Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The industry buzzword is "Robotic Process Automation", which as a category of products has been focused on using various forms of ML/AI to glue these things together in a common/structured way (in addition to good old fashioned screen scraping).

Up this this point, these products have been quite brittle. The recent explosion of AI tech seems like quite a boon for this space.



I totally agree on all points, especially around what AI means for this.

I'm kind of in a happy accident situation because I was working on something for RPA, which then became a layer that was factored as its own product, but now might be able to come full circle as a result of AI.

Essentially this layer can function as a "delivery medium" for RPA agent creation, that you can use on any device without download. However, as it has many others uses I've been working on those, but I've been seeking a great reason to get back into RPA.

I have a cool idea to leverage human-guided AI creation of data maps and action tours for RPA, but similar to what you say, unless great care is taken you can end up with a brittle approach. Also, as the market has been quite saturated many reasonable approaches, I just haven't felt compelled.

Yet now I think the possible merging of GPT level AIs with browser instrumentation to deliver an augmented way to browse the web makes that incredibly compelling.

So I'm incredibly thrilled that I have this happy accident of BrowserBox^0 (the factored out layer originally from RPA work above) which provides a pluggable/iframe-emebeddable interface for remotely controlling a headless browser. So now I want to look at unifying BrowserBox with this kind of GPT driven exploration.

It's even cooler, because, as BB enables co-browsing by default (multiplayer browsing) and turns the browser into a "client-server" architecture, I can see plugging in GPT-4V as a connecting client with some kind of minimal API affordance for it to use would, like the very cool vimium keyboard-enabled browsing in the OP, would be such interesting project to try!

We're open source so if you want to check us out or get involved in this quest, come say hi, maybe get involved if you're game!

0: https://github.com/BrowserBox/BrowserBox


I have watched your project for a while as a possible option for embedded browsers for XR applications like WebXR but the high licensing cost was a factor and solutions like Hyperbeam or Vueplex in Unity have been possible. Defiantly agree that multimodal LLM integration is a huge opportunity and multiplayer browsing with AI in realtime is a super cool idea if you package it right.


Hi jimmySixDOF thank you for the kind words and the attention on our project! :)

Regarding pricing we have heard that feedback over time and gradually adjusted our licensing costs. It should now be much more affordable as it is targeted towards large deployments, with decreasing cost and increasing value at scale.

If you'd like to send an email with any thoughts on our current prices on https://dosyago.com to cris@dosyago.com I'd highly value it!

Your idea of WebXR and embedding within Unity is very interesting, and I think it could be a fit.


In the OP's specific instance when would you reach out for a traditional ETL tool vs an RPA solution?


RPA is for data sources and destinations that are meant for human consumption and entry. So you’d use RPA to take an image of a table and enter every row into a web form.


How much does the involvement of a bank of fax machines complicate things?


A little perhaps, but not much. One can replace a bank of physical fax machines with modems.

It would be an interesting job for sure. Why wasn't it done before? I can imagine only two reasons. One, there isn't that much data to move and it makes no sense to build software for what few people spend 30min per day on. Two, the data in the legacy system is images and people are not just moving it between systems, but they also do categorisation, verification etc. In which case an AI model may be useful, but almost always hard coded rules will be faster.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: