Powered by GPT-4V, a revolutionary new AI framework takes screenshots and outputs mouse clicks and keyboard commands just like a human. The open-source framework represents a major step toward sophisticated AI agents replacing human computing interfaces.
Late nights with his newborn daughter led OthersideAI developer Josh Bickett to a breakthrough idea for an AI system that can operate a computer on its own. As Bickett told VentureBeat, “I’ve been enjoying time with my four-week-old daughter, but I also had a little time and this idea kind of came to me because I saw different demos of GPT-4 vision. The thing we’re working on now can actually happen with GPT-4 vision.”
With his daughter in one arm, Bickett sketched out the basic framework. OthersideAI CEO Matt Shumer recognized its huge potential. “This is a milestone toward getting the equivalent of a self-driving car but for a computer,” Shumer said. “We have the sensors now. We have the LIDAR systems. Next we build the intelligence.”
The framework takes screenshots and outputs mouse clicks and keyboard commands just like a human. But advanced AI models plugged in will enable computers to handle all interactions through conversational commands.
As Shumer said, “Once this thing is sufficiently reliable, it is going to be your computer. It is going to be your interface to the digital world.” Different specialized models may emerge for speed, complex tasks, enterprise or consumer use. The goal is models that can take over hateful tasks so “somebody who can barely use a computer from the beginning can do it.”
Bickett believes the open-source framework will fuel worldwide experimentation. While realizing the vision will require immense resources, AI company Imbue secured $150 million to build a platform for developing reasoning models, which Imbue CEO Kanjun Qiu called “the core blocker to agents that work really well.”
The self-operating framework ushers in an era of sophisticated AI agents replacing human computing interfaces through ordinary language. Late nights may spark ideas, but focused work can realize the vision of computers that “just work” for anyone, anywhere.
Check out VentureBeat's AI events to connect with the enterprise AI community.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.