Google prepares Jarvis to fight the AI ‘computer use’ war

Posted by:

|

On:

|

Google is also throwing its hat in the AI-based computer use war along with players such as Anthropic and OpenAI in an effort to gain share of the nascent yet evolving AI-based automation market, driven by Agentic AI.

The company is working on building Jarvis, which will allow users to automate tasks such as research and shopping over the Chrome browser with the help of the company’s Gemini 2.0 large language model (LLM), according to The Information.

Jarvis’ ability to control actions and complete tasks on the browser would combine multiple LLM-based development techniques, such as reading and understanding screenshots, generating text, and simulating user interactions, according to sources quoted by The Information.

Google’s efforts to use AI underpinned by LLMs to automate user tasks is very similar to the “computer use” ability released by Anthropic last week, which experts believe could revolutionize the automation market once rolled out as a finished product as a huge amount of work continues to be done over computers.

Anthropic’s “computer use” ability, in turn, enables developers to instruct Claude 3.5 Sonnet, through the Anthropic API, to read and interpret what’s on the display, type text, move the cursor, click buttons, and switch between windows or applications — much as today’s robotic process automation (RPA) tools can be instructed — much more laboriously — to do.   

While Jarvis seems to be aimed at consumers, the technology could also be used across enterprises given that many development activities, workflow and automation management, CRM, ERP, etc are accessed over the browser via web-based clients or interfaces.  

In fact, Google may have unraveled the ability to determine coordinates from a screenshot or image much earlier than Anthropic if Simon Willison, the co-creator of the Django web framework, is to be believed.

However, Anthropic may be the first to use the ability in combination with other capabilities to hit the market first when it comes to controlling computers with the help of AI-based agents and LLMs.

OpenAI has reportedly been working on developing a similar capability since February.

Separately, in one of his LinkedIn posts, software expert Martin Bechard claimed that OpenAI has already developed a feature, named Tools, that follows the same underlying principles that Anthropic’s computer use capability is based on.

Microsoft, Meta, and Apple are also in on the act.

While Microsoft earlier this month showcased a new capability, Vision, of its Copilot that can read and understand images and answer questions about them, Apple has been working on introducing automation capabilities to its virtual assistant Siri via its Apple Intelligence updates.

Facebook-parent Meta, on the other hand, has been working to compress the sizes of its LLMs into smartphones. Earlier this year, Google showcased several new AI-based features in the Chrome browser, including the ability to use AI to compare information between two tabs and surface suggestions on grouping similar tabs together.

Posted by

in