Browser Use Agent
The Browser Use Agent autonomously operates a web browser by interpreting natural language using large language models (LLMs).

API KEY
Enter an OpenAI or Google API key. The %VARIABLE%
format is supported.
OpenAI API Key: Refer to https://platform.openai.com/api-keys
Google API Key: Refer to https://ai.google.dev/gemini-api/docs/api-key
MODEL
The following model types are currently supported:
OpenAI
o3, gpt-4.1, gpt-4.1-mini
gemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flash
Generally, the models listed above are billed based on usage. For billing details, refer to:
OpenAI: https://openai.com/pricing
Google Gemini: https://ai.google.dev/gemini-api/docs/pricing
MAX ACTION
Defines the maximum number of actions the agent can perform. Range: 1–100.
GUARDRAIL
Configures operational guardrails:
SAME PAGE: confined to the current page
SAME SITE: confined to pages within the same domain
Since model usage is billed based on consumption, it's recommended to properly apply the MAX ACTION and GUARDRAIL parameters to avoid unexpected charges.
GOAL
Define the task using natural language.
ACTION
The next browser action determined and executed by the model. Each step will be automatically captured as a screenshot and stored in the working directory for tracking and review.
Example
Goal: Operate the search engine in the webpage below using specific options

Use the natural language prompt to guide the model

Result:

--
We are dedicated to improving our content. Please let us know if you come across any errors, including spelling, grammar, or other mistakes, as your feedback is valuable to us! 🤖️⚡️
Last updated