Browser Use Agent

The Browser Use Agent autonomously operates a web browser by interpreting natural language using large language models (LLMs).

API KEY

Enter an OpenAI or Google API key. The %VARIABLE% format is supported.

OpenAI API Key: Refer to https://platform.openai.com/api-keys
Google API Key: Refer to https://ai.google.dev/gemini-api/docs/api-key

MODEL

The following model types are currently supported:

Platform

Model

OpenAI

o3, gpt-4.1, gpt-4.1-mini

Google

gemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flash

Generally, the models listed above are billed based on usage. For billing details, refer to:
OpenAI: https://openai.com/pricing
Google Gemini: https://ai.google.dev/gemini-api/docs/pricing

MAX ACTION

Defines the maximum number of actions the agent can perform. Range: 1–100.

GUARDRAIL

Configures operational guardrails:

SAME PAGE: confined to the current page
SAME SITE: confined to pages within the same domain

Since model usage is billed based on consumption, it's recommended to properly apply the MAX ACTION and GUARDRAIL parameters to avoid unexpected charges.

GOAL

Define the task using natural language.

ACTION

The next browser action determined and executed by the model. Each step will be automatically captured as a screenshot and stored in the working directory for tracking and review.

Example

Goal: Operate the search engine in the webpage below using specific options

Use the natural language prompt to guide the model

Result:

We are dedicated to improving our content. Please let us know if you come across any errors, including spelling, grammar, or other mistakes, as your feedback is valuable to us! 🤖️⚡️

PreviousFlexibility & Security NextWeb Read Agent

Last updated 22 days ago