For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://modelgates.ai/docs/_mcp/server.
Web Search
Server tools are currently in beta. The API and behavior may change.
The modelgates:web_search server tool gives any model on ModelGates access to real-time web information. When the model determines it needs current information, it calls the tool with a search query. ModelGates executes the search and returns results that the model uses to formulate a grounded, cited response.
How It Works
- You include
{ "type": "modelgates:web_search" }in yourtoolsarray. - Based on the user's prompt, the model decides whether a web search is needed and generates a search query.
- ModelGates executes the search using the configured engine (defaults to
auto, which uses native provider search when available or falls back to Exa). - The search results (URLs, titles, and content snippets) are returned to the model.
- The model synthesizes the results into its response. It may search multiple times in a single request if needed.
Quick Start
const response = await fetch('https://modelgates.ai/api/v1/chat/completions', { method: 'POST', headers: { Authorization: 'Bearer {{API_KEY_REF}}', 'Content-Type': 'application/json', }, body: JSON.stringify({ model: '{{MODEL}}', messages: [ { role: 'user', content: 'What were the major AI announcements this week?' } ], tools: [ { type: 'modelgates:web_search' } ] }),}); const data = await response.json();console.log(data.choices[0].message.content);import requests response = requests.post( "https://modelgates.ai/api/v1/chat/completions", headers={ "Authorization": f"Bearer {{API_KEY_REF}}", "Content-Type": "application/json", }, json={ "model": "{{MODEL}}", "messages": [ { "role": "user", "content": "What were the major AI announcements this week?" } ], "tools": [ {"type": "modelgates:web_search"} ] }) data = response.json()print(data["choices"][0]["message"]["content"])curl https://modelgates.ai/api/v1/chat/completions \ -H "Authorization: Bearer {}" \ -H "Content-Type: application/json" \ -d '{ "model": "{{MODEL}}", "messages": [ { "role": "user", "content": "What were the major AI announcements this week?" } ], "tools": [ {"type": "modelgates:web_search"} ] }'Configuration
The web search tool accepts optional parameters to customize search behavior:
{ "type": "modelgates:web_search", "parameters": { "engine": "exa", "max_results": 5, "max_total_results": 20, "search_context_size": "medium", "allowed_domains": ["example.com"], "excluded_domains": ["reddit.com"] }}| Parameter | Type | Default | Description |
|---|---|---|---|
engine | string | auto | Search engine to use: auto, native, exa, firecrawl, or parallel |
max_results | integer | 5 | Maximum results per search call (1–25). Applies to Exa, Firecrawl, and Parallel engines; ignored with native provider search |
max_total_results | integer | — | Maximum total results across all search calls in a single request. Useful for controlling cost and context size in agentic loops |
search_context_size | string | — | How much context to retrieve: low, medium, or high. For Exa, pins a fixed per-result character cap (5K/15K/30K); when omitted, Exa picks adaptively (~2-4K per result). For Parallel, controls total characters across all results (defaults to medium). Ignored with native provider search and Firecrawl |
user_location | object | — | Approximate user location for location-biased results. Currently only supported by native provider search; ignored with Exa, Firecrawl, and Parallel (see below) |
allowed_domains | string[] | — | Limit results to these domains. Supported by Exa, Firecrawl, Parallel, and most native providers (see domain filtering) |
excluded_domains | string[] | — | Exclude results from these domains. Supported by Exa, Firecrawl, Parallel, and some native providers (see domain filtering) |
User Location
Pass an approximate user location to bias search results geographically:
{ "type": "modelgates:web_search", "parameters": { "user_location": { "type": "approximate", "city": "San Francisco", "region": "California", "country": "US", "timezone": "America/Los_Angeles" } }}All fields within user_location are optional.
Engine Selection
The web search server tool supports multiple search engines:
auto(default): Uses native search if the provider supports it, otherwise falls back to Exanative: Forces the provider's built-in web search (falls back to Exa with a warning if the provider doesn't support it)exa: Uses Exa's search API, which combines keyword and embeddings-based search. Returns Exa highlights — excerpts drawn from each page that are most relevant to the search query — rather than truncated page text. See the Exa section below.firecrawl: Uses Firecrawl's search API (BYOK — bring your own key)parallel: Uses Parallel's search API
Engine Capabilities
| Feature | Exa | Firecrawl | Parallel | Native |
|---|---|---|---|---|
| Domain filtering | Yes | Yes | Yes | Varies by provider |
| Context size control | Yes* | No | Yes** | No |
| API key | Server-side | BYOK (your key) | Server-side | Provider-handled |
** Parallel: limit applies as a total across all results
Exa
ModelGates requests Exa highlights for each result rather than the text content option. Highlights are extractive excerpts drawn directly from the page that Exa selects as most relevant to the search query, typically yielding higher-quality context per token than truncated page text for agentic web tooling.
By default, Exa selects an adaptive highlight size per query and document — typically ~2,000–4,000 characters per result. To pin a larger fixed per-result budget, set search_context_size, which maps to Exa's contents.highlights.maxCharacters parameter:
low— 5,000 characters per resultmedium— 15,000 characters per resulthigh— 30,000 characters per result
When search_context_size is omitted, ModelGates lets Exa pick the highlight size adaptively. The selected excerpts are returned to the model on each result and surfaced to API callers via url_citation annotations. Within a single result, excerpts that come from different parts of the page are separated by Exa's [...] markers, so the content field of a url_citation annotation may look like:
First excerpt drawn from the page.[...]Second excerpt drawn from elsewhere in the same page.[...]Third excerpt.Firecrawl (BYOK)
Firecrawl uses your own API key. To set it up:
- Go to your ModelGates plugin settings and select Firecrawl as the web search engine
- Accept the Firecrawl Terms of Service — this creates a Firecrawl account linked to your email
- Your account starts with 10,000 free credits (credits expire after 3 months)
Firecrawl searches use your Firecrawl credits directly — no additional charge from ModelGates. Firecrawl supports domain filtering (allowed_domains / excluded_domains), but they are mutually exclusive — you cannot use both in the same request.
Parallel
Parallel supports domain filtering and context size control (search_context_size), and uses ModelGates credits at $0.005 per request. Includes up to 10 results in a request, then $0.001 per additional result.
Domain Filtering
Restrict which domains appear in search results using allowed_domains and excluded_domains:
{ "type": "modelgates:web_search", "parameters": { "allowed_domains": ["arxiv.org", "nature.com"], "excluded_domains": ["reddit.com"] }}| Engine | allowed_domains | excluded_domains | Notes |
|---|---|---|---|
| Exa | Yes | Yes | Both can be used simultaneously |
| Parallel | Yes | Yes | Mutually exclusive |
| Firecrawl | Yes | Yes | Mutually exclusive |
| Native (Anthropic) | Yes | Yes | Mutually exclusive |
| Native (OpenAI) | Yes | No | excluded_domains silently ignored |
| Native (xAI) | Yes | Yes | Mutually exclusive |
| Native (Perplexity) | No | No | Not supported via server tool path |
Controlling Total Results
When the model searches multiple times in a single request, use max_total_results to cap the cumulative number of results:
{ "type": "modelgates:web_search", "parameters": { "max_results": 5, "max_total_results": 15 }}Once the limit is reached, subsequent search calls return a message telling the model the limit was hit instead of performing another search. This is useful for controlling cost and context window usage in agentic loops.
Works with the Responses API
The web search server tool also works with the Responses API:
const response = await fetch('https://modelgates.ai/api/v1/responses', { method: 'POST', headers: { Authorization: 'Bearer {{API_KEY_REF}}', 'Content-Type': 'application/json', }, body: JSON.stringify({ model: '{{MODEL}}', input: 'What is the current price of Bitcoin?', tools: [ { type: 'modelgates:web_search', parameters: { max_results: 3 } } ] }),}); const data = await response.json();console.log(data);import requests response = requests.post( "https://modelgates.ai/api/v1/responses", headers={ "Authorization": f"Bearer {{API_KEY_REF}}", "Content-Type": "application/json", }, json={ "model": "{{MODEL}}", "input": "What is the current price of Bitcoin?", "tools": [ {"type": "modelgates:web_search", "parameters": {"max_results": 3}} ] }) data = response.json()print(data)Usage Tracking
Web search usage is reported in the response usage object:
{ "usage": { "input_tokens": 105, "output_tokens": 250, "server_tool_use": { "web_search_requests": 2 } }}The web_search_requests field counts the total number of search queries the model made during the request.
Pricing
| Engine | Pricing |
|---|---|
| Exa | $0.005 per request using ModelGates credits. Includes up to 10 results, then $0.001 per additional result |
| Parallel | $0.005 per request using ModelGates credits. Includes up to 10 results in a request, then $0.001 per additional result |
| Firecrawl | Uses your Firecrawl credits directly — no ModelGates charge |
| Native | Passed through from the provider (OpenAI, Anthropic, Perplexity, xAI) |
All pricing is in addition to standard LLM token costs for processing the search result content.
Migrating from the Web Search Plugin
The web search plugin (plugins: [{ id: "web" }]) and the :online variant are deprecated. Use the modelgates:web_search server tool instead.
The key differences:
| Web Search Plugin (deprecated) | Web Search Server Tool | |
|---|---|---|
| How to enable | plugins: [{ id: "web" }] | tools: [{ type: "modelgates:web_search" }] |
| Who decides to search | Always searches once | Model decides when/whether to search |
| Call frequency | Once per request | 0 to N times per request |
| Engine options | Native, Exa, Firecrawl, Parallel | Auto, Native, Exa, Firecrawl, Parallel |
| Domain filtering | Yes (Exa, Parallel, some native) | Yes (Exa, Parallel, most native) |
| Context size control | Via web_search_options | Via search_context_size parameter |
| Total results cap | No | Yes (max_total_results) |
| Pricing | Varies by engine | Varies by engine (same rates) |
Migration example
// Before (deprecated){ "model": "openai/gpt-5.2", "messages": [...], "plugins": [{ "id": "web", "max_results": 3 }]} // After{ "model": "openai/gpt-5.2", "messages": [...], "tools": [ { "type": "modelgates:web_search", "parameters": { "max_results": 3 } } ]}// Before (deprecated) — engine and domain filtering{ "model": "openai/gpt-5.2", "messages": [...], "plugins": [{ "id": "web", "engine": "exa", "max_results": 5, "include_domains": ["arxiv.org"] }]} // After{ "model": "openai/gpt-5.2", "messages": [...], "tools": [{ "type": "modelgates:web_search", "parameters": { "engine": "exa", "max_results": 5, "allowed_domains": ["arxiv.org"] } }]}// Before (deprecated) — :online variant{ "model": "openai/gpt-5.2:online"} // After{ "model": "openai/gpt-5.2", "tools": [{ "type": "modelgates:web_search" }]}Next Steps
- Server Tools Overview — Learn about server tools
- Datetime — Get the current date and time
- Tool Calling — Learn about user-defined tool calling