# Chat Completions

Chat Completions accepts the same request shape as OpenAI's `/v1/chat/completions`, with a set of CoinMarketCap built-in tools the model can call for live crypto data. When the model invokes a built-in, the server runs it against CMC's data and includes the result on the response under `cmc.tool_traces[]`. You pay each model provider's published rate, plus a 10% surcharge on any request whose `tools` array contains a CMC built-in (regardless of whether the model actually invokes it).

To migrate from OpenAI: swap the base URL, change the auth header to `X-CMC_PRO_API_KEY`, replace the OpenAI model name with a CMC model identifier (e.g. `cmc-ai-v1-gpt-5.1`), and read the new `cmc` object (carrying `cost` and optional `tool_traces[]`) on successful responses. See [Migrating from OpenAI](#migrating-from-openai) for the field-level differences.

:::caution{title="Limited access"}

Currently available only to selected enterprise customers. [Request access](https://support.coinmarketcap.com/hc/en-us/requests/new?ticket_form_id=360001156492).

:::

## Availability

To request access, [fill out the access request form](https://support.coinmarketcap.com/hc/en-us/requests/new?ticket_form_id=360001156492) and tell us about your use case. We'll follow up.

Once your account is enabled, your existing `X-CMC_PRO_API_KEY` works on this endpoint just like any other CoinMarketCap Pro endpoint, no key change needed.

## Quick start

<CodeTabs>

```bash title="curl"
curl https://pro-api.coinmarketcap.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-CMC_PRO_API_KEY: <your_api_key>" \
  -d '{
    "model": "cmc-ai-v1-gpt-5.1",
    "messages": [
      {"role": "user", "content": "What is Bitcoin?"}
    ]
  }'
```

```python title="Python"
from openai import OpenAI

client = OpenAI(
    api_key="placeholder",
    base_url="https://pro-api.coinmarketcap.com/v1",
    default_headers={"X-CMC_PRO_API_KEY": "<your_api_key>"},
)

response = client.chat.completions.create(
    model="cmc-ai-v1-gpt-5.1",
    messages=[{"role": "user", "content": "What is Bitcoin?"}],
)

print(response.choices[0].message.content)
```

```javascript title="Node.js"
const response = await fetch("https://pro-api.coinmarketcap.com/v1/chat/completions", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "X-CMC_PRO_API_KEY": "<your_api_key>",
  },
  body: JSON.stringify({
    model: "cmc-ai-v1-gpt-5.1",
    messages: [{ role: "user", content: "What is Bitcoin?" }],
  }),
});

const data = await response.json();
console.log(data.choices[0].message.content);
```

</CodeTabs>

The response carries the assistant's reply under `choices[0].message.content` and the request's cost under `cmc.cost`.

## Authentication

Pass your API key in the `X-CMC_PRO_API_KEY` header on every request. This is the same key you use for every other CoinMarketCap Pro API endpoint.

```http
X-CMC_PRO_API_KEY: <your_api_key>
```

If you don't have a key yet, sign in at [pro.coinmarketcap.com](https://pro.coinmarketcap.com/login) and copy it from the dashboard.

## Endpoint

```text
POST https://pro-api.coinmarketcap.com/v1/chat/completions
```

## Headers

| Name | Required | Description |
| --- | --- | --- |
| `X-CMC_PRO_API_KEY` | Yes | Your CoinMarketCap Pro API key. |
| `Content-Type` | Yes | Must be `application/json`. |
| `Accept` | No | Use `text/event-stream` when `stream: true`. Otherwise `application/json`. |
| `x-request-id` | No | Optional client-supplied request identifier. For support tickets, use the response body's `id` (`chatcmpl-...`) and the `x-server-traceid` response header. |

## Request body

| Field | Type | Required | Description |
| --- | --- | --- | --- |
| `model` | `string` | Yes | Model ID. The currently available identifier is `cmc-ai-v1-gpt-5.1`. Additional frontier closed-source (Claude Opus, Gemini) and open-source (DeepSeek, GLM) models are on our roadmap, with their own identifier strings. Contact us if you need a specific provider. The response `model` field returns the underlying model that served the request and may include a date suffix. |
| `messages` | `array` | Yes | The conversation so far, as a non-empty list of message objects. |
| `stream` | `boolean` | No | Stream the response as Server-Sent Events. Defaults to `false`. |
| `temperature` | `number` | No | Sampling temperature between `0` and `2`. Out-of-range values return a validation error. |
| `max_completion_tokens` | `integer` | No | Cap on tokens generated for the completion. Minimum effective value is around 15. Setting it too low (e.g. `1`) returns `400` with a message about needing higher `max_tokens`. |
| `tools` | `array` | No | Tools the model is allowed to call. See [Tools](#tools). |
| `tool_choice` | `string` | No | `auto` (default), `required`, or `none`. Unknown string values fall back to the default. The OpenAI object form (e.g. `{"type": "function", "function": {"name": "..."}}`) is not supported and returns HTTP 200 with CMC's standard error envelope (`status.error_code: "500"`, `error_message: "The system is busy..."`). Treat that response as a request-shape error. |
| `response_format` | `object` | No | Force structured output. Accepts `{"type": "json_object"}` and `{"type": "json_schema", "json_schema": {...}}`. |
| `stream_options` | `object` | No | Options for streaming. Accepts `{"include_usage": true}` for forward compatibility. The final SSE frame includes `usage` for every stream. |
| `reasoning_effort` | `string` | No | One of `minimal`, `low`, `medium`, `high`. Higher values let the model spend more tokens reasoning before answering, which can improve accuracy on complex prompts at the cost of latency and tokens. Populates `usage.reasoning_tokens`. Unknown string values fall back to the default. |
| `previous_response_id` | `string` | No | The `id` of a prior response to resume from. Send as a top-level field in the request body. Required when sending a `tool` message in response to a custom tool call. Use it to ask the model for an answer that uses the trace data after a CMC built-in runs. See [Tool call resumption](#tool-call-resumption). |

### Messages

Each entry in `messages` describes one turn of the conversation.

| Field | Type | Description |
| --- | --- | --- |
| `role` | `string` | `system`, `user`, `assistant`, or `tool`. |
| `content` | `string` | Text content. May be empty when `tool_calls` is set on an assistant message. |
| `tool_call_id` | `string` | The tool call this message responds to. Required for `role: tool`. |
| `tool_calls` | `array` | Tool calls produced by the model. Only valid on assistant messages. |

Only the first `system` message is honored. If you send more than one, the rest are ignored. To layer multiple sets of rules into one prompt, concatenate them into a single `system` message.

Assistant messages returned by the API also include `reasoning_content`. It's typically empty when `reasoning_effort` is unset and may be empty even when reasoning runs (the model can reason internally without surfacing reasoning text). Check `usage.reasoning_tokens` to see whether reasoning happened. See [Choices](#choices) below.

## Responses

The endpoint returns a `chat.completion` object on success. Validation errors (`400`, `404`) use the OpenAI-style `error` envelope. Authentication failures use CMC's standard error envelope under `status`. Method errors (`405`) use the gateway envelope. Errors from upstream model providers may keep their native shape. The sections below show each shape.

### `200` Successful

A non-streaming success response looks like this:

```json
{
  "id": "chatcmpl-Djh...",
  "object": "chat.completion",
  "created": 1779785684,
  "model": "gpt-5.1-2025-11-13",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Bitcoin is a decentralized digital currency launched in 2009 by an anonymous developer known as Satoshi Nakamoto. It runs on a peer-to-peer network secured by cryptographic proof-of-work, with no central authority.",
        "reasoning_content": ""
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 109,
    "completion_tokens": 132,
    "cached_tokens": 0,
    "reasoning_tokens": 0
  },
  "cmc": {
    "cost": {
      "currency": "USD",
      "total_cost": 0.00145625
    }
  },
  "service_tier": "default"
}
```

#### Top-level fields

| Field | Type | Description |
| --- | --- | --- |
| `id` | `string` | Unique identifier for this completion. Pass this back as `previous_response_id` to resume a tool call. |
| `object` | `string` | Always `chat.completion`. |
| `created` | `integer` | Unix timestamp (seconds) when the response was created. |
| `model` | `string` | The actual model that served the response. May include a date suffix. |
| `choices` | `array` | Currently always length 1. |
| `usage` | `object` | Token counts. See [Usage](#usage). |
| `cmc` | `object` | Cost and built-in tool traces. See [CMC](#cmc). |
| `service_tier` | `string` | Service tier the request was processed at, passed through from the underlying model. Currently `default`. |

#### Choices

| Field | Type | Description |
| --- | --- | --- |
| `index` | `integer` | Index of this choice. |
| `message` | `object` | The full assistant message. Present in non-streaming responses. |
| `delta` | `object` | A streaming chunk. Each frame typically carries only the new content under `delta`, while top-level fields like `id`, `object`, `created`, `model`, and `service_tier` repeat on every frame. |
| `finish_reason` | `string` | `stop`, `length`, `tool_calls`, or `content_filter`. |

##### Message and delta fields

| Field | Type | Description |
| --- | --- | --- |
| `role` | `string` | `assistant` when present. Streaming usually only sends this on the first frame. |
| `content` | `string` | Text from the model. Empty when the model returns a custom tool call, when a CMC built-in was invoked, or before the model has produced any text in a stream. |
| `tool_calls` | `array` | Custom tool calls the client must execute. Not present for CMC built-ins. |
| `reasoning_content` | `string` | Reasoning output from the model. Typically empty when `reasoning_effort` is unset, and may also be empty when reasoning runs internally without surfacing text. Use `usage.reasoning_tokens` to detect reasoning. |

#### Usage

| Field | Type | Description |
| --- | --- | --- |
| `prompt_tokens` | `integer` | Tokens in the input prompt. |
| `completion_tokens` | `integer` | Tokens in the generated completion. |
| `cached_tokens` | `integer` | Tokens served from prompt cache. |
| `reasoning_tokens` | `integer` | Tokens used by the model for reasoning. Always present, but `0` unless `reasoning_effort` triggered reasoning. |

Note: OpenAI's `total_tokens` field is not returned. SDKs that read `usage.total_tokens` will see `None`. Sum `prompt_tokens + completion_tokens + reasoning_tokens` if you need a total.

#### CMC

| Field | Type | Description |
| --- | --- | --- |
| `cost.currency` | `string` | Currency code. Currently `USD`. |
| `cost.total_cost` | `number` | Total cost of this request in `cost.currency`. Includes tokens and any built-in tool execution. |
| `tool_traces` | `array` | One entry per CMC built-in tool that ran server-side. Absent when no built-ins were invoked. |

##### Tool trace

Trace fields vary by tool. `id`, `name`, `arguments`, and `status` are always present. The remaining fields depend on which tool ran and whether it succeeded.

| Field | Type | Description |
| --- | --- | --- |
| `id` | `string` | Always present. Tool call ID. |
| `name` | `string` | Always present. Name of the tool that ran. |
| `arguments` | `string` | Always present. JSON-encoded string of arguments passed to the tool. Parse with `JSON.parse` before using. |
| `status` | `string` | Always present. Execution status (e.g. `completed`, `failed`). |
| `output_visibility` | `string` | Returned by some tools. When set to `sanitized`, the raw `output` is omitted and only `citations` plus `status` are returned. |
| `output` | `string` | JSON-encoded string of the result returned by the tool. Returned on success when `output_visibility` allows. Parse with `JSON.parse` before using. |
| `citations` | `array<object>` | Returned by some tools (e.g. `cmc_content_search`). Each entry has `source`, `title`, `url`, and an optional `published_at` ISO 8601 timestamp. |
| `error_message` | `string` | Returned only when the tool call failed. |

A successful `cmc_id_lookup` trace looks like this. `output` is a JSON-encoded string. Parsed, it is `{"coin_id_look_up_result": [...]}`, with each result carrying `asset_id`, `name`, `symbol`, `slug`, `rank`, `market_cap`, and other fields.

```json
{
  "id": "call_abc",
  "name": "cmc_id_lookup",
  "arguments": "{\"assetType\":\"coin\",\"keyword\":\"Bitcoin\"}",
  "status": "completed",
  "output": "{\"coin_id_look_up_result\":[{\"asset_id\":1,\"name\":\"Bitcoin\",\"symbol\":\"BTC\",\"slug\":\"bitcoin\",\"rank\":1,\"market_cap\":\"1.47 T\"}]}"
}
```

A successful `cmc_content_search` trace returns sanitized output and surfaces sources via `citations`. Each citation is an object, not a bare URL string. `citations` may be an empty array if the search produced no usable sources for that query.

```json
{
  "id": "call_xyz",
  "name": "cmc_content_search",
  "arguments": "{\"query\":\"bitcoin halving\"}",
  "status": "completed",
  "citations": [
    {
      "source": "thirdparty",
      "title": "What is Bitcoin Halving?",
      "url": "https://coinmarketcap.com/academy/article/...",
      "published_at": "2026-02-27T00:00:00Z"
    }
  ],
  "output_visibility": "sanitized"
}
```

### `400` Validation error

Returned when a required field is missing, an enum value is unknown, or a numeric value is out of range.

```json
{
  "error": {
    "statusCode": 400,
    "code": "decimal_above_max_value",
    "message": "Invalid 'temperature': decimal above maximum value. Expected a value <= 2, but got 3.0 instead.",
    "param": "temperature",
    "type": "invalid_request_error"
  }
}
```

### Authentication failure

Authentication failures return CMC's standard error envelope. The HTTP status is `200`, with the failure detail on `status.error_code` and `status.error_message`.

```json
{
  "status": {
    "timestamp": "2026-06-02T18:14:16.194Z",
    "error_code": "1001",
    "error_message": "This API Key is invalid. ",
    "elapsed": "0",
    "credit_count": 0
  }
}
```

`error_code: "1001"` indicates an invalid key. `error_code: "1002"` indicates a missing key. Branch on `status.error_code` in the response body to detect authentication failures.

### `404` Model not found

Returned when the `model` value doesn't match a supported identifier.

```json
{
  "error": {
    "statusCode": 404,
    "code": "model_not_found",
    "message": "The model `gpt-4o` does not exist or you do not have access to it.",
    "type": "invalid_request_error"
  }
}
```

### `405` Method not allowed

Returned for any HTTP verb other than `POST`.

```json
{
  "timestamp": 1779787340092,
  "path": "/v1/chat/completions",
  "status": 405,
  "error": "Method Not Allowed",
  "requestId": "7c7ab920-26866"
}
```

## Try it

<OpenPlaygroundButton
  server="https://pro-api.coinmarketcap.com"
  url="/v1/chat/completions"
  method="POST"
  headers={[
    { name: "X-CMC_PRO_API_KEY", defaultValue: "<your_api_key>" },
    { name: "Content-Type", defaultValue: "application/json" },
  ]}
  body={JSON.stringify({
    model: "cmc-ai-v1-gpt-5.1",
    messages: [{ role: "user", content: "What is Bitcoin?" }],
  }, null, 2)}
/>

## Migrating from OpenAI

The request body is wire-compatible with OpenAI's `/v1/chat/completions`, but a few familiar fields behave differently. Watch for these when porting an existing integration.

The `model` value also has to change. Use a CMC model identifier (e.g. `cmc-ai-v1-gpt-5.1`) instead of an OpenAI model name like `gpt-4o`. The list of supported models is on the [`model` field](#request-body) row.

- `n` is not honored. `choices` is always length 1.
- `stop` sequences are not enforced. The model may produce text containing the stop strings.
- `logprobs` and `top_logprobs` are accepted but not returned in the response.
- `parallel_tool_calls: false` is not honored. The model may still emit multiple tool calls in a single turn.
- Multimodal `content` (an array of parts with `image_url`, etc.) is not supported. Pass `content` as a string.
- `tool_choice` accepts only the string forms `auto`, `required`, and `none`. The OpenAI object form is not supported.

Other OpenAI request fields may be accepted by the parser without producing an error. If a field is not listed in [Request body](#request-body), don't assume it's honored.

## Tools

The endpoint accepts standard OpenAI-style tool definitions, plus a set of CoinMarketCap built-in tools (like `cmc_id_lookup` and `cmc_content_search`) that handle live crypto data lookups.

There's one important behavioral difference between custom tools and CMC built-in tools.

| Tool type | Where it executes | How you receive the result |
| --- | --- | --- |
| Custom tool (your own function name) | On your machine, after the API returns | `choices[0].message.tool_calls[]` |
| CMC built-in (e.g. `cmc_id_lookup`) | On CMC's servers, before the API returns | `cmc.tool_traces[]` |

When the model calls a built-in, the call has already happened by the time you receive the response. The shape is:

- `message.content` is empty.
- `message.tool_calls` is absent (built-ins never round-trip through the client).
- `finish_reason` is `tool_calls`.
- The result is under `cmc.tool_traces[]`.

To get a natural-language assistant reply that uses the trace data, send a follow-up request with `previous_response_id` set to the response `id` and a `messages` array carrying the user prompt. The model uses the trace from the prior turn instead of calling the tool again. See [Built-in tool synthesis](#built-in-tool-synthesis) for an example.

### Tool definition

```json
{
  "type": "function",
  "function": {
    "name": "lookup_user_id",
    "description": "Look up an internal user ID by email.",
    "parameters": {
      "type": "object",
      "additionalProperties": false,
      "properties": {"email": {"type": "string"}},
      "required": ["email"]
    },
    "strict": true
  }
}
```

`function.name` and `function.description` are required on custom tools. An empty string for `description` is accepted. Omitting the field returns a server error. `function.parameters` is optional: a tool defined without `parameters` runs as a zero-argument tool and the model emits `arguments: "{}"` when it calls it. For built-in CMC tools, only `function.name` is required. The server fills in the canonical schema. See [Built-in tools](#built-in-tools).

`function.strict` turns on strict schema adherence for the model's tool call arguments. When `strict` is true, the `parameters` schema must include `additionalProperties: false` and list every property under `required`. Requests that don't satisfy this return a `400`.

### Built-in tools

| Name | Purpose |
| --- | --- |
| `cmc_id_lookup` | Resolve a coin, onchain token, exchange, crypto category, or NFT to its canonical CMC ID. |
| `cmc_market_data` | Live market data for one asset class per call: prices, market caps, returns, liquidity, derivatives metrics, and historical price series. |
| `cmc_asset_metadata` | Static asset metadata: names, descriptions, links, tags, supply schedules, exchange profiles, similar assets, and onchain security signals. |
| `cmc_technical_analysis` | Latest-bar moving averages, MACD, RSI, Fibonacci levels, and pivot points for coins, onchain tokens, or the total crypto market cap. |
| `cmc_market_overview` | Market-wide aggregates and curated bundles: total crypto market cap, dominance, ETF AUM, ETH gas, and macro sentiment. |
| `cmc_signal_list` | Curated discovery lists: trending coins, top gainers and losers, newly added tokens, narrative lists, and onchain buy signals. |
| `cmc_asset_screener` | Natural-language asset screener that returns the top 10 coins, onchain tokens, exchanges, or categories matching the request. |
| `cmc_sentiment` | X-based crypto sentiment and trending keywords, market-wide or for a specific coin. |
| `cmc_content_search` | Semantic search across CMC editorial content, project FAQs, project websites, exchange announcements, news, Twitter, and macro event calendars. |
| `cmc_math_eval` | Evaluate one arithmetic expression deterministically. Use when the answer depends on exact numbers like PnL, percentages, or allocations. |

Most asset-specific tools take a CMC ID as input. Include `cmc_id_lookup` in your `tools` array whenever you also include `cmc_market_data`, `cmc_asset_metadata`, `cmc_technical_analysis`, `cmc_sentiment` (for coin-specific sentiment), or `cmc_asset_screener` (for category filters). The model will chain the lookup automatically.

When you list a built-in by name, the server uses the canonical CMC schema for that tool. You only need `{"type": "function", "function": {"name": "cmc_id_lookup"}}`. Any `description`, `parameters`, and `strict` you send are ignored. The tool runs server-side and the result lands in `cmc.tool_traces[]`. The model fills in arguments from conversation context, so the user prompt needs to carry enough information for the call to succeed (e.g. the asset name when the model decides to call `cmc_id_lookup`). When a built-in fails, the trace carries `error_message` instead of `output`.

Budget `max_completion_tokens` generously when built-ins are in play. The model spends tokens generating the tool call's `arguments`, and a tight cap can truncate that JSON before the call commits, leaving `cmc.tool_traces[].arguments` unparseable.

Including a CMC built-in in the `tools` array adds a system-prompt prelude that teaches the model how to use the tool. Expect `prompt_tokens` to grow by several hundred to a few thousand tokens per built-in compared to the same prompt without tools. Plan cost estimates accordingly.

Avoid naming custom tools with the `cmc_` prefix. A custom tool whose name collides with a built-in will be intercepted as a built-in and your client-side handler will never run.

For the full set of CMC data tools across all integrations, see [CMC MCP](/ai-agent-hub/mcp).

## Streaming

Set `stream: true` to receive the response as Server-Sent Events. Responses use `Content-Type: text/event-stream;charset=UTF-8`. Most frames are JSON-encoded chunks in the same shape as a non-streaming response, with `choices[].message` replaced by `choices[].delta`. The final stats frame is the exception: `choices` is empty, and the frame carries `cmc.cost` and `usage` instead. Today the stats frame includes `usage` regardless of `stream_options.include_usage`.

```text
data:{"choices":[{"delta":{"role":"assistant","content":"Bitcoin"},"index":0}]}

data:{"choices":[{"delta":{"content":" is..."},"index":0}]}

data:{"choices":[],"cmc":{"cost":{"currency":"USD","total_cost":0.00029}},"usage":{"prompt_tokens":109,"completion_tokens":12,"cached_tokens":0,"reasoning_tokens":0}}

data:[DONE]
```

The wire format emits `data:` followed immediately by the JSON payload, with no space. Some examples in the SSE spec include a space (`data: {...}`). Parsers should accept either, e.g. strip the `data:` prefix and any leading whitespace before parsing the remainder as JSON.

`role` typically appears only on the first frame. Subsequent frames carry `content`, `reasoning_content`, or `tool_calls` fragments.

The final `[DONE]` terminator is preceded by a stats frame. It always carries `cmc.cost` and `usage`. For a normal text completion, `choices` is empty in the stats frame. When a CMC built-in is invoked during the stream, the stats frame also carries `cmc.tool_traces[]` and `choices` carries a single entry with `delta: {}` and `finish_reason: "tool_calls"` instead of being empty.

`stream: true` is supported on every kind of request, including synthesis turns and turns inside a multi-turn conversation. The frame format is identical. A streamed synthesis turn (resume with `previous_response_id` and `stream: true`) emits content deltas the same way a normal completion does, ending with the same stats frame and `data:[DONE]` terminator. A streamed tool-call turn emits a single stats frame with `cmc.tool_traces[]` and `finish_reason: "tool_calls"` since the built-in runs to completion server-side before any frame is sent.

## Tool call resumption

`previous_response_id` is the mechanism for the immediate follow-up after a tool call, whether the tool was a custom one or a CMC built-in. It points at the prior response so the server can resume from that turn's state. It is single-use: reusing the same id after the model has produced a final reply (`finish_reason: "stop"`) or after you've already responded to its tool call with a `tool` message returns `400 "conversation is already completed"`. For general multi-turn chat, see [Multi-turn chat with tool calls](#multi-turn-chat-with-tool-calls) below.

The examples below use the `client` defined in [Quick start](#quick-start). The OpenAI Python SDK strips fields it doesn't natively know about, so `previous_response_id` is sent through `extra_body={"previous_response_id": ...}`. In raw HTTP, send it as a top-level field in the JSON body.

### Custom tools

When the model calls a custom tool, you run it on your side and send the result back so the model can continue. Pass the tool result as a `tool` message in a new request, along with the `id` of the previous response in `previous_response_id`. Send only the new tool result, not the full conversation history.

```python
import json

lookup_tool = {
    "type": "function",
    "function": {
        "name": "lookup_internal_id",
        "description": "Look up an internal asset ID by ticker.",
        "parameters": {
            "type": "object",
            "additionalProperties": False,
            "properties": {"ticker": {"type": "string"}},
            "required": ["ticker"],
        },
        "strict": True,
    },
}

# Turn 1: model returns a tool_call
# A short system prompt helps the model use the tool result in turn 2 instead of falling back to general knowledge.
r1 = client.chat.completions.create(
    model="cmc-ai-v1-gpt-5.1",
    messages=[
        {"role": "system", "content": "Answer the user's question using the tool result. Don't fall back to general knowledge."},
        {"role": "user", "content": "What's our internal ID for BTC?"},
    ],
    tools=[lookup_tool],
    tool_choice="required",
)
prev_id = r1.id
tool_call = r1.choices[0].message.tool_calls[0]

# Turn 2: run the tool locally and reply with the result
args = json.loads(tool_call.function.arguments)
result = json.dumps({"ticker": args["ticker"], "internal_id": "stub-btc-001"})

r2 = client.chat.completions.create(
    model="cmc-ai-v1-gpt-5.1",
    messages=[{"role": "tool", "tool_call_id": tool_call.id, "content": result}],
    extra_body={"previous_response_id": prev_id},
)
print(r2.choices[0].message.content)
```

### Built-in tool synthesis

After a CMC built-in runs (see [Tools](#tools) for the response shape), get an assistant reply that uses the trace data by sending a follow-up request with `previous_response_id` set to the response `id` and a `messages` array carrying the user prompt. The model uses the prior trace instead of triggering another call.

```python
# Turn 1: ask a question that the model answers with a built-in tool.
# Only the name is required for built-ins. The server fills in the canonical schema.
r1 = client.chat.completions.create(
    model="cmc-ai-v1-gpt-5.1",
    messages=[{"role": "user", "content": "What does CMC say about Bitcoin halving?"}],
    tools=[{"type": "function", "function": {"name": "cmc_content_search"}}],
    tool_choice="required",
    max_completion_tokens=200,
)
# r1.choices[0].message.content is empty.
# Read traces from r1.model_dump()["cmc"]["tool_traces"] (older SDKs strip cmc.*).

# Turn 2: resume with the same user prompt and previous_response_id.
# The server pulls the trace from the prior response, so the model
# answers from that data instead of calling the tool again.
r2 = client.chat.completions.create(
    model="cmc-ai-v1-gpt-5.1",
    messages=[{"role": "user", "content": "What does CMC say about Bitcoin halving?"}],
    extra_body={"previous_response_id": r1.id},
    max_completion_tokens=200,
)
print(r2.choices[0].message.content)
```

Synthesis turns can be expensive. The server replays the prior turn's tool trace as input context, which can run to thousands of `prompt_tokens` for tools that return long content (e.g. `cmc_content_search` results). Inspect `cmc.cost.total_cost` on the synthesis response to see the actual charge.

### Multi-turn chat with tool calls

`previous_response_id` resumes one tool follow-up at a time. For a back-and-forth conversation where each user question may invoke its own tool call, build up a `messages` history yourself and send it on every request. After each tool follow-up, append only the assistant's final synthesized text to history. Do not carry the assistant message that produced the `tool_calls` or the `tool` result message into the next turn. Those exist only to bridge the immediate follow-up.

The pattern per turn is: send the full conversation history with the new user message, let the built-in run, resume with `previous_response_id` to get the synthesized reply, append that reply to history.

```python
history = []

def ask(user_message):
    """Send a user message, resume for the synthesized reply, append it to history."""
    history.append({"role": "user", "content": user_message})

    # Turn A: send full history with tools available. The built-in runs server-side.
    r1 = client.chat.completions.create(
        model="cmc-ai-v1-gpt-5.1",
        messages=history,
        tools=[{"type": "function", "function": {"name": "cmc_id_lookup"}}],
        tool_choice="required",
        max_completion_tokens=300,
    )

    # Turn B: resume to get the assistant's synthesized reply. Omit `tools` here so
    # the request doesn't pay the 10% built-in surcharge a second time.
    r2 = client.chat.completions.create(
        model="cmc-ai-v1-gpt-5.1",
        messages=[{"role": "user", "content": user_message}],
        extra_body={"previous_response_id": r1.id},
        max_completion_tokens=600,
    )
    final_text = r2.choices[0].message.content
    history.append({"role": "assistant", "content": final_text})
    return final_text

print(ask("What is BTC's CMC ID?"))     # built-in fires, synthesized reply appended
print(ask("And ETH's?"))                 # built-in fires again on the new turn
```

`tool_choice="required"` forces the built-in to run on each turn. Drop it (or use `"auto"`) when you want the model to decide for itself, e.g. when a follow-up can be answered from history alone.

The same shape works for custom tools, with two differences:

1. Read the model's tool call from `r1.choices[0].message.tool_calls` (not from `cmc.tool_traces`).
2. In the resume call, send the tool result you computed locally as a `tool` message in `messages` along with `previous_response_id` (the same shape shown in [Custom tools](#custom-tools) above for a single turn).

After the synthesis, append only `r2.choices[0].message.content` to history and continue. As with the built-in pattern, do not carry the assistant message that produced `tool_calls` or the `tool` result message into the next turn.

Facts produced by a tool reach later turns through the synthesized assistant text. The tool trace itself is not carried forward in `messages` history. If a later question depends on a specific tool-returned value, make sure the synthesis turn surfaces that value in its reply.

#### Replaying tool messages in history is not supported

Replaying tool calls back to the server in `messages` history returns `400` with the body message `"messages with role 'tool' must be a response to a preceeding message with 'tool_calls'"`. Both of these shapes are rejected:

- `[user, assistant_with_tool_calls, tool_result]` (the OpenAI stateless replay pattern).
- `[user, tool_result]` (same shape with the assistant message dropped).

Use `previous_response_id` for the immediate tool follow-up as shown above, then continue with text-only history.

## OpenAI Python SDK

The endpoint is wire-compatible with the official `openai` Python client. The basic setup is in [Quick start](#quick-start). Pass your CMC key via `default_headers` so it goes out as `X-CMC_PRO_API_KEY`. The SDK requires its own `api_key` argument, but the value is unused by CMC.

Streaming works the same way as a non-streaming call.

```python
stream = client.chat.completions.create(
    model="cmc-ai-v1-gpt-5.1",
    messages=[{"role": "user", "content": "Tell me about Ethereum."}],
    stream=True,
    max_completion_tokens=200,
)
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
```

### Reading CMC tool traces from the SDK

CMC-specific fields like `cmc.cost` and `cmc.tool_traces` aren't part of the OpenAI response model, so older SDK versions strip them. Call `response.model_dump()` to get a dict that includes the CMC fields. This is the SDK's serialized view, not byte-for-byte raw HTTP, so it can carry SDK-shaped placeholders (e.g. `usage.total_tokens: None`, `message.audio`, `message.refusal`, `message.function_call`) that the wire response doesn't include. Recent SDK versions also expose `response.cmc` directly as a dict. For built-in tools, you only need to specify the `name`. The SDK accepts `{"function": {"name": "cmc_id_lookup"}}` and the server fills in the canonical schema.

```python
response = client.chat.completions.create(
    model="cmc-ai-v1-gpt-5.1",
    messages=[{"role": "user", "content": "Tell me about Bitcoin."}],
    tools=[{"type": "function", "function": {"name": "cmc_id_lookup"}}],
    tool_choice="required",
    max_completion_tokens=200,
)

raw = response.model_dump()
print(raw["cmc"]["tool_traces"])
print(raw["cmc"]["cost"])
```

When a built-in runs, `response.choices[0].message.content` is empty and `tool_calls` is absent. The result is on `raw["cmc"]["tool_traces"]`.

## Notes on billing

- Token costs match each model provider's published rate exactly. We don't mark up the base model.
- A 10% surcharge applies on requests whose `tools` array contains a CMC built-in, regardless of whether the model invokes it. The surcharge is added on top of the token cost for that request.
- Custom tool execution happens on your end and doesn't add to `total_cost`.
- Each response carries the exact charge in `cmc.cost.total_cost`, in USD.
