ModelsPricingEnterprise
500+ AI Model API, All In One API.Just In CometAPI
Models API
Developer
Quick StartDocumentationAPI Dashboard
Company
About usEnterprise
Resources
AI ModelsBlogChangelogSupport
Terms of ServicePrivacy Policy
© 2026 CometAPI · All rights reserved
Home/Models/Zhipu AI/GLM 4.6
Z

GLM 4.6

Input:$0.8/M
Output:$3.2/M
Context:200,000
Max Output:128,000
Zhipu's latest flagship model GLM-4.6 released: total parameters 355B, active parameters 32B. Overall core capabilities surpass GLM-4.5. Coding: Aligns with Claude Sonnet 4, best in China. Context: Extended to 200K (originally 128K). Inference: Improved, supports Tool calls. Search: Optimized Tool and agent framework. Writing: More aligned with human preferences, writing style, and role-playing. Multilingual: Enhanced translation effect.
New
Commercial Use
Playground
Overview
Features
Pricing
API

GLM-4.6 is the latest major release in Z.ai’s (formerly Zhipu AI) GLM family: a 4th-generation, large-language MoE (Mixture-of-Experts) model tuned for agentic workflows, long-context reasoning and real-world coding. The release emphasizes practical agent/tool integration, a very large context window, and open-weight availability for local deployment.

Key features

  • Long context — native 200K token context window (expanded from 128K). (docs.z.ai)
  • Coding & agentic capability — marketed improvements on real-world coding tasks and better tool invocation for agents.
  • Efficiency — reported ~30% lower token consumption vs GLM-4.5 on Z.ai’s tests.
  • Deployment & quantization — first announced FP8 and Int4 integration for Cambricon chips; native FP8 support on Moore Threads via vLLM.
  • Model size & tensor type — published artifacts indicate a ~357B-parameter model (BF16 / F32 tensors) on Hugging Face.

Technical details

Modalities & formats. GLM-4.6 is a text-only LLM (input and output modalities: text). Context length = 200K tokens; max output = 128K tokens.

Quantization & hardware support. The team reports FP8/Int4 quantization on Cambricon chips and native FP8 execution on Moore Threads GPUs using vLLM for inference — important for lowering inference cost and allowing on-prem and domestic cloud deployments.

Tooling & integrations. GLM-4.6 is distributed through Z.ai’s API, third-party provider networks (e.g., CometAPI), and integrated into coding agents (Claude Code, Cline, Roo Code, Kilo Code).

Technical details

Modalities & formats. GLM-4.6 is a text-only LLM (input and output modalities: text). Context length = 200K tokens; max output = 128K tokens.

Quantization & hardware support. The team reports FP8/Int4 quantization on Cambricon chips and native FP8 execution on Moore Threads GPUs using vLLM for inference — important for lowering inference cost and allowing on-prem and domestic cloud deployments.

Tooling & integrations. GLM-4.6 is distributed through Z.ai’s API, third-party provider networks (e.g., CometAPI), and integrated into coding agents (Claude Code, Cline, Roo Code, Kilo Code).

Benchmark performance

  • Published evaluations: GLM-4.6 was tested on eight public benchmarks covering agents, reasoning and coding and shows clear gains over GLM-4.5. On human-evaluated, real-world coding tests (extended CC-Bench), GLM-4.6 uses ~15% fewer tokens vs GLM-4.5 and posts a ~48.6% win rate vs Anthropic’s Claude Sonnet 4 (near-parity on many leaderboards).
  • Positioning: results claim GLM-4.6 is competitive with leading domestic and international models (examples cited include DeepSeek-V3.1 and Claude Sonnet 4).

img

Limitations & risks

  • Hallucinations & mistakes: like all current LLMs, GLM-4.6 can and does make factual errors — Z.ai’s docs explicitly warn outputs may contain mistakes. Users should apply verification & retrieval/RAG for critical content.
  • Model complexity & serving cost: 200K context and very large outputs dramatically increase memory & latency demands and can raise inference costs; quantized/inference engineering is required to run at scale.
  • Domain gaps: while GLM-4.6 reports strong agent/coding performance, some public reports note it still lags certain versions of competing models in specific microbenchmarks (e.g., some coding metrics vs Sonnet 4.5). Assess per-task before replacing production models.
  • Safety & policy: open weights increase accessibility but also raise stewardship questions (mitigations, guardrails, and red-teaming remain the user’s responsibility).

Use cases

  • Agentic systems & tool orchestration: long agent traces, multi-tool planning, dynamic tool invocation; the model’s agentic tuning is a key selling point.
  • Real-world coding assistants: multi-turn code generation, code review and interactive IDE assistants (integrated in Claude Code, Cline, Roo Code—per Z.ai). Token efficiency improvements make it attractive for heavy-use developer plans.
  • Long-document workflows: summarization, multi-document synthesis, long legal/technical reviews due to the 200K window.
  • Content creation & virtual characters: extended dialogues, consistent persona maintenance in multi-turn scenarios.

How GLM-4.6 compares to other models

  • GLM-4.5 → GLM-4.6: step change in context size (128K → 200K) and token efficiency (~15% fewer tokens on CC-Bench); improved agent/tool use.
  • GLM-4.6 vs Claude Sonnet 4 / Sonnet 4.5: Z.ai reports near parity on several leaderboards and a ~48.6% win rate on the CC-Bench real-world coding tasks (i.e., close competition, with some microbenchmarks where Sonnet still leads). For many engineering teams, GLM-4.6 is positioned as a cost-efficient alternative.
  • GLM-4.6 vs other long-context models (DeepSeek, Gemini variants, GPT-4 family): GLM-4.6 emphasizes large context & agentic coding workflows; relative strengths depend on metric (token efficiency/agent integration vs raw code synthesis accuracy or safety pipelines). Empirical selection should be task-driven.

Zhipu AI’s latest flagship model GLM-4.6 released: 355B total params, 32B active. Surpasses GLM-4.5 in all core capabilities.

  • Coding: Aligns with Claude Sonnet 4, best in China.
  • Context: Expanded to 200K (from 128K).
  • Reasoning: Improved, supports tool calling during inference.
  • Search: Enhanced tool calling and agent performance.
  • Writing: Better aligns with human preferences in style, readability, and role-playing.
  • Multilingual: Boosted cross-language translation.

FAQ

1

test

Pricing for GLM 4.6

47474
Comet Price (USD / M Tokens)Official Price (USD / M Tokens)Discount
Input:$0.8/M
Output:$3.2/M
Input:$1/M
Output:$4/M
-20%

Sample code and API for GLM 4.6

GLM-4.6 is the latest major release in Z.ai’s (formerly Zhipu AI) GLM family: a 4th-generation, large-language MoE (Mixture-of-Experts) model tuned for agentic workflows, long-context reasoning and real-world coding. The release emphasizes practical agent/tool integration, a very large context window, and open-weight availability for local deployment.
POST
/v1/chat/completions
Python
JavaScript
Curl
from openai import OpenAI
import os

# Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

completion = client.chat.completions.create(
    model="glm-4.6",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
)

print(completion.choices[0].message.content)

Python Code Example

from openai import OpenAI
import os

# Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

completion = client.chat.completions.create(
    model="glm-4.6",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
)

print(completion.choices[0].message.content)

JavaScript Code Example

import OpenAI from "openai";

// Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
const api_key = process.env.COMETAPI_KEY;
const base_url = "https://api.cometapi.com/v1";

const openai = new OpenAI({
  apiKey: api_key,
  baseURL: base_url,
});

const completion = await openai.chat.completions.create({
  model: "glm-4.6",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Hello!" },
  ],
});

console.log(completion.choices[0].message.content);

Curl Code Example

curl https://api.cometapi.com/v1/chat/completions \
     --header "Authorization: Bearer $COMETAPI_KEY" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "glm-4.6",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
}'