Kimi-K2.5 is Moonshot AI’s current flagship open model for people who need one system to handle coding, visual inputs, long-context reasoning, and agent-style execution. Officially released on January 27, 2026, it extends Kimi K2 with native multimodality, a 256K context window, thinking and non-thinking modes, and a new Agent Swarm workflow for parallel task execution.
That makes Kimi K2.5 more than “K2 with better scores.” In practice, it is a broader work model: it can turn screenshots into front-end code, reason over images and video, call tools through the API, and power product surfaces that generate documents, slides, sheets, websites, and research outputs.
What is Kimi K2.5?
Kimi K2.5 is an open-source multimodal agentic model from Moonshot AI. It keeps the large MoE backbone that made Kimi K2 notable, then adds native vision, longer context, stronger coding, and an optional Agent Swarm mode for more parallel, tool-heavy work.On Kimi web and app, Instant, Thinking, Agent, and Agent Swarm are product modes built around K2.5 workflows, not four separate foundation models.
What’s new in Kimi-K2.5
- Official release: January 27, 2026
- Built on Kimi K2 with continued pretraining over ~15T mixed visual + text tokens
- Adds native multimodality and a MoonViT 400M vision encoder
- Extends official context to 256K
- Introduces Agent Swarm with up to 100 sub-agents and up to 1,500 tool calls in parallel workflows
- Ships across Kimi web, Kimi app, API, and Kimi Code
Best for
Kimi K2.5 is best for teams and individuals who want one open model that can do serious front-end generation, multimodal reasoning, agentic coding, and long, tool-augmented workflows without splitting work across separate text and vision models. It is especially compelling for developers, research-heavy teams, and knowledge workers who want open weights plus official hosted access.
Not ideal for
Kimi K2.5 is less ideal if your workload is purely text-only and cost-sensitive, if you need the most polished proprietary benchmark leader in every category, or if your workflow depends on the official built-in web search tool while keeping K2.5 thinking mode enabled in the API. It is also not a lightweight local model; this is still a 1T-parameter MoE system with real deployment demands.
What makes Kimi K2.5 different from Kimi K2?
The short answer is that Kimi K2.5 takes K2’s agentic coding foundation and turns it into a broader multimodal work model. The biggest practical changes are native vision, a larger official context window, stronger front-end generation, and the addition of Agent Swarm.
Kimi K2 itself was introduced as a text-first MoE model optimized for agentic behavior, tool use, reasoning, and coding. Its public open model summary lists a 128K context window and explicitly describes Kimi-K2-Instruct as a reflex-grade model without long thinking. K2.5 keeps the same 1T / 32B sparse backbone pattern but adds a native vision stack, 256K context in official docs, thinking and non-thinking modes within the same model, and a product layer built around research, office outputs, and swarm execution.
The most important distinction for normal users is simple: K2 was already strong for text-heavy agentic work, but K2.5 is the version that is designed to see as well as reason. That matters if your inputs are screenshots, design files, video clips, charts, forms, scientific figures, or mixed office documents rather than plain prompts.
It also matters that Moonshot now exposes K2.5 on consumer-facing product surfaces in four modes: Instant, Thinking, Agent, and Agent Swarm. That makes the model easier to understand operationally: quick chat, deeper reasoning, work-product generation, and large parallel workflows all sit on one K2.5-centered stack rather than feeling like separate products.
Kimi K2.5 architecture and technical specs
At the architecture level, Kimi K2.5 is still a very large Mixture-of-Experts model, but it is no longer just a text model with some multimodal add-on behavior. Official materials describe it as a native multimodal architecture with a dedicated MoonViT vision encoder and support for visual + text input throughout official access surfaces.
Kimi K2.5 at a glance
| Spec | Kimi K2.5 |
|---|---|
| Release date | January 27, 2026 |
| Model type | Open-source multimodal agentic model |
| Architecture | Mixture-of-Experts (MoE) |
| Total parameters | 1T |
| Activated parameters | 32B |
| Layers | 61 |
| Experts | 384 |
| Experts selected per token | 8 |
| Vocabulary | 160K |
| Context window | 256K |
| Vision encoder | MoonViT, 400M parameters |
| Inputs | Text, image, video in official usage examples |
| Product modes | Instant, Thinking, Agent, Agent Swarm (Beta) |
Specs compiled from Moonshot’s official Kimi model page, GitHub repo, and API docs.
For non-specialists, three numbers matter most. First, 1T total / 32B active tells you this is a sparse model with very large capacity, but it does not activate the entire parameter count on every token. Second, 256K context means it is built for very long prompts, large documents, and multi-step agent loops. Third, the MoonViT 400M encoder is the clearest signal that the vision capability is structural, not an afterthought.
For developers, the practical point is that Kimi K2.5 combines three layers that often get separated in other stacks: multimodal understanding, deep reasoning, and tool-based execution. That unified design is why it shows up in product claims around visual coding, office workflows, and agentic search at the same time.
Kimi-K2.5 benchmarks explained
The headline on Kimi K2.5 benchmarks is not that it wins everything. The real story is that it performs especially well when tasks mix reasoning, tools, code, and visual inputs, while still trailing stronger closed models on some pure frontier benchmarks.
Here are the public benchmark numbers most worth knowing:
- HLE with tools: 50.2 — this is the clearest argument for K2.5 as a serious agent model. It suggests the model becomes much more competitive when it can search, browse, and use code tools rather than answer from the raw prompt alone.
- AIME 2025: 96.1 and HMMT 2025 (Feb): 95.4 — strong evidence that K2.5 is not just a UI or code model. It can handle difficult mathematical reasoning at a very high level.
- GPQA-Diamond: 87.6 — strong graduate-level science and knowledge reasoning, but not the top score in Moonshot’s own comparison table.
- MMMU-Pro: 78.5, MathVision: 84.2, MathVista: 90.1 — these numbers are why “multimodal” should be taken seriously here. K2.5 is competitive on image-heavy reasoning, not just OCR-style perception.
- SWE-Bench Verified: 76.8 and SWE-Bench Multilingual: 73.0 — this is strong evidence that K2.5 is useful for real software work, not just toy coding prompts.
Benchmark meaning in practice
If you translate those numbers into everyday decisions, the pattern is straightforward. HLE-with-tools and SWE-Bench tell you K2.5 is unusually promising for agent builders and developers. MMMU-Pro, MathVision, and MathVista tell you the vision side is strong enough for chart-heavy documents, design references, and scientific or technical visuals. AIME and HMMT tell you the model can reason carefully when you give it time.
The honest caveat is just as important. Official Moonshot comparisons still show K2.5 behind GPT-5.2 and Gemini 3 Pro on several tasks, including GPQA-Diamond and MMMU-Pro, and behind GPT-5.2 and Claude 4.5 Opus on SWE-Bench Verified. So the best reading is not “K2.5 beats closed models everywhere.” It is “K2.5 is unusually strong for an open model that combines vision, code, and agency in one stack.”
Moonshot also warns that benchmark reproduction can vary by deployment. Its benchmarking guide explicitly recommends using the official API for evaluation-sensitive testing and points users to Kimi Vendor Verifier when comparing third-party providers, because some endpoints show measurable accuracy drift.
Kimi K2.5 use cases
Kimi K2.5 is strongest when the job is bigger than a plain chat response. It shines when you need working code, multimodal analysis, structured outputs, or tool-driven execution across long contexts.
Visual coding and front-end generation
This is one of the clearest K2.5 use cases. Moonshot positions the model as a visual coding system that can turn text, screenshots, images, and even video references into functional front-end code with layout fidelity and interactive behavior. That makes it relevant for landing pages, internal tools, rapid prototypes, UI recreation, and visual debugging.
In plain English, K2.5 is not only “good at code.” It is a visual coding model. If your prompt starts with “build this from the screenshot,” “match this layout,” or “turn this mockup into React,” K2.5 is much more in its element than a text-only model.
Agent Swarm research workflows
Agent Swarm is the feature that makes K2.5 feel different from earlier Kimi models. Official materials describe it as a self-directed system that can create up to 100 sub-agents, coordinate up to 1,500 tool calls, and reduce execution time by up to 4.5x versus a single-agent setup on some workloads.
Use Kimi K2.5 Agent Swarm when the problem is naturally parallel: broad research, batch gathering, long-form synthesis, many-file processing, or tasks where one agent would otherwise serialize the work. Do not use it for a simple question, a short code snippet, or basic drafting. Swarm is a scale tool, not a default mode.
API and tool use for developers
For developers, Kimi K2.5 API access is one of the model’s biggest strengths. Moonshot’s docs describe kimi-k2.5 as a 256K model with visual + text input, tool calling, thinking and non-thinking modes, JSON mode, partial mode, and official web search support. The platform also presents the API as OpenAI-compatible, and Moonshot’s GitHub README says it additionally offers Anthropic-compatible access patterns.
There is one nuance that matters in production: Moonshot’s current docs say the official built-in $web_search tool is temporarily incompatible with Kimi K2.5 thinking mode, so you may need to disable thinking for search-heavy workflows. That is exactly the kind of detail that product pages often skip, but it matters if you are building real agent loops.
Documents, slides, spreadsheets, and office workflows
Kimi’s own product layer is clearly trying to turn K2.5 into a practical work engine, not just a chat model. The official model page shows K2.5 being used for documents, slides, sheets, websites, and deep research. The tech blog adds that K2.5 Agent is intended for large, dense office-style tasks, including spreadsheets, PDFs, and slide decks.
This matters for editorial positioning. If you are writing for non-developers, K2.5 should be framed as a model that can generate outputs people actually deliver: a slide deck, a report, a spreadsheet, a prototype site. At the same time, those are product-layer workflows, not proof that every raw API integration will behave like the consumer product.
Moonshot also reports internal benchmark gains over K2 Thinking for office and general-agent tasks. Those results are useful as directional evidence, but because they are internal benchmarks, they should be treated more cautiously than public benchmarks like AIME or SWE-Bench.
How to use Kimi K2.5
You can use Kimi K2.5 through the consumer product, the mobile app, the official developer API, Kimi Code, and at least one major third-party hosted provider. The right access path depends on whether you want chat, work-product generation, coding assistance, or programmable inference.
Kimi web
On Kimi web, K2.5 is exposed through four modes: Instant, Thinking, Agent, and Agent Swarm (Beta). This is the easiest path for most users because it wraps the model in higher-level workflows and lets you preview outputs such as docs, slides, sheets, websites, and research reports.
Kimi app
The Kimi app follows the same mode structure as the web surface, according to Moonshot’s official launch materials. If your use case is consumer-facing chat, quick analysis, or mobile access to Agent/Agent Swarm workflows, the app is the simplest route.
Kimi API
The Kimi K2.5 API is the route for developers. Moonshot’s platform currently lists kimi-k2.5 with visual + text input, thinking and non-thinking modes, dialogue + agent tasks, and pricing of $0.10 / 1M cached-hit tokens, $0.60 / 1M input tokens, and $3.00 / 1M output tokens on the official platform. The same platform describes billing as pay-as-you-go.
One more pricing honesty check: I did not find a current official statement promising a standing free API tier for new users. In fact, an official February 2026 Moonshot setup guide says newly registered API accounts have no balance and need a recharge before the API key can be used. So for publication, it is safer to describe the official API as paid, usage-based access rather than “free with credits.”
Kimi Code
Kimi Code is Moonshot’s coding-focused access path. Official pages describe it as a coding perk within the Kimi membership/coding plan, and the product page shows it running as a CLI/IDE workflow powered by kimi-k2.5. Kimi Code docs also position it as a premium subscription tier with console-based key management and one-click authentication flows.
This is the best option if your goal is not “call a model” but “ship code faster.” For the broader Kimi cluster on your site, it also gives a clean internal linking path to Kimi docs and your future API docs page.
Third-party providers
For hosted third-party inference, Together AI currently exposes the model as moonshotai/Kimi-K2.5. Its model card positions K2.5 as a multimodal thinking agent and provides ready-to-use chat completion examples.
For open-weight access, the official weights are on Hugging Face, and Moonshot’s GitHub README recommends vLLM, SGLang, or KTransformers for deployment, with transformers >= 4.57.1. That matters more than it sounds: K2.5 may be open, but using it well still depends on the right runtime and config.
Is Kimi K2.5 open source?
Yes. Kimi K2.5 is open source in the practical sense most developers care about: Moonshot publishes both the code repository and the model weights. The official Kimi page points users to GitHub and Hugging Face, and the GitHub README explicitly says both the code and weights are released under the Modified MIT License.
The license is permissive, but it is not identical to plain MIT. Moonshot’s license text says that if a commercial product or service using the software exceeds 100 million monthly active users or $20 million in monthly revenue, the interface must prominently display “Kimi K2.5.” That is a real commercial-use nuance, so anyone deploying at scale should read the license itself before shipping.
Is Kimi K2.5 free?
Kimi K2.5 is free in some access paths, but not all of them. The clearest official wording is that Kimi web/app offer free access with usage limits, while paid plans unlock higher usage and stronger workflow access.
That does not mean every K2.5 access path is free. The official API is pay-as-you-go, and Moonshot’s own platform currently lists K2.5 token pricing on the developer platform. Kimi Code is also positioned as a paid coding plan or premium subscription tier. Agent Swarm is in beta, and Moonshot’s launch blog says free credits there are currently tied to high-tier paid users on Kimi.com.
So the clean publication-safe answer is this: Kimi K2.5 is free to try on consumer surfaces, but serious API and coding usage should be treated as paid.
Kimi K2.5 vs other models
If you want one sentence: Kimi K2.5 is the best choice in the Kimi family when you need vision + coding + agent workflows + open weights together. Kimi K2 remains relevant for earlier text-first agentic use, and K2 Thinking remains useful when you specifically want a separate text-only reasoning model.
| Model | Best when | Modality | Context | Key note |
|---|---|---|---|---|
| Kimi-K2.5 | You want the most complete open Kimi stack | Text + image + video examples in official docs | 256K | Includes thinking/non-thinking modes and powers Agent / Agent Swarm workflows |
| Kimi K2 | You want the earlier text-first open K2 family | Text | 128K in the open K2 repo | Strong agentic/coding base, but no native vision layer |
| kimi-k2-thinking | You want a separate text-only reasoning model | Text | 256K in current API docs | Distinct API model; docs say it does not support vision |
Comparison built from Moonshot’s GitHub repos, pricing/docs pages, and K2 Thinking announcement.
A useful clarification for readers: K2 Thinking is not the same thing as Kimi K2.5 Thinking. kimi-k2-thinking is a separate API model family that Moonshot announced in November 2025 and that current docs describe as a text-only deep-reasoning model. By contrast, K2.5 has its own thinking and non-thinking modes inside the multimodal kimi-k2.5 model.
Against closed-source competitors, the balanced take is this: Moonshot’s own benchmark table shows K2.5 beating GPT-5.2, Claude 4.5 Opus, and Gemini 3 Pro on HLE with tools, and performing very strongly on MathVista mini. But the same table still shows GPT-5.2 or Gemini ahead on several pure reasoning and multimodal benchmarks, and Claude/GPT-5.2 ahead on SWE-Bench Verified. So K2.5 is strongest as an open, versatile, developer-friendly choice, not as the unquestioned number-one model in every benchmark slice.
If you are expanding your own site cluster, this is where your internal links should do useful work. Readers who need older-model context can go to Kimi K2. Readers who need broader family context can go to the Kimi models hub. Readers who need implementation details should move into your docs section and API docs.
Limitations and tradeoffs
Kimi K2.5 is powerful, but it is not the right answer for every workflow. Its strengths come with real operational tradeoffs.
First, Agent Swarm is still beta on Kimi.com. That means it is promising and usable, but it should not be described as a fully settled production feature.
Second, provider quality can vary. Moonshot’s own docs warn that some third-party endpoints show noticeable accuracy drift and recommend the official API for evaluation-sensitive work. If you are doing benchmarks, audits, or tool-use-heavy production testing, this matters.
Third, tool behavior has caveats. Today, the official built-in $web_search tool is temporarily incompatible with K2.5 thinking mode in the API. For some developer workflows, that is a meaningful limitation, not a footnote.
Fourth, open source does not mean lightweight. Moonshot recommends specific inference engines and a recent transformers version, and the model remains a 1T MoE system. Self-hosting is possible, but it is not the same thing as casually running a 7B model on a laptop.
Fifth, cost still matters. K2.5’s official API input price matches the current kimi-k2 family, but output pricing is higher than K2’s listed output price. If your workload is text-only and output-heavy, the older K2 family may still be the more efficient choice.
FAQ
What is Kimi K2.5?
Kimi K2.5 is Moonshot AI’s open-source multimodal model for text, code, and visual reasoning. Official materials position it as a native multimodal agentic model that supports thinking and non-thinking modes, long context, and both conversational and agent-style workflows.
Is Kimi K2.5 open source?
Yes. Moonshot publishes both the public GitHub repo and the weights on Hugging Face, and the project is released under a Modified MIT License rather than a closed commercial-only license. Large-scale commercial products still need to review the attribution condition in the license text.
What is Agent Swarm and when should I use it?
Agent Swarm is K2.5’s parallel multi-agent execution mode. Moonshot says it can coordinate up to 100 sub-agents and up to 1,500 tool calls. It is best for broad, parallelizable work like large research projects, batch collection, and multi-part synthesis, not simple one-shot chats.
How do I access Kimi K2.5?
You can access Kimi K2.5 through Kimi web, the Kimi app, the official Moonshot developer API, Kimi Code, and Together AI’s hosted endpoint. The right choice depends on whether you want direct chat, structured work outputs, developer integration, or coding-agent workflows.
Is Kimi K2.5 free?
Consumer access is free with limits on Kimi web/app, but that does not extend cleanly to every access path. The official API is pay-as-you-go, Kimi Code is a paid coding tier, and Agent Swarm beta credits are currently tied to high-tier paid users on Kimi.com.
What is the difference between Kimi K2.5 and Kimi K2?
K2.5 adds native multimodality, a 256K context window in official docs, stronger visual coding, and Agent Swarm. Kimi K2 was the earlier text-first open model family: still strong for agentic coding and reasoning, but without K2.5’s native vision layer and expanded product modes.
Is K2 Thinking the same as Kimi K2.5 Thinking?
No. kimi-k2-thinking is a separate API model family introduced in November 2025 for deep reasoning and text-first agentic work. Kimi K2.5 Thinking is a mode inside the multimodal kimi-k2.5 model. Moonshot’s docs also say kimi-k2-thinking does not support vision.
Can developers build with the Kimi K2.5 API today?
Yes. Moonshot’s API docs present kimi-k2.5 as a current model with OpenAI-compatible access patterns, multimodal input, tool calls, JSON mode, and web search tooling. Together AI also exposes it as moonshotai/Kimi-K2.5 for hosted third-party use.
Conclusion
Kimi-K2.5 is best understood as Moonshot’s attempt to unify visual understanding, agentic execution, and real-world coding inside one open model. That does not make it the perfect choice for every workload, but it does make it one of the most interesting open models available right now if you care about multimodal work, agent loops, and production-minded coding.
For kimi-ai.chat, the strongest next step is not a hard sell. It is a clean cluster path: send new readers to the homepage, give model-family context through Kimi models and Kimi K2, and move builders deeper into docs and API docs. That keeps this page focused on what it should be: the most useful independent guide to Kimi-K2.5, not a copy of a model card.
Kimi-ai.chat is an independent resource and is not affiliated with Moonshot AI.
Last updated: March 29, 2026




