Cursor 3 and Gemma 4 Dropped on the Same Day. Your Stack Just Changed.
On April 2, 2026, Google shipped Gemma 4 (89% on AIME, 80% on LiveCodeBench, 86% on agentic tool use) and Cursor shipped a ground-up agent-first IDE. Here is what the new developer stack looks like.
Two releases shipped within hours of each other on April 2, 2026. One is the most capable open model Google has ever released. The other is a ground-up redesign of the most widely used AI coding editor. Neither is incremental. Together they define a stack shift that every developer building with AI should understand this week.
Gemma 4: Open Frontier Intelligence
Google DeepMind shipped Gemma 4 today, built on the same research and architecture as Gemini 3. The benchmark numbers are not a modest improvement over Gemma 3 — they are a generation jump:
| Benchmark | Gemma 4 31B | Gemma 4 26B | Gemma 3 27B | Delta |
|---|---|---|---|---|
| AIME 2026 (math) | 89.2% | 88.3% | 20.8% | +329% |
| LiveCodeBench v6 (coding) | 80.0% | 77.1% | 29.1% | +175% |
| τ2-bench (agentic tool use) | 86.4% | 85.5% | 6.6% | +1,209% |
| GPQA Diamond (science) | 84.3% | 82.3% | 42.4% | +99% |
| MMLU Multilingual | 85.2% | 82.6% | 67.6% | +26% |
| Arena AI | 1452 | 1441 | 1365 | +87 pts |
The agentic tool-use jump is the number that matters most for developers. τ2-bench measures real-world agent tasks — navigating apps, calling APIs, completing multi-step workflows. Gemma 3 scored 6.6%. Gemma 4 scores 86.4%. A 13× improvement is not refinement; it is a capability unlock.
The "E" variants are the edge story. Gemma 4 ships in four sizes: 31B, 26B (A4B — 4-bit active), E4B, and E2B. The E (Efficient) variants target mobile and IoT devices, delivering frontier intelligence on personal computers and edge hardware. E2B on an iPhone is now a realistic deployment target.
What Gemma 4 Does That Gemma 3 Couldn't
Three capabilities define the step change:
Native function calling. Gemma 4 has built-in support for structured tool use, not prompt-engineered workarounds. Define a function schema, pass it to the model, get back structured calls. This is the foundation of every serious agent pipeline.
Multimodal reasoning. Audio and visual understanding in the same model that does code and tool use. A single model can read a screenshot, understand what it shows, and write code to interact with it.
140-language support. Not translation — cultural context understanding. For teams building products for non-English markets, this removes the language bottleneck on fine-tuned specializations.
Running Gemma 4 Today
# Via Ollama (easiest path)
ollama pull gemma4:27b
ollama run gemma4:27b "Explain XNOR-popcount and why it matters for 1-bit LLMs"
# Via Hugging Face Transformers
pip install transformers torch
python3 - <<'EOF'
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "google/gemma-4-27b-it"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [{"role": "user", "content": "Write a Fastify route handler that validates a JWT and returns the user profile."}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.2)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
EOF
# Via OpenAI-compatible API (Google AI Studio)
curl https://generativelanguage.googleapis.com/v1beta/openai/chat/completions \
-H "Authorization: Bearer $GOOGLE_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "gemma-4-27b-it", "messages": [{"role": "user", "content": "Hello"}]}'The OpenAI-compatible API endpoint means Gemma 4 drops into any existing stack that accepts an OpenAI-format client — LangChain, LlamaIndex, Cursor's custom model config, all of it.
Cursor 3: The IDE That Built Itself Over Again
Cursor shipped version 3 today and it is not an update to the IDE. It is a new interface built from scratch, centered on the assumption that agents write most of your code and humans review the results.
The original Cursor was a VS Code fork — the smartest IDE in the market, but still fundamentally a file editor with AI layered on top. Cursor 3 inverts this. The primary interface is your agents. Files are something you dive into when you need them.
What's Actually New
Multi-repo workspace. A single Cursor 3 window spans multiple repositories. If your backend, frontend, and mobile app live in different repos, they are all first-class citizens in one view. Agents working across them have full context of the whole system.
All your agents, in one sidebar. Every agent session — local, cloud, triggered from mobile, web, Slack, GitHub, or Linear — appears in a unified sidebar. You see what every agent is doing, what it has produced, and whether it needs your input.
Cursor 3 Sidebar
├── 🤖 fix/auth-bug (local, running)
├── 🌐 feature/payment-flow (cloud, waiting for review)
├── 📱 refactor/mobile-nav (triggered from Linear, complete)
└── ✅ docs/api-reference (merged 2h ago)
Cloud ↔ local handoff. You can push a local agent session to the cloud to keep it running while you close your laptop, or pull a cloud session local when you want to test changes on your own machine. The state transfers cleanly.
Commit to merged PR flow. The new diffs view shows what agents changed, lets you edit inline, stage, commit, and open a PR — without leaving Cursor. The PR loop is now inside the tool that generated the code.
Plugin Marketplace. Hundreds of plugins extend agents with MCPs, skills, and subagents. One-click install. Teams can publish private marketplaces of internal tools.
The key workflow shift in Cursor 3: Stop opening files first, then asking the agent what to do with them. Start by describing the outcome you want, let the agent identify and open the relevant files, then review what it produced. The "outcome first" workflow is what the new interface is built for.
Cursor 3 + Gemma 4: The Open Stack
Cursor supports custom model configurations via its OpenAI-compatible endpoint. That means you can point Cursor 3 at a locally-running Gemma 4 instance:
// Cursor Settings → Models → Add Custom Model
{
"name": "Gemma 4 27B (local)",
"apiKey": "not-needed",
"baseUrl": "http://localhost:11434/v1",
"model": "gemma4:27b"
}With Ollama serving Gemma 4 locally, you get a fully open-source, fully local agentic coding environment: Cursor 3's orchestration layer backed by Gemma 4's frontier intelligence. No API costs, no data leaving your machine, no rate limits.
For sensitive codebases — fintech, healthcare, defense — this combination closes the gap between "capable AI assistance" and "acceptable security posture."
The Bigger Picture: Third Era Tooling Arrives
Cursor CEO Michael Truell named it in February: we are entering the third era of software development. Era one was manual editing. Era two was AI assistance (autocomplete, chat). Era three is autonomous agents shipping code while developers review outcomes.
Today's releases are the first tooling built for era three rather than retrofitted to support it.
Gemma 4's 86.4% on agentic tool use means an open model can now reliably plan, call functions, and complete multi-step tasks. That score would have been state-of-the-art for any model, open or closed, six months ago.
Cursor 3's multi-agent workspace means developers have a control plane for managing fleets of agents rather than individual conversations.
The micromanagement problem is not solved yet. Cursor's own blog acknowledges that engineers are "still micromanaging individual agents, trying to keep track of different conversations." Cursor 3 reduces that friction significantly — but autonomous agents that can be fully trusted to merge PRs without review are still ahead of us. The tools are getting there faster than most expected.
What Gets Eliminated, What Opens Up
| Before today | After today |
|---|---|
| Open-source models weak on tool use | Gemma 4 at 86.4% agentic τ2-bench |
| Separate windows per agent session | Unified sidebar for all agents |
| Local vs cloud agents as separate workflows | Seamless handoff between environments |
| PR creation needs external tool | Commit-to-PR inside Cursor |
| Capable AI requires paid cloud API | Gemma 4 local via Ollama, zero cost |
| Multi-repo work requires context switching | Single Cursor 3 workspace spans repos |
Getting Started This Week
Cursor 3: Download from cursor.com. The new interface is the default. If you have an existing workflow that depends on the old IDE, the option to switch back is available from the menu.
Gemma 4: Pull via ollama pull gemma4:27b (27B is the sweet spot for developer hardware) or access via Google AI Studio's API. The model card at ai.google.dev/gemma/docs/core/model_card_4 has full benchmark breakdowns.
The companion scripts for this article — a Gemma 4 function-calling agent template, a Cursor + local Gemma 4 setup script, and a multi-agent orchestration example — are at github.com/aistackinsights/stackinsights.
Sources & Further Reading
- Meet the New Cursor — Cursor 3. Cursor, April 2026
- Gemma 4 — Google DeepMind. Google DeepMind, April 2026
- Gemma 4 Model Card. Google AI, 2026
- Truell, M. (2026). The Third Era of AI Software Development. Cursor Blog
- Cursor Composer 2. Cursor, March 2026
- Cursor Plugin Marketplace. Cursor Docs, 2026
- τ2-bench: Evaluating AI Agents on Real-World Tasks. arXiv:2502.04938
- LiveCodeBench v6: Benchmarking LLMs on Competitive Programming. 2026
- Qwen3.6-Plus: Towards Real World Agents. Alibaba, April 2026
- Show HN: Cursor 3 — Hacker News Discussion. HN, April 2026
- Google Releases Gemma 4 — Hacker News Discussion. HN, April 2026
- Ollama Model Library — Gemma 4. Ollama, 2026
Was this article helpful?
Related Posts
DeerFlow 2.0: ByteDance Open-Sourced a Full-Stack SuperAgent. Here's the Complete Developer Guide.
ByteDance's DeerFlow 2.0 hit #1 on GitHub Trending with 39K stars in weeks. It's not another chatbot wrapper — it's a full-stack SuperAgent harness with sandboxed execution, persistent memory, sub-agents, and LangGraph orchestration. Here's everything you need to build with it.
Read more1-Bit LLMs Hit Production: What Prism's Bonsai and BitNet Mean for On-Device AI
An 8B language model that fits in 1.15GB of RAM, runs 8x faster than full-precision, and matches its benchmark scores. Prism's Bonsai family just made 1-bit LLMs commercially viable — here is what that unlocks for developers.
Read moreCLAUDE.md Mastery: The Spec File That Turns AI Coding Agents from Chatbots into Team Members
Every AI coding session starts from zero. CLAUDE.md, AGENTS.md, and Cursor Rules are how you give agents institutional memory — and the difference between AI that guesses your conventions and one that ships to them.
Read moreComments
No comments yet. Be the first to share your thoughts!