Sakana AI Sells AB-MCTS to Sakana Marlin, Business Agent Generating 100-Page Research Reports with Slides

Tokyo-based Sakana AI shipped its first commercial product ‘Sakana Marlin’ this week. The Sakana team positions it as a Virtual CSO (Chief Strategic Officer). It is a B2B independent research agency built for businesses.
Marlin doesn’t respond in seconds like a chatbot. You give it one research topic. It then runs automatically for up to eight hours. Each run returns a long report and a presentation slide deck. Sakana says a single session generates hundreds to thousands of LLM questions.
What is Sakana Marlin
Marlin is a business research agent, not a chat assistant. You give it one topic or question. It then formulates hypotheses, consults sources, and validates findings on its own. It compresses weeks of strategic work into hours.
Deliverables are planned for decision makers. The Japanese announcement describes dozens of pages of reports. The English announcement cites nearly 100 pages of reports. In the press conference, the reports were 60–100 pages and cited 60–80 sources. Each report includes a main body, references, and appendices. Presentation slides are generated using AI for image generation.
The Sakana team refined Marlin through a closed beta in April 2026. About 300 experts tested it in real operations during that beta. Those activities included strategy development, market research, risk analysis, and competitive analysis. Sakana also partnered with MUFG and took a strategic investment from Citigroup.
Within AB-MCTS: Broad or Deep
The core of Marlin is AB-MCTS, or Adaptive Branching Monte Carlo Tree Search. It appears in Sakana’s previous research “Breadth or Depth? Scaling LLM Inference-Time Compute and Adaptive Branching Tree Search.”
AB-MCTS treats reasoning as a tree-searching problem. At each step the algorithm makes one decision. It can go further by generating a new response for the candidate. Or it can go deeper by refining an existing promising answer. Standard repeated sampling only goes so far in parallel, and you hope that one answer is correct.
Many LLM options add a second option. It can move the step to a completely different model. In Sakana’s reported ARC-AGI-2 trials, this collaboration helped. Combining the o4-mini, Gemini 2.5 Pro, and DeepSeek-R1 solved about 27.5% of the tasks. The o4-mini model alone solved about 23%. Marlin uses the same dynamic search for long-horizon research.
The second key component of Marlin is automated workflows from Sakana’s AI Scientist project. That project demonstrated an independent scientific finding and was published in Nature.
Interactive demo: Embedded widget (marlin-abmcts-demo.html) shows the “broad or deep” decision live. Press Run and watch the tree grow. Green nodes carry the highest score, and the best path is highlighted. Change to “Multi-LLM” to see the steps that are taken across the various models.
AB-MCTS: “Broad or Deep?” — interactive search
A simplified view of Sakana AI’s Adaptive Branching Monte Carlo Tree Search. Each step is a policy that chooses to widen (new candidate) or deepen (refine promising line).
Search for status
Budget used0/24
Nodes (candidates)1
Excellent score0.00
Broad / Deep0/0
low score
high score
the best way
How does Marlin compare
Marlin competes for depth, not speed. Common deep research tools respond in minutes to tens of minutes. Marlin deliberately spends hours to increase the quality of the output. The lap times of the competitors below are estimated and reported, not official statistics.
| A tool | Average running time | Output | The main user |
|---|---|---|---|
| Kiss Marlin | Up to ~8 hours | Report (dozens to 100 pages) + slides | Business strategy teams |
| An In-depth Study of OpenAI | ~Minutes to tens of minutes | Text report cited | General and professional users |
| Preoccupation with In-depth Research | ~A few minutes | Answer to the quoted text | Standard users |
| An In-depth Study of Google Gemini | ~ Minutes | Text report cited | Standard users and the work environment |
The trade-off is obvious. You wait longer and pay per run. In return you get an in-depth hypothesis test and a finished deliverable. You can cancel the run at any time, but the credits are still used.
The price
Sakana offers pay as you go with Pro, Team, and Enterprise tiers. Pay-as-you-go starts at 100 credits per run, at ¥98 per credit. Pro is ¥150,000 per month and includes 2,000 credits. The team is ¥400,000 per month and includes 6,000 credits. Custom business pricing, with dedicated support.
Use Cases, and Examples
Marlin allows for high-level questions where research is a bottleneck. Here are some concrete examples taken from its target works.
- Entering the market: ‘Explore Japan’s stablecoin and tokenized payments market after regulatory change.’ Marlin outlines the drivers, risks, and options planned in the report.
- Risk analysis: ‘Conditions for model correction of the Strait of Hormuz blockade.’ It compares ideas, not just summaries, before drawing conclusions.
- Competitive analysis: Enter information on three competitors and rate our standings. Returns slides ready for strategy review.
Each example is equivalent to one prompt and one unsupervised run. One still reviews the quoted output before any decision.
Try the Engine for yourself: TreeQuest
You can’t help yourself Marlin. But you can use its main algorithm today. Sakana open-sources AB-MCTS as TreeQuest under the Apache 2.0 license. Enter it, define a production job, and apply a fixed search budget.
import random
import treequest as tq
# Each node holds a user-defined state; score must be normalized to [0, 1].
def generate(parent_state):
if parent_state is None: # None means expand from the root
new_state = "Initial draft"
else:
new_state = f"Refined: {parent_state}"
score = random.random() # swap this for an LLM-based score
return new_state, score
algo = tq.ABMCTSA() # Adaptive Branching MCTS (variant A)
search_tree = algo.init_tree()
for _ in range(10): # generation budget of 10
search_tree = algo.step(search_tree, {"generate": generate})
best_state, best_score = tq.top_k(search_tree, algo, k=1)[0]
print("BEST:", best_state, round(best_score, 3))
Replace the random points so that the LLM judge reproduces the original pattern. TreeQuest also posts multiple LLM searches and long-term assessments. Testing is important because long sessions can hit API errors in the middle.
Strengths and Weaknesses
Power
- Peer-reviewed foundations: AB-MCTS at NeurIPS and AI Scientist in Nature.
- Completed deliverables, including references, appendices, and slides.
- Adaptive compute spends effort on the most promising branches.
- An open-source core (TreeQuest) allows AI researchers to study the method.
Weakness
- Longer times make replication slower compared to minute-scale research tools.
- Automated reports can contain serious errors that require human review.
- Pricing and design aimed at businesses, not individual developers.
- Marlin itself is closed; only the basic algorithm is open.
Key Takeaways
- Sakana Marlin conducts independent research up to eight hours per job.
- A single run produces a multi-page report, as well as slides.
- Build on AB-MCTS (NeurIPS 2025 Spotlight) and AI Scientist workflows (Nature).
- Entry prices are pay as you go: 100 credits per run at ¥98 per credit.
- Directs finance, corporate strategy, consulting, and think tanks.
Sources
- Sakana AI – Sakana Marlin Release:
- Sakana AI – Sakana Marlin Product Page:
- Sakana AI – AB-MCTS and TreeQuest research:
- SakanaAI/treequest (GitHub, Apache 2.0):



