Sakana AI Sells AB-MCTS to Sakana Marlin, Business Agent Generating 100-Page Research Reports with Slides

pleasuremandarya@gmail.com 15/06/2026

0 2 4 minutes read

Sakana AI Sells AB-MCTS to Sakana Marlin, Business Agent Generating 100-Page Research Reports with Slides

Tokyo-based Sakana AI shipped its first commercial product ‘Sakana Marlin’ this week. The Sakana team positions it as a Virtual CSO (Chief Strategic Officer). It is a B2B independent research agency built for businesses.

Marlin doesn’t respond in seconds like a chatbot. You give it one research topic. It then runs automatically for up to eight hours. Each run returns a long report and a presentation slide deck. Sakana says a single session generates hundreds to thousands of LLM questions.

What is Sakana Marlin

Marlin is a business research agent, not a chat assistant. You give it one topic or question. It then formulates hypotheses, consults sources, and validates findings on its own. It compresses weeks of strategic work into hours.

Deliverables are planned for decision makers. The Japanese announcement describes dozens of pages of reports. The English announcement cites nearly 100 pages of reports. In the press conference, the reports were 60–100 pages and cited 60–80 sources. Each report includes a main body, references, and appendices. Presentation slides are generated using AI for image generation.

The Sakana team refined Marlin through a closed beta in April 2026. About 300 experts tested it in real operations during that beta. Those activities included strategy development, market research, risk analysis, and competitive analysis. Sakana also partnered with MUFG and took a strategic investment from Citigroup.

Within AB-MCTS: Broad or Deep

The core of Marlin is AB-MCTS, or Adaptive Branching Monte Carlo Tree Search. It appears in Sakana’s previous research “Breadth or Depth? Scaling LLM Inference-Time Compute and Adaptive Branching Tree Search.”

AB-MCTS treats reasoning as a tree-searching problem. At each step the algorithm makes one decision. It can go further by generating a new response for the candidate. Or it can go deeper by refining an existing promising answer. Standard repeated sampling only goes so far in parallel, and you hope that one answer is correct.

Many LLM options add a second option. It can move the step to a completely different model. In Sakana’s reported ARC-AGI-2 trials, this collaboration helped. Combining the o4-mini, Gemini 2.5 Pro, and DeepSeek-R1 solved about 27.5% of the tasks. The o4-mini model alone solved about 23%. Marlin uses the same dynamic search for long-horizon research.

The second key component of Marlin is automated workflows from Sakana’s AI Scientist project. That project demonstrated an independent scientific finding and was published in Nature.

Interactive demo: Embedded widget (marlin-abmcts-demo.html) shows the “broad or deep” decision live. Press Run and watch the tree grow. Green nodes carry the highest score, and the best path is highlighted. Change to “Multi-LLM” to see the steps that are taken across the various models.

AB-MCTS: “Broad or Deep?” — interactive search

A simplified view of Sakana AI’s Adaptive Branching Monte Carlo Tree Search. Each step is a policy that chooses to widen (new candidate) or deepen (refine promising line).

Search for status

Budget used0/24

Nodes (candidates)1

Excellent score0.00

Broad / Deep0/0

low score
high score
the best way

How does Marlin compare

Marlin competes for depth, not speed. Common deep research tools respond in minutes to tens of minutes. Marlin deliberately spends hours to increase the quality of the output. The lap times of the competitors below are estimated and reported, not official statistics.

A tool	Average running time	Output	The main user
Kiss Marlin	Up to ~8 hours	Report (dozens to 100 pages) + slides	Business strategy teams
An In-depth Study of OpenAI	~Minutes to tens of minutes	Text report cited	General and professional users
Preoccupation with In-depth Research	~A few minutes	Answer to the quoted text	Standard users
An In-depth Study of Google Gemini	~ Minutes	Text report cited	Standard users and the work environment

The trade-off is obvious. You wait longer and pay per run. In return you get an in-depth hypothesis test and a finished deliverable. You can cancel the run at any time, but the credits are still used.

The price

Sakana offers pay as you go with Pro, Team, and Enterprise tiers. Pay-as-you-go starts at 100 credits per run, at ¥98 per credit. Pro is ¥150,000 per month and includes 2,000 credits. The team is ¥400,000 per month and includes 6,000 credits. Custom business pricing, with dedicated support.

Use Cases, and Examples

Marlin allows for high-level questions where research is a bottleneck. Here are some concrete examples taken from its target works.

Entering the market: ‘Explore Japan’s stablecoin and tokenized payments market after regulatory change.’ Marlin outlines the drivers, risks, and options planned in the report.
Risk analysis: ‘Conditions for model correction of the Strait of Hormuz blockade.’ It compares ideas, not just summaries, before drawing conclusions.
Competitive analysis: Enter information on three competitors and rate our standings. Returns slides ready for strategy review.

Each example is equivalent to one prompt and one unsupervised run. One still reviews the quoted output before any decision.

Try the Engine for yourself: TreeQuest

You can’t help yourself Marlin. But you can use its main algorithm today. Sakana open-sources AB-MCTS as TreeQuest under the Apache 2.0 license. Enter it, define a production job, and apply a fixed search budget.

import random
import treequest as tq

# Each node holds a user-defined state; score must be normalized to [0, 1].
def generate(parent_state):
    if parent_state is None:               # None means expand from the root
        new_state = "Initial draft"
    else:
        new_state = f"Refined: {parent_state}"
    score = random.random()                # swap this for an LLM-based score
    return new_state, score

algo = tq.ABMCTSA()                         # Adaptive Branching MCTS (variant A)
search_tree = algo.init_tree()

for _ in range(10):                         # generation budget of 10
    search_tree = algo.step(search_tree, {"generate": generate})

best_state, best_score = tq.top_k(search_tree, algo, k=1)[0]
print("BEST:", best_state, round(best_score, 3))

Replace the random points so that the LLM judge reproduces the original pattern. TreeQuest also posts multiple LLM searches and long-term assessments. Testing is important because long sessions can hit API errors in the middle.