Tools & Resources Archive Details

PaperBanana: Automating Academic Illustration for AI Scientists (arXiv:2601.23265)

What it is

An arXiv paper proposing a multi-agent framework for generating publication-ready academic diagrams and plots, plus the PaperBananaBench benchmark for evaluation.

Gabriel’s notes

PaperBanana is an agentic (multi-step, multi-role) framework described in an arXiv paper (submitted January 30, 2026) for generating “publication-ready” academic illustrations—especially methodology diagrams—using vision-language models (VLMs) and image generation models. The authors also introduce PaperBananaBench, a benchmark built from NeurIPS 2025 methodology diagrams (292 test cases, with a corresponding reference set), to evaluate faithfulness/readability/aesthetics-type criteria. A companion project site and an open-source repository are available. (Dataset licensing details are Unknown / not confirmed from the dataset page alone.)

Quick take: PaperBanana is one of those “finally, someone tried to automate the annoying part” papers—turning dense method sections into diagrams you’d actually dare to submit. The punchline: it’s not just text-to-image; it’s a pipeline of specialist agents with critique/refinement loops, which is where most figure generators either shine or faceplant.

I saved this under Research because diagrams are the silent gatekeepers of scientific communication: if your figure is confusing, your idea might as well be. (Yes, that’s unfair. Welcome to academia.)

Good fit if you want to:

  • Generate first-draft methodology diagrams from a paper’s method section + caption.
  • Iterate on figure style using reference examples (instead of “make it prettier” vibes-only prompting).
  • Explore an explicit agent breakdown (Retriever / Planner / Stylist / Visualizer / Critic) rather than a single monolithic prompt.
  • Benchmark figure generation quality on a curated diagram set (PaperBananaBench).
  • Prototype a “figure assistant” workflow inside a research team (with humans still doing final review).

Pricing snapshot (auto-enriched)

The paper is free to read on arXiv, and the reference implementation is open-source (Apache-2.0). Actual runtime cost depends on which external model APIs you configure (the repo expects you to supply model names and at least one API key), so your bill will be whatever your chosen VLM + image generation providers charge. Specific dollar pricing: Unknown / not confirmed (varies by provider and usage).

Work-use / compliance snapshot (auto-enriched)

Licensing is a two-layer story: the arXiv paper is posted under a Creative Commons Attribution 4.0 license (CC BY 4.0), while the GitHub code is Apache-2.0. If you pull the benchmark/datasets, their licensing and redistribution terms are Unknown / not confirmed from the dataset page shown (no dataset card). Also: because the system can call external model APIs, treat your method text as data you may be transmitting to third parties—don’t upload proprietary or sensitive content unless your org is comfortable with the relevant provider terms and data-handling policies.

Alternatives (auto-enriched)

  • SciFig (arXiv:2601.04390): also targets publication-ready pipeline figures from paper text, with a hierarchical layout strategy; good to compare if you want more layout-first structure emphasis.
  • AutoFigure + FigureBench (arXiv:2602.03828): another agentic approach paired with a larger benchmark; includes a public code release, so it’s handy if you want an adjacent open implementation to contrast design choices.

Before you adopt it:

  • Start with “draft figures” as the product, not “final figures.” Bake in a human review step for scientific correctness (connections, labels, and numeric fidelity are where tools like this tend to get spicy).
  • Decide your model stack up front (VLM + image model) and track cost/latency per figure—otherwise the tool will quietly become a very expensive way to avoid Illustrator.
  • If you use retrieval/few-shot, curate your reference set like you mean it; your style guide is only as sane as your examples.

Sources

  • https://arxiv.org/abs/2601.23265
  • https://papersbanana.com/
  • https://github.com/dwzhu-pku/PaperBanana
  • https://arxiv.org/abs/2601.04390
  • https://arxiv.org/abs/2602.03828

Visit the resource