Academic Research Framework

The DIALECTIC Framework:
Scientific Startup Evaluation

A peer-reviewed multi-agent AI system from TU Munich and Earlybird VC that transforms startup screening through structured argumentation and iterative debate.

Research Credits

Authors

Jae Yoon Bae*, Simon Malberg*, Joyce Galang*

Andre Retterath, Georg Groh

*Equal contribution

Institutions

Technical University of Munich

Earlybird Venture Capital

UVC Partners

Published: EACL 2025 (European Chapter of the Association for Computational Linguistics)

License: MIT License — Open source and free to use

The Problem: Attention Scarcity in Venture Capital

Venture capital faces a fundamental attention crisis. Starting a company has never been easier — AI copilots, no-code tools, and cloud infrastructure have collapsed traditional barriers to entry. The result? An explosion of startups competing for limited investor attention.

The Data Tells the Story

2022: VCs spent 2 minutes 42 seconds per pitch deck

2023: Down 20% to 2 minutes 12 seconds

2024: Down another 12% to 1 minute 56 seconds

Source: DocSend Pitch Deck Reports

Put yourself in the founder's position: You've spent weeks perfecting your narrative, refining every chart, polishing every message. An investor opens your deck between meetings, swipes for under two minutes, and closes it before reaching your "big moment."

This isn't a judgment on founders or investors. It's a structural problem. In today's venture market, attention, not capital, is the scarcest resource.

What is DIALECTIC?

DIALECTIC (Decision Iteration with Argument-Level Evidence and Counter-Thinking for Investment Conclusions) is a multi-agent AI system that models how real venture investors form investment decisions through structured argumentation and debate.

1

Fact Collection

Decomposes the startup into structured questions (team, product, market, traction) and builds a hierarchical knowledge base.

2

Simulated Debate

Multiple AI agents generate pro and contra arguments, critique each other, and iteratively refine through "survival of the fittest."

3

Decision Scoring

Produces numeric decision scores that rank opportunities, enabling investors to prioritize where attention should go.

The Key Insight

Most AI approaches to VC focus on prediction accuracy. DIALECTIC focuses on reasoning quality.

Instead of asking "Can we predict startup success?", it asks "How do investors actually form investment decisions, and can that process be modeled?"

How DIALECTIC Works: A Technical Overview

1

Phase 1: Fact Collection

Building a structured knowledge base

Seed Questions

Four high-level questions based on VC evaluation criteria:

  • How does the company align with VC strategy?
  • Who are the key members of the founding team?
  • What are the product's core features and technology?
  • What is the target market size and growth potential?

Question Decomposition

Each seed question is decomposed into industry-specific sub-questions, creating a hierarchical question tree tailored to the startup's sector.

Answer Agent

An AI agent answers all questions using company data, pitch deck content, team information, and web search (when needed for market data).

Output: A comprehensive fact base with structured Q&A pairs covering all major investment criteria.

2

Phase 2: Reasoning Through Debate

Multi-agent argumentation system

Generator Agent

Generates K initial arguments (default: 5) for both PRO (invest) and CONTRA (pass) stances, each citing specific facts from the knowledge base.

Critic Agent (Devil's Advocate)

Takes the opposite stance and critiques each argument, pointing out weak evidence, alternative interpretations, and challenging assumptions.

Evaluator Agent

Judges argument quality on 14 criteria based on Wachsmuth et al.'s (2017) argument quality taxonomy:

Local Acceptability
Local Relevance
Local Sufficiency
Cogency
Credibility
Emotional Appeal
Clarity
Appropriateness
Arrangement
Effectiveness
Global Acceptability
Global Relevance
Global Sufficiency
Reasonableness

Each criterion scored 1-7, producing quality scores up to 98 per argument.

Refiner Agent

Improves arguments based on critique and quality scores, addressing weaknesses and strengthening evidence.

Iterative Loop (T iterations)

The critique → evaluate → refine cycle repeats T times (optimal: T=2). After each iteration, only the highest-quality arguments survive.

Output: 4 refined PRO arguments and 4 refined CONTRA arguments, each with quality scores and supporting evidence.

3

Phase 3: Decision Making

Converting arguments into actionable scores

Decision Score Formula

Decision Score = Sum(PRO scores) - Sum(CONTRA scores) + threshold
  • PRO scores: Sum of quality scores of all surviving PRO arguments
  • CONTRA scores: Sum of quality scores of all surviving CONTRA arguments
  • Threshold: Decision threshold capturing investor risk preference

Investment Recommendation

Decision: INVEST if decision score > 0, otherwise PASS

Confidence level determined by magnitude of decision score.

Output: Decision score (typically -10 to +10), investment recommendation (INVEST/PASS/UNCERTAIN), confidence level (HIGH/MEDIUM/LOW), and full argument justifications.

Research Results: What the Data Shows

Dataset

  • 259 early-stage startups (seed/pre-seed)
  • Data from 5 different VC funds
  • Added to watchlists in 2021
  • Success = raised Series A or later by Sept 2025
  • 25% success rate (industry-typical)

Performance

  • Matches human VC precision in predicting Series A+ outcomes
  • AUC-PR: 0.24 on test set (validation: 0.37)
  • Produces full ranking vs single operating point
  • Optimal hyperparameters: K=4 arguments, T=2 iterations

🤔 The "Overthinking" Discovery

A fascinating finding: performance peaks at T=2 iterations, then declines.

As iterations increase beyond 2:

  • Argument quality scores keep rising
  • Arguments get longer
  • More facts are cited
  • But predictive performance decreases

This mirrors human investment committees: more words, more sophistication, less signal. The value of debate is not infinite.

Key Takeaway

DIALECTIC doesn't outperform human VCs in prediction accuracy. Instead, it matches their precision while providing full rankings and transparent justifications — enabling better attention allocation at scale.

Why Scientific Evaluation Matters for Venture Capital

1. Reproducibility & Consistency

Human VC decisions are influenced by mood, time of day, recent portfolio events, and cognitive biases. A scientific framework provides consistent evaluation criteria that can be audited, improved, and reproduced.

2. Transparency & Explainability

Unlike black-box ML models, DIALECTIC produces human-readable arguments with evidence. Investors can understand why a decision was made, challenge the reasoning, and override when needed.

3. Scalability Without Quality Loss

Human attention doesn't scale. A VC partner can only deeply evaluate 10-20 companies per month. DIALECTIC can process hundreds while maintaining structured, high-quality analysis — bringing iterative reasoning to the top of the funnel where it's typically impossible.

4. Continuous Improvement

A scientific framework can be measured, tested, and improved. By tracking which arguments correlate with future success, the system can evolve. Human intuition is valuable but hard to systematically improve.

5. Democratization of Expertise

Top-tier VC firms have decades of pattern recognition. DIALECTIC encodes investment thinking processes that can be accessed by emerging VCs, corporate venture arms, and founders — leveling the playing field.

How GRID Uses DIALECTIC

We discovered the DIALECTIC framework through the Data Driven VC newsletter by Andre Retterath. Recognizing its potential, we implemented it as part of GRID's two-tier evaluation system.

Quick Score (Free/€9)

Fast single-agent analysis (30 seconds) on a 0-100 scale. Perfect for iteration and rapid feedback.

Try Quick Score

DIALECTIC Premium (€29)

Full academic implementation (3-5 minutes) with pro/contra arguments, evidence citations, and decision scoring. 4 validations per month.

View Pricing

Our Implementation Status

  • MIT License — open source and commercially usable
  • No affiliation with TU Munich or Earlybird VC
  • Full credit to original researchers in all documentation
  • We will share our findings and correlation data with the research community

Frequently Asked Questions

What is the DIALECTIC framework?

DIALECTIC (Decision Iteration with Argument-Level Evidence and Counter-Thinking for Investment Conclusions) is a multi-agent AI system developed by researchers from TU Munich, Earlybird VC, and UVC Partners.

It evaluates startups through three phases: (1) Fact Collection with hierarchical question trees, (2) Simulated Debate with pro/contra argumentation and iterative refinement, and (3) Decision Scoring based on argument quality.

How accurate is DIALECTIC compared to human VCs?

In a backtest on 259 early-stage startups from 5 VC funds, DIALECTIC achieved the same precision as human investors when predicting which companies would later raise a Series A or beyond.

The key advantage is not superior accuracy, but the ability to produce consistent, explainable rankings at scale — enabling VCs to prioritize where their attention should go.

Why does performance decline after 2 iterations?

Research showed optimal performance at T=2 refinement iterations. Beyond that, argument quality scores continue rising, but predictive performance actually decreases.

This mirrors human investment committees: overthinking leads to sophistication without signal. The lesson: the value of debate is not infinite.

Is DIALECTIC open source? Can I use it?

Yes! DIALECTIC is released under the MIT License, which means it's free to use, modify, and implement commercially.

GitHub repository: github.com/pantageepapa/DIALECTIC

What's the difference between Quick Score and DIALECTIC Premium?
Quick ScoreDIALECTIC Premium
Speed30 seconds3-5 minutes
ApproachSingle-agentMulti-agent debate
Output0-100 scoreDecision + arguments
Use CaseIterationFinal validation
AvailabilityFree & €9€29 Pro
Why is DIALECTIC limited to 4 uses per month?

Why is Premium limited to 4 per month?
DIALECTIC Premium requires significantly more computational resources than our Quick Score. It spawns multiple AI agents that converse with each other, analyzing your deck from different perspectives. The 4-per-month limit keeps our €29 Pro tier sustainable while encouraging founders to only run full analyses when they've made meaningful updates to their deck. We recommend using Quick Scores for rapid iteration, and saving Premium analyses for your final polish.

Can I cite the DIALECTIC paper in my research?

Yes! Citation format:

Bae, J. Y., Malberg, S., Galang, J., Retterath, A., & Groh, G. (2025).
DIALECTIC: A Multi-Agent System for Startup Evaluation.
EACL 2025.
What LLM does DIALECTIC use?

The original research used OpenAI GPT-4o-mini.

Our implementation uses Google Gemini 2.5 Flash for cost efficiency while maintaining quality. We're tracking correlation and will publish findings.

Experience DIALECTIC-Powered Evaluation

Start with Quick Score to iterate, then validate with DIALECTIC Premium when you're investor-ready.