The DIALECTIC Framework:
Scientific Startup Evaluation
A peer-reviewed multi-agent AI system from TU Munich and Earlybird VC that transforms startup screening through structured argumentation and iterative debate.
Research Credits
Authors
Jae Yoon Bae*, Simon Malberg*, Joyce Galang*
Andre Retterath, Georg Groh
*Equal contribution
Institutions
Technical University of Munich
Earlybird Venture Capital
UVC Partners
Published: EACL 2025 (European Chapter of the Association for Computational Linguistics)
License: MIT License — Open source and free to use
The Problem: Attention Scarcity in Venture Capital
Venture capital faces a fundamental attention crisis. Starting a company has never been easier — AI copilots, no-code tools, and cloud infrastructure have collapsed traditional barriers to entry. The result? An explosion of startups competing for limited investor attention.
The Data Tells the Story
2022: VCs spent 2 minutes 42 seconds per pitch deck
2023: Down 20% to 2 minutes 12 seconds
2024: Down another 12% to 1 minute 56 seconds
Source: DocSend Pitch Deck Reports
Put yourself in the founder's position: You've spent weeks perfecting your narrative, refining every chart, polishing every message. An investor opens your deck between meetings, swipes for under two minutes, and closes it before reaching your "big moment."
This isn't a judgment on founders or investors. It's a structural problem. In today's venture market, attention, not capital, is the scarcest resource.
What is DIALECTIC?
DIALECTIC (Decision Iteration with Argument-Level Evidence and Counter-Thinking for Investment Conclusions) is a multi-agent AI system that models how real venture investors form investment decisions through structured argumentation and debate.
Fact Collection
Decomposes the startup into structured questions (team, product, market, traction) and builds a hierarchical knowledge base.
Simulated Debate
Multiple AI agents generate pro and contra arguments, critique each other, and iteratively refine through "survival of the fittest."
Decision Scoring
Produces numeric decision scores that rank opportunities, enabling investors to prioritize where attention should go.
The Key Insight
Most AI approaches to VC focus on prediction accuracy. DIALECTIC focuses on reasoning quality.
Instead of asking "Can we predict startup success?", it asks "How do investors actually form investment decisions, and can that process be modeled?"
How DIALECTIC Works: A Technical Overview
Phase 1: Fact Collection
Building a structured knowledge base
Seed Questions
Four high-level questions based on VC evaluation criteria:
- How does the company align with VC strategy?
- Who are the key members of the founding team?
- What are the product's core features and technology?
- What is the target market size and growth potential?
Question Decomposition
Each seed question is decomposed into industry-specific sub-questions, creating a hierarchical question tree tailored to the startup's sector.
Answer Agent
An AI agent answers all questions using company data, pitch deck content, team information, and web search (when needed for market data).
Output: A comprehensive fact base with structured Q&A pairs covering all major investment criteria.
Phase 2: Reasoning Through Debate
Multi-agent argumentation system
Generator Agent
Generates K initial arguments (default: 5) for both PRO (invest) and CONTRA (pass) stances, each citing specific facts from the knowledge base.
Critic Agent (Devil's Advocate)
Takes the opposite stance and critiques each argument, pointing out weak evidence, alternative interpretations, and challenging assumptions.
Evaluator Agent
Judges argument quality on 14 criteria based on Wachsmuth et al.'s (2017) argument quality taxonomy:
Each criterion scored 1-7, producing quality scores up to 98 per argument.
Refiner Agent
Improves arguments based on critique and quality scores, addressing weaknesses and strengthening evidence.
Iterative Loop (T iterations)
The critique → evaluate → refine cycle repeats T times (optimal: T=2). After each iteration, only the highest-quality arguments survive.
Output: 4 refined PRO arguments and 4 refined CONTRA arguments, each with quality scores and supporting evidence.
Phase 3: Decision Making
Converting arguments into actionable scores
Decision Score Formula
- PRO scores: Sum of quality scores of all surviving PRO arguments
- CONTRA scores: Sum of quality scores of all surviving CONTRA arguments
- Threshold: Decision threshold capturing investor risk preference
Investment Recommendation
Decision: INVEST if decision score > 0, otherwise PASS
Confidence level determined by magnitude of decision score.
Output: Decision score (typically -10 to +10), investment recommendation (INVEST/PASS/UNCERTAIN), confidence level (HIGH/MEDIUM/LOW), and full argument justifications.
Research Results: What the Data Shows
Dataset
- 259 early-stage startups (seed/pre-seed)
- Data from 5 different VC funds
- Added to watchlists in 2021
- Success = raised Series A or later by Sept 2025
- 25% success rate (industry-typical)
Performance
- Matches human VC precision in predicting Series A+ outcomes
- AUC-PR: 0.24 on test set (validation: 0.37)
- Produces full ranking vs single operating point
- Optimal hyperparameters: K=4 arguments, T=2 iterations
🤔 The "Overthinking" Discovery
A fascinating finding: performance peaks at T=2 iterations, then declines.
As iterations increase beyond 2:
- Argument quality scores keep rising
- Arguments get longer
- More facts are cited
- But predictive performance decreases
This mirrors human investment committees: more words, more sophistication, less signal. The value of debate is not infinite.
Key Takeaway
DIALECTIC doesn't outperform human VCs in prediction accuracy. Instead, it matches their precision while providing full rankings and transparent justifications — enabling better attention allocation at scale.
Why Scientific Evaluation Matters for Venture Capital
1. Reproducibility & Consistency
Human VC decisions are influenced by mood, time of day, recent portfolio events, and cognitive biases. A scientific framework provides consistent evaluation criteria that can be audited, improved, and reproduced.
2. Transparency & Explainability
Unlike black-box ML models, DIALECTIC produces human-readable arguments with evidence. Investors can understand why a decision was made, challenge the reasoning, and override when needed.
3. Scalability Without Quality Loss
Human attention doesn't scale. A VC partner can only deeply evaluate 10-20 companies per month. DIALECTIC can process hundreds while maintaining structured, high-quality analysis — bringing iterative reasoning to the top of the funnel where it's typically impossible.
4. Continuous Improvement
A scientific framework can be measured, tested, and improved. By tracking which arguments correlate with future success, the system can evolve. Human intuition is valuable but hard to systematically improve.
5. Democratization of Expertise
Top-tier VC firms have decades of pattern recognition. DIALECTIC encodes investment thinking processes that can be accessed by emerging VCs, corporate venture arms, and founders — leveling the playing field.
How GRID Uses DIALECTIC
We discovered the DIALECTIC framework through the Data Driven VC newsletter by Andre Retterath. Recognizing its potential, we implemented it as part of GRID's two-tier evaluation system.
Quick Score (Free/€9)
Fast single-agent analysis (30 seconds) on a 0-100 scale. Perfect for iteration and rapid feedback.
Try Quick ScoreDIALECTIC Premium (€29)
Full academic implementation (3-5 minutes) with pro/contra arguments, evidence citations, and decision scoring. 4 validations per month.
View PricingOur Implementation Status
- MIT License — open source and commercially usable
- No affiliation with TU Munich or Earlybird VC
- Full credit to original researchers in all documentation
- We will share our findings and correlation data with the research community
Frequently Asked Questions
What is the DIALECTIC framework?▼
DIALECTIC (Decision Iteration with Argument-Level Evidence and Counter-Thinking for Investment Conclusions) is a multi-agent AI system developed by researchers from TU Munich, Earlybird VC, and UVC Partners.
It evaluates startups through three phases: (1) Fact Collection with hierarchical question trees, (2) Simulated Debate with pro/contra argumentation and iterative refinement, and (3) Decision Scoring based on argument quality.
How accurate is DIALECTIC compared to human VCs?▼
In a backtest on 259 early-stage startups from 5 VC funds, DIALECTIC achieved the same precision as human investors when predicting which companies would later raise a Series A or beyond.
The key advantage is not superior accuracy, but the ability to produce consistent, explainable rankings at scale — enabling VCs to prioritize where their attention should go.
Why does performance decline after 2 iterations?▼
Research showed optimal performance at T=2 refinement iterations. Beyond that, argument quality scores continue rising, but predictive performance actually decreases.
This mirrors human investment committees: overthinking leads to sophistication without signal. The lesson: the value of debate is not infinite.
Is DIALECTIC open source? Can I use it?▼
Yes! DIALECTIC is released under the MIT License, which means it's free to use, modify, and implement commercially.
GitHub repository: github.com/pantageepapa/DIALECTIC
What's the difference between Quick Score and DIALECTIC Premium?▼
| Quick Score | DIALECTIC Premium | |
|---|---|---|
| Speed | 30 seconds | 3-5 minutes |
| Approach | Single-agent | Multi-agent debate |
| Output | 0-100 score | Decision + arguments |
| Use Case | Iteration | Final validation |
| Availability | Free & €9 | €29 Pro |
Why is DIALECTIC limited to 4 uses per month?▼
Why is Premium limited to 4 per month?
DIALECTIC Premium requires significantly more computational resources than our Quick Score. It spawns multiple AI agents that converse with each other, analyzing your deck from different perspectives. The 4-per-month limit keeps our €29 Pro tier sustainable while encouraging founders to only run full analyses when they've made meaningful updates to their deck. We recommend using Quick Scores for rapid iteration, and saving Premium analyses for your final polish.
Can I cite the DIALECTIC paper in my research?▼
Yes! Citation format:
DIALECTIC: A Multi-Agent System for Startup Evaluation.
EACL 2025.
What LLM does DIALECTIC use?▼
The original research used OpenAI GPT-4o-mini.
Our implementation uses Google Gemini 2.5 Flash for cost efficiency while maintaining quality. We're tracking correlation and will publish findings.
Experience DIALECTIC-Powered Evaluation
Start with Quick Score to iterate, then validate with DIALECTIC Premium when you're investor-ready.