Benchmark

GraphRAG-Benchmark

GraphRAG-Benchmark evaluates graph-based RAG holistically: graph construction, retrieval/path selection, and reasoning consistency—not just answers. On the Medical domain, we measure how the WGS RAG layer improves end-to-end quality and evidence grounding.

+15 points on average: WGS turns strong backbones into stronger systems

Avg RAG Performance Model / Avg Performance

Naive RAG

62

Fast-Graph RAG

68

Graph RAG

71

Light RAG

75

Path RAG

77

Raptor

78

Hippo RAG2

79

WGS RAG

88
ModelEvidence Recall AVG
Fast-Graph RAG0.87
Naive RAG0.79
Raptor0.90
Light-RAG0.89
Hippo RAG20.90
Path RAG0.92
Graph RAG0.90
WGS RAG0.98

Gen Accuracy is largely model-dependent; Evidence Recall better reflects system performance.

Overall Rank

#1

(Avg score: 88)

Compared to HippoRAG2

+12%

(+9 points)

Compared to GraphRAG

+24%

(+17 points)

Evidence Recall

0.98

Verified grounding

End-to-End Graph Reasoning

What GraphRAG Benchmark Measures

GraphRAG-Benchmark evaluates the full GraphRAG pipeline:

  • Graph Construction: extracting entities/relations and building useful structure
  • Knowledge Retrieval: selecting correct nodes/paths for multi-hop questions
  • Generation & Reasoning: connecting evidence into a logically consistent explanation and answer

It also scores evidence relevance and reasoning consistency, reflecting enterprise requirements where “why this answer” matters as much as the answer itself.

Graph Construction Diagram showing knowledge graph structure
System Architecture Diagram showing graph structure

Graph-based RAG validation

Why We Ran This

Our goal is to prove that processing data through the Wisdom Graph structure strengthens end-to-end RAG performance.
By applying our reasoning, the graph can create new nodes, merge or remove duplicates, and refine relationships—resulting in a more coherent, higher-signal knowledge structure for retrieval and grounded answering.

We tested WGS RAG to measure how consistently these graph-level improvements translate into better graph-based retrieval and evidence-grounded, multi-hop reasoning, beyond prompt-only tuning.
Strong results in this setting increase confidence that the same approach can transfer to other complex, evidence-heavy domains that demand reliability and auditability.

What This Result Shows

These results highlight that performance improvements come from more than retrieval alone.
A graph-aware layer that improves path selection and evidence-grounded reasoning can materially raise quality in complex, multi-hop medical queries.

WGS adds structured knowledge retrieval that improves RAG quality across implementations.

“WGS RAG improves medical-domain GraphRAG quality by raising evidence recall and reasoning consistency—not just answer fluency.”

Why It Matters in Production

High-stakes domains (medicine, healthcare, law, finance) require systems that are not only accurate, but also grounded, reproducible, and explainable.
Medical-domain GraphRAG results provide evidence that WGS RAG can improve trust and reliability where incorrect or ungrounded outputs are unacceptable.

Another Benchmark: UniADILR

See Benchmark Results

Interested in our Wisdom Graph System beyond this benchmark?

Get a demo, see the benchmarks, or integrate today.

Contact Us
Copyright © 2026 Mind AI Inc.