GraphRAG-Benchmark evaluates graph-based RAG holistically: graph construction, retrieval/path selection, and reasoning consistency—not just answers. On the Medical domain, we measure how the WGS RAG layer improves end-to-end quality and evidence grounding.
+15 points on average: WGS turns strong backbones into stronger systems
Naive RAG
Fast-Graph RAG
Graph RAG
Light RAG
Path RAG
Raptor
Hippo RAG2
WGS RAG
62
68
71
75
77
78
79
88
| Model | Evidence Recall AVG |
|---|---|
| Fast-Graph RAG | 0.87 |
| Naive RAG | 0.79 |
| Raptor | 0.90 |
| Light-RAG | 0.89 |
| Hippo RAG2 | 0.90 |
| Path RAG | 0.92 |
| Graph RAG | 0.90 |
| WGS RAG | 0.98 |
Gen Accuracy is largely model-dependent; Evidence Recall better reflects system performance.
| Model | Evidence Recall | ||||
|---|---|---|---|---|---|
| Fact Retrieval | Complex Reasoning | Contextual Summarize | Creative Generation | AVG | |
| Fast-Graph RAG | 0.84 | 0.88 | 0.88 | 0.89 | 0.87 |
| Naive RAG | 0.79 | 0.78 | 0.73 | 0.85 | 0.79 |
| Raptor | 0.92 | 0.89 | 0.93 | 0.84 | 0.90 |
| Light-RAG | 0.90 | 0.89 | 0.96 | 0.80 | 0.89 |
| Hippo RAG2 | 0.89 | 0.94 | 0.92 | 0.83 | 0.90 |
| Path RAG | 0.94 | 0.90 | 0.92 | 0.91 | 0.92 |
| Graph RAG | 0.90 | 0.89 | 0.90 | 0.92 | 0.90 |
| WGS RAG | 0.99 | 0.99 | 0.99 | 0.95 | 0.98 |
Overall Rank
#1
(Avg score: 88)
Compared to HippoRAG2
+12%
(+9 points)
Compared to GraphRAG
+24%
(+17 points)
Evidence Recall
0.98
Verified grounding
End-to-End Graph Reasoning
GraphRAG-Benchmark evaluates the full GraphRAG pipeline:
It also scores evidence relevance and reasoning consistency, reflecting enterprise requirements where “why this answer” matters as much as the answer itself.


Graph-based RAG validation
Our goal is to prove that processing data through the Wisdom Graph structure strengthens end-to-end RAG performance.
By applying our reasoning, the graph can create new nodes, merge or remove duplicates, and refine relationships—resulting in a more coherent, higher-signal knowledge structure for retrieval and grounded answering.
We tested WGS RAG to measure how consistently these graph-level improvements translate into better graph-based retrieval and evidence-grounded, multi-hop reasoning, beyond prompt-only tuning.
Strong results in this setting increase confidence that the same approach can transfer to other complex, evidence-heavy domains that demand reliability and auditability.
These results highlight that performance improvements come from more than retrieval alone.
A graph-aware layer that improves path selection and evidence-grounded reasoning can materially raise quality in complex, multi-hop medical queries.


“WGS RAG improves medical-domain GraphRAG quality by raising evidence recall and reasoning consistency—not just answer fluency.”
High-stakes domains (medicine, healthcare, law, finance) require systems that are not only accurate, but also grounded, reproducible, and explainable.
Medical-domain GraphRAG results provide evidence that WGS RAG can improve trust and reliability where incorrect or ungrounded outputs are unacceptable.

Get a demo, see the benchmarks, or integrate today.
Contact Us