UniADILR evaluates reasoning as a process—Abduction → Deduction → Induction—rather than answers alone. The WGS (Wisdom Graph System) changes reasoning performance and efficiency across different backbone models.
Up to ~2× performance gains for frontier models
varying by model and conditions.
GPT-4o mini
GPT-4.1
GPT-5
Gemini 2.5 Pro
Wisdom (GPT-4o-mini)
Wisdom (GPT-5)
Wisdom (Gemini 2.5 Pro)
46.27
64.95
73.33
76.45
80.38
90.75
91.88
WGS
(GPT-4o-mini)
WGS
(GPT-4.1)
WGS
(gemini-2.5-pro)
In our UniADILR runs, a smaller model equipped with WGS outperformed the prompt-engineered performance of a much larger frontier model (e.g., GPT-5). This result demonstrates structural efficiency advantages—achieving higher quality while significantly reducing cost and latency.
Across backbones, Baseline + WGS consistently outperforms Baseline on UniADILR. A lightweight model also shows a large uplift
(~2× improvement), indicating that the layer can materially improve reasoning quality under practical constraints.
Average Gain
+15 points
Across All Runs
Lightweight
~2x Boost
Low Overhead
Consistency
Robust
Across Backbones
Efficiency
Outperforming
Frontier Models
Logical consistency
UniADILR scores the quality of an end-to-end reasoning chain. It goes beyond "right vs wrong" by evaluating logical consistency, evidence fit, and reproducibility.
"Reasoning is evaluated as a chain, not a single answer."


WGS Layer
Backbone Model

Measurable reasoning gains
Reasoning layer validation
Our goal is to verify whether a dedicated layer can add reasoning performance on top of existing models. We measure how consistently it improves scores across multiple backbones.
"Backbone model + WGS layer measurable reasoning gains."
These results are not about ranking models. They show that a well-designed layer can reliably improve reasoning quality on top of strong backbones, and that performance—and efficiency—are not determined by the model alone.


"WGS adds measurable reasoning gains across different backbones, and can unlock multi-fold efficiency in practice."
In real deployments, teams face constraints—cost, latency, and operational limits—that often require smaller or more affordable models. UniADILR results indicate that the WGS layer can raise reasoning quality even in lightweight settings, improving cost–quality trade-offs without relying solely on bigger models.

Get a demo, explore the full system, or discuss integration.
Contact Us