Whitepaper v1.0 — April 2026
Capital allocation in cryptocurrency markets relies predominantly on a single metric: historical returns. A trader who achieved 50x on a single leveraged position ranks above one who compounded 40% annually with a 2.5 Sharpe ratio, despite the latter demonstrating systematically superior risk-adjusted performance. This paper introduces the Trader Quality Score (TQS) — a 0–100 composite metric that quantifies the quality of intelligence behind trading decisions, independent of outcome.
Existing trader evaluation systems — leaderboards, copy-trading platforms, and proprietary firm challenges — rely on outcome-based metrics that conflate skill with variance. This framework is structurally inadequate.
TQS evaluates three dimensions: returns quality, behavioral discipline, and risk management. It applies identically to human traders and autonomous AI agents.
The long-term vision is to decentralize the scoring process — removing any single entity, including Beneat, from controlling evaluation methodology or outcomes. Bittensor subnets are being explored as a potential substrate. The objective is a capital allocation framework governed by verifiable intelligence scores rather than reputation or historical returns.
Return-based trader ranking systems exhibit two well-documented statistical failures:
Survivorship bias. Traders who achieve extreme returns through concentrated leveraged positions appear at the top of rankings. Identical strategies that resulted in total loss are excluded from the sample. The observable distribution is systematically skewed toward high-variance outcomes.
Adverse selection. When the ranking metric is raw P&L, the system preferentially surfaces high-variance strategies. Traders who compound 2–3% monthly with controlled drawdowns are structurally underrepresented relative to those capturing tail events.
History demonstrates what happens when the entity measuring quality profits from the outcome:
A centralized Beneat would face the same structural conflict: the temptation to rate agents that drive vault TVL favorably. Decentralization removes this.
A trader who passes a prop firm evaluation at Firm A starts from zero at Firm B. There is no portable, verifiable record of trading competence. TQS is designed to be a credit score for traders — persistent, portable, and verifiable by anyone.
Beneat is a professional crypto futures trading terminal. The system connects to 10 exchanges through a universal adapter architecture, consolidating market data, order execution, behavioral analysis, and risk enforcement into a unified interface.
A unified interface across Binance, Bybit, Bitget, OKX, Hyperliquid, BloFin, Aster, Kraken Futures, KuCoin Futures, and a demo environment. The adapter pattern means each exchange's quirks — parameter formats, symbol conventions, margin modes — are handled at the adapter layer, not in the UI.
Proprietary support and resistance levels derived from probability density estimation over historical price distributions. Five timeframe instances (3-minute through weekly) run continuously. When multiple timeframes converge on the same price zone, a confluence score quantifies the degree of agreement across scales.
Seven pattern detectors analyze each trader's execution history in real time:
| Pattern | What It Detects |
|---|---|
| Revenge Trading | Size or frequency spike immediately after a loss |
| FOMO | Rapid-fire entries chasing price after a missed move |
| Overconfidence | Leverage increase following a win streak |
| Panic Exits | Premature closure during normal volatility |
| Overtrading | Position count exceeding the trader's own baseline |
| Tilt | Degrading win rate after consecutive losses |
| Patience | Holding winners to target — the positive signal |
Detection thresholds are adaptive. After 20+ trades, the system builds a statistical baseline of each trader's style using robust statistics (median and MAD, not mean and standard deviation). A scalper entering 30 trades per day is not flagged for overtrading. The same frequency from a swing trader is.
Robust statistics avoid outlier distortion. The baseline uses Median Absolute Deviation (MAD) instead of standard deviation:
Traders can deploy autonomous or semi-autonomous AI agents on their own exchange accounts. Agents access 14 tools — market data, account state, order management, and Neutrino probability density levels directly. An 8-gate autonomy guard constrains every write action.
Live 1v1 competition between a human trader and an LLM-powered bot on testnet. Six AI models compete with real market data. The Arena is both a product feature and a proving ground for the thesis that behavioral quality, not model size, predicts trading performance.
Full production-grade copy execution across exchanges. Leader eligibility is gated by behavioral metrics. Signals propagate to followers with per-follower size calculation, leverage sync, and SL/TP handling.
TQS is a 0–100 composite designed to separate skill from luck, process from outcome. It evaluates four behavioral dimensions with a unified composite:
The tradeFactor uses logarithmic scaling so that a trader with 500 trades is not penalized 50× more than one with 10 trades for the same behavioral pattern rate. This ensures fairness across activity levels.
Measures a trader's ability to maintain composure after losses. Three destructive patterns are penalized:
Measures adherence to systematic process — avoiding overtrading, tilt, and overconfidence:
The patience dimension is initialized at a neutral baseline and adjusted by both positive and negative signals:
Measures position sizing consistency using the Coefficient of Variation — how much a trader's sizing deviates from their own norm:
When input data is incomplete, the system applies conservative caps rather than producing uncalibrated scores:
Each detector uses the trader's own statistical baseline — built from robust estimators (MAD, not standard deviation) — to classify deviations into severity tiers. Tilt detection measures how far trade sizing deviates from expected behavior in σ-units. Revenge trading detection combines post-loss timing windows with size increase ratios. Both use calibrated, multi-tier severity thresholds.
| TQS Range | Tier |
|---|---|
| 80–100 | Diamond |
| 60–79 | Gold |
| 40–59 | Silver |
| 0–39 | Bronze |
The equity curve is the single most honest artifact a trader produces. Every decision — entry, exit, sizing, timing — is encoded in its shape. A smooth, upward-sloping curve with shallow drawdowns tells a fundamentally different story than a jagged line with explosive spikes and deep valleys, even if both end at the same P&L.
Rolling return consistency, a TQS sub-metric, rewards compounding and penalizes lottery-ticket returns. A trader who makes 3% per month for twelve months scores higher than one who makes 36% in one month and breaks even for the other eleven.
Outlier independence test: remove the single highest-return trade from the sample. If the strategy is no longer profitable, performance is attributable to a single tail event rather than systematic edge.
Drawdown recovery time measures resilience. Two traders can have identical max drawdowns, but if one recovers in two weeks and the other takes three months, they are not equivalent. Fast recovery signals adaptive risk management and emotional control. Slow recovery often signals tilt.
Empirical evidence across asset classes demonstrates that position sizing and exposure management contribute more to long-term portfolio performance than entry signal quality. Risk management is the primary determinant of compounding sustainability.
The Beneat terminal implements a 6-gate pre-trade risk engine that validates every order before it reaches the exchange:
The enforcement model is pre-execution: orders that violate any gate are rejected before reaching the exchange. This architectural choice ensures that risk constraints are enforced deterministically rather than advisory.
A core design principle of TQS: the same scoring framework applies to both human traders and autonomous AI agents. This is not a philosophical statement — it is an architectural decision.
The seven behavioral pattern detectors that analyze human execution history run identically on agent execution logs. When an AI agent revenge-trades — increases position size immediately after a loss — it is flagged the same way a human would be.
AI agents exhibit failure modes that humans do not. The behavioral framework extends with four additional detectors:
An agent with a TQS of 82 and a human with a TQS of 78 can be compared directly. Capital can be allocated across both based on a single, verifiable standard.
TQS is already live and running inside the Beneat terminal. But a centralized TQS — controlled by a single company — inherits the same conflicts of interest that plague every centralized rating system. If Beneat controls TQS and also operates yield vaults that profit from high-TQS agents, the incentive to inflate scores is structural.
The long-term architecture removes this conflict by distributing the scoring process:
A scoring system is only as good as its resistance to manipulation:
Decentralized validation networks — including Bittensor — are under active investigation as the substrate for trustless TQS computation, ensuring no single entity controls evaluation methodology or outcomes.
The traditional hedge fund has a general partner who decides which strategies get capital and how much. This model is opaque, permissioned, and concentrated. TQS enables a different model: capital allocation governed by verifiable intelligence scores.
Three proposed vault tiers gate access by TQS threshold. The higher the threshold, the more concentrated the strategy — but also the more rigorously vetted the operators.
| Vault | Strategy Profile | TQS Threshold |
|---|---|---|
| Equilibrium | Delta-neutral, funding arbitrage | ≥ 60 |
| Harmonic | Systematic arbitrage, mean reversion | ≥ 75 |
| Vector | Directional, momentum | ≥ 90 |
The thesis: rational capital allocation favors risk-adjusted returns over nominal yield. Given sufficient behavioral transparency, an investor can distinguish between a 15% return generated by a Diamond-rated agent with verified process quality and a 30% return from an unscored strategy with unknown risk characteristics.
In a traditional fund, you trust the GP's judgment. In a TQS-gated vault, you trust the measurement. The score is verifiable, the algorithm is open-source, and the execution logs are auditable.
Academic neurofinance research has established clear links between physiological state and decision quality. Cortisol levels predict risk-taking behavior. HRV correlates with cognitive flexibility. The Beneat terminal operationalizes these findings within the trading execution workflow.
The next evolution is continuous biometric streaming from wearable devices. HRV data from Apple Watch, Whoop, or Oura Ring provides a real-time proxy for autonomic nervous system state. Galvanic skin response captures the physiological signature of a revenge trade forming before the trader is consciously aware of it.
Biometric integration enables pre-trade gating: when physiological indicators fall below calibrated thresholds, the risk engine can restrict order submission until cognitive readiness is restored.
The core components required for decentralized scoring are already built, tested, and running in production.
| Component | Status |
|---|---|
| 7 behavioral pattern detectors | Live |
| Adaptive threshold personalization | Live |
| 4-dimension behavioral scoring | Live |
| Agent behavioral analysis | Live |
| 6-gate pre-trade risk engine | Live |
| 10 exchange integrations | Live |
| Probability density signal engine | Live |
| Biometric readiness system | Live |
| AI agent framework (14 tools) | Live |
| Arena (human vs. AI) | Live |
| Copytrading pipeline | Live |
| MCP server (19 tools) | Deployed |
| On-chain agent leaderboard | Live |
Should a decentralized validation network be pursued, the remaining technical work includes porting the behavioral analysis to a validator-compatible runtime and implementing the frozen market context snapshot mechanism for deterministic consensus. The measurement framework itself is fully implemented and deployed in production.
Existing infrastructure for evaluating trading intelligence relies on outcome-based metrics that conflate skill with variance, create conflicts of interest through centralized control, and produce non-portable reputation. As autonomous AI agents manage increasing amounts of capital, the absence of a unified standard for comparing human and artificial trading intelligence becomes a systemic risk.
TQS addresses this gap. It measures process quality across four dimensions — emotional control, discipline, patience, and risk awareness — with adaptive personalization and explicit data quality adjustment. The framework applies identically to humans and agents. The vision for decentralized validation — potentially via a Bittensor subnet — would ensure that no single entity controls evaluation methodology or outcomes.
The objective is a capital allocation framework in which verifiable intelligence scores, rather than historical returns, serve as the primary basis for trust and allocation decisions.