Abstract

Existing trader evaluation systems — leaderboards, copy-trading platforms, and proprietary firm challenges — rely on outcome-based metrics that conflate skill with variance. This framework is structurally inadequate.

TQS evaluates three dimensions: returns quality, behavioral discipline, and risk management. It applies identically to human traders and autonomous AI agents.

The long-term vision is to decentralize the scoring process — removing any single entity, including Beneat, from controlling evaluation methodology or outcomes. Bittensor subnets are being explored as a potential substrate. The objective is a capital allocation framework governed by verifiable intelligence scores rather than reputation or historical returns.

01

The Problem

Systemic Failures in Outcome-Based Ranking

Return-based trader ranking systems exhibit two well-documented statistical failures:

Survivorship bias. Traders who achieve extreme returns through concentrated leveraged positions appear at the top of rankings. Identical strategies that resulted in total loss are excluded from the sample. The observable distribution is systematically skewed toward high-variance outcomes.

Adverse selection. When the ranking metric is raw P&L, the system preferentially surfaces high-variance strategies. Traders who compound 2–3% monthly with controlled drawdowns are structurally underrepresented relative to those capturing tail events.

Centralized Ratings Create Conflicts of Interest

History demonstrates what happens when the entity measuring quality profits from the outcome:

Moody's and S&P, 2008. Credit rating agencies rated the issuers who were also paying customers. AAA-rated mortgage-backed securities turned out worthless.
Arthur Andersen and Enron. The auditor's consulting revenue exceeded its audit fees. Independence was structural fiction.
Copy-trading platforms, today. Platforms profit from trading volume, not from the quality of traders they surface.

A centralized Beneat would face the same structural conflict: the temptation to rate agents that drive vault TVL favorably. Decentralization removes this.

No Portable Trader Reputation

A trader who passes a prop firm evaluation at Firm A starts from zero at Firm B. There is no portable, verifiable record of trading competence. TQS is designed to be a credit score for traders — persistent, portable, and verifiable by anyone.

02

Terminal

The Beneat Terminal

Beneat is a professional crypto futures trading terminal. The system connects to 10 exchanges through a universal adapter architecture, consolidating market data, order execution, behavioral analysis, and risk enforcement into a unified interface.

Multi-Exchange Trading

A unified interface across Binance, Bybit, Bitget, OKX, Hyperliquid, BloFin, Aster, Kraken Futures, KuCoin Futures, and a demo environment. The adapter pattern means each exchange's quirks — parameter formats, symbol conventions, margin modes — are handled at the adapter layer, not in the UI.

Neutrino: Probability Density Signal Engine

Proprietary support and resistance levels derived from probability density estimation over historical price distributions. Five timeframe instances (3-minute through weekly) run continuously. When multiple timeframes converge on the same price zone, a confluence score quantifies the degree of agreement across scales.

Behavioral Psychoanalysis Engine

Seven pattern detectors analyze each trader's execution history in real time:

Pattern	What It Detects
Revenge Trading	Size or frequency spike immediately after a loss
FOMO	Rapid-fire entries chasing price after a missed move
Overconfidence	Leverage increase following a win streak
Panic Exits	Premature closure during normal volatility
Overtrading	Position count exceeding the trader's own baseline
Tilt	Degrading win rate after consecutive losses
Patience	Holding winners to target — the positive signal

Detection thresholds are adaptive. After 20+ trades, the system builds a statistical baseline of each trader's style using robust statistics (median and MAD, not mean and standard deviation). A scalper entering 30 trades per day is not flagged for overtrading. The same frequency from a swing trader is.

Adaptive Threshold Calibration

\begin{aligned} \alpha &= \text{clamp}\!\left(\frac{\text{totalTrades} - T_{\min}}{T_{\max} - T_{\min}},\; 0,\; 1\right) \\[6pt] \text{threshold} &= \alpha \cdot \text{adaptive} + (1 - \alpha) \cdot \text{default} \end{aligned}

Below T_min → 100% defaults · Above T_max → 100% adaptive

Robust statistics avoid outlier distortion. The baseline uses Median Absolute Deviation (MAD) instead of standard deviation:

Robust Deviation Estimator

\begin{aligned} \text{MAD} &= \text{median}\!\left(\left|x_i - \text{median}(X)\right|\right) \\[4pt] \sigma_{\text{robust}} &\approx 1.4826 \times \text{MAD} \end{aligned}

AI Agent Framework

Traders can deploy autonomous or semi-autonomous AI agents on their own exchange accounts. Agents access 14 tools — market data, account state, order management, and Neutrino probability density levels directly. An 8-gate autonomy guard constrains every write action.

Arena: Human vs. AI

Live 1v1 competition between a human trader and an LLM-powered bot on testnet. Six AI models compete with real market data. The Arena is both a product feature and a proving ground for the thesis that behavioral quality, not model size, predicts trading performance.

Copytrading

Full production-grade copy execution across exchanges. Leader eligibility is gated by behavioral metrics. Signals propagate to followers with per-follower size calculation, leverage sync, and SL/TP handling.

03

TQS

Trader Quality Score (TQS)

TQS is a 0–100 composite designed to separate skill from luck, process from outcome. It evaluates four behavioral dimensions with a unified composite:

TQS Composite

\text{TQS} = \frac{\text{EmotionalControl} + \text{Discipline} + \text{Patience} + \text{RiskAwareness}}{4}

where tradeFactor normalizes pattern counts by activity level

The tradeFactor uses logarithmic scaling so that a trader with 500 trades is not penalized 50× more than one with 10 trades for the same behavioral pattern rate. This ensures fairness across activity levels.

TQS Composition

Emotional Control (0–100)

Measures a trader's ability to maintain composure after losses. Three destructive patterns are penalized:

Emotional Control Score

\begin{aligned} \text{EmotionalControl} &= 100 - \sum \text{penalties} \quad (\text{capped}) \\[8pt] \text{Revenge trading:}\quad & \frac{\text{count}}{\text{tradeFactor}} \times w_1 \\[4pt] \text{FOMO:}\quad & \frac{\text{count}}{\text{tradeFactor}} \times w_2 \\[4pt] \text{Panic exits:}\quad & \frac{\text{count}}{\text{tradeFactor}} \times w_3 \end{aligned}

Weights w₁ > w₂ > w₃ — revenge trading is penalized most heavily

Discipline (0–100)

Measures adherence to systematic process — avoiding overtrading, tilt, and overconfidence:

Discipline Score

\begin{aligned} \text{Discipline} &= 100 - \sum \text{penalties} \quad (\text{capped}) \\[8pt] \text{Overtrading:}\quad & \frac{\text{count}}{\text{tradeFactor}} \times w_4 \\[4pt] \text{Tilt:}\quad & \frac{\text{count}}{\text{tradeFactor}} \times w_5 \\[4pt] \text{Overconfidence:}\quad & \frac{\text{count}}{\text{tradeFactor}} \times w_6 \end{aligned}

Patience (0–100)

The patience dimension is initialized at a neutral baseline and adjusted by both positive and negative signals:

Patience Score

\begin{aligned} \text{Patience} &= P_0 + \min(C_{\max},\; \text{patienceCount} \times w_7) \\ & \quad - \min(C_{\min},\; \text{panicCount} \times w_8 + \text{fomoCount} \times w_9) \\ & \quad + \text{winRateBonus} \end{aligned}

P₀ is a neutral baseline · winRateBonus is tiered by performance thresholds

Risk Awareness (0–100)

Measures position sizing consistency using the Coefficient of Variation — how much a trader's sizing deviates from their own norm:

Risk Awareness Score (CV-Based)

\begin{aligned} \text{CV} &= \frac{\sigma(\text{positionSizes})}{\mu(\text{positionSizes})} \\[4pt] \text{sizingScore} &= \max\!\left(0,\; 100 - \text{CV} \times 100\right) \\[8pt] \text{extremeRatio} &= \frac{|\{t : \text{size}_t \;\text{deviates beyond thresholds}\}|}{N} \\[4pt] \text{RiskAwareness} &= \max\!\left(0,\; \text{sizingScore} - \text{extremeRatio} \times w_{10}\right) \end{aligned}

Data Quality Adjustment

When input data is incomplete, the system applies conservative caps rather than producing uncalibrated scores:

Data Quality Caps

\begin{aligned} & \textbf{Without duration data:} \\ & \quad \text{cap EmotionalControl and Patience at } C_d \\[8pt] & \textbf{Without size variance data:} \\ & \quad \text{set RiskAwareness to neutral} \\ & \quad \text{cap Discipline at } C_s \end{aligned}

Caps C_d and C_s are calibrated to reflect increased uncertainty

Pattern Detection

Each detector uses the trader's own statistical baseline — built from robust estimators (MAD, not standard deviation) — to classify deviations into severity tiers. Tilt detection measures how far trade sizing deviates from expected behavior in σ-units. Revenge trading detection combines post-loss timing windows with size increase ratios. Both use calibrated, multi-tier severity thresholds.

General Detection Framework

\text{deviation} = \frac{\text{observed} - \text{expected}}{\sigma_{\text{robust}}}

Severity tiers are calibrated per-pattern with proprietary thresholds

Tier Grades

TQS Range	Tier
80–100	Diamond
60–79	Gold
40–59	Silver
0–39	Bronze

04

Equity Curve

The Equity Curve as Truth

The equity curve is the single most honest artifact a trader produces. Every decision — entry, exit, sizing, timing — is encoded in its shape. A smooth, upward-sloping curve with shallow drawdowns tells a fundamentally different story than a jagged line with explosive spikes and deep valleys, even if both end at the same P&L.

Rolling return consistency, a TQS sub-metric, rewards compounding and penalizes lottery-ticket returns. A trader who makes 3% per month for twelve months scores higher than one who makes 36% in one month and breaks even for the other eleven.

Outlier independence test: remove the single highest-return trade from the sample. If the strategy is no longer profitable, performance is attributable to a single tail event rather than systematic edge.

Drawdown recovery time measures resilience. Two traders can have identical max drawdowns, but if one recovers in two weeks and the other takes three months, they are not equivalent. Fast recovery signals adaptive risk management and emotional control. Slow recovery often signals tilt.

05

Risk Engine

Pre-Trade Risk Enforcement

Empirical evidence across asset classes demonstrates that position sizing and exposure management contribute more to long-term portfolio performance than entry signal quality. Risk management is the primary determinant of compounding sustainability.

The Beneat terminal implements a 6-gate pre-trade risk engine that validates every order before it reaches the exchange:

6-Gate Pre-Trade Risk Engine

Gate 1 — Daily Loss Circuit Breaker

\begin{aligned} \text{dailyLossLimit} &= \frac{\text{dailyLossLimit\%}}{100} \times \text{accountEquity} \\[4pt] & \text{reject if } \text{dailyPnL} \leq -\text{dailyLossLimit} \end{aligned}

Gate 2 — Capital at Risk

\begin{aligned} \text{capitalAtRisk} &= |\text{entryPrice} - \text{stopLoss}| \times \text{positionSize} \\[4pt] \text{riskPercent} &= \frac{\text{capitalAtRisk}}{\text{accountEquity}} \times 100 \end{aligned}

Gate 3 — Risk:Reward Ratio

\begin{aligned} \text{Single TP:}\quad RR &= \frac{|\text{takeProfit} - \text{entry}|}{|\text{entry} - \text{stopLoss}|} \\[6pt] \text{Multi-TP:}\quad \text{weightedReward} &= \sum_i \left(|\text{TP}_i - \text{entry}| \times \text{sizePercent}_i\right) \\[4pt] RR &= \frac{\text{weightedReward}}{|\text{entry} - \text{stopLoss}|} \end{aligned}

All TP allocations must sum to 1.0 (±2% tolerance)

Gate 5 — Cold Streak Sizing

\begin{aligned} \text{Normal:}\quad & \text{maxRisk} = R_n\%\;\text{of equity} \\ \text{After streak:}\quad & \text{maxRisk} = R_s\%\;\text{of equity} \quad (R_s < R_n) \\[6pt] & \text{reject if } \text{riskPercent} > \text{maxRisk} \end{aligned}

Streak length and risk reduction factor R_s/R_n are configurable per policy

The enforcement model is pre-execution: orders that violate any gate are rejected before reaching the exchange. This architectural choice ensures that risk constraints are enforced deterministically rather than advisory.

06

Humans & Agents

Humans and Agents: One Standard

A core design principle of TQS: the same scoring framework applies to both human traders and autonomous AI agents. This is not a philosophical statement — it is an architectural decision.

Same Detectors, Same Score

The seven behavioral pattern detectors that analyze human execution history run identically on agent execution logs. When an AI agent revenge-trades — increases position size immediately after a loss — it is flagged the same way a human would be.

Agent-Specific Extensions

AI agents exhibit failure modes that humans do not. The behavioral framework extends with four additional detectors:

Stuck Position — holds beyond expected duration without action
Missed Exit — take-profit hit but position not closed
Over-Execution — trade frequency exceeds expected rate by 2x+
Guardrail Rejection — risk engine blocked 3+ trades within 24h

Agent Execution Health

\text{status} = \begin{cases} \text{critical} & \text{if errorRate} > \epsilon_c \;\text{OR}\; \text{rejectionRate} > \rho_c \\ \text{warning} & \text{if errorRate} > \epsilon_w \;\text{OR}\; \text{rejectionRate} > \rho_w \\ \text{healthy} & \text{otherwise} \end{cases}

Thresholds ε and ρ are calibrated from production agent data

An agent with a TQS of 82 and a human with a TQS of 78 can be compared directly. Capital can be allocated across both based on a single, verifiable standard.

07

Decentralization

Toward Decentralized Scoring

TQS is already live and running inside the Beneat terminal. But a centralized TQS — controlled by a single company — inherits the same conflicts of interest that plague every centralized rating system. If Beneat controls TQS and also operates yield vaults that profit from high-TQS agents, the incentive to inflate scores is structural.

The long-term architecture removes this conflict by distributing the scoring process:

Independent validators audit and score each trader by cross-referencing exchange data and applying behavioral detectors alongside a frozen market context snapshot — the exact price, volatility, and position state at the moment each trade fired.
Open-source scoring algorithm — the TQS computation is fully transparent. Anyone can verify how a score was derived.
Consensus across validators — outlier scores are clipped. No single validator can manipulate the final score.

Anti-Gaming

A scoring system is only as good as its resistance to manipulation:

Minimum capital-at-risk — traders must have real money in positions. Paper trading doesn't count.
Score decay — TQS requires continuous quality. Historical reputation without ongoing performance degrades over time.
Sybil resistance — staking requirements prevent a single actor from running puppet accounts to game the distribution.
Behavioral detection — the same pattern detectors that identify revenge trading in humans can identify mechanical wash-trading in bots. Artificial behavioral patterns are statistically distinct from genuine ones.

Decentralized validation networks — including Bittensor — are under active investigation as the substrate for trustless TQS computation, ensuring no single entity controls evaluation methodology or outcomes.

08

Quant Fund

TQS-Gated Capital Allocation

The traditional hedge fund has a general partner who decides which strategies get capital and how much. This model is opaque, permissioned, and concentrated. TQS enables a different model: capital allocation governed by verifiable intelligence scores.

Vault Architecture

Three proposed vault tiers gate access by TQS threshold. The higher the threshold, the more concentrated the strategy — but also the more rigorously vetted the operators.

Vault	Strategy Profile	TQS Threshold
Equilibrium	Delta-neutral, funding arbitrage	≥ 60
Harmonic	Systematic arbitrage, mean reversion	≥ 75
Vector	Directional, momentum	≥ 90

TQS-Gated Vault Tiers

The thesis: rational capital allocation favors risk-adjusted returns over nominal yield. Given sufficient behavioral transparency, an investor can distinguish between a 15% return generated by a Diamond-rated agent with verified process quality and a 30% return from an unscored strategy with unknown risk characteristics.

In a traditional fund, you trust the GP's judgment. In a TQS-gated vault, you trust the measurement. The score is verifiable, the algorithm is open-source, and the execution logs are auditable.

09

Neurofinance

Neurofinance: Bridging Biology and Trading

Academic neurofinance research has established clear links between physiological state and decision quality. Cortisol levels predict risk-taking behavior. HRV correlates with cognitive flexibility. The Beneat terminal operationalizes these findings within the trading execution workflow.

What's Live Today

Manual check-in — sleep duration, quality, focus, stress, exercise normalized to 0–100
N-Back Stroop cognitive primer — 25-stimulus test measuring impulsivity and cognitive fatigue
Cognitive state classification — personal EMA baseline classifies peak flow, fatigue, impulsivity, or exhaustion
Circadian alignment — identifies personal peak trading hours from historical win rate data
Correlation engine — personalized insights: sleep vs. P&L, stress vs. win rate

The Wearable Vision

The next evolution is continuous biometric streaming from wearable devices. HRV data from Apple Watch, Whoop, or Oura Ring provides a real-time proxy for autonomic nervous system state. Galvanic skin response captures the physiological signature of a revenge trade forming before the trader is consciously aware of it.

Biometric integration enables pre-trade gating: when physiological indicators fall below calibrated thresholds, the risk engine can restrict order submission until cognitive readiness is restored.

10

Infrastructure

Existing Infrastructure

The core components required for decentralized scoring are already built, tested, and running in production.

Component	Status
7 behavioral pattern detectors	Live
Adaptive threshold personalization	Live
4-dimension behavioral scoring	Live
Agent behavioral analysis	Live
6-gate pre-trade risk engine	Live
10 exchange integrations	Live
Probability density signal engine	Live
Biometric readiness system	Live
AI agent framework (14 tools)	Live
Arena (human vs. AI)	Live
Copytrading pipeline	Live
MCP server (19 tools)	Deployed
On-chain agent leaderboard	Live

Should a decentralized validation network be pursued, the remaining technical work includes porting the behavioral analysis to a validator-compatible runtime and implementing the frozen market context snapshot mechanism for deterministic consensus. The measurement framework itself is fully implemented and deployed in production.

11

Conclusion

Existing infrastructure for evaluating trading intelligence relies on outcome-based metrics that conflate skill with variance, create conflicts of interest through centralized control, and produce non-portable reputation. As autonomous AI agents manage increasing amounts of capital, the absence of a unified standard for comparing human and artificial trading intelligence becomes a systemic risk.

TQS addresses this gap. It measures process quality across four dimensions — emotional control, discipline, patience, and risk awareness — with adaptive personalization and explicit data quality adjustment. The framework applies identically to humans and agents. The vision for decentralized validation — potentially via a Bittensor subnet — would ensure that no single entity controls evaluation methodology or outcomes.

The objective is a capital allocation framework in which verifiable intelligence scores, rather than historical returns, serve as the primary basis for trust and allocation decisions.

app.beneat.ai·beneat.ai

Decentralized Intelligence Measurement for Traders and Agents