Back to Dashboard

Mispriced

fundamental valuation software

Model Methodology

1. Overview

What the Model Does

The model predicts fair market capitalization from financial statement data using machine learning. It learns historical relationships between company fundamentals (revenue, profits, debt, cash flows) and market valuations across thousands of stocks.

What the Model Does NOT Do

  • It does not predict future stock prices or returns
  • It does not account for growth expectations or momentum
  • It does not provide buy/sell recommendations
  • It is not a fundamental DCF or comparable company analysis

Key Insight

Mispricing signals are relative, not absolute. A stock showing 20% mispricing means the current market cap exceeds the model's predicted fair value by 20% based on fundamentals alone. This indicates overvaluation — investors are willing to pay beyond what fundamentals suggest, which could reflect growth expectations, brand value, or other intangibles not captured by financial statements.

Cross-Sectional, Not Time-Series

Each quarter is trained independently — the model only compares companies within the same quarter. This means:

  • No future leakage: The model cannot learn from future quarters
  • Market regime adaptation: Valuation multiples change over time (e.g., tech was valued higher in 2021)
  • Fair comparison: Companies are valued against contemporaries, not historical norms

2. Model Architecture

Algorithm

XGBoost

Gradient Boosted Decision Trees

Target Variable

log(market_cap)

Log-transformed for numerical stability

Why Tree-Based Models?

  • Non-linear relationships: Financial ratios have complex, non-linear effects on valuation
  • Feature interactions: Trees naturally capture interactions (e.g., high debt is worse for low-margin companies)
  • Robustness: Less sensitive to outliers and missing data than linear models
  • No scaling required: Tree splits are invariant to monotonic feature transformations

Fixed Hyperparameters

n_estimators: 200
max_depth: 5
learning_rate: 0.1
subsample: 0.8
colsample_bytree: 0.8
objective: reg:absoluteerror

Fixed parameters ensure consistency across quarters. No hyperparameter tuning is performed.

3. Cross-Validation Methodology

The model uses repeated K-fold cross-validation to generate prediction distributions. This approach prevents data leakage and provides uncertainty estimates.

10

CV Repeats

5

Folds per Repeat

50

Predictions per Stock

K-Fold Cross-Validation Diagram

Each row shows one fold. Blue = training data, Red = test data (held-out).

Fold 1
Test
Fold 2
Test
Fold 3
Test
Fold 4
Test
Fold 5
Test

Why Repeated CV?

  • Uncertainty quantification: The standard deviation across 50 predictions measures model confidence
  • Robustness: Averaging reduces sensitivity to specific train/test splits
  • No data leakage: Each prediction is made on held-out data the model has never seen

4. Feature Engineering

Features are extracted from quarterly financial statements. The model uses a combination of raw fundamentals and financial ratios.

Core Features

Feature Category Transform Fill Strategy
Total Revenue Fundamentals log1p Required
Gross Profit Fundamentals log1p Zero
EBITDA Fundamentals log1p Median
Net Income Fundamentals - Zero
Total Debt Balance Sheet log1p Zero
Total Cash Balance Sheet log1p Zero
Free Cash Flow Cash Flow - Zero
Profit Margin Ratio - Median
Debt-to-Equity Ratio log Median
ROE / ROA Ratio - Median

Current Data Coverage

Feature availability across ~32,000 quarterly snapshots:

Revenue: 91%
Net Income: 72%
Total Debt: 72%
Total Cash: 72%
EBITDA: 64%
Free Cash Flow: 69%
ROA/ROE: 65-71%
Gross Profit: 43%

Transforms Explained

  • log1p: Applies log(1 + x) to handle large scale differences and zeros
  • log: Standard log transform for ratio features (excludes zeros)
  • Median fill: Replaces missing values with sector/industry median
  • Zero fill: Assumes missing financial data indicates zero (conservative)

5. Mispricing Calculation

Raw Mispricing

mispricing = (actual_mcap - predicted_mcap) / actual_mcap

Positive Mispricing

Current market cap exceeds model's predicted fair value. Suggests potential overvaluation — investors are paying beyond fundamentals.

Negative Mispricing

Current market cap is below model's predicted fair value. Suggests potential undervaluation based on fundamentals.

Size Premium Correction

Raw mispricing exhibits a systematic size effect: smaller companies tend to show positive mispricing while larger companies show negative mispricing. This reflects the historical "size premium" where smaller companies trade at higher multiples.

size_neutral_mispricing = raw_mispricing - size_premium(market_cap)

The size premium is estimated by fitting a smooth curve (spline or polynomial) to the mispricing vs. market cap relationship. This correction isolates stock-specific mispricing from the systematic size effect.

When to Use Each Mode

  • Raw: Compare stocks within similar market cap ranges
  • Size-Neutral: Compare stocks across different market caps (recommended)

Uncertainty Measure

relative_std = prediction_std / actual_mcap

Higher relative standard deviation indicates less confident predictions. Stocks with unusual financial profiles or sparse comparable data will have higher uncertainty.

6. Signal Quality & Backtesting

Backtest results measure whether historical mispricing signals predicted future price movements.

Information Coefficient (IC)

IC = correlation(mispricing_signal, future_return)

Interpreting IC

  • IC > 0: Signal worked — overvalued stocks underperformed, undervalued outperformed
  • IC ~ 0: No predictive signal
  • IC < 0: Signal inverted — overvalued stocks outperformed (momentum dominated)

On the dashboard, IC is displayed such that positive = good signal (mispricing predicted subsequent returns correctly).

Hit Rate

hit_rate = % of stocks where mispricing direction matched return direction

A hit rate above 50% indicates the signal has some directional predictive power. However, magnitude of returns matters more than hit rate for portfolio construction.

Statistical Significance

P-values are corrected using the Benjamini-Hochberg procedure to control false discovery rate when testing multiple hypotheses (horizons x sectors/indices).

Significance Stars

  • p < 0.05 (significant)
  • ★★ p < 5e-4 (highly significant)
  • ★★★ p < 5e-8 (extremely significant)

Horizon Analysis

Backtests are run across multiple forward-looking horizons (e.g., 5, 10, 21, 63, 126 trading days) to understand signal persistence and decay. Shorter horizons capture momentum effects while longer horizons reflect fundamental mean reversion.

7. Limitations & Caveats

Not Financial Advice

This tool is for research and educational purposes only. The mispricing signals should not be used as the sole basis for investment decisions. Always consult with a qualified financial advisor and conduct your own due diligence.

Model Limitations

  • Backward-looking fundamentals: Financial statements are historical. The model cannot capture future growth expectations, pending acquisitions, or unreleased products.
  • No intangibles: Brand value, intellectual property, network effects, and other intangible assets are not directly measured in financial statements.
  • Cross-sectional only: The model compares companies at a single point in time. It does not model time-series dynamics or macroeconomic factors.
  • Sector mixing: The model trains on all sectors together. Industry-specific valuation multiples may not be fully captured.
  • Survivorship bias: The dataset includes currently traded stocks. Delisted companies are not included in backtests.

Data Limitations

  • Variable coverage: Revenue has ~91% coverage, but some features like gross profit have lower availability (~43%). Missing values are filled with sector medians or zeros.
  • Point-in-time accuracy: Quarterly snapshots may not perfectly align with earnings release dates.
  • Market cap timing: Historical market caps are reconstructed from price × shares outstanding.

Backtest Caveats

  • Look-ahead bias: Model hyperparameters were tuned on the full dataset. True out-of-sample performance may differ.
  • Transaction costs: Backtests do not include trading costs, slippage, or market impact.
  • Past performance: Historical signal quality does not guarantee future results.