Life Without Bloomberg: A CFTC COT REST API Built From Scratch

Backend Architecture

The CFTC publishes 30 years of futures positioning data every Friday — buried in a legacy ZIP file with 300 columns and no API. We built one. A REST API that ingests, normalizes, and exposes COT data across 398 futures contracts with Z-scores, pressure metrics, COT Index, and divergence signals computed on the fly. Full architecture walkthrough: Django ORM, incremental sync pipeline, API key authentication, and a live Z-score snapshot across all mainstream contracts.

cftc-cot-report-text-file
Antoine Perrin Profile Picture

Antoine

CEO - CodeMarketLabs

2026-04-21

Life Without Bloomberg: A CFTC COT REST API Built From Scratch

Every Friday at 3:30 PM Eastern, the U.S. Commodity Futures Trading Commission publishes the Commitments of Traders report. It contains the full breakdown of open interest across 400+ futures markets — hedge funds, commercial hedgers, swap dealers, non-reportable traders. Who is long, who is short, how many contracts, how concentrated. It is one of the most actionable public datasets in finance. And it is delivered as a ZIP file containing a 1990s-era CSV with 300 columns and no documentation. This article documents the build: how we parsed, structured, and exposed 30 years of COT data as a clean REST API — with Z-scores, pressure metrics, COT Index, and divergence signals computed on the fly.

What this article covers

  • The data architecture: Django ORM models for OpenInterest, Traders, Concentration across 398 contracts from 1986 to present.
  • The ingestion pipeline: automated sync from CFTC historical ZIPs, incremental updates, idempotent bulk_create.
  • The analytics layer: Z-score, pressure (|z| / max|z|), COT Index (percentile ranking), divergence (Commercial vs Non-Commercial), and active signal detection.
  • The API design: 25+ REST endpoints, API key authentication with per-day rate limiting, mainstream contract filtering.
  • Live Z-score snapshot: where every mainstream futures contract sits right now relative to its full history.

1. The Problem With COT Data

The CFTC publishes COT data in two formats: a viewable HTML table on their website, and a compressed annual ZIP containing a legacy XLS file. The XLS has 300+ columns with names like NonComm_Postions_Spread_All — note the typo in 'Positions', baked into the official dataset since 1986 and never corrected. There is no API. There is no JSON. There is no way to query 'give me the net Non-Commercial positioning on Gold for the last 5 years' without downloading multiple annual files, concatenating them, filtering by contract code, and computing the net yourself. This is the state of public financial data in 2026.

cftc-cot-report-text-file
cftc-cot-report-text-file

The bigger problem: the raw numbers are useless without context. Knowing that Non-Commercial traders hold 180,000 long contracts on Gold means nothing without knowing the historical range. Is that elevated? Is it near the all-time high? Has it been increasing for 4 weeks? The COT report requires normalization, and normalization requires history. The only way to do this properly is to ingest the full 30-year dataset and compute relative metrics at query time — which is exactly what this API does.

2. Data Architecture

The database has five core models: Contract (398 markets, each with a CFTC contract code, commodity type, exchange, and region), OpenInterest (weekly positions for Non-Commercial, Commercial, and Non-Reportable traders), Traders (weekly count of distinct reporting traders per category), and Concentration (top 4 and top 8 trader concentration ratios — gross and net). Each OpenInterest record stores raw positions; derived metrics like net_non_comm, open_interest, and all percentage breakdowns are computed as model properties at read time, not stored.

python
class OpenInterest(models.Model):
    contract = models.ForeignKey(Contract, on_delete=models.CASCADE)
    reported_date = models.DateField()
    non_comm_long_all = models.IntegerField()
    non_comm_short_all = models.IntegerField()
    non_comm_spread = models.IntegerField()
    comm_long_all = models.IntegerField()
    comm_short_all = models.IntegerField()
    non_reportable_long_all = models.IntegerField()
    non_reportable_short_all = models.IntegerField()

    @property
    def net_non_comm(self):
        return self.non_comm_long_all - self.non_comm_short_all

    @property
    def open_interest(self):
        return max(
            self.non_reportable_long_all + self.total_reported_long,
            self.non_reportable_short_all + self.total_reported_short
        )

    class Meta:
        unique_together = ('contract', 'reported_date')
cot-report-api-architecture
cot-report-api-architecture

3. The Ingestion Pipeline

The CFTC publishes annual ZIP files for the Legacy Futures Only format going back to 1986. The ingestion pipeline is a Django management command (sync_cftc) that downloads the relevant years, parses the XLS, and bulk-inserts only the records not already in the database. The key design choices: use report_date as a column name instead of _date (itertuples renames columns starting with underscore), filter on reported_date > latest_db_date for incremental updates, and use bulk_create with ignore_conflicts=True for idempotent reruns.

python
def handle(self, *args, **options):
    latest_date = get_latest_db_date()
    contract_map = {c.contract_code.strip(): c for c in Contract.objects.all()}

    years_to_fetch = sorted({latest_date.year, CURRENT_YEAR})

    for year in years_to_fetch:
        df = download_and_extract(year)  # GET dea_fut_xls_{year}.zip
        df['report_date'] = pd.to_datetime(df['Report_Date_as_MM_DD_YYYY']).dt.date
        df = df[df['report_date'] > latest_date]
        ingest_df(df, contract_map)

# python manage.py sync_cftc              # incremental
# python manage.py sync_cftc --year 2012  # specific year
# python manage.py sync_cftc --full       # full history since 1986

4. The Analytics Layer

Four metrics drive the API's analytical value. The Z-score normalizes the net Non-Commercial position (long minus short) relative to its mean and standard deviation over a configurable window. A Z-score of +2 means positioning is 2 standard deviations above its historical mean — historically elevated. The Pressure metric normalizes the Z-score further: |z_current| / max(|z|) over the period, giving a 0-to-1 score where 1 means the market is at its most extreme relative positioning ever recorded in the window. The COT Index is the simplest: (current - min) / (max - min) × 100, a percentile rank. Above 80 means the most stretched long in the window; below 20 means the most stretched short. The Divergence metric tracks the gap between net Non-Commercial and net Commercial — when speculators are maximum long and hedgers are maximum short, that structural tension is historically mean-reverting.

cot-gold-z-score-series
cot-gold-z-score-series
python
vals = [row['non_comm_long_all'] - row['non_comm_short_all'] for row in qs]

mean = statistics.mean(vals)
std  = statistics.pstdev(vals) or 1.0
zscores = [(v - mean) / std for v in vals]

# Pressure = how extreme is the current z relative to all historical z
abs_z    = [abs(z) for z in zscores]
max_z    = max(abs_z) if abs_z else 1.0
pressure = abs(zscores[-1]) / max_z  # 0.0 → 1.0

# COT Index = percentile rank in the window
min_v, max_v = min(vals), max(vals)
cot_index = round((vals[-1] - min_v) / (max_v - min_v) * 100, 2) if max_v != min_v else 50.0

5. API Design

The API exposes 25+ endpoints organized in four layers: reference data (contract list, types, search), raw historical data (open interest, traders, concentration series), per-contract analytics (Z-score series, pressure series, COT index, divergence, full summary), and cross-universe analytics (Z-score ranking across all mainstream contracts, extremes above a threshold, active signal detection). Authentication uses a custom API key header (X-API-KEY) with per-day rate limiting tracked in the database — no Redis required. The mainstream flag on each Contract filters the universe to the ~55 most liquid markets, eliminating noise from obscure basis swaps and regional electricity contracts.

bash
# Full summary for Gold
curl -H "X-API-KEY: your_key" \
  https://api.cotreports.codemarketlabs.com/api/contracts/088691/summary/

# Z-score ranking across all mainstream contracts
curl -H "X-API-KEY: your_key" \
  "https://api.cotreports.codemarketlabs.com/api/contracts/zscore/ranking/?weeks=156"

# Active signals — all mainstream contracts, 3-year window
curl -H "X-API-KEY: your_key" \
  "https://api.cotreports.codemarketlabs.com/api/contracts/signals/?weeks=156"

# Extremes — FX contracts with |z-score| > 2
curl -H "X-API-KEY: your_key" \
  "https://api.cotreports.codemarketlabs.com/api/contracts/extremes/?type=FX&threshold=2.0"

6. Current Z-Score Snapshot

The chart below shows where every mainstream futures contract currently sits on its Z-score — computed since inception on each contract's full available history. The red dot marks the historical minimum (most extreme short positioning ever recorded), the green dot marks the historical maximum (most extreme long). The dark bar is the current Z-score. The light bar is the Z-score from the previous week. This gives full context: not just where positioning is, but where it has been, and whether it is moving toward or away from an extreme.

cot-gold-z-score-snapshot
cot-gold-z-score-snapshot

7. What's Next

The current API covers the Legacy Futures Only report — Non-Commercial, Commercial, Non-Reportable. The next build adds the Disaggregated format (Producer/Merchant, Swap Dealers, Managed Money, Other Reportables), which gives a sharper picture of who is actually driving the positioning moves. Managed Money — the COT category that most closely maps to systematic hedge funds — is a cleaner signal than the legacy Non-Commercial bucket, which mixes trend-followers with macro funds and option dealers. The COT Index and Z-score methodology transfers directly; the data model needs four new position categories per record.

What does the Z-score actually measure?

The net Non-Commercial position (long minus short contracts) normalized by its mean and standard deviation over a chosen window. A Z-score of +2 means speculative positioning is 2 standard deviations above its historical average. The default window is since inception (full available history per contract). Override with ?weeks=52 (1 year), ?weeks=156 (3 years), etc.

What is the difference between Z-score and COT Index?

Z-score is a statistical measure relative to mean and standard deviation — it can go above 3 or below -3 in extreme markets. COT Index is a bounded percentile: (current - min) / (max - min) × 100, always between 0 and 100. COT Index above 80 = top 20% of all historical long positioning. Both are useful; the COT Index is more intuitive for traders, the Z-score is more useful for systematic models.

How often is the data updated?

The CFTC publishes weekly, every Friday at 3:30 PM Eastern for the prior Tuesday's data. The sync_cftc command runs incrementally — it only downloads and inserts records more recent than the last date in the database.

What is the mainstream filter?

The CFTC tracks 398 futures contracts, including obscure basis swap instruments, regional electricity hubs, and micro contracts with minimal open interest. The mainstream=true filter restricts the universe to ~55 liquid, widely-followed markets across FX, Rates, Equity Index, Energy, Metals, Agricultural, and Crypto. All cross-universe endpoints default to mainstream=true. Pass ?mainstream=false to query the full 398-contract universe.

Is the full 30-year history available?

Yes. The CFTC Legacy Futures Only format goes back to January 1986 for most contracts. The database stores the complete history. For newer contracts (Crypto, recent equity sector ETF futures) the history is shorter. The inception-window Z-score always uses the full available history per contract — a Gold Z-score uses 38 years of data, a Bitcoin Z-score uses data since 2018.