Why AI Needs Cited, Point-in-Time SEC Data

The Shift

Research Used to Be a Workflow. Now It's a Question.

What used to mean logging into a platform, clicking through menus, exporting data, and stitching it together in a doc now happens in one natural-language question — answered in seconds, with citations.

Before

Log in. Click. Export. Stitch.

Open three or four platforms and a spreadsheet
Click through menus, run exports, reconcile tickers
Paste it into a doc and hope nothing's stale
Re-do it all when someone asks "as of when?"

An afternoon, per name

With Kaleidoscope MCP

Ask once. Cited answer in seconds.

Ask Claude in plain English
It composes the right tools dynamically — many at once
You get a synthesized read, each fact cited to a filing
Add "as of Q3 2021" and it reconstructs that exact moment

One question, seconds

The Problem

Speed Is Table Stakes. Being Right Is the Hard Part.

Frontier LLMs are brilliant at language and reasoning. But three structural gaps — that no amount of model scaling fixes — mean Claude alone can't be trusted with serious research. Kaleidoscope MCP fills exactly that layer.

Gap 01 · Frozen training cutoff

It doesn't know what happened after training

The latest 10-K, this morning's M&A announcement, last quarter's 13F — none of it exists for a model frozen at its training date.

With MCP: Claude queries filings, news (2020–2026, refreshed daily), and institutional holdings as they're published — and cites them.

Gap 02 · No point-in-time

It can't reconstruct "what was true on date X"

Ask Claude to rebuild a company's pipeline as of Q3 2018 and it returns today's pipeline (wrong) or a confident reconstruction (untrustworthy).

With MCP: Every tool accepts an as_of date and returns the state of the world as it was knowable then — leak-safe by construction.

Gap 03 · Survivor bias

Its base rates are the success rate of the winners

Training is dominated by the companies and drugs that got written about. The failures rarely make the corpus, so any base rate Claude computes is inflated.

With MCP: Base rates are computed over a survivorship-complete population — failures, delistings, and discontinued programs included in the denominator.

In Practice

How Investors Are Actually Using It

Each of these is one natural-language question. Claude composes the tools, and returns a synthesized read — not a raw data dump — with every fact traceable to its source.

You ask "Brief me on this name."

Claude fires get_company_profile and returns a synthesized read — echo composite, distress signal, recent 8-K activity, insider trades, credit move, and latest-quarter fundamentals — each line cited to the filing that established it. An hour of tab-switching, compressed to one answer.

get_company_profile get_fundamentals

You ask "Reconstruct their pipeline as it was knowable end-2020."

Not today's pipeline, and not a confident guess. Claude calls bio__pipeline in point-in-time mode and returns the assets, phases, and indications that existed then — each cited to the specific 10-K accession and snippet. Leak-safe by construction.

bio__pipeline get_company_profile

You ask "What's the real base rate here?"

"What fraction of programs against this target have actually succeeded?" Claude computes it over a survivorship-complete population — 17,296 reconstructed program trajectories with the discontinued and delisted programs in the denominator. The number Claude can't get from its training corpus.

bio__target_base_rate bio__target_trajectory

You ask "Who's been quietly accumulating?"

Claude screens institutional flow for cluster entries, quiet accumulation, and consecutive-quarter adds across hundreds of notable funds — then reads smart-money consensus on any name. Share-based actual trading, not price-inflated values.

thirteenf_screen thirteenf_smart_money_consensus

You ask "Show me active hostile campaigns demanding board seats."

Claude queries event-driven 13D/13G data — filed within ~10 days of crossing 5% — with the stated purpose of transaction, the accumulation trail, and LLM-classified intent and tone. The signal quarterly 13F data misses entirely.

search_activist_positions thirteenf_search_managers

You ask "Watch my list — anything moving?"

Claude scans your whole watchlist at once against pre-aggregated signals over 152M filing sentences and 127M news sentences — distress echoes, rising classifiers, restructuring-language 8-Ks — and ranks what changed. A multi-hour batch job, returned sub-second.

screen_companies search_classifiers

The Difference

Why This Is Hard to Replicate

"Couldn't someone just point an LLM at SEC EDGAR?" The short answer is no. The long answer is five things that took years to build.

Extraction Pipelines Tuned Over Years

The 529-classifier taxonomy and 152M classified filing sentences aren't a corpus you can spin up. They're the output of an extraction system tuned over years against real outcomes.

The Survivorship-Complete Denominator

Including delistings, failed Phase 3s, and acquired-for-pennies trajectories means reconstructing companies that no longer exist from filings no aggregator surfaces. The base rate is the moat.

Joined-Panel Signals

Risk score = XBRL × 13F × extraction × out-of-sample harness. Manager track record = 13F × price × outcome resolution. These aren't retrievable; they're computed from the joined panel, then served as one pre-validated number.

The Provenance Contract

Every fact carries its source filing, snippet, and as-of date. Most APIs return data; we return data plus verifiability — which is what makes LLM agents trustworthy inside it.

Continuous Freshness

New filings flow through the pipeline as the SEC publishes them; news refreshes daily; biotech extractions run continuously. A competitor would need to match not just the data but the cadence.

The Bottom Line

A Faster Wrong Answer Is Still Wrong

Every research platform is racing to bolt an AI chat onto its data. The hard part was never the chat — it's whether the answer is true as of the date you asked, whether the base rate counts the failures, and whether every claim traces to a filing you can open. That's the layer Kaleidoscope MCP adds, and it's the layer that took years to build.

FAQ

Common Questions

What Kaleidoscope MCP is, which AI clients it works with, and how to connect it.

What is the Model Context Protocol (MCP)?

MCP is an open standard that lets AI assistants like Claude, ChatGPT, Gemini, and Cursor call external tools and data sources directly. Kaleidoscope MCP is a family of MCP servers that connect those assistants to cited SEC, SEDAR, and market research data.

What is Kaleidoscope MCP?

Kaleidoscope MCP is a family of read-only research servers — one per domain — that give any MCP-compatible AI client direct access to SEC filings, institutional 13F flow, activist 13D/13G campaigns, fundamentals, news, Canadian SEDAR filings, and more — every answer cited to a source filing and accurate as of any historical date.

Which AI clients does Kaleidoscope MCP work with?

Any client that supports the Model Context Protocol over HTTP — including Claude (Desktop, Code, and claude.ai), ChatGPT, Google Gemini, Cursor, and custom agents.

How do I connect Kaleidoscope MCP to Claude or ChatGPT?

Request a demo and we'll allowlist your email for the servers your engagement covers. Add a server's endpoint URL to your client's MCP settings; on first use your client opens a one-time email sign-in — no API keys or bearer tokens to manage. Every tool is read-only and side-effect-free, so clients can auto-approve them.

How do I get access to Kaleidoscope MCP?

Access is by invitation, per server. Request a demo and we'll walk through the data on your own questions, then allowlist your team's emails for the servers you need.

What data can I access through Kaleidoscope MCP?

Ten live servers across the market: SEC filings, 13F flow and activist ownership, XBRL fundamentals, news, Canadian SEDAR and NI 43-101 technical reports, private-company funding, material agreements, M&A deals, biotech pipelines, catalysts, power, and corporate debt. Every fact returns its source filing and as-of date.

AI Can Finally Do the Research. The Question Is Whether You Can Trust It.