How the denoising works
Trust for a facts product is verifiable provenance, not accuracy claims. Here is exactly how a raw headline becomes a typed, entity-resolved fact - and what we throw away.
1. Ingestion & classification
The raw stream (financial news + tracked-actor posts) is classified rule-by-rule into typed events - whale transfer, regulation, M&A deal, macro policy, earnings, partnership, upgrade, hack/exploit, listing, sanction, ETF flow. No LLM guessing in the hot path: the types are rule-derived and auditable.
2. Two-stage denoise
A fact survives only if it carries a type, a resolved subject and - where the event implies one - a magnitude. Generic "10 stocks to buy" headlines are dropped at the source. In a recent run this took 28,179 raw items down to 2,402 typed facts with zero false positives in review - about 91% removed.
3. Entity resolution
Every subject is resolved to one canonical ID: NVDA, Nvidia, @nvidia and $NVDA all map to the same ticker, across equities and crypto. We favor precision over recall - the resolver returns null rather than risk a wrong merge, so your agent never silently conflates two entities.
4. Magnitude & direction
Where the text carries a dollar figure it is parsed to a USD magnitude; a direction hint is attached. The hint is explicitly a hint, never a graded call - we do not dress a headline up as a signal.
5. Canonical events & the graph
Facts that mark a structural inflection are graded with a p_canonical score and dated - the early-warning layer. Interpreted, typed edges (not co-mention noise) feed the knowledge graph, so what you traverse is meaning, not coincidence.
What we deliberately do not do
We do not claim a price edge, we do not give investment advice, and we do not expose the internal paper-trading books - those are research instrumentation, not the product. Every number on the track record is forward-only, with the control group and global false-discovery rate disclosed.