Composite What You Trust, Watch What You Don't: A Trust Boundary for Data With Money Attached

By Eric Caskey · June 17, 2026 · 10 min read

AI finance AWS Python software-development side-projects

A few times a quarter, one of my holdings lights up everywhere at once. StockTwits gets loud and lopsided. A prediction market reprices its next earnings overnight. The urge to act on it is immediate. So I pull up the position's health grade, the single number this whole system is built around, and it has not moved.

It was built not to. That stillness is the most deliberate thing in the whole system.

I wrote earlier about the first rule I gave this engine: the math decides, the AI only describes. The scoring is fixed and runs the same way every time, the AI writes the explanation, and it is never allowed to overrule a number. That post was deliberately narrow. This one is about a second rule, the one that kept the grade still while the crowd screamed: which data is allowed to touch the number, and which data only gets to sit beside it. Both rules come from the same place. There is real money attached, and the person I trust least with it is me, on a day when something is moving.

This is an old trick from security engineering, where it goes by the name taint tracking: data from the outside world is treated as untrusted and kept away from the decisions that matter until something explicitly vouches for it. A grade with money riding on it is precisely the kind of decision you guard that way. Most of what I can pull off the internet has not earned that trust, so the architecture's one job at this boundary is to keep it out of the number while still keeping it in view.

Here is the system on one screen. The solid arrows are the trusted core, the only data allowed to reach a grade. The dashed arrows are the watched perimeter: signals I want in front of me, wired so they can never move the number.

One grade, several faces#

The center of the system is a single health measurement for every position. It is composited only from sources I trust, and it is built so I can rotate it and read it from several angles: Quality, Valuation, Momentum, and Health. One number with several faces, not several numbers I have to reconcile by feel. It is the same engine whether I own the stock or am only researching an unheld name, with no special logic for the ones I happen to like.

Those four factors are not the ones I started with. The first version of this engine graded modern companies against a value rubric written in 1949 and handed one of the most dominant firms in the market a D+. I rewrote it around modern factor research, and that rewrite is its own story. What matters here is what feeds the result. Financial Modeling Prep (FMP) is the source of truth underneath it: fundamentals, prices, analyst grades, and it is the only feed that flows directly into the composite.

The rule from the first post still holds. A missing number never becomes a confident zero. If FMP cannot give me a real value for a factor, that factor is marked failed and its weight redistributes across the factors that still have real data, rather than scoring a hole as a zero and quietly dragging the grade down.

There is one deliberate exception, because even this rule can be gamed by absence. If the missing factor is Valuation, blind redistribution would let a company with genuinely terrible valuation slip out of the signal entirely, just by having a gap where its numbers should be. So a missing valuation is not redistributed away. It floors at a conservative default instead. The redistribution rule protects the grade from bad data; the floor protects it from absent data pretending to be neutral.

The watched perimeter#

Everything else I pull is there to be watched, not believed. I collect prediction-market data from Kalshi and Polymarket, and crowd sentiment from StockTwits. Polymarket carries most of the names, since Kalshi lists few individual tickers; the engine reads Kalshi only where it has a clean per-ticker contract and falls back to Polymarket for breadth. None of it touches the grade.

That quarantine is a design decision, not an oversight. These are public signals I want to be aware of precisely because the composite is built to ignore them. The use case is narrow and specific: if prediction-market odds on a name start moving ahead of its price, I want to know before the fundamentals catch up. It is a tripwire on the edge of the system, wired to get my attention, not to move my grade. On the Today view it sits next to a position as context, never folded into it.

I know the quarantine holds because a test enforces it. One of the checks in the suite swings the crowd sentiment on a ticker from cold to red hot and then asserts that the production grade comes out byte for byte identical. The watched signals can scream; the number does not flinch. A boundary you only describe in a design doc is a hope. A boundary with a failing test behind it is a fact.

There is a second reason I keep these signals at arm's length, and it is not only caution. In a separate experiment that never reaches the live grade, where I do let sentiment touch a number, it enters backwards: extreme bullish chatter and high retail attention both tend to forecast underperformance rather than strength, so loud crowd enthusiasm pushes that experimental score down, not up. The crowd is more useful to me as a fade than as a follow. Whether any of these watched signals actually predicts anything is a question I can only answer by testing it forward without fooling myself, which is a discipline of its own.

The one time I didn't hold the line#

I have made this mistake from the inside, which is part of why I am strict about it now.

I wanted my positions to refresh onto the screen automatically instead of by hand, so I reached for Plaid. Before I built anything, I asked an AI coding assistant whether Plaid supported Fidelity, and it told me yes. I had just spent a whole post explaining that I built this system around never trusting a confident, unverified claim. Then I trusted exactly one.

I did the real work on the strength of it. Getting application access to Plaid is not a weekend toy. I hardened the app to qualify, wrote terms-and-conditions pages, produced user guides for my own system, and stood up the consent surface a real integration needs. I got a sandbox working end to end. It felt close. Then I moved to production, logged in from the live side, and found that the only Fidelity entity supported was Fidelity Charitable, the donor-advised-fund arm. Not the brokerage. Not the thing I needed. I lost a day or two.

I call it a successful failure for two reasons. The integration and hardening work is real and reusable, so if I ever do want hands-free account linking I have already been through the approval gauntlet once. And the dead end clarified something I had not fully admitted to myself: I am not sure I want an AI reading my spending habits just yet. So today I enter positions and trades by hand. The failure turned the manual path into an honest choice instead of a fallback, and it left me a standing reminder of what a single unverified yes can cost.

The boundary that runs through me#

There is one more boundary in here, and it is the one I am proudest of, because it runs between me and myself. When I open a position I pre-register the thesis: the catalyst, what I think is already priced in, a benchmark, a target, a stop, and a probability. Those fields freeze at commit time. The record is append-only and the row is never overwritten, so I cannot quietly edit yesterday's reasoning to match today's outcome. The storage layer simply will not let me. When a position resolves, it books the realized return against the frozen forecast and, on a loss, files a post-mortem.

The honest difficulty is not the design, it is the upkeep: keeping the journal in step with the live portfolio as I trade. I am actively brainstorming systematic ways to capture it at stock-update time, so the pre-registration happens with as little manual friction as possible instead of being a separate chore I have to remember.

Where the line is#

The boundary is the right default, and like any default it is wrong in specific places, so it is worth holding honestly.

Quarantine too aggressively and you blind yourself to real information. Prediction markets and crowd attention are noisy, but they are not nothing, and a system that refused to even display them would miss the occasional moment when the edge of the network knows something before the fundamentals do. The answer is not to ignore the signal. It is to watch it without letting it vote.

Composite too eagerly and you do the opposite damage: you launder a rumor into a number. The instant a soft signal earns a weight in the grade, it inherits all the authority of the hard ones, and a reader, including me, can no longer tell which part of the score they should actually believe. One authoritative number that secretly blends a balance sheet with a message board is worse than two honest numbers kept apart.

Some data sits deliberately in between. My net worth is real and trusted, but it does not feed any grade. It sets context. A position that is 10% of the trading account might be 3% of everything I own and owe, and that changes how much the grade should worry me without changing the grade itself. Trusted enough to interpret the number, not to compute it.

The same suspicion runs one level deeper, into the rules themselves. The weights that turn factors into a grade are not free to drift. No change to the scoring logic reaches production without a human signing off, because the recipe that converts data into a verdict deserves at least as much scrutiny as the data going into it.

The test#

So here is the test I would hand anyone building a system that turns many signals into one consequential number. Point at the number and ask: can you separate the inputs you would stake money on from the ones you are only watching? If you cannot, the number is blending them, and it is lying to you with a straight face.

This is not really about finance. Any system that fuses signals into a single verdict has the same fault line: a credit decision, an anomaly score, an incident severity, a model that ranks anything that matters. The durable move is the same one the rest of this engine is built on. Decide what gets to count before you are under pressure, keep the rest where you can see it but it cannot vote, and write the boundary down so a test can defend it when you are tempted to soften it later.

The first post's line keeps the AI out of the deciding. This one keeps the unverified data out of the grade. The same line, drawn in a different place, for the same reason: when there are consequences, the parts that carry them should only ever read from sources you can stand behind. Everything else gets to sit on the outside, in plain sight, where you can watch it without being moved by it.

Building a Personal Finance Reviewer: What Survived the Rewrite: the first boundary, between the parts that decide and the parts that describe.
When the Spec Was Wrong: Rewriting a Shipped Decision: why these four factors, and what grading modern companies against a 1949 rubric got wrong.
How to Backtest Without Fooling Yourself: whether any of the watched signals actually predicts anything, tested forward without cheating.

Further reading, the honest sources behind the factors:

Asness, Frazzini, and Pedersen, Quality Minus Junk.
Bailey and Lopez de Prado, The Deflated Sharpe Ratio, on not mistaking a lucky backtest for an edge.
Aswath Damodaran for valuation done seriously.

Keep reading

Tool

Investment Committee

Score any stock across five weighted dimensions and get a letter grade with a written committee verdict.

Read

Tool

Market's Best

The top-graded stocks from the latest market scan. No sign-in needed.

Read

Demo

Grade my portfolio

Run a sample portfolio through the investor committee.

Read

Case study

Factor-First AI Investment Platform Narrated by a Six-Persona Committee

Grew a single-ticker grader into a full investment platform: a four-factor composite (Quality, Valuation, Momentum, Health) narrated by a six-persona committee, a nightly scan of 600+ large caps, portfolio and net-worth tracking, and a grade scale validated by a daily backtester.

Read

Post

Building a Personal Finance Reviewer: What Survived the Rewrite

A personal portfolio reviewer where the scoring is deterministic and the AI only narrates. The architecture that held up after I had to rewrite the model it was built on, and why that boundary is the whole point.

Read

Post

A Boring Design Let Me Run a Black Swan on a Tuesday

Two posts ago I bet that keeping my portfolio reviewer's engine deterministic and auditable was worth it. This is where that bet paid off: because the engine is replayable, I could run a simulated market crash through the real production code and catch a money-losing flaw on paper, before it could ever cost a real dollar.

Read

Follow the work

New tools and writing as they ship — pick a channel.

RSS feed LinkedIn

Written by Eric Caskey. I build AI tools you can actually use. Explore the Tools or see the case studies.

One grade, several faces#

The watched perimeter#

The one time I didn't hold the line#

The boundary that runs through me#

Where the line is#

The test#

Related#

Keep reading

Follow the work