RESEARCH

How we validate a strategy.

Ideas are cheap. Proving one is real is the expensive part, and most strategies that look real in research fall apart in production. The usual culprits are well understood — selection effects, overfitting, evaluations that flatter the researcher instead of testing the strategy. Our standards exist to catch them before capital is at risk. They are house policy, and we do not make exceptions.

A result is worth trusting only when nothing that shaped it was chosen by looking at it.

Selection is separated from reporting.

No result is trusted if anything that produced it was chosen by looking at it. The choices that shape an evaluation are made before the evaluation runs, and the record of those choices is kept. A result that cannot demonstrate this separation is treated as exploratory, whatever it shows, and exploratory results do not advance.

Evaluations are pre-registered.

What will be measured, and what would count as failure, is written down before the evaluation runs. Robustness is measured across independent trials rather than a single favorable run, and statistical significance is corrected for the breadth of the search that produced the candidate. Breadth of search is a cost the evidence has to cover.

Held-out data is earned.

Data exists that research never touches. A strategy earns its way to it by surviving everything else first, and each strategy gets one look. There is no second attempt against the same held-out period; a failure there is final for that line of work.

Simulation precedes capital.

Before a dollar follows a strategy, it trades on paper — live, forward in time, at realistic costs. Forward validation does real work. Most surviving candidates are expected to fail here, and the stage exists to give them room to.

THE SAME BAR, EVERYWHERE

One standard, applied without exception.

The four standards apply to every strategy, including the ones we are most attached to. A rule that bends when the result looks good has stopped protecting anything, so ours do not bend. The harder a candidate is to tell apart from luck, the more evidence we ask of it, and the breadth of the search that produced it counts against the significance of what it shows.

The process cannot promise a strategy will make money; no honest process can. What it can do is make sure the case behind a strategy was built without self-deception. Until real capital and real time weigh in, that is as far as the evidence takes us — and it is further than most research ever gets.

We use statistical and machine learning methods where they are warranted. The standard of evidence does not change with the method.

We write about the thinking behind these standards in our notes.