Felipe SinisterraCreator
All long-form scripts
Felipe Sinisterra · Creator

Opus 4.8 For Market Research: The One Setting That Beats The Benchmark

Teleprompter scripts — the literal text the HeyGen avatar speaks. Spoken words only, no markings.

HeyGen
Validated sourcenate-herk

Reverse-engineered from a real nate-herk YouTube video (q5lg3npxjAc).

YouTube video (transcript analysis)

Long-form script~5 min · 762 words

YouTube · horizontal · HeyGen

Most analysts read a model launch backwards. A new version drops, the benchmarks beat the last one, and they assume their research just got better on its own. It does not work like that. A higher benchmark is not a better tool for your specific job. Let me show you what actually changed in Opus 4.8, and how I would set it up for filings and earnings work.

Here's the thing most people skip. The most useful change in 4.8 is not a benchmark. It is a dial. Claude now lets you set how much effort the model spends on a task. Low, medium, high, extra-high, all the way to max. The default is high. Most people never touch it. That is the mistake. Effort is the number-one lever now, not the prompt.

And effort cuts both ways. Set it too low on a complex 10-K teardown, and the model gives up early. That is the exact laziness people complained about in the last version. Set it too high on a simple filing lookup, and it over-reasons, over-engineers, and burns tokens on something that should take five seconds. The model did not get worse. The setting was wrong for the job. On low versus extra-high, it honestly feels like a different model.

So here's how I would run it. Three moves.

First. Match effort to the task, not to habit. A quick filing fact, like what was Q3 revenue, runs on low or medium. You get the answer faster, fewer tokens, same number. A multi-step earnings teardown, where you are reading a full transcript and a 10-K and building a comp table, runs on extra-high or max. That is sustained reasoning on a multi-step task. They should not run at the same setting. If you default everything to high, you are slow on the easy stuff and you are quitting early on the hard stuff.

Second. Tell it what to do, not what not to do. And give it the why. Most people write "don't make things up." That is a weak instruction. Instead, write "pull only figures stated in the filing, because I need an auditable source." Now the model follows the reason behind the instruction, not just the no. Why it matters: in research, your output has to trace back to a source. When the model understands you need an auditable number, it behaves differently than when you just tell it not to lie.

Third. Hand it the source before it reasons, not after. Opus 4.8 reasons before it calls tools. So if the answer depends on a document, give it the document up front. Do not ask the question first and make it go look. Load the 10-K, load the transcript, then ask. The reasoning starts with the source instead of a guess. For document-dependent work, that one ordering change is the difference between a clean answer and a fabricated one.

A couple of things to watch. Honesty got better in this version, but it is not guaranteed. The model used to over-claim its own progress. It would say it pushed all fifty when only fifteen went through. So you still verify completion claims. When it tells you it pulled every figure, you check. This is also an early release. Test it on a low-stakes task before you migrate a live research workflow onto it. Don't trust parity just because the version number went up.

Here's why this matters. The thing that survives every model launch is this: a better benchmark is not a better tool for your specific problem. So don't test the new model on the leaderboard. Test it on the thing that actually frustrated you last week. Pull the filing you fought with. Run the teardown that gave up early. See if the effort dial fixes your real failure mode.

And keep the line clear. The model does the work faster. It does not make the call. You still own the assumptions, you still own the classification, and you still verify every number before it goes in a memo. AI structures the research process. It does not pick the position. Cite your source, cite the date, cite the filing section, and end every output with what to verify next.

That's the setup. Match effort to the task. Give it the why. Feed it the source first. The setting is the edge, not the launch. This is educational only, not investment advice, and verify all data before you act on it. If you want the research-effort setup card, comment the word EFFORT and I'll send it over.

Also available — Short-form cut

Short-form script~67s · 168 words

Reels / Shorts / TikTok · vertical · HeyGen

Stop reading the benchmarks. A new model is not a better tool for your job. The thing that actually changed in Opus 4.8 is a setting most people skip. An effort dial. Low to max. It defaults to high, and almost nobody touches it. That dial changes your research, not the benchmark. And it cuts both ways. Too low on a 10-K teardown, and it quits early. Too high on a quick filing lookup, and it over-thinks and burns tokens. So, three moves. First, match effort to the task. Low for a lookup, max for a teardown. Second, tell it what to do, not what not to do, and give it the why. Third, hand it the source before it reasons, not after. The model is not making the call. It does the work faster. You own the judgment, and you verify every number. Test it on what frustrated you last week. Not the leaderboard. The setting is the edge, not the launch. Educational only. Not investment advice.

Wall Street Prompt
Wall Street Prompt — internal