about

what otis is

an instrument for one question, built around an otter.


the one-line version

otis takes a task, tries it many times in parallel across several gpus, checks which tries actually worked, and tells you how the success rate climbs as you spend more compute.

everything else on this site is detail around that sentence.

the question it studies

there are two ways to make a language model better at a task. you can train a bigger, smarter model, which is slow and expensive. or you can take the model you already have and let it think harder at the moment you ask by spending more compute per question. the second one is called test-time compute, and it is the trend otis is built to poke at.

the cleanest way to spend that compute is also the most parallel: instead of one careful attempt, make many independent attempts and keep the best. that only works if you can tell which attempt is actually good. so otis pairs two things that have to go together:

why it has to be a real check

if you ask a model "is your answer correct?" it will usually say yes, confidently, whether or not it is. selecting the best of many tries only works when selection is honest. so a task in otis ships with a verifier: a shell command that runs against the result and exits zero only on success. run the tests. compare the output. check the file. no vibes.

because the check is real, "did any try pass" and "did the try we picked pass" are the same number. the result is an honest success rate, not just an optimistic upper bound.

what otis actually is, concretely

otis is a small python harness, not a service. it has:

the model serving is handled by vllm, one copy per gpu. read the details on the gpus page and the full procedure on the method page.

what otis is not

it is not a chatbot, an assistant, or a product you point at your work. it is a measurement instrument. you give it a task with a known answer and it tells you how reliably a given model solves it for a given compute budget. the output is a number and a curve. that is the whole job, and it does it well.

why an otter

sea otters hold things on their bellies and float in groups called rafts. a raft of otters, each holding its own gpu, each taking its own shot at the same fish — that is the picture. the name stuck before the code did.


     .-""-.   .-""-.   .-""-.
    ( o  o ) ( o  o ) ( o  o )
     '-..-'   '-..-'   '-..-'
       a small raft of otis