Llama 2 on DOS
Just when you thought running retro software on modern machines was impressive, someone flipped the entire script. What’s more impressive than Chrome running on Windows 3.1 or Doom playing on a pregnancy test? How about a conversational assistant bootstrapped straight into 640K of real-mode memories on a classic IBM PC? Meet Llama 2 on DOS. Yes, you read that right. Forget your RTX cards and cloud GPUsthis is command-line commoditization at its finest.
How Did We Get Here?
It all started with a challenge: “Will it run?” And not just “will it run Blender” on a dinosaur of a laptop, but “can it run a 2020s-era chat engine under MS-DOS?” That’s where gizmo-fueled curiosity meets some real digital wizardry. Thanks to renowned retro tinkerer Matt Sarnoff, we now have a purpose-built solution running via custom execution layers wrapped with just the right amount of DOS memory paging trickery.
Llama 2yes, that Llamais wrangled by Sarnoff’s clever tool called llm4dos
, a full pipeline that packages miniature inference models to operate within the rigid constraints of MS-DOS. The result: a surprisingly coherent text-generating tool running on hardware that struggles with modern GIFs. Sarnoff even provides a neat front-end that wouldn’t feel out of place next to qbasic.exe
.
Zero GPUs, All Fun
There’s a certain poetry in watching today’s most talked-about tech perform on yesterday’s forgotten silicon. Think CPU-only, single-threaded executionon a processor clocked at under 100MHz. The software makes use of quantized weights (~4-bit representation), trimming datasets and optimizing just enough to nudge the model into the realm of possibility. And what do you get for all this compression sorcery? A REPL that can answer questions, write haikus, and describe quantum physics like it just booted up from a floppy disk.
Is it fast? Not by today’s standards. This isn’t about speed. It’s about possibilityand it’s that joyous combination of overengineering and childhood nostalgia that makes this oddly thrilling.
DOS Isn’t DeadIt’s Dormant
With modern tech heroes often focused on speed benchmarks and power consumption, this project takes a left turn into sheer possibility factor. The takeaway here isn’t about productivityit’s about exploration. It’s a love letter to the raw ecosystem of 80s computing, drawing a zig-zag line from autoexec.bat
to modern linguistic libraries. Sarnoff’s implementation uses ggml
under the hood, dynamically linked by a custom DOS extender. In other words: he’s jamming square pegs into vintage round sockets and somehow making them fit.
“This isn’t going to replace your laptopbut you’ll be thinking about it all week.”An amused systems engineer on Reddit
Why Do This? Because We Can
There’s no grand strategic need for a vintage system to answer your trivia queries. But projects like this fire up a deeply human urge: to push boundaries, just to see if we can. Whether it’s running Tetris on an oscilloscope or streaming Netflix through a Commodore 64, the joy here lies in subverting expectations.
For developers, it’s a testbed for serialization, quantization, memory-limited inference, and UI-for-unexpected-environments. For everyone else, it’s a case study in how computing history loops upon itselfunexpectedly relevant, delightfully inefficient, and utterly beautiful.
How to Try It Yourself
If you’ve got a working 486 PC, a bootable DOS disk, and some C compiler chops, you can follow Sarnoff’s project on GitHub. Otherwise, a DOSBox setup will get you most of the experience (minus the tactile joy of listening to your hard drive click-boom under pressure).
Once installed, execution is straightforward:
C:\> llm4dos.exe llama.bin
And voilayou’ve got a prompt. From here, it behaves like a stripped-down shell that writes back complete sentences, poems, or witty responses in glorious white-on-black VGA output.
The Bigger Picture
Strip away the layers of tech-speak and you’ve got a compelling story of digital minimalisma punk rock attitude in a world obsessed with 8K displays and neural acceleration units. We need projects like this to remind us that innovation doesn’t always scream from data centers or next-gen silicon. Sometimes, it quietly beeps from a beige rectangle in someone’s attic.
Sure, it won’t help you code your next mobile appbut it will remind you why we fell in love with computers in the first place: because they do what we tell them to. Even if what we tell them makes no practical sense.
Final Thoughts
If you ever doubted the longevity of classic platforms, Llama 2 on DOS stands as proof that the old dogs can still learn new tricksespecially if those dogs are booted from a 3.5″ floppy disk. It’s raw, it’s ridiculous, and it’s radically fun.
Long live the blinking cursor.
Now someone port this to a fax machine, and we’re done here.