How Will Smith Eating Spaghetti Became a Benchmark for AI Video Progress

3 Jun 2026

In March 2023, when mainstream AI mania was still finding its shape, Reddit, chaindrop, posted an AI-generated video to r/StableDiffusion with the simple prompt, “Will Smith Eating Spaghetti.” The result was what some have called “cursed”. A synthetic version of the global movie star appeared to eat pasta while his face warped, his hands dissolved, and the noodles tested the laws of physics. The entire scene looked like a fever dream from hell.

It was funny because it was bad. Then it became useful because it stayed bad in very specific ways.

Over the years, “Will Smith eating spaghetti” has become the internet’s unofficial benchmark for AI video progress. And all because it unwittingly compressed all of generative video’s hardest problems into one scene: a recognizable human, using hands, interacting with a deformable object, moving food into a mouth, chewing, maintaining identity, obeying physics, and now syncing all of that with audio.

Will smith eating spaghetti

2023: Terrible Beginnings

The original Will Smith spaghetti clip arrived when text-to-video was still visibly primitive. AI image generation had already entered the mainstream, but video remained unstable, slippery, and strange.

That is what made the clip memorable.

The model understood the ingredients of the prompt, but not the event. There was a man. There was food. There was an attempt at eating. But the face changed shape, the hands failed, the pasta behaved like a living organism, and the scene seemed to forget itself from one frame to the next.

The clip became a meme because it was grotesque. It became a benchmark because its failures were so specific.

It exposed the central problem of early AI video: generating motion is not the same thing as understanding action.

https://arstechnica.com/information-technology/2023/03/yes-virginia-there-is-ai-joy-in-seeing-fake-will-smith-ravenously-eat-spaghetti/?embedable=true

2024: Weird Benchmarks Find Their Purpose

By 2024, the Will Smith spaghetti test had left the realm of inside jokes and become part of a wider pattern: the internet was inventing weird, informal benchmarks faster than the industry could explain its formal ones.

https://www.instagram.com/reel/C3i5vAZvRS3/?embedable=true

TechCrunch grouped Will Smith eating spaghetti alongside other odd AI tests, including Minecraft-building agents and AI systems playing games like Pictionary and Connect 4. The point was not that these were scientifically superior. It was that they were understandable. A benchmark score on a hard math test may impress researchers, but a cursed celebrity failing to eat noodles tells everyone exactly what is wrong.

This was also the year Sora changed the conversation around AI video. OpenAI framed Sora not merely as a video generator, but as a step toward models that can simulate the physical world in motion.

That shift matters. The spaghetti test is, at its core, a physical-world test. The model has to preserve identity, objects, contact, movement, and cause-and-effect across time. The 2023 clip failed because it could not maintain a stable world.

Sora made the industry’s ambition clearer: AI video was no longer just about generating moving images. It was about generating scenes that hold together.

2025: Veo and the Crunchy Spaghetti Problem

By 2025, newer video models were producing dramatically better versions of the spaghetti test. Faces were more stable. Hands looked less monstrous. The food moved more plausibly. The whole thing became less obviously cursed.

https://arstechnica.com/ai/2025/05/googles-will-smith-double-is-better-at-eating-ai-spaghetti-but-its-crunchy/?embedable=true

Then audio entered the chat.

Google’s Veo 3 pushed AI video further into native audiovisual generation. That made the spaghetti test even more interesting, because the model now had to generate not just what eating pasta looked like, but what it sounded like.

And that created a new failure mode: crunchy spaghetti.

That detail is funny, but it is also revealing. Spaghetti should not sound like chips. If the visuals look convincing but the audio misunderstands the material, the illusion breaks.

The test had evolved. In 2023, the question was whether AI could generate a man eating pasta without producing a demon. By 2025, the question was whether it could make the face, hands, noodles, motion, and sound agree with each other.

https://africa.businessinsider.com/news/then-vs-now-ai-videos-of-will-smith-eating-spaghetti-show-just-how-advanced-the-tech/pc7ee96?embedable=true

2026: We’re in Our “Has AI Figured Out Video Era?”

By 2026, the latest versions of the spaghetti test had become unsettling for the opposite reason: they were no longer immediately broken.

https://www.techradar.com/ai-platforms-assistants/chatgpt/will-smith-eating-spaghetti-was-peak-ai-chaos-in-2023-now-it-shows-how-fast-the-tech-has-evolved?embedable=true

Models like Kling and Seedance have been used in newer comparisons showing far better lighting, motion, character consistency, and scene coherence. The old nightmare-fuel quality is fading. The newer clips can look, at least at a glance, like a normal person eating food.

https://x.com/MarioNawfal/status/2053023918576005476?s=20&embedable=true

That is real progress.

But “at a glance” is doing a lot of work.

The spaghetti test is still not a controlled benchmark. We usually do not know the exact prompt, number of generations, whether reference images were used, whether the clip was edited, or how many failed attempts were discarded. TechCrunch made the same broader point about weird AI benchmarks: they are entertaining and easy to understand, but not empirical or fully generalizable.

So no, the test is not “solved” in any rigorous sense.

But the direction is obvious. AI video has moved from chaotic prompt interpretation to something much closer to controllable scene generation.

What’s the Real Story Behind the Will Smith AI Spaghetti Videos?

The Will Smith spaghetti meme is not important because of Will Smith. It is important because eating pasta is a deceptively dense simulation problem.

A model that handles this prompt well has likely improved across several core areas of video generation.

Identity Persistence

The subject has to remain recognizably the same person across frames. Early AI video treated faces like unstable textures. Newer systems are better at keeping a character’s appearance consistent, which is essential for storytelling, advertising, product demos, games, and any real production workflow.

Hands and Human Anatomy

Hands are still one of the easiest ways to spot bad generative media. The spaghetti prompt puts hands right in the center of the action. The model cannot hide them. It has to deal with grip, fingers, utensils, and motion.

Mouth-Object Interaction

Eating is hard because it requires timing and contact. The fork approaches. The mouth opens. The food bends. Some of it disappears behind the lips. The jaw moves. If any part of that sequence is off, the clip becomes instantly fake.

Food Physics

Spaghetti is a nightmare object. It is flexible, tangled, wet, thin, and irregular. It does not behave like a cube, a car, or a chair. It droops, clumps, slides, stretches, and falls. That makes it a surprisingly good stress test for physical plausibility.

Temporal Coherence

The scene has to remember itself. A fork cannot become a finger. A noodle cannot become part of the face. The plate should not rearrange itself every second. Temporal coherence is the difference between “a bunch of generated frames” and “a video.” OpenAI’s Sora framing is useful here because it describes video generation as a step toward simulators of the physical world, not just prettier frame generation.

Audio-Video Alignment

Native audio makes everything harder. The sound must match the action, the material, and the timing. Crunchy spaghetti is funny because it reveals a deeper issue: audiovisual realism depends on cross-modal consistency.

Final Takeaway

Will Smith eating pasta is not a real benchmark.

It has no formal scoring system. It has no standardized setup. It is cherry-pickable, legally awkward, and scientifically messy.

The useful thing about the spaghetti test is that it does not let AI video hide behind mood, style, or spectacle. It asks for a familiar action and punishes every weak spot: the face that drifts, the hand that cheats, the fork that changes shape, the noodle that forgets how matter works, the audio that mistakes pasta for chips.

That is why the test keeps coming back. It gives regular people a way to see progress without needing a leaderboard.

The next phase of AI video will not be judged only by whether a model can make a famous man eat pasta convincingly. It will be judged by whether developers, filmmakers, creators, and businesses can reliably direct these systems with more control, clearer consent boundaries, and outputs that hold up outside a cherry-picked demo.

For now, the internet has its benchmark: a fork, a face, a plate of pasta, and three years of AI progress hiding inside the world’s dumbest stress test.

← Previous

Detailed Results of the Foundation Benchmark