Hand-Crafted, Machine-Made: How we make learning games with AI

By Ben Goldsmith

Hand-Crafted, Machine-Made: How we make learning games with AI

Falling into the pit and dying, but for math

In Super Mario Bros, you don’t get a text box saying “You pressed the jump button too early.” You fall into the pit and die. That’s feedback you can feel.

0:00
/0:04

Since Brilliant’s early days, we’ve been chasing that same visceral learning experience – the kind where understanding clicks through doing, not reading.

When ChatGPT hit the scene, there was a massive wave of AI tutoring chatbots – but we decided to stay on the sidelines. Our own internal product testing convinced us that this wasn’t the way forward, even though we were impressed with the explanations they were able to give. Hallucinations were certainly a problem, but there was something more that didn’t feel right…

We kept coming back to Mario. Text explanations, no matter how perfect, just aren’t our goal. We want learners to develop intuition through interaction, to build understanding through experimentation, and to have fun. We want them to feel the math.

Creating tactile learning for 10,000 concepts

Creating these tactile learning moments is hard. Really hard. Take our Scientific Thinking course: there comes a moment when you drag weights in place, and suddenly understand how forces and moments balance. That flash of insight is the result of careful game design.

We spend weeks designing the core game for each topic, and even more time building the levels, aiming for that perfect difficulty curve that keeps you in flow.

Let’s break down what it takes to create just one course – say, an introductory course on core pre-algebra concepts:

  • 50+ core concepts to teach
  • 20+ problems per concept
  • → ~1,000+ individual problems per course

That last number kept us up at night. You can’t just explain a concept once and move on. You need enough variations to let learners truly master each idea, enough edge cases to build real understanding, and enough of a ramp in difficulty to create that perfect learning curve.

Designing the right game and sequence of concepts, so that learning feels like flow, is the fun part. But then you need to make a thousand carefully calibrated problems. And that part is a lot less fun – and it takes a long time.

We want to eventually teach all of STEM through games. This will require covering thousands of concepts, and we’ll never get there if we’re spending 90% of our time configuring each level by hand.

This is where AI has proved invaluable for us. Instead of using AI to explain concepts to students, we’re using it to help us build better learning games, faster.

90% design, 10% technical implementation

We’ve been on this journey since 2019, with early experiments on GPT-2 and GPT-3. They didn’t quite work, but they did inform how we built our systems. We wanted to be ready when the models got smarter.

GPT-3.5 was a huge turning point. It was the first time we could actually use natural language to interact with our game engine. But even then, the outputs of the models were... less than stellar. Our early prompts produced janky, unusable assets. The gears wouldn’t mesh. The circuits weren’t circuiting. Sometimes, the problems were completely unsolvable.

0:00
/0:12

The longer you look at it, the worse it gets...

But we kept iterating. We built chains of prompts, redesigned the APIs for our games, and even resorted to begging the models to work. And over time, as the models have gotten smarter and we’ve built better tools, we’ve seen dramatic improvements.

Take our gear train puzzles: Surprisingly, thinking about how a gear train works is quite tricky for an LLM. In the 48 hours before we published this post, we went from a 0% to 93% success rate in generating creative and correct puzzles. The breakthrough didn’t come from upgrading to r1 or o3 — it came from making the representations in our game engine more LLM-friendly. We’re adding new game types to our LLM workflows every day.

Crucially, the human is still the creative director. Our course authors are still firmly in the driver’s seat – they’re responsible for the learning objective, the progression, and the “aha moment” they’re trying to create. At Brilliant, we live or die on the quality of our learning games. And AI can’t meet the bar for level design. The AI just handles the technical implementation – and it’s great at that.

Below is our asset generation pipeline in action. Learning designers describe the type of problem they want, and in seconds the AI tool generates an interactive puzzle and solution that is ready to tweak, playtest, or publish.

We’ve found that this AI tool lets us sketch out lots of variations while we focus on the creative direction. Don’t like the specific layout? Hit regenerate. Want to tweak something? The configuration is right there so a designer can go in and make it just right.

Once we have a level design that we’re happy with, we always want to make a few more practice sets just like it, so that users can make sure they’ve really mastered the idea before moving on. We have a different tool that makes variations on the puzzles while keeping the same learning objective.

0:00
/0:23

Our pipeline for generating variants of practice sets.

The AI handles the heavy lifting of implementation, but always under human creative direction. Every generated problem also goes through multiple rounds of human review for design and correctness. In learning, correctness isn’t just nice to have – it’s essential. A single wrong problem can shake a learner’s confidence or reinforce misconceptions.

These workflows have been game-changing for our authors. All of the asset configuration that took hours per lesson now happens in minutes. They’re now free to spend time experimenting with different sequences, prototyping new ideas, and hunting for those Mario moments where learning happens through play.

Obsessed with the intersection of AI and games for learning? Join us.

We’re laser-focused on building a product where you learn by doing, not reading. An AI at Brilliant needs to speak to the learner in rich, interactive games – not text.

If you’re excited about using AI to make games for learning – we’d love to talk. We’re looking for engineers and learning designers who are passionate about building tools that make learning feel like play.

Check out some of our latest games, like Solving Equations, Thinking in Code, Visual Algebra, or Scientific Thinking. Check out our open positions at brilliant.org/careers. Join us in reimagining what education can be in the age of AI.