Snake Game
Path finding vs ML algorithms on the Snake Game
Project in a Nutshell
This interactive project turns the classic Snake arcade game into a research sandbox for graph search, real-time heuristics and safe-navigation techniques.
- Compare Random, BFS and A* (Hilbert Path).
- Visualise path-planning decisions live.
- Inspect complexity, connectivity filters & loop-breaking.
- Execute a pre-trained Q-Learning policy for immediate in-browser gameplay.
Click the demo card to run and switch agents on the fly β‘οΈ
This is a demo of a fully functional project that you can find in my github.
Interactive Snake Demo
Formal Problem Statement & Q-Table
We model Snake as a Markov Decision Process (MDP) \(\mathcal{M} = (S, A, P, R, \gamma)\) where:
- States \(S = \{0,\dots,127\}\): each state encodes the direction to the fruit (8 values) and 4 βdangerβ bits (walls/self-traps), so \(|S| = 8\times16 = 128\).
- Actions \(A = \{\uparrow,\rightarrow,\downarrow,\leftarrow\}\).
- Dynamics \(P(s' \mid s,a)\) is deterministic under the game rules and safety checks.
- Reward \(R(s,a)\): \(+1\) on eating fruit, \(0\) on normal move, and \(-1\) on collision.
- Discount factor \(\gamma = 0.95\).
Here, \(m=10\) and \(n=10\) so \(|V| = m\times n = 100\) grid cells, \(|S|=8\times16=128\) encoded states, and \(|A|=4\) possible actions.
After offline Q-learning, we obtain a table
\(Q: S\times A \to \mathbb{R}\) of shape (128, 4).
A sample row (stateΒ 0):
Algorithmic Pipeline
Random
Uniform legal move → baseline.
Breadth-First Search
Shortest path, but ignorant of future body movement.
A* with Hilbert Heuristic
- Pre-calculates a Hilbert path that fills the whole space, incentives the agent to follow this path when no better alternative is found.
- When choosing a path, it checks if it will trap itself following the given path.
- Visited-state set breaks cyclic wandering.
Q-Learning
Uses the offline-trained Q-table to select \(\displaystyle a = \arg\max_{a'} Q(s,a')\) per state. Zero runtime searchβpure table lookup.
Q-Learning Agent Definition
The Q-Learning agent uses the pre-trained Q-table to act:
- Observe current state \(s\in S\).
- Select action \(\displaystyle a = \arg\max_{a'\in A} Q(s,a')\).
- Execute \(a\) in the game loop; no further online updates.
This allows the browser to run the learned policy instantly, without any additional computation.
Complexity & Performance
| Agent | Time/step | Space |
|---|---|---|
| Random | \( \Theta(1) \) | \( \Theta(1) \) |
| BFS | \( \Theta(|V|) \) | \( \Theta(|V|) \) |
| A* | \( \Theta(|V|\log|V|) \) | \( \Theta(|V|) \) |
| Q-Learning | \( \Theta(1) \) | \( \Theta(|S|\times|A|) \) |
On a 10 Γ 10 board A* expands β€ 30 nodes on average, so the demo stays above 60 FPS.
Key Skills Demonstrated
- Formal modelling of grid MDP.
- Heuristic search & Hilbert tie-breaking.
- Connectivity-based safety analysis.
- Real-time Canvas rendering.
- Responsive frosted-glass UI.