Snake Game

Path finding vs ML algorithms on the Snake Game

Project in a Nutshell

This interactive project turns the classic Snake arcade game into a research sandbox for graph search, real-time heuristics and safe-navigation techniques.

Compare Random, BFS and A* (Hilbert Path).
Visualise path-planning decisions live.
Inspect complexity, connectivity filters & loop-breaking.
Execute a pre-trained Q-Learning policy for immediate in-browser gameplay.

Click the demo card to run and switch agents on the fly ⚡️

This is a demo of a fully functional project that you can find in my github.

Interactive Snake Demo

Score: 0

Choose agent:

Formal Problem Statement & Q-Table

We model Snake as a Markov Decision Process (MDP) \(\mathcal{M} = (S, A, P, R, \gamma)\) where:

States \(S = \{0,\dots,127\}\): each state encodes the direction to the fruit (8 values) and 4 “danger” bits (walls/self-traps), so \(|S| = 8\times16 = 128\).
Actions \(A = \{\uparrow,\rightarrow,\downarrow,\leftarrow\}\).
Dynamics \(P(s' \mid s,a)\) is deterministic under the game rules and safety checks.
Reward \(R(s,a)\): \(+1\) on eating fruit, \(0\) on normal move, and \(-1\) on collision.
Discount factor \(\gamma = 0.95\).

Here, \(m=10\) and \(n=10\) so \(|V| = m\times n = 100\) grid cells, \(|S|=8\times16=128\) encoded states, and \(|A|=4\) possible actions.

After offline Q-learning, we obtain a table \(Q: S\times A \to \mathbb{R}\) of shape (128, 4). A sample row (state 0):

Algorithmic Pipeline

Random

Uniform legal move → baseline.

Breadth-First Search

Shortest path, but ignorant of future body movement.

A* with Hilbert Heuristic

Pre-calculates a Hilbert path that fills the whole space, incentives the agent to follow this path when no better alternative is found.
When choosing a path, it checks if it will trap itself following the given path.
Visited-state set breaks cyclic wandering.

Q-Learning

Uses the offline-trained Q-table to select \(\displaystyle a = \arg\max_{a'} Q(s,a')\) per state. Zero runtime search—pure table lookup.

Q-Learning Agent Definition

The Q-Learning agent uses the pre-trained Q-table to act:

Observe current state \(s\in S\).
Select action \(\displaystyle a = \arg\max_{a'\in A} Q(s,a')\).
Execute \(a\) in the game loop; no further online updates.

This allows the browser to run the learned policy instantly, without any additional computation.

Complexity & Performance

Agent	Time/step	Space
Random	\( \Theta(1) \)	\( \Theta(1) \)
BFS	\( \Theta(\|V\|) \)	\( \Theta(\|V\|) \)
A*	\( \Theta(\|V\|\log\|V\|) \)	\( \Theta(\|V\|) \)
Q-Learning	\( \Theta(1) \)	\( \Theta(\|S\|\times\|A\|) \)

On a 10 × 10 board A* expands ≤ 30 nodes on average, so the demo stays above 60 FPS.

Key Skills Demonstrated

Formal modelling of grid MDP.
Heuristic search & Hilbert tie-breaking.
Connectivity-based safety analysis.
Real-time Canvas rendering.
Responsive frosted-glass UI.