Snake Game

Path finding vs ML algorithms on the Snake Game

Project in a Nutshell

This interactive project turns the classic Snake arcade game into a research sandbox for graph search, real-time heuristics and safe-navigation techniques.

  • Compare Random, BFS and A* (Hilbert Path).
  • Visualise path-planning decisions live.
  • Inspect complexity, connectivity filters & loop-breaking.
  • Execute a pre-trained Q-Learning policy for immediate in-browser gameplay.

Click the demo card to run and switch agents on the fly ⚑️

This is a demo of a fully functional project that you can find in my github.

Interactive Snake Demo

Score: 0

Formal Problem Statement & Q-Table

We model Snake as a Markov Decision Process (MDP) \(\mathcal{M} = (S, A, P, R, \gamma)\) where:

  • States \(S = \{0,\dots,127\}\): each state encodes the direction to the fruit (8 values) and 4 β€œdanger” bits (walls/self-traps), so \(|S| = 8\times16 = 128\).
  • Actions \(A = \{\uparrow,\rightarrow,\downarrow,\leftarrow\}\).
  • Dynamics \(P(s' \mid s,a)\) is deterministic under the game rules and safety checks.
  • Reward \(R(s,a)\): \(+1\) on eating fruit, \(0\) on normal move, and \(-1\) on collision.
  • Discount factor \(\gamma = 0.95\).
  • Here, \(m=10\) and \(n=10\) so \(|V| = m\times n = 100\) grid cells, \(|S|=8\times16=128\) encoded states, and \(|A|=4\) possible actions.

After offline Q-learning, we obtain a table \(Q: S\times A \to \mathbb{R}\) of shape (128, 4). A sample row (stateΒ 0):

  
    

Algorithmic Pipeline

Random

Uniform legal move → baseline.

Breadth-First Search

Shortest path, but ignorant of future body movement.

A* with Hilbert Heuristic
  • Pre-calculates a Hilbert path that fills the whole space, incentives the agent to follow this path when no better alternative is found.
  • When choosing a path, it checks if it will trap itself following the given path.
  • Visited-state set breaks cyclic wandering.
Q-Learning

Uses the offline-trained Q-table to select \(\displaystyle a = \arg\max_{a'} Q(s,a')\) per state. Zero runtime searchβ€”pure table lookup.

Q-Learning Agent Definition

The Q-Learning agent uses the pre-trained Q-table to act:

  1. Observe current state \(s\in S\).
  2. Select action \(\displaystyle a = \arg\max_{a'\in A} Q(s,a')\).
  3. Execute \(a\) in the game loop; no further online updates.

This allows the browser to run the learned policy instantly, without any additional computation.

Complexity & Performance

Agent Time/step Space
Random \( \Theta(1) \) \( \Theta(1) \)
BFS \( \Theta(|V|) \) \( \Theta(|V|) \)
A* \( \Theta(|V|\log|V|) \) \( \Theta(|V|) \)
Q-Learning \( \Theta(1) \) \( \Theta(|S|\times|A|) \)

On a 10 Γ— 10 board A* expands ≀ 30 nodes on average, so the demo stays above 60 FPS.

Key Skills Demonstrated

  • Formal modelling of grid MDP.
  • Heuristic search & Hilbert tie-breaking.
  • Connectivity-based safety analysis.
  • Real-time Canvas rendering.
  • Responsive frosted-glass UI.