JAX experiments on tiny character-level language models with clean baselines.
Goal
Study how capacity and training choices affect small character-level language models, using a consistent pipeline and reproducible baselines.
My Contributions
- End-to-end script - Built a single runner to execute all experiments and log results
- Data pipeline - Implemented fixed-length context/target sampling from a character corpus
- Report - Produced plots, timing tables, and sample generations in the project report
Experiments
- Regression warm-up - Quadratic fit with signed gradient descent
- Baselines - Constant and linear context models with cross-entropy loss
- Nonlinear models - Single-hidden-layer MLP and a two-layer ReLU variant
- Generation - Sampled text from each model to compare structure and coherence
Technical Stack
- Python, Jupyter - Experimentation and tooling
- JAX - Model training
- NumPy, Matplotlib - Data handling and plots