Overview

Contents

Overview#

Introduction to Interpretability for Language Models is a three-week crash course on interpretability and language modeling. The course starts with an introduction to Python, moves on to language models in natural language processing, and ends with a week on large language models like BERT and the GPT series.

Syllabus#

Day

Title

Topics

1

Getting Started

terminal, environment managment, Python introduction

2

Python Basics

control flow, data structures, functions

3

Data Analysis in Python

tabular data, plotting, corpus analytics

4

N-gram Models

n-grams, probability models, sampling

5

Vectorization

the document-term matrix, weighting, classification

6

Vector Space Semantics

semantic spaces, vector operations, static embeddings

7

Introduction to LLMs

subwords, model architectures, dynamic embeddings

8

BERT

fine tuning, model evaluation, SHAP values

9

GPT

next token prediction, reverse engineering, activation patching

Data#

As of this writing (July 2024), a zipped data directory for the course, dtl_2024.zip, may be found at tylershoemaker.info/data. Download this file, move it to the location on your computer where you’ll be working from, and unzip it.