diff --git a/Makefile b/Makefile
index a9bda5d9718bb657895c38729082cd1586570439..89778a3c5d7bfc9da0dd3388da0471e70df59ccd 100644
--- a/Makefile
+++ b/Makefile
@@ -14,7 +14,7 @@ chibicc: $(OBJS)
 $(OBJS): chibicc.h
 
 test/%.exe: chibicc test/%.c
-	./chibicc -Itest -c -o test/$*.o test/$*.c
+	./chibicc -Iinclude -Itest -c -o test/$*.o test/$*.c
 	$(CC) -o $@ test/$*.o -xc test/common
 
 test: $(TESTS)
@@ -28,10 +28,9 @@ test-all: test test-stage2
 stage2/chibicc: $(OBJS:%=stage2/%)
 	$(CC) $(CFLAGS) -o $@ $^ $(LDFLAGS)
 
-stage2/%.o: chibicc self.py %.c
+stage2/%.o: chibicc %.c
 	mkdir -p stage2/test
-	./self.py chibicc.h $*.c > stage2/$*.c
-	./chibicc -c -o stage2/$*.o stage2/$*.c
+	./chibicc -c -o $(@D)/$*.o $*.c
 
 stage2/test/%.exe: stage2/chibicc test/%.c
 	mkdir -p stage2/test
diff --git a/README.md b/README.md
index 60eb40476584c58cd1d5e659b634d1b3d5a7ebea..a02d7f468e2e179eb81aa9015cddc06ba352fc86 100644
--- a/README.md
+++ b/README.md
@@ -1 +1,123 @@
-This is the reference implementation of https://www.sigbus.info/compilerbook.
+# chibicc: A Teaching C Compiler
+
+chibicc is a C compiler for educational purposes. I wrote it with the
+following goals in mind:
+
+- Simple: The compiler should be as simple as possible to help the
+  reader understand how it works.
+
+- Small: The compiler should be small enough to be covered in a
+  semester.
+
+- Demonstrating an incremental approach: It's git history should start
+  from a minimal compiler implementation, and the compiler should gain
+  one feature at a time with a small incremental patch. That should
+  help the reader understand how to write a large program from
+  scratch, which requires a different kind of skill set than writing a
+  patch for an existing large project.
+
+- Correctness: It should correctly capture the semantics of the major
+  but obscure C language features, such as "usual arithmetic
+  conversion" or "arrays decay into pointers".
+
+- Completeness: While the compiler doesn't have to support all C
+  language features, it should be able to compile nontrivial programs
+  including itself.
+
+I believe all the above goals are met. chibicc's source code is in my
+opinion small and pretty easy to read, and not only the current state
+of the code but _every commit_ was written with readability in
+mind. The first commit is a minimalistic compiler that compiles an
+integer to a program that exits with the given number as the exit
+code. Then I added operators (e.g. `+` or `-`), local variable,
+control structures (e.g. `if` or `while`), function call, function
+definition, global variable, and other language features one at a
+time. As the compiler gained features with a series of small patches,
+the language the compiler accepts looked more and more like the real C
+language.
+
+When I found a bug in a previous commit, I edited the commit by
+rewriting the git history instead of creating a new commit. This is an
+unusual and undesirable development style for most projects, but for
+my purpose, keeping clean commit history is more important than
+avoiding git forced-pushes.
+
+chibicc's internal design was carefully chosen to naturally support
+the core C language semantics. It supports many C language features
+including the preprocessor. chibicc is written in C and can compile
+itself. I didn't try to avoid certain C features when writing this
+compiler for ease of self-hosting, so I can say that it can compile at
+least one ordinary C program.
+
+Being said that, there are many missing features. They are left as an
+exercise for the reader.
+
+## Internals
+
+chibicc consists of the following stages:
+
+- Tokenize: A tokenizer takes a string as an input, breaks it into
+  a list of tokens and returns them.
+
+- Preprocess: A preprocessor takes as an input a list of tokens and
+  output a new list of macro-expanded tokens. It interprets
+  preprocessor directives while expanding macros.
+
+- Parse: A recursive descendent parser constructs abstract syntax trees
+  from the output of the preprocessor. It also adds a type to each
+  AST node.
+
+- Codegen: A code generator emits an assembly text for given AST nodes.
+
+Currently, there's no optimization pass, but there's a plan to add one
+to elimnate obvious inefficiencies in the chibicc's output.
+
+Note that chibicc allocates memory using malloc() but never calls
+free(). Once memory is allocated, it won't be released until the
+process exits.  This may look like an odd design choice, and perhaps
+it is, but in practice this memory management policy (or the lack of
+thereof) works well for short-lived programs like chibicc. This design
+eliminates all scaffolding and complexity of manual memory management
+and makes the compiler much simpler than it would otherwise have been.
+If the memory consumption becomes a real issue, you can plug in [Boehm
+GC](https://en.wikipedia.org/wiki/Boehm_garbage_collector) for
+automatic memory management.
+
+## Book
+
+I'm writing an online book about the C compiler. The draft is
+available at https://www.sigbus.info/compilerbook, though currently it
+is in Japanese. I have a plan to translate to English once it's
+complete.
+
+## About the Author
+
+I'm Rui Ueyama. I'm the creator of [8cc](https://github.com/rui314/8cc),
+which is a hobby C compiler, and also the original creator of the
+current version of [LLVM lld](https://lld.llvm.org) linker, which is a
+production-quality linker used by various operating systems and
+large-scale build systems.
+
+## References
+
+- [tcc](https://bellard.org/tcc/): A small C compiler written by
+  Fabrice Bellard. I learned a lot from this compiler, but the design
+  of tcc and chibicc are largely different. In particular, tcc is a
+  one-pass compiler, while chibicc is a multi-pass one.
+
+- [lcc](https://github.com/drh/lcc): Another small C compiler. The
+  creators wrote a
+  [book](https://sites.google.com/site/lccretargetablecompiler/) about
+  the internals of lcc, which I found a good resource to see how a
+  compiler is implemented.
+
+- [An Incremental Approach to Compiler
+  Construction](http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf)
+
+## Project ideas
+
+- Add missing features
+- Port to different ISAs such as RISC-V
+- Rewrite the compiler in a different language than C
+- Use LLVM as a backend
+- Add optimization passes
diff --git a/self.py b/self.py
deleted file mode 100755
index 919985757b2d5713e42ee79b852a47e62cb4df15..0000000000000000000000000000000000000000
--- a/self.py
+++ /dev/null
@@ -1,109 +0,0 @@
-#!/usr/bin/python3
-import re
-import sys
-
-print("""
-typedef signed char int8_t;
-typedef short int16_t;
-typedef int int32_t;
-typedef long int64_t;
-
-typedef unsigned char uint8_t;
-typedef unsigned short uint16_t;
-typedef unsigned int uint32_t;
-typedef unsigned long uint64_t;
-
-typedef unsigned long size_t;
-
-typedef struct FILE FILE;
-extern FILE *stdin;
-extern FILE *stdout;
-extern FILE *stderr;
-
-typedef struct {
-  int gp_offset;
-  int fp_offset;
-  void *overflow_arg_area;
-  void *reg_save_area;
-} __va_elem;
-
-typedef __va_elem va_list[1];
-
-struct stat {
-  char _[512];
-};
-
-void *malloc(long size);
-void *calloc(long nmemb, long size);
-void *realloc(void *buf, long size);
-int *__errno_location();
-char *strerror(int errnum);
-FILE *fopen(char *pathname, char *mode);
-long fread(void *ptr, long size, long nmemb, FILE *stream);
-int fclose(FILE *fp);
-int feof(FILE *stream);
-static void assert() {}
-int strcmp(char *s1, char *s2);
-int strncasecmp(char *s1, char *s2);
-int printf(char *fmt, ...);
-int sprintf(char *buf, char *fmt, ...);
-int fprintf(FILE *fp, char *fmt, ...);
-int vfprintf(FILE *fp, char *fmt, va_list ap);
-long strlen(char *p);
-int strncmp(char *p, char *q);
-void *memcpy(char *dst, char *src, long n);
-char *strdup(char *p);
-char *strndup(char *p, long n);
-char *strdup(char *p);
-int isspace(int c);
-int ispunct(int c);
-int isdigit(int c);
-int isxdigit(int c);
-char *strstr(char *haystack, char *needle);
-char *strchr(char *s, int c);
-double strtod(char *nptr, char **endptr);
-static void va_end(va_list ap) {}
-long strtoul(char *nptr, char **endptr, int base);
-void exit(int code);
-char *basename(char *path);
-char *strrchr(char *s, int c);
-int unlink(char *pathname);
-int mkstemp(char *template);
-int close(int fd);
-int fork(void);
-int execvp(char *file, char **argv);
-void _exit(int code);
-int wait(int *wstatus);
-int atexit(void (*)(void));
-FILE *open_memstream(char **ptr, size_t *sizeloc);
-char *dirname(char *path);
-char *strncpy(char *dest, char *src, long n);
-int stat(char *pathname, struct stat *statbuf);
-int stat(char *pathname, struct stat *statbuf);
-char *dirname(char *path);
-char *basename(char *path);
-char *strrchr(char *s, int c);
-int unlink(char *pathname);
-int mkstemp(char *template);
-int close(int fd);
-int fork(void);
-int execvp(char *file, char **argv);
-void _exit(int code);
-int wait(int *wstatus);
-int atexit(void (*)(void));
-""")
-
-for path in sys.argv[1:]:
-    with open(path) as file:
-        s = file.read()
-        s = re.sub(r'\\\n', '', s)
-        s = re.sub(r'^\s*#.*', '', s, flags=re.MULTILINE)
-        s = re.sub(r'\bbool\b', '_Bool', s)
-        s = re.sub(r'\berrno\b', '*__errno_location()', s)
-        s = re.sub(r'\btrue\b', '1', s)
-        s = re.sub(r'\bfalse\b', '0', s)
-        s = re.sub(r'\bNULL\b', '0', s)
-        s = re.sub(r'\bva_start\(([^)]*),([^)]*)\)', '*(\\1)=*(__va_elem*)__va_area__', s)
-        s = re.sub(r'\bunreachable\b', 'error', s)
-        s = re.sub(r'\bMIN\(([^)]*),([^)]*)\)', '((\\1)<(\\2)?(\\1):(\\2))', s)
-        print(s)