Skip to content
GitLab
Explore
Sign in
Register
Primary navigation
Search or go to…
Project
C
Chibicc
Manage
Activity
Members
Labels
Plan
Issues
0
Issue boards
Milestones
Wiki
Code
Merge requests
0
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
邓家琪
Chibicc
Commits
74c0aecc
Commit
74c0aecc
authored
4 years ago
by
Rui Ueyama
Browse files
Options
Downloads
Patches
Plain Diff
Self-host: including preprocessor, chibicc can compile itself
parent
810f094f
Branches
Branches containing commit
No related merge requests found
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
Makefile
+3
-4
3 additions, 4 deletions
Makefile
README.md
+123
-1
123 additions, 1 deletion
README.md
self.py
+0
-109
0 additions, 109 deletions
self.py
with
126 additions
and
114 deletions
Makefile
+
3
−
4
View file @
74c0aecc
...
...
@@ -14,7 +14,7 @@ chibicc: $(OBJS)
$(OBJS)
:
chibicc.h
test/%.exe
:
chibicc test/%.c
./chibicc
-Itest
-c
-o
test
/
$*
.o
test
/
$*
.c
./chibicc
-Iinclude
-Itest
-c
-o
test
/
$*
.o
test
/
$*
.c
$(
CC
)
-o
$@
test
/
$*
.o
-xc
test
/common
test
:
$(TESTS)
...
...
@@ -28,10 +28,9 @@ test-all: test test-stage2
stage2/chibicc
:
$(OBJS:%=stage2/%)
$(
CC
)
$(
CFLAGS
)
-o
$@
$^
$(
LDFLAGS
)
stage2/%.o
:
chibicc
self.py
%.c
stage2/%.o
:
chibicc %.c
mkdir
-p
stage2/test
./self.py chibicc.h
$*
.c
>
stage2/
$*
.c
./chibicc
-c
-o
stage2/
$*
.o stage2/
$*
.c
./chibicc
-c
-o
$(
@D
)
/
$*
.o
$*
.c
stage2/test/%.exe
:
stage2/chibicc test/%.c
mkdir
-p
stage2/test
...
...
This diff is collapsed.
Click to expand it.
README.md
+
123
−
1
View file @
74c0aecc
This is the reference implementation of https://www.sigbus.info/compilerbook.
# chibicc: A Teaching C Compiler
chibicc is a C compiler for educational purposes. I wrote it with the
following goals in mind:
-
Simple: The compiler should be as simple as possible to help the
reader understand how it works.
-
Small: The compiler should be small enough to be covered in a
semester.
-
Demonstrating an incremental approach: It's git history should start
from a minimal compiler implementation, and the compiler should gain
one feature at a time with a small incremental patch. That should
help the reader understand how to write a large program from
scratch, which requires a different kind of skill set than writing a
patch for an existing large project.
-
Correctness: It should correctly capture the semantics of the major
but obscure C language features, such as "usual arithmetic
conversion" or "arrays decay into pointers".
-
Completeness: While the compiler doesn't have to support all C
language features, it should be able to compile nontrivial programs
including itself.
I believe all the above goals are met. chibicc's source code is in my
opinion small and pretty easy to read, and not only the current state
of the code but _every commit_ was written with readability in
mind. The first commit is a minimalistic compiler that compiles an
integer to a program that exits with the given number as the exit
code. Then I added operators (e.g.
`+`
or
`-`
), local variable,
control structures (e.g.
`if`
or
`while`
), function call, function
definition, global variable, and other language features one at a
time. As the compiler gained features with a series of small patches,
the language the compiler accepts looked more and more like the real C
language.
When I found a bug in a previous commit, I edited the commit by
rewriting the git history instead of creating a new commit. This is an
unusual and undesirable development style for most projects, but for
my purpose, keeping clean commit history is more important than
avoiding git forced-pushes.
chibicc's internal design was carefully chosen to naturally support
the core C language semantics. It supports many C language features
including the preprocessor. chibicc is written in C and can compile
itself. I didn't try to avoid certain C features when writing this
compiler for ease of self-hosting, so I can say that it can compile at
least one ordinary C program.
Being said that, there are many missing features. They are left as an
exercise for the reader.
## Internals
chibicc consists of the following stages:
-
Tokenize: A tokenizer takes a string as an input, breaks it into
a list of tokens and returns them.
-
Preprocess: A preprocessor takes as an input a list of tokens and
output a new list of macro-expanded tokens. It interprets
preprocessor directives while expanding macros.
-
Parse: A recursive descendent parser constructs abstract syntax trees
from the output of the preprocessor. It also adds a type to each
AST node.
-
Codegen: A code generator emits an assembly text for given AST nodes.
Currently, there's no optimization pass, but there's a plan to add one
to elimnate obvious inefficiencies in the chibicc's output.
Note that chibicc allocates memory using malloc() but never calls
free(). Once memory is allocated, it won't be released until the
process exits. This may look like an odd design choice, and perhaps
it is, but in practice this memory management policy (or the lack of
thereof) works well for short-lived programs like chibicc. This design
eliminates all scaffolding and complexity of manual memory management
and makes the compiler much simpler than it would otherwise have been.
If the memory consumption becomes a real issue, you can plug in
[
Boehm
GC
](
https://en.wikipedia.org/wiki/Boehm_garbage_collector
)
for
automatic memory management.
## Book
I'm writing an online book about the C compiler. The draft is
available at https://www.sigbus.info/compilerbook, though currently it
is in Japanese. I have a plan to translate to English once it's
complete.
## About the Author
I'm Rui Ueyama. I'm the creator of
[
8cc
](
https://github.com/rui314/8cc
)
,
which is a hobby C compiler, and also the original creator of the
current version of
[
LLVM lld
](
https://lld.llvm.org
)
linker, which is a
production-quality linker used by various operating systems and
large-scale build systems.
## References
-
[
tcc
](
https://bellard.org/tcc/
)
: A small C compiler written by
Fabrice Bellard. I learned a lot from this compiler, but the design
of tcc and chibicc are largely different. In particular, tcc is a
one-pass compiler, while chibicc is a multi-pass one.
-
[
lcc
](
https://github.com/drh/lcc
)
: Another small C compiler. The
creators wrote a
[
book
](
https://sites.google.com/site/lccretargetablecompiler/
)
about
the internals of lcc, which I found a good resource to see how a
compiler is implemented.
-
[
An Incremental Approach to Compiler
Construction
](
http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf
)
## Project ideas
-
Add missing features
-
Port to different ISAs such as RISC-V
-
Rewrite the compiler in a different language than C
-
Use LLVM as a backend
-
Add optimization passes
This diff is collapsed.
Click to expand it.
self.py
deleted
100755 → 0
+
0
−
109
View file @
810f094f
#!/usr/bin/python3
import
re
import
sys
print
(
"""
typedef signed char int8_t;
typedef short int16_t;
typedef int int32_t;
typedef long int64_t;
typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
typedef unsigned int uint32_t;
typedef unsigned long uint64_t;
typedef unsigned long size_t;
typedef struct FILE FILE;
extern FILE *stdin;
extern FILE *stdout;
extern FILE *stderr;
typedef struct {
int gp_offset;
int fp_offset;
void *overflow_arg_area;
void *reg_save_area;
} __va_elem;
typedef __va_elem va_list[1];
struct stat {
char _[512];
};
void *malloc(long size);
void *calloc(long nmemb, long size);
void *realloc(void *buf, long size);
int *__errno_location();
char *strerror(int errnum);
FILE *fopen(char *pathname, char *mode);
long fread(void *ptr, long size, long nmemb, FILE *stream);
int fclose(FILE *fp);
int feof(FILE *stream);
static void assert() {}
int strcmp(char *s1, char *s2);
int strncasecmp(char *s1, char *s2);
int printf(char *fmt, ...);
int sprintf(char *buf, char *fmt, ...);
int fprintf(FILE *fp, char *fmt, ...);
int vfprintf(FILE *fp, char *fmt, va_list ap);
long strlen(char *p);
int strncmp(char *p, char *q);
void *memcpy(char *dst, char *src, long n);
char *strdup(char *p);
char *strndup(char *p, long n);
char *strdup(char *p);
int isspace(int c);
int ispunct(int c);
int isdigit(int c);
int isxdigit(int c);
char *strstr(char *haystack, char *needle);
char *strchr(char *s, int c);
double strtod(char *nptr, char **endptr);
static void va_end(va_list ap) {}
long strtoul(char *nptr, char **endptr, int base);
void exit(int code);
char *basename(char *path);
char *strrchr(char *s, int c);
int unlink(char *pathname);
int mkstemp(char *template);
int close(int fd);
int fork(void);
int execvp(char *file, char **argv);
void _exit(int code);
int wait(int *wstatus);
int atexit(void (*)(void));
FILE *open_memstream(char **ptr, size_t *sizeloc);
char *dirname(char *path);
char *strncpy(char *dest, char *src, long n);
int stat(char *pathname, struct stat *statbuf);
int stat(char *pathname, struct stat *statbuf);
char *dirname(char *path);
char *basename(char *path);
char *strrchr(char *s, int c);
int unlink(char *pathname);
int mkstemp(char *template);
int close(int fd);
int fork(void);
int execvp(char *file, char **argv);
void _exit(int code);
int wait(int *wstatus);
int atexit(void (*)(void));
"""
)
for
path
in
sys
.
argv
[
1
:]:
with
open
(
path
)
as
file
:
s
=
file
.
read
()
s
=
re
.
sub
(
r
'
\\\n
'
,
''
,
s
)
s
=
re
.
sub
(
r
'
^\s*#.*
'
,
''
,
s
,
flags
=
re
.
MULTILINE
)
s
=
re
.
sub
(
r
'
\bbool\b
'
,
'
_Bool
'
,
s
)
s
=
re
.
sub
(
r
'
\berrno\b
'
,
'
*__errno_location()
'
,
s
)
s
=
re
.
sub
(
r
'
\btrue\b
'
,
'
1
'
,
s
)
s
=
re
.
sub
(
r
'
\bfalse\b
'
,
'
0
'
,
s
)
s
=
re
.
sub
(
r
'
\bNULL\b
'
,
'
0
'
,
s
)
s
=
re
.
sub
(
r
'
\bva_start\(([^)]*),([^)]*)\)
'
,
'
*(
\\
1)=*(__va_elem*)__va_area__
'
,
s
)
s
=
re
.
sub
(
r
'
\bunreachable\b
'
,
'
error
'
,
s
)
s
=
re
.
sub
(
r
'
\bMIN\(([^)]*),([^)]*)\)
'
,
'
((
\\
1)<(
\\
2)?(
\\
1):(
\\
2))
'
,
s
)
print
(
s
)
This diff is collapsed.
Click to expand it.
Preview
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment