The techniques used in compiler construction are widely applicable to day-to-day software development and employ a diverse and interesting set of data structures and algorithms. Unfortunately, novice level material aimed at teaching compiler construction techniques is notably thin and the mainstay textbooks cover a lot of technical theory which is simply not necessary knowledge for the average amateur compiler writer.
Over a series of posts, I’m going to discuss the basic concepts of compiler construction in a way that should allow a beginner to understand what’s going on and hopefully apply it to their own project. A back-of-napkin plan for the series goes like –
- Basic components of a compiler and how the interoperate. Language specification and what to expect out of the project compiler, planning ahead
- Introduction to lexical scanning/tokenising – developing a simple, extensible scanner based on finite state automata, error handling
- Parsing – developing a recursive decent parser, error handling/recovery, scannerless parsers, how recursive decent relates to language grammar
- Code generation – designing a virtual stack machine language, outputting opcodes, discussion about efficiency
- Running generated code – virtual machine, shortcut to compiled code using C and macros
- Advanced concepts – register machine code, optimisation, library calls
I don’t necessarily expect each of those list items to only take a single post, some will, some won’t, and how far down the rabbit hole of advanced topics we go hasn’t yet been decided.
Obviously, as per the title, I’ll be coding in Python. This isn’t required, however the compiler will be based on object-oriented principles – compiler construction is a poster child for this development approach – so it would would make sense to stick to an OO language.
That’s all for now, I have already written the code for the first three parts of the series, so we should be off to a start pretty soon.