The typechk phase performs semantic analyis and validation, turning the generic parse AST into a typed AST of legal C code. It keeps the namespacing convention the rest of the project uses by having a short prefix for all public types, in this case td
(for typed)
It is mostly a mapping over the parser AST, but not all types are 1:1. For example, a declaration in the AST is:
struct ast_declaration {
struct ast_declaration_specifier_list specifier_list;
struct ast_init_declarator_list declarator_list;
struct text_span span;
};
However, post typing, it looks like this:
struct td_declaration {
enum td_storage_class_specifier storage_class_specifier;
enum td_function_specifier_flags function_specifier_flags;
// this field can be ignored! it is only needed for C11 anonymous structs & unions
struct td_var_ty base_ty;
size_t num_var_declarations;
struct td_var_declaration *var_declarations;
};
struct td_var_declaration {
enum td_var_declaration_ty ty;
struct td_var_ty var_ty;
struct td_var var;
struct td_init *init;
unsigned long long bitfield_width;
};
The declaration & type specifiers have been split into a more canonical format which makes analysis and building the IR more simple. Storage and function specifiers are explicitly stored, and then each individual struct td_var_declaration
contains the type of the declaration, the initializer expression, and bitfield width if applicable.
This phase is also responsible for folding all expressions which are constant at compile time. For example, an array size is a expression within the parse AST, but here it is finalised into the numeric value of the array size, or rejected with a diagnostic if it is not a valid constant expression.
The majority of diagnostics, such as variables not existing, invalid casts, calling things that are not functions, etc, are emitted here. It also performs some canonicalisation of the AST so that the IR builder does not need to concern itself with intricacies of the C spec. In C, both arr[10]
and 10[arr]
are legal, but typechk will ensure that the left hand side of an array-access expression is the pointer type and the right hand side is the offset (this is legal as there is no sequencing point within an array-access expression).
Debugging
Similarly to the parse stage, typechk has the the ability to prettyprint its AST.
You can compare the TD AST to the parse AST from Parse. Note that all expressions and variables are typed
PRINTING TD
DECLARATION
STORAGE CLASS SPECIFIER
NONE
VARS
VAR DECLARATION
VARIABLE 'printf'
SCOPE GLOBAL
TYPE
VARIADIC FUNC (
POINTER TO
signed char
) RETURNS
int
FUNCTION DEFINITION
STORAGE CLASS
NONE
VAR DECLARATION
VARIABLE 'main'
SCOPE GLOBAL
TYPE
UNSPECIFIED FUNC (
) RETURNS
int
BODY
COMPOUND STATEMENT:
EXPRESSION
int
COMPOUND EXPRESSION:
EXPRESSION
int
CALL
TARGET
EXPRESSION
VARIADIC FUNC (
POINTER TO
signed char
) RETURNS
int
VARIABLE 'printf'
SCOPE GLOBAL
ARGLIST:
EXPRESSION
CONST
POINTER TO
char
CONSTANT "Hello, World!\n"
RETURN
EXPRESSION
int
CONSTANT 0