Tinkering with Rust Procedural Macros

Macros are a useful tool for meta programming. Think of macros like writing code for code. For example, lets say we have some pseudocode that looks like this:

function Antelope(){}
function Bird(){}
function Cat(){}

We can write a macro with pseudocode like:

for animal in ["Antelope", "Bird", "Cat"] {
    function #animal(){}
}

This means for every animal in the array, declare an empty function block. This should directly replace the first pseudocode that was defined.

These macros are evaluated at compile time, so there's no runtime overhead for these expressions. The generated code can of course incure additional runtime overhead but thats not because of the macros. An example of a feature with runtime overhead is dynamic dispatch which can be achieved in rust with trait objects or dynamic traits. These are like interfaces in other languages and since many types can implement an interface, a vtable lookup will be done at runtime to find the correct function to call.

That being said, macros add an extra layer of complexity as they don't abide by the same rules as code. Eg. They have slightly different syntax, They receive a TokenStream as an argument and return the same type. This is because macros are processed after the abstract syntax tree is constructed so they have access to the tokens which are gotten after tokenization^[2] and they should return valid syntax in whatever language they're written in.

Proc(edural) Macros

Broadly speaking, there are two kinds of macros^[1] in rust. I'll write an example with proc macros as i think they'll give more clarity to the utility and essence of macros and they're less pain on the eye.

Derive-Proc macros use the #[derive] attribute. This attribute can be used to implement traits or in this case macros on specific types. For an example with traits:

#[derive(Debug, Default, Clone, Copy, Eq, PartialEq)]
struct Greet{
    morning: String
}

Because of the Debug trait derived on the Greet struct, we can print the contents of the struct to stdout.

let hi = Greet { morning: "hi".to_string() };
println!("{:?}", hi);

The reason why we can print the variable hi to stdout using println! (which is a function-like proc macro) is because we implement Debug. Without implementing that trait we would see something like this from the compiler

`Greet` doesn't implement `Debug`
the trait `Debug` is not implemented for `Greet`
add `#[derive(Debug)]` to `Greet` or manually `impl Debug for Greet`

This means the debug trait can be derived or implemented manually, like it is done for types in the standard library. Eg. The bool type

Mathy

I want to implement a derive proc-macro called Mathy which will expose some math related methods to its associated type. For example, if i have a struct like:

struct Count {
    morning: u32
    afternoon: u32
    evening: u32
    comment: String
}

This struct holds the number of times a person has eaten at different times in the day with a comment.

let count = Count {
    morning: 5, 
    afternoon: 0, 
    evening: 1, 
    comment: "Had so much to eat in the morning".to_string()
};

If i derive macro Mathy on this struct, i should be able to call a method sum that will sum up all the integer-like types in the struct. In this case, it'll ignore the comment field and sum up others to return 6.

The benefit of using a macro over a function in this case is that for every struct this is derived on, at compile time, the macro knows the integer-like types on the struct and computes what the sum method should be.

In this case, when Mathy is expanded, it will look like:

Impl Count {
    fn sum(&self) -> isize {
        self.morning + self.afternoon + self.evening
    }
}

To get started, we ensure the library crate is labeled as a proc-macro in Cargo.toml

[lib]
proc-macro = true

[dependencies]
quote = "1.0.37"
syn = "2.0.79"

Then we define all the integer-like types we know of in rust

const INTEGER_TYPES: [&str; 12] = [
    "u8", "u16", "u32", "u64", "u128", "usize", "i8", "i16", "i32", "i64", "i128", "isize",
];

#[proc_macro_derive(Mathy)]
pub fn derive_math_functions(item: TokenStream) -> TokenStream {}

The function derive_math_functions is what implements the macro. Its signature accepts a TokenStream and returns same.

// ...
let input = parse_macro_input!(item as DeriveInput);
let name = &input.ident;
let mut integer_fields = Vec::new();
if let Data::Struct(s) = input.data {
    if let Fields::Named(namen) = s.fields {
        for field in namen.named {
            for field_type in field.clone().ty.into_token_stream() {
                if INTEGER_TYPES.contains(&field_type.to_string().as_str()) {
                    integer_fields.push(field.clone().ident.unwrap());
                }
            }
        }
    }
}

This is us pushing all the integer-like fields on the struct to an array. Most importantly, in the block below, we implement the sum function which uses the pattern #(...)* to run for every field in the array.

// ...
let result = if integer_fields.len() > 0 {
    let first = integer_fields.remove(0);
    quote! {
        impl #name {
            pub fn sum(&self) -> isize {
                let mut result = self.#first as isize;
                #(
                    result += self.#integer_fields as isize;
                )*
                result
            }
        }
    }
}else {
    quote! {}
};

TokenStream::from(result)

Cargo Expand

With a tool called cargo-expand we can see what the macro expands to and the methods that have been exposed.

If we derive Mathy on the count struct we declared above,

#[derive(Mathy)]
struct Count {
    morning: u32
    afternoon: u32
    evening: u32
    comment: String
}

let count = Count {
    morning: 5, 
    afternoon: 0, 
    evening: 1, 
    comment: "Had so much to eat in the morning".to_string()
}

// 
println!("The sum is {}", count.sum()); // 6

To see what exactly was derived on struct Count, we run:

cargo expand

We can see that an impl Count block has been added because Mathy was derived on this struct and we can see that the sum function contains all integer-like fields in it and all this code was generated at compile time!

struct Count {
    morning: usize,
    afternoon: usize,
    evening: usize,
    comment: String,
}
// ...... ADDED BECAUSE OF THE MACRO.....
impl Count {
    pub fn sum(&self) -> isize {
        let mut result = self.morning as isize;
        result += self.afternoon as isize;
        result += self.evening as isize;
        result
    }
}

Happy Macro-ing and Mathy-ing

References

[1] https://veykril.github.io/tlborm/proc-macros/methodical.html

[2] https://en.wikipedia.org/wiki/Lexical_analysis#Tokenization