Unions

Unions are getting more and more rare as the years have shown that they are quite dangerous to use; especially the C variant that does not have a selector field to indicate which of the union’s variants are valid. Some may still have a legacy reason to use unions. In fact, LLVM does not support unions at all:

union Foo
{
    int a;
    char *b;
    double c;
};

Foo Union;

Becomes this when run through clang++:

%union.Foo = type { double }
@Union = %union.Foo { 0.0 }

What happened here? Where did the other union members go? The answer is that in LLVM there are no unions; there are only structs that can be cast into whichever type the front-end want to cast the struct into. So to access the above union from LLVM IR, you’d use the bitcast instruction to cast a pointer to the “union” into whatever pointer you’d want it to be:

%1 = bitcast %union.Foo* @Union to i32*
store i32 1, i32* %1
%2 = bitcast %union.Foo* @Union to i8**
store i8* null, i8** %2

This may seem strange, but the truth is that a union is nothing more than a piece of memory that is being accessed using different implicit pointer casts. There is no type-safety when dealing with unions.

If you want to support unions in your front-end language, you should simply allocate the total size of the union (i.e. the size of the largest member) and then generate code to reinterpret the allocated memory as needed.

The cleanest approach might be to simply allocate a range of bytes (i8), possibly with alignment padding at the end, and then cast whenever you access the structure. That way you’d be sure you did everything properly all the time.

Tagged Unions

When dealing with unions in C, one typically adds another field that signals the content of the union, since accidently interpreting the bytes of a double as a char*, can have disastrous consequences.

Many modern programming languages feature type-safe tagged unions. Rust has enum types, that can optionally contain values. C++ has the variant type since C++17.

Consider the following short rust program, that defines an enum type that can hold three different primitive types.

enum Foo {
    ABool(bool),
    AInteger(i32),
    ADouble(f64),
}

fn main() {
    let x = Foo::AInteger(42);
    let y = Foo::ADouble(1337.0);
    let z = Foo::ABool(true);

    if let Foo::ABool(b) = x {
        println!("A boolean! {}", b)
    }
    if let Foo::ABool(b) = y {
        println!("A boolean! {}", b)
    }
    if let Foo::ABool(b) = z {
        println!("A boolean! {}", b)
    }
}

rustc generates something similar to the following LLVM IR to initialize the Foo variables.

; basic type definition
%Foo = type { i8, [8 x i8] }
; Variants of Foo
%Foo_ABool = type { i8, i8 }       ; tagged with 0
%Foo_AInteger = type { i8, i32 }   ; tagged with 1
%Foo_ADouble = type { i8, double } ; tagged with 2

; allocate the first Foo
%z = alloca %Foo
; pointer to the first element of type i8 (the tag)
%0 = getelementptr inbounds %Foo, %Foo* %x, i32 0, i32 0
; set tag to '1'
store i8 1, i8* %0
; bitcast Foo to the right Foo variant
%1 = bitcast %Foo* %x to %Foo_AInteger*
; store the constant '42'
%2 = getelementptr inbounds %Foo_AInteger, %Foo_AInteger* %1, i32 0, i32 1
store i32 42, i32* %2

; allocate and initialize the second Foo
%y = alloca %Foo
%3 = getelementptr inbounds %Foo, %Foo* %y, i32 0, i32 0
; this time the tag is '2'
store i8 2, i8* %3
; cast to variant and store double constant
%4 = bitcast %Foo* %y to %Foo_ADouble*
%5 = getelementptr inbounds %Foo_ADouble, %Foo_ADouble* %4, i32 0, i32 1
store double 1.337000e+03, double* %5

To check whether the given Foo object is a certain variant, the tag must be retrieved and compared to the desired value.

%9 = getelementptr inbounds %Foo, %Foo* %x, i32 0, i32 0
%10 = load i8, i8* %9
; check if tag is '0', which identifies the variant Foo_ABool
%11 = icmp i8 %10, 0
br i1 %11, label %bb1, label %bb2

bb1:
  ; cast to variant
  %12 = bitcast %Foo* %x to %Foo_ABool*
  ; retrieve boolean
  %13 = getelementptr inbounds %Foo_ABool, %Foo_ABool* %12, i32 0, i32 1
  %14 = load i8, i8* %13,
  %15 = trunc i8 %14 to i1
  ; <...>