skip to content
Scott's Ramblings building Chesterton's Fence
Photo by Annie Spratt / Unsplash

All Along the Calltower

/ 12 min read

In my last post, we explored the native stack and how it manages function calls and local variables in compiled languages. I elided a bunch of other detail with some hand-wavey “here be dragons!” talk. In this post we’re going to dive into one of those details - now that we’ve got the stack and a way to jump into and return from functions, how do we pass their arguments and return values?

Functions can take all sorts of things as arguments - big, and small - as well as essentially an arbitrary number of them. So - where do we put them?

Let’s dive in!

Calling Conventions?

The very short answer is we follow a calling convention - a description of how we can use CPU registers and the stack to encode function arguments. When we share compiled code between libraries and languages we need them all to be callable in the same way so that the code plays together nicely.

We’re going to be looking at AAPCS64 (the ARM 64-bit calling convention), if only because I have a Mac handy. If we ignore Windows 1 the other popular ABI these days is System V AMD64. In practice these are both very similar, and for this post, I want to give you a feel for how they work in general, so the differences aren’t so important!

Calling Conventions!

No arguments

Let’s start at the absolute simplest case: a function that takes no arguments at all.

To the side is the disassembly so we can dive into the calling convention; don’t be horrified — it’s straightforward once you know what you’re looking for. If you’re playing along at home, you can use objdump -d to produce this from a binary yourself. Note - this isn’t the whole diassembly, just the bits that are involved in the call.

// ...
int z = no_args();

Now, let’s have a look at no_args itself - how’s it returning that value?

int no_args() {
return 0;
}

There’s a bit to take in here, but it establishes the baseline for our calling convention:

  • A register is used to store where to return control to when a function completes (x30)
  • We physically call the function by jumping execution to the function’s address in the application’s memory
  • A register is used to store the return value from a function (x0)
  • On AArch64, x0-x30 are the 64 bit general purpose registers, and w0-w30 are just their lower 32 bit halves; we often see these mixed into the same assembly

That’s it - we can see how we move control between functions and how we return a value. So - what about passing arguments?

Simple arguments

Let’s start with simple arguments:

To the side is the relevant bits of the disassembly; note that as we go along, I’m omitting bits that we’ve covered above, and focussing on what is new:

// ...
int mixed = simple_func(1, 2, 3.0, 4.0);

Now, let’s see what simple_func does:

int simple_func(long a, long b, double x, double y) {
return (int)(a + b + (long)(x + y));
}

A bit more to take in here, but straightforward enough:

  • Small function arguments go into registers
  • Different registers are used for integer and floating point values
  • We have 8 registers of each type

An obvious question is - what happens when we run out of registers?

Lots of arguments

Let’s work it out by looking at how we can call func_ints, a function that takes 9 integer arguments:

// Conspicuously has 9 arguments!
int result = func_ints(1, 2, 3, 4, 5, 6, 7, 8, 9);

Now, let’s see what func_ints does:

int func_ints(int a, int b, int c, int d, int e, int f, int g, int h, int i) {
return a + b + c + d + e + f + g + h + i;
}

So we can see that once we exhaust the scratch registers allocated for passing arguments w0-w7 we store additional arguments on the stack, in the frame of the caller. This is called spilling.

It should be clear now why we need a calling convention - there’s no “universally obvious” way of describing how a function should be called - we’re balancing performance - registers are much faster than memory - against the use of limited resources.

A calling convention also defines who must preserve register values across a call — for instance, on AArch64 per the AAPCS64 convention:

  • Caller-saved registers may be freely overwritten by the callee; the caller must save them if it cares.
  • Callee-saved registers must be restored by the callee before

Small structs

But what about when we pass a struct? Let’s look at func_small, which takes and returns a small structure by value.

typedef struct Small {
int a;
double b;
} Small;
// ...
Small s2 = func_small(s, 5);

Now, let’s see what func_small does:

Small func_small(Small s, int x) {
s.a += x;
s.b *= 2.0;
return s;
}

We see:

  • Small structs are passed by, and split across, registers
  • Small structs are returned in registers in the same fashion
  • “small” means 16 bytes or less

Sooo … what happens if we try pass a big struct?

Big structs

Let’s try:

typedef struct Large {
char data[64];
int len;
} Large;
// ...
Large l2 = func_large(l, 42);

There’s a great deal of shuffling things back and forth between registers and stack in the unoptimised assembler of func_large, so I’ve omitted it in the interests of brevity.

The rules for big structs are already clear:

  • The caller allocates space for the result and passes its address in x8
  • The callee writes the result there directly, then returns normally
  • if a struct exceeds 16 bytes, we pass a pointer to the data in the stack, in the caller’s frame, instead.

Footnotes

  1. I find ignoring windows to be generally a good practice. Windows 11: what even is that?