Structs and Memory

Last night I started working an a program for making the interface between Neptune and C code easier to work with. Here, I’ll explain what makes the interface difficult in some cases and how this new tool will make it easier. Beware: long post!

The OSVM system has a really simple interface for calling external C code. You can define a Neptune method as external by using the extern keyword:

class X {
  extern int my_external_method(int a, int b);
}

When you invoke that method, the C function named my_external_method in the underlying system is located and called. Similarly, if I want to call the standard time function in C I just make an external method called time in some Neptune object and when I invoke that method, the call will go though to the time functino in C because they have the same name. You can only pass integers as arguments, and only integers can be returned from the function. Nice and simple.

Well, except that sometimes you really want to call a C function with a piece of structured data, or have it returned. For instance, you might want to use a graphics library written in C, and in that case it would be nice to be able to pass in a text string when calling the draw_string function:

void draw_string(char *str, int x, int y);

How do you get from a fancy Neptune string with bells and whistles to a character array, and how do you pass it to the function? The solution is to use Memory objects. A Memory object allows you to allocate and deallocate memory in the underlying system. It is essentially the same as malloc and free except that a memory object checks that you don’t access memory outside of the allocated area. So you can create a character array by allocating a piece of memory of the right size, copying the characters from the Neptune string into the memory area, and then finally pass the address of the memory to the external call:

String str = "Whatever...";
Memory mem = new Memory(str.size + 1); // Allocate memory
for (int i = 0; i < str.size; i++)
  mem.set_byte(i, str[i].as_integer()); // Copy characters
mem.set_byte(str.size, 0); // Null-terminate
draw_string(mem.address, 0, 0); // Perform call
mem.free(); // Free memory

We’re still just passing integers through the C call, but one of them is a pointer to the character array which contains our string. Unfortunately, since the memory is allocated in the underlying system it is outside the reach of the garbage collector so you have to do manual memory management. However in some cases, for instance with externalizing strings, there are convenience methods that free you from dealing directly with memory objects:

str.externalize_during(fun (Memory c_string) {
  draw_string(c_string.address, 0, 0);
});

In this case, string has a method which externalizes the string, invokes the given block, and then cleans up.

A character array is a pretty simple thing and the approach above solves some but not all problems with external calls. For instance, say you want to use an external C library that can use GPS to give the current position:

struct point {
  int x, y;
}

struct point *get_position();

When you call this external function from Neptune you will get the address of a C point structure. Now, you can access the contents of this piece of memory by using a memory object. Memory objects can be used in two ways: either to allocate a fresh piece of memory or to give access to a piece of memory that has already been allocated:

int address = get_position();
Memory mem = new Memory(address, 8); // using existing memory
int x = mem.get_int(0);
int y = mem.get_int(4);

Here we have to make some assumptions about how structs are implemented in C. We guess that the point takes up 8 bytes of memory and that x and y are integers starting at byte offset 0 and 4 respectively. It is probably true, but it really is just a guess. The C standard does say something about the implementation of structs, but it doesn’t define them completely. For instance, it says that x must occur before y. However, there are holes in the standard which allows different compilers to implement structs differently. Some standard structures, for instance in network code (tcp.h), use a lot of bit fields which allows you to access individual bits or groups of bits within a struct. The C standard says that a C compiler is free to decide how to implement bit fields. So if an external call returns a TCP header I may have access to the contents of the struct, but if I want to know the value of the SYN field I have no way to know where to find it. Bugger.

Maybe I’ll experiment with my compiler and figure out that it always puts SYN in word 4 as bit 21. That’s not enjoyable work, and a TCP header has more than 15 fields so that produces a lot of nasty code full of magic numbers.
Worse, the code is now tied to a particular compiler, in fact a particular version of a particular compiler since they are free to change the implementation of structs in a later version. Bleh!

As I said in the beginning, I’ve started working on a tool for making interaction easier between Neptune and C code. In fact, the tool is exactly designed to, if not solve the problem with structs, then at least make it a lot easier to deal with. The tool processes the definition of a struct and then generates a Neptune class which wraps a memory object and provides accessors for the struct’s fields. For instance, for a simple example such as the point struct from before:

class Point {

  Memory memory;

  Point(int address) {
    memory = new Memory(address, 8);
  }

  int accessor x {
    return memory.get_int(0);
  }

  int accessor x=(int new_x) {
    memory.set_int(0, new_x);
  }

  ...

}

So now you can access a point struct as it if was a real object

Point p = new Point(get_position());
int x = p.x;
p.y = 8;

without dealing with offsets and other nastiness. Of course, that doesn’t actually solve the bitfield problem, I just wanted to show how the interface worked.

The tool deals with the bitfield problem by simply asking the the compiler how the layout of a structure is. When generating the Neptune class for a structure, you must also specify which compiler you’re using. Through various trickery, the tool gets the compiler to describe the layout of the structure and uses that information to generate the class correctly. That way the tool takes care of determining, for instance, where the SYN bit is an a TCP header is, and ensures that the code for extracting and setting that bit is correct. If the compiler changes the way it lays out structures the generated code will break, but you can just run the tool again and get new working classes. It doesn’t solve all problems but it does in fact solve the problems that can be solved. And besides figuring out the layout of bit fields, it generates code that makes for really easy access to the fields, for instance setters so you can do

TCPHeader hdr = new TCPHeader();
hdc.SYN = true;
hdr.ACK = false;

The SYN=(bool v) accessor handles all the bit fiddling and masking which causes the bits to be set correctly in the underlying C structure. This means that it will behave correctly if you pass the structure to an external call (using hdr.address):

extern void process_header(int header);
...
process_header(hdr.address);

void process_header(struct tcphdr *hdr) {
  if (hdr->SYN) {
    ...
  }
}

How does the tool figure out what the layout of a structure is? It generates a C program that uses the struct, compiles it with the specified compiler, and then runs the program which prints out a description of the structures. It then uses that description when generating the Neptune classes. I can’t take credit for that idea, Kasper came up with that. For instance, if you want to know the offset of point.y you can use the offsetof macro which is defined in stddef.h. For each member, the generated C program contains a line like this one

printf("%s %s %i", "point", "y", offsetof(struct point, y));

Lo and behold, when I run the program it prints out

point y 4

One problem with this is that it doesn’t work with bit fields — I can’t do offsetof(struct tcphdr, SYN) because you can’t take the address of a bit field and that’s how offsetof is implemented. Instead, I use a little trick which I can take credit for: I simply fill the struct with zeroes, then set the field to the largest possible value it can contain, and then scan through the struct bit-by-bit to find the area where bits have been set.

struct tcphdr my_hdr;
memset(&my_hdr, 0, sizeof(my_hdr));
my_hdr.SYN = 1;
find_set_bits(&my_hdr);

Cheesy eh…

The tool is not really done but neither is the new OSVM containing Neptune. Both will be freely available later this spring…

2 Responses to Structs and Memory

Leave a Reply

Your email address will not be published. Required fields are marked *


*