Week 8 – Memory management by C routines

Last week we concluded the series of discussions on generic functions and data types in C. All this while we intended to know more and more about the C language, the low level system mechanics and the relation between the two. Today, we begin the start to the next phase : implementation of C routines like malloc(), realloc() and free().
Before we try to know about malloc() and the other functions, its important to know about the portion of memory called the stack and likewise the heap memory. The two portions have resemblance with the respective data structures, but here only their purpose it relevent to us - as of now.

The Stack memory, or better known as the program stack, is the place where the local variables and the parameters of the functions are stored. Say we have a program like this :

int main()
int a,b;
foo1(a,b); //Do not bother about the values of a and b
return 0;

And implementation of foo1..

void foo1(int a, int b)
int x,y;

and foo2...
void foo2()
int l,m;

When foo1() is called in the main(), its not that the variables 'a' and 'b' simply go away. They are simply disabled. These parameters must be saved such that the next called functions doesn't get mess up its local variables because of the caller functions variables. This is ensured by the program stack. The stack pushes in the variables of the last function and pops them only after the function ends – last function variables in, get out first.

In the coming weeks, we get more in detail about the stack as we write assembly codes and try to understand how the compiler works. Lets move on to the heap memory.

The Heap memory is where all the dynamically allocated memory comes from. It is this heap memory which we will be talking mostly about as malloc(),realloc() and free() use this portion of memory.

So what really happens when malloc() is called?
Say we have a statement like this :
void *ptr = malloc(40);
After all these weeks programming, we'd say that malloc() gets us 40 bytes memory for us and returns the pointer to us for a reason as simple as this : how else do we access the memory assigned to us?
In reality, there's more. Actually more than 40 bytes of memory is set aside or us. This amount could be any number close to 40 (in our example) but remember - always greater than 40.Just before the this block, it reserves a few bytes, depending on the compiler, to store some information about this block of memory. Usually this happens to be the size of the block.
Note that when we talk of the internals of these functions, there umpteen number ways to do this and manage the memory. What we discuss here are the popular ones.

This header serves in freeing the memory. When free() is called to free the memory, it jumps to the header which is just behind the starting address. Here it reads the value and assumes that so many bytes had been allocated earlier and hence frees only so many bytes. If this value were to be altered, consequences could be serious as C doesnot do any internal checking if the value at the header is a legal value or not. Consider the following code :

int *Arr = (int*)malloc(10*4);
Arr[-1] = 5;

free() in this case would be able to free only 5 bytes.
Futhermore, consider the following :

int I;
int *ptr = (int*) malloc(100*4);
for( I = 0; I < 100; ++I)
ptr[I] = 10;

Simple as it is, all the elements of the array is assigned an initial value of 10. Now...

ptr = ptr + 100;

What do you think happens? Well, free() is given the address of the 25th element of the array. free() checks one block behind (24th element) and finds the value 10. It thinks only 10 bytes were allocated previously when malloc() was called. Hence, it frees just 10 bytes.

Allocation of new nodes: how does the system do this?
Well to understand this, we need to know what a linked list is.
A linked list is a data structure, first of all. In detail what a data structure is cannot been covered here – hit the search engine if you are really curious or mail me. A linked list is basically a collection of nodes that are linked together using pointers. We access linked lists using the address of the first node. Linked list are of many types and for now we just need to know about the singly linked lists – the other types do not concern us at this moment.
A singly linked list is a collection of nodes that are linked only in one direction. If you are wondering what nodes are, nodes are nothing but a block of memory (in terms of C its a structure).

What the system does is, it maintains a linked list of all the free memory blocks in the memory – each of its own size. Whenever a request is made, the malloc() traverses this linked list to find the best fit. As we know it now, malloc() always chooses a block bigger than the required amount. This help realloc(). When realloc() is called, first thing it does it to check the header to see if theres enough space to occupy the new request. If no, then a new block is searched for in the linked list of free memory blocks and on finding one, all the data is copied there. Then the older block is freed. If the current block can occupy the new request, then, only the header is updated.

Well thats it! This should suffice to lead us to system programming. Today it was mostly about the heap memory. Next week on, it will be more about the program stack. We will be working at the assembly language level, trying to understand how the C codes we write are converted to machine code. Time and again all these concepts about the program stack, Heap memory etc. will be re-visited. Stay tuned!