Aside from the latent typing, the major source of constraints on a Scheme implementation’s data representation is the garbage collector. The collector must be able to traverse every live object in the heap, to determine which objects are not live, and thus collectable.
There are many ways to implement this. Guile’s garbage collection is built on a library, the Boehm-Demers-Weiser conservative garbage collector (BDW-GC). The BDW-GC “just works”, for the most part. But since it is interesting to know how these things work, we include here a high-level description of what the BDW-GC does.
Garbage collection has two logical phases: a mark phase, in which the set of live objects is enumerated, and a sweep phase, in which objects not traversed in the mark phase are collected. Correct functioning of the collector depends on being able to traverse the entire set of live objects.
In the mark phase, the collector scans the system’s global variables and the local variables on the stack to determine which objects are immediately accessible by the C code. It then scans those objects to find the objects they point to, and so on. The collector logically sets a mark bit on each object it finds, so each object is traversed only once.
When the collector can find no unmarked objects pointed to by marked objects, it assumes that any objects that are still unmarked will never be used by the program (since there is no path of dereferences from any global or local variable that reaches them) and deallocates them.
In the above paragraphs, we did not specify how the garbage collector finds the global and local variables; as usual, there are many different approaches. Frequently, the programmer must maintain a list of pointers to all global variables that refer to the heap, and another list (adjusted upon entry to and exit from each function) of local variables, for the collector’s benefit.
The list of global variables is usually not too difficult to maintain, since global variables are relatively rare. However, an explicitly maintained list of local variables (in the author’s personal experience) is a nightmare to maintain. Thus, the BDW-GC uses a technique called conservative garbage collection, to make the local variable list unnecessary.
The trick to conservative collection is to treat the C stack as an ordinary range of memory, and assume that every word on the C stack is a pointer into the heap. Thus, the collector marks all objects whose addresses appear anywhere in the C stack, without knowing for sure how that word is meant to be interpreted.
In addition to the stack, the BDW-GC will also scan static data sections. This means that global variables are also scanned when looking for live Scheme objects.
Obviously, such a system will occasionally retain objects that are actually garbage, and should be freed. In practice, this is not a problem, as the set of conservatively-scanned locations is fixed; the Scheme stack is maintained apart from the C stack, and is scanned precisely (as opposed to conservatively). The GC-managed heap is also partitioned into parts that can contain pointers (such as vectors) and parts that can’t (such as bytevectors), limiting the potential for confusing a raw integer with a pointer to a live object.
Interested readers should see the BDW-GC web page at http://www.hboehm.info/gc/, for more information on conservative GC in general and the BDW-GC implementation in particular.