9.2.3 Cheaper Pairs

However, there is yet another issue to confront. Most Scheme heaps contain more pairs than any other type of object; Jonathan Rees said at one point that pairs occupy 45% of the heap in his Scheme implementation, Scheme 48. However, our representation above spends three SCM-sized words per pair — one for the type, and two for the CAR and CDR. Is there any way to represent pairs using only two words?

Let us refine the convention we established earlier. Let us assert that:

Here is the new C code:

enum type { string, vector, ... };

typedef struct value *SCM;

struct value {
  enum type type;
  union {
    struct { int length; char *elts; } string;
    struct { int length; SCM  *elts; } vector;
    ...
  } value;
};

struct pair {
  SCM car, cdr;
};

#define POINTER_P(x) (((int) (x) & 7) == 0)

#define INTEGER_P(x)  (((int) (x) & 7) == 1)
#define GET_INTEGER(x)  ((int) (x) >> 3)
#define MAKE_INTEGER(x) ((SCM) (((x) << 3) | 1))

#define PAIR_P(x) (((int) (x) & 7) == 2)
#define GET_PAIR(x) ((struct pair *) ((int) (x) & ~7))

Notice that enum type and struct value now only contain provisions for vectors and strings; both integers and pairs have become special cases. The code above also assumes that an int is large enough to hold a pointer, which isn’t generally true.

Our list of examples is now as follows:

This change in representation reduces our heap size by 15%. It also makes it cheaper to decide if a value is a pair, because no memory references are necessary; it suffices to check the bottom two bits of the SCM value. This may be significant when traversing lists, a common activity in a Scheme system.

Again, most real Scheme systems use a slightly different implementation; for example, if GET_PAIR subtracts off the low bits of x, instead of masking them off, the optimizer will often be able to combine that subtraction with the addition of the offset of the structure member we are referencing, making a modified pointer as fast to use as an unmodified pointer.