5.4.5 Multi-Threading

Guile can be used in multi-threaded programs just as well as in single-threaded ones.

Each thread that wants to use functions from libguile must put itself into guile mode and must then follow a few rules. If it doesn’t want to honor these rules in certain situations, a thread can temporarily leave guile mode (but can no longer use libguile functions during that time, of course).

Threads enter guile mode by calling scm_with_guile, scm_boot_guile, or scm_init_guile. As explained in the reference documentation for these functions, Guile will then learn about the stack bounds of the thread and can protect the SCM values that are stored in local variables. When a thread puts itself into guile mode for the first time, it gets a Scheme representation and is listed by all-threads, for example.

Threads in guile mode can block (e.g., do blocking I/O) without causing any problems5; temporarily leaving guile mode with scm_without_guile before blocking slightly improves GC performance, though. For some common blocking operations, Guile provides convenience functions. For example, if you want to lock a pthread mutex while in guile mode, you might want to use scm_pthread_mutex_lock which is just like pthread_mutex_lock except that it leaves guile mode while blocking.

All libguile functions are (intended to be) robust in the face of multiple threads using them concurrently. This means that there is no risk of the internal data structures of libguile becoming corrupted in such a way that the process crashes.

A program might still produce nonsensical results, though. Taking hashtables as an example, Guile guarantees that you can use them from multiple threads concurrently and a hashtable will always remain a valid hashtable and Guile will not crash when you access it. It does not guarantee, however, that inserting into it concurrently from two threads will give useful results: only one insertion might actually happen, none might happen, or the table might in general be modified in a totally arbitrary manner. (It will still be a valid hashtable, but not the one that you might have expected.) Guile might also signal an error when it detects a harmful race condition.

Thus, you need to put in additional synchronizations when multiple threads want to use a single hashtable, or any other mutable Scheme object.

When writing C code for use with libguile, you should try to make it robust as well. An example that converts a list into a vector will help to illustrate. Here is a correct version:

SCM
my_list_to_vector (SCM list)
{
  SCM vector = scm_make_vector (scm_length (list), SCM_UNDEFINED);
  size_t len, i;

  len = scm_c_vector_length (vector);
  i = 0;
  while (i < len && scm_is_pair (list))
    {
      scm_c_vector_set_x (vector, i, scm_car (list));
      list = scm_cdr (list);
      i++;
    }

  return vector;
}

The first thing to note is that storing into a SCM location concurrently from multiple threads is guaranteed to be robust: you don’t know which value wins but it will in any case be a valid SCM value.

But there is no guarantee that the list referenced by list is not modified in another thread while the loop iterates over it. Thus, while copying its elements into the vector, the list might get longer or shorter. For this reason, the loop must check both that it doesn’t overrun the vector and that it doesn’t overrun the list. Otherwise, scm_c_vector_set_x would raise an error if the index is out of range, and scm_car and scm_cdr would raise an error if the value is not a pair.

It is safe to use scm_car and scm_cdr on the local variable list once it is known that the variable contains a pair. The contents of the pair might change spontaneously, but it will always stay a valid pair (and a local variable will of course not spontaneously point to a different Scheme object).

Likewise, a vector such as the one returned by scm_make_vector is guaranteed to always stay the same length so that it is safe to only use scm_c_vector_length once and store the result. (In the example, vector is safe anyway since it is a fresh object that no other thread can possibly know about until it is returned from my_list_to_vector.)

Of course the behavior of my_list_to_vector is suboptimal when list does indeed get asynchronously lengthened or shortened in another thread. But it is robust: it will always return a valid vector. That vector might be shorter than expected, or its last elements might be unspecified, but it is a valid vector and if a program wants to rule out these cases, it must avoid modifying the list asynchronously.

Here is another version that is also correct:

SCM
my_pedantic_list_to_vector (SCM list)
{
  SCM vector = scm_make_vector (scm_length (list), SCM_UNDEFINED);
  size_t len, i;

  len = scm_c_vector_length (vector);
  i = 0;
  while (i < len)
    {
      scm_c_vector_set_x (vector, i, scm_car (list));
      list = scm_cdr (list);
      i++;
    }

  return vector;
}

This version relies on the error-checking behavior of scm_car and scm_cdr. When the list is shortened (that is, when list holds a non-pair), scm_car will throw an error. This might be preferable to just returning a half-initialized vector.

The API for accessing vectors and arrays of various kinds from C takes a slightly different approach to thread-robustness. In order to get at the raw memory that stores the elements of an array, you need to reserve that array as long as you need the raw memory. During the time an array is reserved, its elements can still spontaneously change their values, but the memory itself and other things like the size of the array are guaranteed to stay fixed. Any operation that would change these parameters of an array that is currently reserved will signal an error. In order to avoid these errors, a program should of course put suitable synchronization mechanisms in place. As you can see, Guile itself is again only concerned about robustness, not about correctness: without proper synchronization, your program will likely not be correct, but the worst consequence is an error message.

Real thread-safety often requires that a critical section of code is executed in a certain restricted manner. A common requirement is that the code section is not entered a second time when it is already being executed. Locking a mutex while in that section ensures that no other thread will start executing it, blocking asyncs ensures that no asynchronous code enters the section again from the current thread, and the error checking of Guile mutexes guarantees that an error is signaled when the current thread accidentally reenters the critical section via recursive function calls.

Guile provides two mechanisms to support critical sections as outlined above. You can either use the macros SCM_CRITICAL_SECTION_START and SCM_CRITICAL_SECTION_END for very simple sections; or use a dynwind context together with a call to scm_dynwind_critical_section.

The macros only work reliably for critical sections that are guaranteed to not cause a non-local exit. They also do not detect an accidental reentry by the current thread. Thus, you should probably only use them to delimit critical sections that do not contain calls to libguile functions or to other external functions that might do complicated things.

The function scm_dynwind_critical_section, on the other hand, will correctly deal with non-local exits because it requires a dynwind context. Also, by using a separate mutex for each critical section, it can detect accidental reentries.


Footnotes

(5)

In Guile 1.8, a thread blocking in guile mode would prevent garbage collection to occur. Thus, threads had to leave guile mode whenever they could block. This is no longer needed with Guile 2.x.