Gnuastro's thread related functions (GNU Astronomy Utilities)

Previous: Implementation of pthread_barrier, Up: Multithreaded programming (threads.h) [Contents][Index]

The POSIX Threads functions offered in the C library are very low-level and offer a great range of control over the properties of the threads. So if you are interested in customizing your tools for complicated thread applications, it is strongly encouraged to get a nice familiarity with them. Some resources were introduced in Multithreaded programming (threads.h).

However, in many cases used in astronomical data analysis, you do not need communication between threads and each target operation can be done independently. Since such operations are very common, Gnuastro provides the tools below to facilitate the creation and management of jobs without any particular knowledge of POSIX Threads for such operations. The most interesting high-level functions of this section are the gal_threads_number and gal_threads_spin_off that identify the number of threads on the system and spin-off threads. You can see a demonstration of using these functions in Library demo - multi-threaded operation.

C struct: gal_threads_params ¶

Structure keeping the parameters of each thread. When each thread is created, a pointer to this structure is passed to it. The params element can be the pointer to a structure defined by the user which contains all the necessary parameters to pass onto the worker function. The rest of the elements within this structure are set internally by gal_threads_spin_off and are relevant to the worker function.

struct gal_threads_params
{
  size_t            id; /* Id of this thread.                  */
  void         *params; /* User-identified pointer.            */
  size_t       *indexs; /* Target indices given to this thread. */
  pthread_barrier_t *b; /* Barrier for all threads.            */
};

Function:
size_t
gal_threads_number () ¶: Return the number of threads that the operating system has available for your program. This number is usually fixed for a single machine and does not change. So this function is useful when you want to run your program on different machines (with different CPUs).

Function:
void
gal_threads_spin_off (void *(*worker)(void *), void *caller_params, size_t numactions, size_t numthreads, size_t minmapsize, int quietmmap) ¶

Distribute numactions jobs between numthreads threads and spin-off each thread by calling the worker function. The caller_params pointer will also be passed to worker as part of the gal_threads_params structure. For a fully working example of this function, please see Library demo - multi-threaded operation.

If there are many jobs (millions or billions) to organize, memory issues may become important. With minmapsize you can specify the minimum byte-size to allocate the necessary space in a memory-mapped file or alternatively in RAM. If quietmmap is non-zero, then a warning will be printed upon creating a memory-mapped file. For more on Gnuastro’s memory management, see Memory management.

Function:
void
gal_threads_attr_barrier_init (pthread_attr_t *attr, pthread_barrier_t *b, size_t limit) ¶: This is a low-level function in case you do not want to use gal_threads_spin_off. It will initialize the general thread attribute attr and the barrier b with limit threads to wait behind the barrier. For maximum efficiency, the threads initialized with this function will be detached. Therefore no communication is possible between these threads and in particular pthread_join will not work on these threads. You have to use the barrier constructs to wait for all threads to finish.

Function:
char *
gal_threads_dist_in_threads (size_t numactions, size_t numthreads, size_t minmapsize, int quietmmap, size_t **indexs, size_t *icols) ¶

This is a low-level function in case you do not want to use gal_threads_spin_off. The job of this function is to distribute numactions jobs/actions in numthreads threads. To do this, it will assign each job an ID, ranging from 0 to numactions-1. The output is the allocated *indexs array and the *icols number. In memory, it is just a simple 1D array that has numthreads \(\times\) *icols elements. But you can visualize it as a 2D array with numthreads rows and *icols columns. For more on the logic of the distribution, see below.

When you have millions/billions of jobs to distribute, indexs will become very large. For memory management (when to use a memory-mapped file, and when to use RAM), you need to specify the minmapsize and quietmmap arguments. For more on memory management, see Memory management. In general, if your distributed jobs will not be on the scale of billions (and you want everything to always be written in RAM), just set minmapsize=-1 and quietmmap=1.

When indexs is actually in a memory-mapped file, this function will return a string containing the name of the file (that you can later give to gal_pointer_mmap_free to free/delete). When indexs is in RAM, this function will return a NULL pointer. So after you are finished with indexs, you can free it like this:

char *mmapname;
int quietmmap=1;
size_t *indexs, thrdcols;
size_t numactions=5000, minmapsize=-1;
size_t numthreads=gal_threads_number();

/* Distribute the jobs. */
mmapname=gal_threads_dist_in_threads(numactions, numthreads,
                                     minmapsize, quietmmap,
                                     &indexs, &thrdcols);

/* Do any processing you want... */

/* Free the 'indexs' array. */
if(mmapname) gal_pointer_mmap_free(&mmapname, quietmmap);
else         free(indexs);

Here is a brief description of the reasoning behind the indexs array and how the jobs are distributed. Let’s assume you have \(A\) actions (where there is only one function and the input values differ for each action) and \(T\) threads available to the system with \(A>T\) (common values for these two would be \(A>1000\) and \(T<10\)). Spinning off a thread is not a cheap job and requires a significant number of CPU cycles. Therefore, creating \(A\) threads is not the best way to address such a problem. The most efficient way to manage the actions is such that only \(T\) threads are created, and each thread works on a list of actions identified for it in series (one after the other). This way your CPU will get all the actions done with minimal overhead.

The purpose of this function is to do what we explained above: each row in the indexs array contains the indices of actions which must be done by one thread (so it has numthreads rows with *icols columns). However, when using indexs, you do not have to know the number of columns. It is guaranteed that all the rows finish with GAL_BLANK_SIZE_T (see Library blank values (blank.h)). The GAL_BLANK_SIZE_T macro plays a role very similar to a string’s \0: every row finishes with this macro, so can easily stop parsing the indexes in the row as soon as you confront GAL_BLANK_SIZE_T. For some real examples, please see the example program in tests/lib/multithread.c for a demonstration.

GNU Astronomy Utilities

12.3.2.2 Gnuastro’s thread related functions ¶