Previous: Implementation of pthread_barrier
, Up: Multithreaded programming (threads.h) [Contents][Index]
The POSIX Threads functions offered in the C library are very low-level and offer a great range of control over the properties of the threads. So if you are interested in customizing your tools for complicated thread applications, it is strongly encouraged to get a nice familiarity with them. Some resources were introduced in Multithreaded programming (threads.h).
However, in many cases used in astronomical data analysis, you do not need communication between threads and each target operation can be done independently.
Since such operations are very common, Gnuastro provides the tools below to facilitate the creation and management of jobs without any particular knowledge of POSIX Threads for such operations.
The most interesting high-level functions of this section are the gal_threads_number
and gal_threads_spin_off
that identify the number of threads on the system and spin-off threads.
You can see a demonstration of using these functions in Library demo - multi-threaded operation.
struct
: gal_threads_params ¶Structure keeping the parameters of each thread.
When each thread is created, a pointer to this structure is passed to it.
The params
element can be the pointer to a structure defined by the user which contains all the necessary parameters to pass onto the worker function.
The rest of the elements within this structure are set internally by gal_threads_spin_off
and are relevant to the worker function.
struct gal_threads_params { size_t id; /* Id of this thread. */ void *params; /* User-identified pointer. */ size_t *indexs; /* Target indices given to this thread. */ pthread_barrier_t *b; /* Barrier for all threads. */ };
size_t
()
¶Return the number of threads that the operating system has available for your program. This number is usually fixed for a single machine and does not change. So this function is useful when you want to run your program on different machines (with different CPUs).
void
(void *(*worker)(void *)
, void *caller_params
, size_t numactions
, size_t numthreads
, size_t minmapsize
, int quietmmap
)
¶Distribute numactions
jobs between numthreads
threads and spin-off each thread by calling the worker
function.
The caller_params
pointer will also be passed to worker
as part of the gal_threads_params
structure.
For a fully working example of this function, please see Library demo - multi-threaded operation.
If there are many jobs (millions or billions) to organize, memory issues may become important.
With minmapsize
you can specify the minimum byte-size to allocate the necessary space in a memory-mapped file or alternatively in RAM.
If quietmmap
is non-zero, then a warning will be printed upon creating a memory-mapped file.
For more on Gnuastro’s memory management, see Memory management.
void
(pthread_attr_t *attr
, pthread_barrier_t *b
, size_t limit
)
¶This is a low-level function in case you do not want to use gal_threads_spin_off
.
It will initialize the general thread attribute attr
and the barrier b
with limit
threads to wait behind the barrier.
For maximum efficiency, the threads initialized with this function will be detached.
Therefore no communication is possible between these threads and in particular pthread_join
will not work on these threads.
You have to use the barrier constructs to wait for all threads to finish.
char *
(size_t numactions
, size_t numthreads
, size_t minmapsize
, int quietmmap
, size_t **indexs
, size_t *icols
)
¶This is a low-level function in case you do not want to use gal_threads_spin_off
.
The job of this function is to distribute numactions
jobs/actions in numthreads
threads.
To do this, it will assign each job an ID, ranging from 0 to numactions
-1.
The output is the allocated *indexs
array and the *icols
number.
In memory, it is just a simple 1D array that has numthreads
\(\times\) *icols
elements.
But you can visualize it as a 2D array with numthreads
rows and *icols
columns.
For more on the logic of the distribution, see below.
When you have millions/billions of jobs to distribute, indexs
will become very large.
For memory management (when to use a memory-mapped file, and when to use RAM), you need to specify the minmapsize
and quietmmap
arguments.
For more on memory management, see Memory management.
In general, if your distributed jobs will not be on the scale of billions (and you want everything to always be written in RAM), just set minmapsize=-1
and quietmmap=1
.
When indexs
is actually in a memory-mapped file, this function will return a string containing the name of the file (that you can later give to gal_pointer_mmap_free
to free/delete).
When indexs
is in RAM, this function will return a NULL
pointer.
So after you are finished with indexs
, you can free it like this:
char *mmapname; int quietmmap=1; size_t *indexs, thrdcols; size_t numactions=5000, minmapsize=-1; size_t numthreads=gal_threads_number(); /* Distribute the jobs. */ mmapname=gal_threads_dist_in_threads(numactions, numthreads, minmapsize, quietmmap, &indexs, &thrdcols); /* Do any processing you want... */ /* Free the 'indexs' array. */ if(mmapname) gal_pointer_mmap_free(&mmapname, quietmmap); else free(indexs);
Here is a brief description of the reasoning behind the indexs
array and how the jobs are distributed.
Let’s assume you have \(A\) actions (where there is only one function and the input values differ for each action) and \(T\) threads available to the system with \(A>T\) (common values for these two would be \(A>1000\) and \(T<10\)).
Spinning off a thread is not a cheap job and requires a significant number of CPU cycles.
Therefore, creating \(A\) threads is not the best way to address such a problem.
The most efficient way to manage the actions is such that only \(T\) threads are created, and each thread works on a list of actions identified for it in series (one after the other).
This way your CPU will get all the actions done with minimal overhead.
The purpose of this function is to do what we explained above: each row in the indexs
array contains the indices of actions which must be done by one thread (so it has numthreads
rows with *icols
columns).
However, when using indexs
, you do not have to know the number of columns.
It is guaranteed that all the rows finish with GAL_BLANK_SIZE_T
(see Library blank values (blank.h)).
The GAL_BLANK_SIZE_T
macro plays a role very similar to a string’s \0
: every row finishes with this macro, so can easily stop parsing the indexes in the row as soon as you confront GAL_BLANK_SIZE_T
.
For some real examples, please see the example program in tests/lib/multithread.c for a demonstration.
Previous: Implementation of pthread_barrier
, Up: Multithreaded programming (threads.h) [Contents][Index]
JavaScript license information
GNU Astronomy Utilities 0.23 manual, July 2024.