N-tuples¶
This chapter describes functions for creating and manipulating ntuples, sets of values associated with events. The ntuples are stored in files. Their values can be extracted in any combination and booked in a histogram using a selection function.
The values to be stored are held in a user-defined data structure, and an ntuple is created associating this data structure with a file. The values are then written to the file (normally inside a loop) using the ntuple functions described below.
A histogram can be created from ntuple data by providing a selection function and a value function. The selection function specifies whether an event should be included in the subset to be analyzed or not. The value function computes the entry to be added to the histogram for each event.
All the ntuple functions are defined in the header file
gsl_ntuple.h
.
The ntuple struct¶
-
type gsl_ntuple¶
Ntuples are manipulated using the
gsl_ntuple
struct. This struct contains information on the file where the ntuple data is stored, a pointer to the current ntuple data row and the size of the user-defined ntuple data struct:typedef struct { FILE * file; void * ntuple_data; size_t size; } gsl_ntuple;
Creating ntuples¶
-
gsl_ntuple *gsl_ntuple_create(char *filename, void *ntuple_data, size_t size)¶
This function creates a new write-only ntuple file
filename
for ntuples of sizesize
and returns a pointer to the newly created ntuple struct. Any existing file with the same name is truncated to zero length and overwritten. A pointer to memory for the current ntuple rowntuple_data
must be supplied—this is used to copy ntuples in and out of the file.
Opening an existing ntuple file¶
-
gsl_ntuple *gsl_ntuple_open(char *filename, void *ntuple_data, size_t size)¶
This function opens an existing ntuple file
filename
for reading and returns a pointer to a corresponding ntuple struct. The ntuples in the file must have sizesize
. A pointer to memory for the current ntuple rowntuple_data
must be supplied—this is used to copy ntuples in and out of the file.
Writing ntuples¶
-
int gsl_ntuple_write(gsl_ntuple *ntuple)¶
This function writes the current ntuple
ntuple->ntuple_data
of sizentuple->size
to the corresponding file.
-
int gsl_ntuple_bookdata(gsl_ntuple *ntuple)¶
This function is a synonym for
gsl_ntuple_write()
.
Reading ntuples¶
-
int gsl_ntuple_read(gsl_ntuple *ntuple)¶
This function reads the current row of the ntuple file for
ntuple
and stores the values inntuple->data
.
Closing an ntuple file¶
-
int gsl_ntuple_close(gsl_ntuple *ntuple)¶
This function closes the ntuple file
ntuple
and frees its associated allocated memory.
Histogramming ntuple values¶
Once an ntuple has been created its contents can be histogrammed in
various ways using the function gsl_ntuple_project()
. Two
user-defined functions must be provided, a function to select events and
a function to compute scalar values. The selection function and the
value function both accept the ntuple row as a first argument and other
parameters as a second argument.
-
type gsl_ntuple_select_fn¶
The selection function determines which ntuple rows are selected for histogramming. It is defined by the following struct:
typedef struct { int (* function) (void * ntuple_data, void * params); void * params; } gsl_ntuple_select_fn;
The struct component
function
should return a non-zero value for each ntuple row that is to be included in the histogram.
-
type gsl_ntuple_value_fn¶
The value function computes scalar values for those ntuple rows selected by the selection function:
typedef struct { double (* function) (void * ntuple_data, void * params); void * params; } gsl_ntuple_value_fn;
In this case the struct component
function
should return the value to be added to the histogram for the ntuple row.
-
int gsl_ntuple_project(gsl_histogram *h, gsl_ntuple *ntuple, gsl_ntuple_value_fn *value_func, gsl_ntuple_select_fn *select_func)¶
This function updates the histogram
h
from the ntuplentuple
using the functionsvalue_func
andselect_func
. For each ntuple row where the selection functionselect_func
is non-zero the corresponding value of that row is computed using the functionvalue_func
and added to the histogram. Those ntuple rows whereselect_func
returns zero are ignored. New entries are added to the histogram, so subsequent calls can be used to accumulate further data in the same histogram.
Examples¶
The following example programs demonstrate the use of ntuples in
managing a large dataset. The first program creates a set of 10,000
simulated “events”, each with 3 associated values . These
are generated from a Gaussian distribution with unit variance, for
demonstration purposes, and written to the ntuple file test.dat
.
#include <gsl/gsl_ntuple.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
struct data
{
double x;
double y;
double z;
};
int
main (void)
{
const gsl_rng_type * T;
gsl_rng * r;
struct data ntuple_row;
int i;
gsl_ntuple *ntuple
= gsl_ntuple_create ("test.dat", &ntuple_row,
sizeof (ntuple_row));
gsl_rng_env_setup ();
T = gsl_rng_default;
r = gsl_rng_alloc (T);
for (i = 0; i < 10000; i++)
{
ntuple_row.x = gsl_ran_ugaussian (r);
ntuple_row.y = gsl_ran_ugaussian (r);
ntuple_row.z = gsl_ran_ugaussian (r);
gsl_ntuple_write (ntuple);
}
gsl_ntuple_close (ntuple);
gsl_rng_free (r);
return 0;
}
The next program analyses the ntuple data in the file test.dat
.
The analysis procedure is to compute the squared-magnitude of each
event, , and select only those which exceed a
lower limit of 1.5. The selected events are then histogrammed using
their values.
#include <math.h>
#include <gsl/gsl_ntuple.h>
#include <gsl/gsl_histogram.h>
struct data
{
double x;
double y;
double z;
};
int sel_func (void *ntuple_data, void *params);
double val_func (void *ntuple_data, void *params);
int
main (void)
{
struct data ntuple_row;
gsl_ntuple *ntuple
= gsl_ntuple_open ("test.dat", &ntuple_row,
sizeof (ntuple_row));
double lower = 1.5;
gsl_ntuple_select_fn S;
gsl_ntuple_value_fn V;
gsl_histogram *h = gsl_histogram_alloc (100);
gsl_histogram_set_ranges_uniform(h, 0.0, 10.0);
S.function = &sel_func;
S.params = &lower;
V.function = &val_func;
V.params = 0;
gsl_ntuple_project (h, ntuple, &V, &S);
gsl_histogram_fprintf (stdout, h, "%f", "%f");
gsl_histogram_free (h);
gsl_ntuple_close (ntuple);
return 0;
}
int
sel_func (void *ntuple_data, void *params)
{
struct data * data = (struct data *) ntuple_data;
double x, y, z, E2, scale;
scale = *(double *) params;
x = data->x;
y = data->y;
z = data->z;
E2 = x * x + y * y + z * z;
return E2 > scale;
}
double
val_func (void *ntuple_data, void *params)
{
(void)(params); /* avoid unused parameter warning */
struct data * data = (struct data *) ntuple_data;
double x, y, z;
x = data->x;
y = data->y;
z = data->z;
return x * x + y * y + z * z;
}
Fig. 15 shows the distribution of the selected events. Note the cut-off at the lower bound.
References and Further Reading¶
Further information on the use of ntuples can be found in the documentation for the CERN packages PAW and HBOOK (available online).