Skip to content

Commit 3ae8a66

Browse files
committed
[df] Add more docs to the Snapshot with variations section
1 parent 80a7b45 commit 3ae8a66

File tree

1 file changed

+27
-5
lines changed

1 file changed

+27
-5
lines changed

tree/dataframe/src/RDataFrame.cxx

Lines changed: 27 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1264,11 +1264,30 @@ When Filters are employed, some variations might not pass the selection cuts (li
12641264
In that case, RDataFrame will snapshot the filtered columns in a memory-efficient way by writing zero into the memory of fundamental types, or write a
12651265
default-constructed object in case of classes. If none of the filters pass like in row 6, the entire event is omitted from the snapshot.
12661266
1267-
To tell apart a genuine `0` (like `x` in row 0) from a variation that didn't pass the selection, RDataFrame writes a bitmask for each event, indicating which variations
1268-
are valid (see last column). A mapping of column names to this bitmask is placed in the same file as the output dataset, and automatically loaded when
1269-
RDataFrame opens a file that was snapshot with variations.
1270-
Attempting to read such missing values with RDataFrame will produce an error, but RDataFrame can either skip these values or fill in defaults as
1271-
described in the \ref missing-values "section on dealing with missing values".
1267+
To tell apart a genuine `0` (like `x` in row 0) from a case where nominal or variation didn't pass a selection,
1268+
RDataFrame writes a bitmask for each event, see last column of the table above. Every bit indicates whether its
1269+
associated columns are valid. The bitmask is implemented as a 64-bit `std::bitset` in memory, written to the output
1270+
dataset as a `std::uin64_t`. For every 64 columns, a new bitmask column is added to the output dataset.
1271+
1272+
For each column that gets varied, the nominal and all variation columns are each assigned a bit to denote whether their
1273+
entries are valid. A mapping of column names to the corresponding bitmask is placed in the same file as the output
1274+
dataset, with a name that follows the pattern `"R_rdf_branchToBitmaskMapping_<NAME_OF_THE_DATASET>"`. It is of type
1275+
`std::unordered_map<std::string, std::pair<std::string, unsigned int>>`, and maps a column name to the name of the
1276+
bitmask column and the index of the relevant bit. For example, in the same file as the dataset "Events" there would be
1277+
an object named `R_rdf_branchToBitmaskMapping_Events`. This object for example would describe a connection such as:
1278+
1279+
~~~
1280+
muon_pt --> (R_rdf_mask_Events_0, 42)
1281+
~~~
1282+
1283+
which means that the validity of the entries in `muon_pt` is established by the bit `42` in the bitmask found in the
1284+
column `R_rdf_mask_Events_0`.
1285+
1286+
When RDataFrame opens a file, it checks for the existence of this mapping between columns and bitmasks, and loads it automatically if found. As such,
1287+
RDataFrame makes the treatment of the various bitmap maskings completely transparent to the user.
1288+
1289+
In case certain values are labeled invalid by the corresponding bit, this will result in reading a missing value. The semantics of such a scenario follow the
1290+
rules described in the \ref missing-values "section on dealing with missing values" and can be dealt with accordingly.
12721291
12731292
\note Snapshot with variations is currently restricted to single-threaded TTree snapshots.
12741293
@@ -1780,6 +1799,9 @@ more of its entries. For example:
17801799
- When joining different datasets horizontally according to some index value
17811800
(e.g. the event number), if the index does not find a match in one or more
17821801
other datasets for a certain entry.
1802+
- If, for a certain event, a column is invalid because it results from a Snapshot
1803+
with systematic variations, and that variation didn't pass its filters. For
1804+
more details, see \ref snapshot-with-variations.
17831805
17841806
For example, suppose that column "y" does not have a value for entry 42:
17851807

0 commit comments

Comments
 (0)