You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tree/dataframe/src/RDataFrame.cxx
+27-5Lines changed: 27 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -1264,11 +1264,30 @@ When Filters are employed, some variations might not pass the selection cuts (li
1264
1264
In that case, RDataFrame will snapshot the filtered columns in a memory-efficient way by writing zero into the memory of fundamental types, or write a
1265
1265
default-constructed object in case of classes. If none of the filters pass like in row 6, the entire event is omitted from the snapshot.
1266
1266
1267
-
To tell apart a genuine `0` (like `x` in row 0) from a variation that didn't pass the selection, RDataFrame writes a bitmask for each event, indicating which variations
1268
-
are valid (see last column). A mapping of column names to this bitmask is placed in the same file as the output dataset, and automatically loaded when
1269
-
RDataFrame opens a file that was snapshot with variations.
1270
-
Attempting to read such missing values with RDataFrame will produce an error, but RDataFrame can either skip these values or fill in defaults as
1271
-
described in the \ref missing-values "section on dealing with missing values".
1267
+
To tell apart a genuine `0` (like `x` in row 0) from a case where nominal or variation didn't pass a selection,
1268
+
RDataFrame writes a bitmask for each event, see last column of the table above. Every bit indicates whether its
1269
+
associated columns are valid. The bitmask is implemented as a 64-bit `std::bitset` in memory, written to the output
1270
+
dataset as a `std::uin64_t`. For every 64 columns, a new bitmask column is added to the output dataset.
1271
+
1272
+
For each column that gets varied, the nominal and all variation columns are each assigned a bit to denote whether their
1273
+
entries are valid. A mapping of column names to the corresponding bitmask is placed in the same file as the output
1274
+
dataset, with a name that follows the pattern `"R_rdf_branchToBitmaskMapping_<NAME_OF_THE_DATASET>"`. It is of type
1275
+
`std::unordered_map<std::string, std::pair<std::string, unsigned int>>`, and maps a column name to the name of the
1276
+
bitmask column and the index of the relevant bit. For example, in the same file as the dataset "Events" there would be
1277
+
an object named `R_rdf_branchToBitmaskMapping_Events`. This object for example would describe a connection such as:
1278
+
1279
+
~~~
1280
+
muon_pt --> (R_rdf_mask_Events_0, 42)
1281
+
~~~
1282
+
1283
+
which means that the validity of the entries in `muon_pt` is established by the bit `42` in the bitmask found in the
1284
+
column `R_rdf_mask_Events_0`.
1285
+
1286
+
When RDataFrame opens a file, it checks for the existence of this mapping between columns and bitmasks, and loads it automatically if found. As such,
1287
+
RDataFrame makes the treatment of the various bitmap maskings completely transparent to the user.
1288
+
1289
+
In case certain values are labeled invalid by the corresponding bit, this will result in reading a missing value. The semantics of such a scenario follow the
1290
+
rules described in the \ref missing-values "section on dealing with missing values" and can be dealt with accordingly.
1272
1291
1273
1292
\note Snapshot with variations is currently restricted to single-threaded TTree snapshots.
1274
1293
@@ -1780,6 +1799,9 @@ more of its entries. For example:
1780
1799
- When joining different datasets horizontally according to some index value
1781
1800
(e.g. the event number), if the index does not find a match in one or more
1782
1801
other datasets for a certain entry.
1802
+
- If, for a certain event, a column is invalid because it results from a Snapshot
1803
+
with systematic variations, and that variation didn't pass its filters. For
1804
+
more details, see \ref snapshot-with-variations.
1783
1805
1784
1806
For example, suppose that column "y" does not have a value for entry 42:
0 commit comments