Skip to content

Commit fb22b87

Browse files
committed
Improving the IO - works now with rank of arrays
1 parent db82fc5 commit fb22b87

File tree

7 files changed

+160
-108
lines changed

7 files changed

+160
-108
lines changed

docs/source/howtocite.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Please cite mpi4py-fft using
1010
year = {{2019}},
1111
title = {{Fast parallel multidimensional FFT using advanced MPI}},
1212
journal = {{Journal of Parallel and Distributed Computing}},
13-
volume = {{in press}}
13+
doi = {10.1016/j.jpdc.2019.02.006}
1414
}
1515
@electronic{mpi4py-fft,
1616
author = {{Lisandro Dalcin and Mikael Mortensen}},

docs/source/io.rst

Lines changed: 59 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -24,26 +24,31 @@ reads data in parallel. A simple example of usage is::
2424
u[:] = np.random.random(u.shape)
2525
# Store by first creating output files
2626
fields = {'u': [u], 'v': [v]}
27-
f0 = HDF5File('h5test.h5', global_shape=N, mode='w')
28-
f1 = NCFile('nctest.nc', global_shape=N, mode='w')
27+
f0 = HDF5File('h5test.h5', mode='w')
28+
f1 = NCFile('nctest.nc', mode='w')
2929
f0.write(0, fields)
3030
f1.write(0, fields)
3131
v[:] = 3
3232
f0.write(1, fields)
3333
f1.write(1, fields)
34-
# Alternatively, just use write method of each distributed array
35-
u.write('h5test.h5', 'u', step=2)
36-
v.write('h5test.h5', 'v', step=2)
37-
u.write('nctest.nc', 'u', step=2)
38-
v.write('nctest.nc', 'v', step=2)
3934

4035
Note that we are here creating two datafiles ``h5test.h5`` and ``nctest.nc``,
4136
for storing in HDF5 or NetCDF4 formats respectively. Normally, one would be
4237
satisfied using only one format, so this is only for illustration. We store
4338
the fields ``u`` and ``v`` on three different occasions,
4439
so the datafiles will contain three snapshots of each field ``u`` and ``v``.
4540

46-
The stored dataarrays can be retrieved later on::
41+
Also note that an alternative and perhaps simpler approach is to just use
42+
the ``write`` method of each distributed array::
43+
44+
u.write('h5test.h5', 'u', step=2)
45+
v.write('h5test.h5', 'v', step=2)
46+
u.write('nctest.nc', 'u', step=2)
47+
v.write('nctest.nc', 'v', step=2)
48+
49+
The two different approaches can be used on the same output files.
50+
51+
The stored dataarrays can also be retrieved later on::
4752

4853
u0 = newDistArray(T, forward_output=False)
4954
u1 = newDistArray(T, forward_output=False)
@@ -53,26 +58,33 @@ The stored dataarrays can be retrieved later on::
5358
#u0.read('nctest.nc', 'u', 0)
5459
#u1.read('nctest.nc', 'u', 1)
5560

56-
5761
Note that one does not have to use the same number of processors when
5862
retrieving the data as when they were stored.
5963

6064
It is also possible to store only parts of the, potentially large, arrays.
61-
Any chosen slice may be stored, using a *global* view of the arrays::
65+
Any chosen slice may be stored, using a *global* view of the arrays. It is
66+
possible to store both complete fields and slices in one single call by
67+
using the following appraoch::
6268

63-
f2 = HDF5File('variousfields.h5', global_shape=N, mode='w')
69+
f2 = HDF5File('variousfields.h5', mode='w')
6470
fields = {'u': [u,
6571
(u, [slice(None), slice(None), 4]),
6672
(u, [5, 5, slice(None)])],
6773
'v': [v,
6874
(v, [slice(None), 6, slice(None)])]}
6975
f2.write(0, fields)
7076
f2.write(1, fields)
71-
f2.write(2, fields)
72-
# or, using write method of field, e.g.
73-
#u.write('variousfields.h5', 'u', 0, [slice(None), slice(None), 4])
7477

75-
This will lead to an hdf5-file with groups::
78+
Alternatively, one can use the write method of each field with the ``global_slice``
79+
keyword argument::
80+
81+
u.write('variousfields.h5', 'u', 2)
82+
u.write('variousfields.h5', 'u', 2, global_slice=[slice(None), slice(None), 4])
83+
u.write('variousfields.h5', 'u', 2, global_slice=[5, 5, slice(None)])
84+
v.write('variousfields.h5', 'v', 2)
85+
v.write('variousfields.h5', 'v', 2, global_slice=[slice(None), 6, slice(None)])
86+
87+
In the end this will lead to an hdf5-file with groups::
7688

7789
variousfields.h5/
7890
├─ u/
@@ -86,41 +98,49 @@ This will lead to an hdf5-file with groups::
8698
| | ├─ 0
8799
| | ├─ 1
88100
| | └─ 2
89-
| └─ 3D/
90-
| ├─ 0
91-
| ├─ 1
92-
| └─ 2
93-
├─ v/
94-
| ├─ 2D/
95-
| | └─ slice_6_slice/
96-
| | ├─ 0
97-
| | ├─ 1
98-
| | └─ 2
99-
| └─ 3D/
100-
| ├─ 0
101-
| ├─ 1
102-
| └─ 2
103-
└─ mesh/
104-
├─ x0
105-
├─ x1
106-
└─ x2
107-
108-
Note that a mesh is stored along with all the data. This mesh can be given in
109-
two different ways when creating the datafiles:
101+
| ├─ 3D/
102+
| | ├─ 0
103+
| | ├─ 1
104+
| | └─ 2
105+
| └─ mesh/
106+
| ├─ x0
107+
| ├─ x1
108+
| └─ x2
109+
└─ v/
110+
├─ 2D/
111+
| └─ slice_6_slice/
112+
| ├─ 0
113+
| ├─ 1
114+
| └─ 2
115+
├─ 3D/
116+
| ├─ 0
117+
| ├─ 1
118+
| └─ 2
119+
└─ mesh/
120+
├─ x0
121+
├─ x1
122+
└─ x2
123+
124+
Note that a mesh is stored along with each group of data. This mesh can be
125+
given in two different ways when creating the datafiles:
110126

111127
1) A sequence of 2-tuples, where each 2-tuple contains the (origin, length)
112128
of the domain along its dimension. For example, a uniform mesh that
113129
originates from the origin, with lengths :math:`\pi, 2\pi, 3\pi`, can be
114-
given as::
130+
given when creating the output file as::
131+
132+
f0 = HDF5File('filename.h5', domain=((0, pi), (0, 2*np.pi), (0, 3*np.pi)))
133+
134+
or, using the write method of the distributed array:
115135

116-
f0 = HDF5File('filename.h5', global_shape=N, domain=((0, pi), (0, 2*np.pi), (0, 3*np.pi)))
136+
u.write('filename.h5', 'u', 0, domain=((0, pi), (0, 2*np.pi), (0, 3*np.pi)))
117137

118138
2) A sequence of arrays giving the coordinates for each dimension. For example::
119139

120140
d = (np.arange(N[0], dtype=np.float)*1*np.pi/N[0],
121141
np.arange(N[1], dtype=np.float)*2*np.pi/N[1],
122142
np.arange(N[2], dtype=np.float)*2*np.pi/N[2])
123-
f0 = HDF5File('filename.h5', global_shape=N, domain=d)
143+
f0 = HDF5File('filename.h5', domain=d)
124144

125145
With NetCDF4 the layout is somewhat different. For ``variousfields`` above,
126146
if we were using :class:`.NCFile` instead of :class:`.HDF5File`,

mpi4py_fft/distarray.py

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ class DistArray(np.ndarray):
5555
"""
5656
def __new__(cls, global_shape, subcomm=None, val=None, dtype=np.float,
5757
buffer=None, alignment=None, rank=0):
58-
if len(global_shape) < 2:
58+
if len(global_shape[rank:]) < 2:
5959
obj = np.ndarray.__new__(cls, global_shape, dtype=dtype, buffer=buffer)
6060
if buffer is None and isinstance(val, Number):
6161
obj.fill(val)
@@ -356,7 +356,7 @@ def redistribute(self, axis=None, out=None):
356356
return out
357357

358358
def write(self, filename, name='darray', step=0, global_slice=None,
359-
as_scalar=False):
359+
domain=None, as_scalar=False):
360360
"""Write snapshot ``step`` of ``self`` to file ``filename``
361361
362362
Parameters
@@ -370,6 +370,14 @@ def write(self, filename, name='darray', step=0, global_slice=None,
370370
Index used for snapshot in file.
371371
global_slice : sequence of slices or integers, optional
372372
Store only this global slice of ``self``
373+
domain : sequence, optional
374+
An optional spatial mesh or domain to go with the data.
375+
Sequence of either
376+
377+
- 2-tuples, where each 2-tuple contains the (origin, length)
378+
of each dimension, e.g., (0, 2*pi).
379+
- Arrays of coordinates, e.g., np.linspace(0, 2*pi, N). One
380+
array per dimension
373381
as_scalar : boolean, optional
374382
Whether to store rank > 0 arrays as scalars. Default is False.
375383
@@ -382,7 +390,7 @@ def write(self, filename, name='darray', step=0, global_slice=None,
382390
"""
383391
if isinstance(filename, str):
384392
writer = HDF5File if filename.endswith('.h5') else NCFile
385-
f = writer(filename, u=self, mode='a')
393+
f = writer(filename, domain=domain, mode='a')
386394
elif isinstance(filename, FileBase):
387395
f = filename
388396
field = [self] if global_slice is None else [(self, global_slice)]

mpi4py_fft/io/h5py_file.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,6 @@ class HDF5File(FileBase):
1818
----------
1919
h5name : str
2020
Name of hdf5 file to be created.
21-
mode : str, optional
22-
``r``, ``w`` or ``a`` for read, write or append. Default is ``a``.
2321
domain : sequence, optional
2422
An optional spatial mesh or domain to go with the data.
2523
Sequence of either
@@ -28,8 +26,10 @@ class HDF5File(FileBase):
2826
of each dimension, e.g., (0, 2*pi).
2927
- Arrays of coordinates, e.g., np.linspace(0, 2*pi, N). One
3028
array per dimension.
29+
mode : str, optional
30+
``r``, ``w`` or ``a`` for read, write or append. Default is ``a``.
3131
"""
32-
def __init__(self, h5name, mode='a', domain=None, **kw):
32+
def __init__(self, h5name, domain=None, mode='a', **kw):
3333
FileBase.__init__(self, domain=domain, **kw)
3434
import h5py
3535
self.filename = h5name

mpi4py_fft/io/nc_file.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,6 @@ class NCFile(FileBase):
2323
----------
2424
ncname : str
2525
Name of netcdf file to be created
26-
mode : str
27-
``r``, ``w`` or ``a`` for read, write or append. Default is ``a``.
2826
domain : Sequence, optional
2927
An optional spatial mesh or domain to go with the data.
3028
Sequence of either
@@ -33,7 +31,10 @@ class NCFile(FileBase):
3331
of each dimension, e.g., (0, 2*pi).
3432
- Arrays of coordinates, e.g., np.linspace(0, 2*pi, N). One
3533
array per dimension.
34+
mode : str
35+
``r``, ``w`` or ``a`` for read, write or append. Default is ``a``.
3636
clobber : bool, optional
37+
3738
Note
3839
----
3940
Each class instance creates one unique NetCDF4-file, with one step-counter.
@@ -42,7 +43,7 @@ class NCFile(FileBase):
4243
every 10th timestep and another every 20th timestep, then use two different
4344
class instances and as such two NetCDF4-files.
4445
"""
45-
def __init__(self, ncname, domain=None, clobber=True, mode='a', **kw):
46+
def __init__(self, ncname, domain=None, mode='a', clobber=True, **kw):
4647
FileBase.__init__(self, domain=domain, **kw)
4748
from netCDF4 import Dataset
4849
self.filename = ncname
@@ -122,7 +123,7 @@ def write(self, step, fields, **kw):
122123
it = np.argwhere(nc_t.__array__() == step)[0][0]
123124
else:
124125
nc_t[it] = step
125-
FileBase.write(self, it, fields)
126+
FileBase.write(self, it, fields, **kw)
126127
self.close()
127128

128129
def read(self, u, name, **kw):

tests/test_darray.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,12 @@
55

66
comm = MPI.COMM_WORLD
77

8+
def test_1Darray():
9+
N = (8,)
10+
z = DistArray(N, val=2)
11+
assert z[0] == 2
12+
assert z.shape == N
13+
814
def test_2Darray():
915
N = (8, 8)
1016
for subcomm in ((0, 1), (1, 0), None, Subcomm(comm, (0, 1))):
@@ -114,6 +120,7 @@ def test_newDistArray():
114120
assert a.base.rank == rank
115121

116122
if __name__ == '__main__':
123+
test_1Darray()
117124
test_2Darray()
118125
test_3Darray()
119126
test_newDistArray()

0 commit comments

Comments
 (0)