Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement serialization and deserialization #93

Merged
merged 10 commits into from
Aug 15, 2024
96 changes: 91 additions & 5 deletions doc/sphinx/src/databox.rst
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ yourself. For example:
You can also resize a ``DataBox``, which you can use to modify a
``DataBox`` in-place. For example:

.. code-block::
.. code-block:: cpp

Spiner::DataBox<double> db; // empty
// clears old memory, resizes the underlying array,
Expand All @@ -124,7 +124,7 @@ If you want to change the stride without changing the underlying data,
you can use ``reshape``, which modifies the dimensions of the
array, without modifying the underlying memory. For example:

.. code-block::
.. code-block:: cpp

// allocate a 1D databox
Spiner::DataBox<double> db(nx3*nx2*nx1);
Expand Down Expand Up @@ -170,7 +170,7 @@ Semantics and Memory Management
``DataBox`` has reference semantics---meaning that copying a
``DataBox`` does not copy the underlying data. In other words,

.. code-block::
.. code-block:: cpp

Spiner::DataBox<double> db1(size);
Spiner::DataBox<double> db2 = db1;
Expand Down Expand Up @@ -230,7 +230,7 @@ call ``free`` for you, so long as you use them with a custom
deleter. Spiner provides the following deleter for use in this
scenario:

.. code-block::
.. code-block:: cpp

struct DBDeleter {
template <typename T>
Expand All @@ -242,7 +242,7 @@ scenario:

It can be used, for example, with a ``std::unique_ptr`` via:

.. code-block::
.. code-block:: cpp

// needed for smart pointers
#include <memory>
Expand All @@ -259,6 +259,92 @@ It can be used, for example, with a ``std::unique_ptr`` via:

// when you leave scope, the data box will be freed.

Serialization and de-serialization
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Shared memory models, such as `MPI Windows`_, require allocation of
memory through an external API call (e.g.,
``MPI_Win_allocate_shared``), which tabulated data must be written
to. ``Spiner`` supports this model through **serialization** and
**de-serialization**. The relevant methods are as follows. The
function

.. cpp:function:: std::size_t DataBox::serializedSizeInBytes() const;

reports how much memory a ``DataBox`` object requires to be externally
allocated. The function

.. cpp:function:: std::size_t serialize(char *dst) const;

takes a ``char*`` pointer, assumed to contain enough space for a
``DataBox``, and stores all information needed for the ``DataBox`` to
reconstruct itself. The return value is the amount of memory in bytes
used in the array by the serialized ``DataBox`` object. This method is
non-destructive; the original ``DataBox`` is unchanged. The function

.. cpp:function:: std::size_t DataBox::setPointer(T *src);

with the overload

.. cpp:function:: std::size_t DataBox::setPointer(char *src);

sets the underlying tabulated data from the src pointer, which is
assumed to be the right size and shape. This is useful for the
deSerialize function (described below) and for building your own
serialization/de-serialization routines in composite objects. The
function

.. cpp:function:: std::size_t DataBox::deSerialize(char *src);

initializes a ``DataBox`` to match the serialized ``DataBox``
contained in the ``src`` pointer.

.. note::

Note that the de-serialized ``DataBox`` has **unmanaged** memory, as
it is assumed that the ``src`` pointer manages its memory for
it. Therefore, one **cannot** ``free`` the ``src`` pointer until
everything you want to do with the de-serialized ``DataBox`` is
over.

Putting this all together, an application of
serialization/de-serialization probably looks like this:

.. code-block:: cpp

// load a databox from, e.g., file
Spiner::DataBox<double> db;
db.loadHDF(filename);

// get size of databox
std::size_t allocate_size = db.serialSizeInBytes();

// Allocate the memory for the new databox.
// In practice this would be an API call for, e.g., shared memory
char *memory = (char*)malloc(allocate_size);

// serialize the old databox
std::size_t write_size = db.serialize(memory);

// make a new databox and de-serialize it
Spiner::DataBox<double> db2;
std::size_t read_size = db2.deSerialize(memory);

// read_size, write_size, and allocate_size should all be the same.
assert((read_size == write_size) && (write_size == allocate_size));

.. warning::

The serialization routines described here are **not** architecture
aware. Serializing and de-serializing on a single architecture
inside a single executable will work fine. However, do not use
serialization as a file I/O strategy, as there is no guarantee that
the serialized format for a ``DataBox`` on one architecture will be
the same as on another. This is due to, for example,
architecture-specific differences in endianness and padding.

.. _`MPI Windows`: https://www.mpi-forum.org/docs/mpi-4.1/mpi41-report/node311.htm

Accessing Elements of a ``DataBox``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
54 changes: 54 additions & 0 deletions spiner/databox.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,60 @@ class DataBox {
return indices_[i];
}

// serialization routines
// ------------------------------------
// this one reports size for serialize/deserialize
std::size_t serializedSizeInBytes() const {
return sizeBytes() + sizeof(*this);
}
// this one takes the pointer `dst`, which is assumed to have
// sufficient memory allocated, and fills it with the
// databox. Return value is the amount of bytes written to.
std::size_t serialize(char *dst) const {
PORTABLE_REQUIRE(status_ != DataStatus::AllocatedDevice,
"Serialization cannot be performed on device memory");
memcpy(dst, this, sizeof(*this));
std::size_t offst = sizeof(*this);
if (sizeBytes() > 0) { // could also do data_ != nullptr
memcpy(dst + offst, data_, sizeBytes());
offst += sizeBytes();
}
return offst;
}

// This sets the internal pointer based on a passed in src pointer,
// which is assumed to be the right size. Used below in deSerialize
// and may be used for serialization routines. Returns amount of src
// pointer used.
std::size_t setPointer(T *src) {
if (sizeBytes() > 0) { // could also do data_ != nullptr
data_ = src;
// TODO(JMM): If portable arrays ever change maximum rank, this
// line needs to change.
dataView_.NewPortableMDArray(data_, dim(6), dim(5), dim(4), dim(3),
dim(2), dim(1));
makeShallow();
}
return sizeBytes();
}
std::size_t setPointer(char *src) { return setPointer((T *)src); }

// This one takes a src pointer, which is assumed to contain a
// databox and initializes the current databox. Note that the
// databox becomes unmananged, as the contents of the box are still
// the externally managed pointer.
std::size_t deSerialize(char *src) {
PORTABLE_REQUIRE(
(status_ == DataStatus::Empty || status_ == DataStatus::Unmanaged),
"Must not de-serialize into an active databox.");
memcpy(this, src, sizeof(*this));
Yurlungur marked this conversation as resolved.
Show resolved Hide resolved
std::size_t offst = sizeof(*this);
// now sizeBytes is well defined after copying the "header" of the source.
offst += setPointer(src + offst);
return offst;
}
// ------------------------------------

DataBox<T, Grid_t, Concept>
getOnDevice() const { // getOnDevice is always a deep copy
if (size() == 0 ||
Expand Down
15 changes: 0 additions & 15 deletions spiner/regular_grid_1d.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -62,21 +62,6 @@ class RegularGrid1D {
PORTABLE_ALWAYS_REQUIRE(min_ < max_ && N_ > 0, "Valid grid");
}

// Assignment operator
/*
Default copy constructable
PORTABLE_INLINE_FUNCTION RegularGrid1D &operator=(const RegularGrid1D &src) {
if (this != &src) {
min_ = src.min_;
max_ = src.max_;
dx_ = src.dx_;
idx_ = src.idx_;
N_ = src.N_;
}
return *this;
}
*/

// Forces x in the interval
PORTABLE_INLINE_FUNCTION int bound(int ix) const {
#ifndef SPINER_DISABLE_BOUNDS_CHECKS
Expand Down
107 changes: 96 additions & 11 deletions test/test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,8 @@ PORTABLE_INLINE_FUNCTION Real linearFunction(Real b, Real a, Real z, Real y,
return x + y + z + a + b;
}

TEST_CASE("PortableMDArrays can be allocated from a pointer",
"[PortableMDArray]") {
SCENARIO("PortableMDArrays can be allocated from a pointer",
"[PortableMDArray]") {
constexpr int N = 2;
constexpr int M = 3;
std::vector<int> data(N * M);
Expand Down Expand Up @@ -529,19 +529,22 @@ TEST_CASE("DataBox Interpolation with piecewise grids",

WHEN("We construct and fill a 3D DataBox based on this grid") {
constexpr int RANK = 3;
PiecewiseDB<NGRIDS> db(Spiner::AllocationTarget::Device, NCOARSE, NCOARSE,
NCOARSE);
PiecewiseDB<NGRIDS> dbh(Spiner::AllocationTarget::Host, NCOARSE, NCOARSE,
NCOARSE);
Comment on lines +532 to +533
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes were actually unnecessary. I originally had this test merged with the new test, and needed host-side data. However I split them into separate tests for clarity. Nevertheless, there's no harm in this change.

for (int i = 0; i < RANK; ++i) {
db.setRange(i, g);
dbh.setRange(i, g);
}
portableFor(
"Fill 3D Databox", 0, NCOARSE, 0, NCOARSE, 0, NCOARSE,
PORTABLE_LAMBDA(const int iz, const int iy, const int ix) {
for (int iz = 0; iz < NCOARSE; ++iz) {
for (int iy = 0; iy < NCOARSE; ++iy) {
for (int ix = 0; ix < NCOARSE; ++ix) {
Real x = g.x(ix);
Real y = g.x(iy);
Real z = g.x(iz);
db(iz, iy, ix) = linearFunction(z, y, x);
});
dbh(iz, iy, ix) = linearFunction(z, y, x);
}
}
}
auto db = dbh.getOnDevice();

THEN("We can interpolate it to a finer grid and get the right answer") {
Real error = 0;
Expand All @@ -561,8 +564,90 @@ TEST_CASE("DataBox Interpolation with piecewise grids",
error);
REQUIRE(error <= EPSTEST);
}

Yurlungur marked this conversation as resolved.
Show resolved Hide resolved
// cleanup
free(db);
free(dbh);
}
}
}

SCENARIO("Serializing and deserializing a DataBox",
"[DataBox][PiecewiseGrid1D][Serialize]") {
GIVEN("A piecewise grid") {
constexpr int NGRIDS = 2;
constexpr Real xmin = 0;
constexpr Real xmax = 1;

RegularGrid1D g1(xmin, 0.35 * (xmax - xmin), 3);
RegularGrid1D g2(0.35 * (xmax - xmin), xmax, 4);
PiecewiseGrid1D<NGRIDS> g = {{g1, g2}};

const int NCOARSE = g.nPoints();

THEN("The piecewise grid contains a number of points equal the sum of "
"the points of the individual grids") {
REQUIRE(g.nPoints() == g1.nPoints() + g2.nPoints());
}

WHEN("We construct and fill a 3D DataBox based on this grid") {
constexpr int RANK = 3;
PiecewiseDB<NGRIDS> dbh(Spiner::AllocationTarget::Host, NCOARSE, NCOARSE,
NCOARSE);
for (int i = 0; i < RANK; ++i) {
dbh.setRange(i, g);
}
for (int iz = 0; iz < NCOARSE; ++iz) {
for (int iy = 0; iy < NCOARSE; ++iy) {
for (int ix = 0; ix < NCOARSE; ++ix) {
Real x = g.x(ix);
Real y = g.x(iy);
Real z = g.x(iz);
dbh(iz, iy, ix) = linearFunction(z, y, x);
}
}
}
WHEN("We serialize the DataBox") {
std::size_t serial_size = dbh.serializedSizeInBytes();
REQUIRE(serial_size == (sizeof(dbh) + dbh.sizeBytes()));

char *db_serial = (char *)malloc(serial_size * sizeof(char));
std::size_t write_offst = dbh.serialize(db_serial);
REQUIRE(write_offst == serial_size);

THEN("We can initialize a new databox based on the serialized one") {
PiecewiseDB<NGRIDS> dbh2;
std::size_t read_offst = dbh2.deSerialize(db_serial);
REQUIRE(read_offst == write_offst);

AND_THEN("They do not point ot the same memory") {
// checks DataBox pointer
REQUIRE(dbh2.data() != dbh.data());
// checks accessor agrees
REQUIRE(&dbh2(0) != &dbh(0));
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is important, as it ensures that we didn't accidentally just create a shallow copy of the original DataBox.


AND_THEN("The shape is correct") {
REQUIRE(dbh2.rank() == dbh.rank());
REQUIRE(dbh2.size() == dbh.size());
for (int d = 1; d <= 3; ++d) {
REQUIRE(dbh2.dim(d) == dbh.dim(d));
}
}

AND_THEN("The contents are correct") {
for (int i = 0; i < dbh.size(); ++i) {
REQUIRE(dbh(i) == dbh2(i));
}
}
Yurlungur marked this conversation as resolved.
Show resolved Hide resolved
}

// cleanup
free(db_serial);
}

// cleanup
free(dbh);
}
}
}
Expand Down Expand Up @@ -702,7 +787,7 @@ SCENARIO("Using unique pointers to garbage collect DataBox",
}

#if SPINER_USE_HDF
TEST_CASE("PiecewiseGrid HDF5", "[PiecewiseGrid1D][HDF5]") {
SCENARIO("PiecewiseGrid HDF5", "[PiecewiseGrid1D][HDF5]") {
GIVEN("A piecewise grid") {
RegularGrid1D g1(0, 0.25, 3);
RegularGrid1D g2(0.25, 0.75, 11);
Expand Down
Loading