Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connecting with Awkward Array #29

Open
alimanfoo opened this issue Apr 6, 2020 · 4 comments
Open

Connecting with Awkward Array #29

alimanfoo opened this issue Apr 6, 2020 · 4 comments

Comments

@alimanfoo
Copy link
Member

Via @jpivarski (and @martindurant) I learned about the Awkward Array project. Development has been mainly driven by high energy physics use cases, but the need for variable-length data structures and other "awkard" structures is common to other domains, and work in Awkward Array could inform if/how Zarr might support some of these features. A recent update from @jpivarski:

I've finished a lot of development on my side, and Awkward is ready to use (front page; Doxygen C++ and Sphinx Python documentation is done; tutorials are not). I remember that the interaction between regular-sized arrays and variable-sized arrays was important for your data, so RegularArrays (C++, Python) are worth taking a look at.

I'd also like to know if Zarr is taking a more columnar approach to ragged arrays. Even if it's not, I could write a C++ function to de-interlace list sizes from list contents, which could then be exposed to the Python layer for Zarr → Awkward for analysis. (There's an awkward1._io extension module for these sorts of things, including some special cases for ROOT.)

@martindurant
Copy link
Member

Development has been mainly driven by high energy physics use cases

The original awkward already showed how you could write loopy custom code and run at C speeds over deeply nested structures of lists and maps. Such "json-like" data is very different from the usual N-D arrays we usually think of, but it is perfectly possible that you could have an N-D array of such strucs, or some other combination. To point is that each leaf node of the struct has its data stored in (some chunks of) homogenous arrays, and the nested structure is defined by corresponding arrays of offsets.

@jakirkham
Copy link
Member

IIUC they are giving a SciPy talk this year. As everything has gone digital, it should be on YouTube pretty quickly. There's then a Q&A session after the videos go up where one can ask the speakers questions.

@jpivarski
Copy link

That's true—I've recorded it and everything. I think the talks go live on July 5 and there's a moderated discussion on July 7 at 2:30‒3:45pm U.S. Central time (schedule; title is "Awkward Array: Manipulating JSON-like Data with NumPy-like Idioms").

I think you'll be there, too, right? If so, see you then!

@jakirkham
Copy link
Member

Looking forward to it 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants