Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing ntuple larger than 2GB fails when no compression is used #1130

Closed
grzanka opened this issue Feb 16, 2024 · 6 comments
Closed

Writing ntuple larger than 2GB fails when no compression is used #1130

grzanka opened this issue Feb 16, 2024 · 6 comments
Labels
bug (unverified) The problem described would be a bug, but needs to be triaged

Comments

@grzanka
Copy link

grzanka commented Feb 16, 2024

The problem

Trying to save ntuple (TTree) with more than 2 GB of data and no compression fails with following error:

error: 'i' format requires -2147483648 <= number <= 2147483647

The minimum code to reproduce:

from pathlib import Path
import numpy as np
import uproot
data_dict = {
        "x": np.ones(1000_000_000, dtype=np.float64),
}
with uproot.recreate(Path('file.root'), compression=None) as fout:
    fout["tree"] = data_dict

Details

More details with my original code from which the problem is below. In the comments below I've also provided more examples.

I was trying to write an ROOT ntuple with following code:

import time
from pathlib import Path
import h5py
from hdf import peak_count
import uproot

def convert_hdf_to_ntuple(input_path: Path):
    ntuple_path = input_path.with_suffix('.root')
    print(f"Saving ntuple to {ntuple_path}")
    before_write = time.time()
    ntuple_path.unlink(missing_ok=True)

    uproot.create(ntuple_path, compression=None)

    file = uproot.reading.ReadOnlyFile(ntuple_path)
    print(f"file 64 bit (check via file.is_64bit) {file.is_64bit}")
    
    for channel_no in range(4):
        with h5py.File(input_path, 'r') as f, uproot.update(ntuple_path) as fout:
            print(f"Processing channel {channel_no}")
            gain_mV = f[f'channel_{channel_no}'].attrs['gain_mV']
            offset_mV = f[f'channel_{channel_no}'].attrs['offset_mV']
            horiz_interval_ns = f[f'channel_{channel_no}'].attrs['horiz_interval_ns']
            fout[f'channel_{channel_no}/gain_mV'] = str(gain_mV)
            fout[f'channel_{channel_no}/offset_mV'] = str(offset_mV)
            fout[f'channel_{channel_no}/horiz_interval_ns'] = str(horiz_interval_ns)

            peaks_in_bucket = 10000000
            for peak_type in ['positive', 'negative']:
                print(f"Processing {peak_type} peaks")
                total_number_of_peaks = peak_count(f, channel_no, peak_type)
                for i in range(0, total_number_of_peaks, peaks_in_bucket):
                    dict_bucket = {}
                    for name, dataset in f[f'channel_{channel_no}/{peak_type}'].items():
                        dict_bucket[name] = dataset[i:i + peaks_in_bucket]
                    dict_bucket['peak_value_mV'] = dict_bucket['peak_value'] * gain_mV
                    dict_bucket['peak_length_ns'] = dict_bucket['peak_length'] * horiz_interval_ns
                    dict_bucket['peak_start_us'] = dict_bucket['peak_start'] * horiz_interval_ns / 1000
                    dict_bucket['peak_cfd_us'] = dict_bucket['peak_cfd_index'] * horiz_interval_ns / 1000
                    dict_bucket['peak_rise_ns'] = dict_bucket['rise_time'] * horiz_interval_ns
                    dict_bucket['peak_area_ns_mV'] = dict_bucket['peak_area'] * horiz_interval_ns * gain_mV
                    dict_bucket['peak_baseline_mV'] = dict_bucket['peak_baseline'] * gain_mV - offset_mV
                    dict_bucket['peak_noise_mV'] = dict_bucket['peak_noise'] * gain_mV
                    dict_bucket['peak_fwhm_ns'] = dict_bucket['peak_fwhm'] * horiz_interval_ns

                    del dict_bucket['peak_value']
                    del dict_bucket['peak_length']
                    del dict_bucket['peak_area']
                    del dict_bucket['peak_cfd_index']
                    del dict_bucket['rise_time']
                    del dict_bucket['peak_baseline']
                    del dict_bucket['peak_noise']
                    del dict_bucket['peak_fwhm']
                    if i == 0:
                        fout[f'channel_{channel_no}/{peak_type}'] = dict_bucket
                    else:
                        fout[f'channel_{channel_no}/{peak_type}'].extend(dict_bucket)
                    print(f"num entries {fout[f'channel_{channel_no}/{peak_type}'].num_entries} , num baskets {fout[f'channel_{channel_no}/{peak_type}'].num_baskets}")

    after_write = time.time()
    print(f"Writing took {after_write - before_write:.3f} s")

This works nicely until files are small, say smaller than 2GB.

When trying to save larger file I get following error:

Traceback (most recent call last):
  File "/net/people/plgrid/plgkongruencj/2022-krakow-lgad/src/convert_from_lv1_to_lv2.py", line 146, in <module>
    main()
  File "/memfs/7649613/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/memfs/7649613/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/memfs/7649613/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/memfs/7649613/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/people/plgrid/plgkongruencj/2022-krakow-lgad/src/convert_from_lv1_to_lv2.py", line 134, in main
    convert_hdf_to_ntuple(input_path)
  File "/net/people/plgrid/plgkongruencj/2022-krakow-lgad/src/root.py", line 58, in convert_hdf_to_ntuple
    fout[f'channel_{channel_no}/{peak_type}'] = dict_bucket
    ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/memfs/7649613/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py", line 984, in __setitem__
    self.update({where: what})
  File "/memfs/7649613/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py", line 1555, in update
    uproot.writing.identify.add_to_directory(v, name, directory, streamers)
  File "/memfs/7649613/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/identify.py", line 152, in add_to_directory
    tree.extend(data)
  File "/memfs/7649613/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py", line 1834, in extend
    self._cascading.extend(self._file, self._file.sink, data)
  File "/memfs/7649613/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py", line 816, in extend
    totbytes, zipbytes, location = self.write_np_basket(
                                   ^^^^^^^^^^^^^^^^^^^^^
  File "/memfs/7649613/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py", line 1427, in write_np_basket
    self._freesegments.write(sink)
  File "/memfs/7649613/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascade.py", line 782, in write
    super().write(sink)
  File "/memfs/7649613/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascade.py", line 132, in write
    dependency.write(sink)
  File "/memfs/7649613/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascade.py", line 102, in write
    tmp = self.serialize()
          ^^^^^^^^^^^^^^^^
  File "/memfs/7649613/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascade.py", line 377, in serialize
    format.pack(
struct.error: 'i' format requires -2147483648 <= number <= 2147483647

I saw similar error reported long ago here: scikit-hep/uproot3#462

Also - when looking at the source code of extend method in class NTuple(CascadeNode) it seems that all calls to add_rblob are with big=False argument. which suggest that only 4-byte pointers are being used.

See:

page_key = self.add_rblob(sink, data_bytes, len(data_bytes), big=False)

This is my uproot version:

Python 3.11.3 (main, Nov 19 2023, 23:25:18) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import uproot
>>> uproot.__version__
'5.2.2'
@grzanka grzanka added the bug (unverified) The problem described would be a bug, but needs to be triaged label Feb 16, 2024
@grzanka
Copy link
Author

grzanka commented Feb 16, 2024

Another comparison:

Version failing

Code:

import time
from pathlib import Path
import h5py
from hdf import peak_count
import uproot


def convert_hdf_to_ntuple(input_path: Path):
    ntuple_path = input_path.with_suffix('.root')
    print(f"Saving ntuple to {ntuple_path}")
    before_write = time.time()
    ntuple_path.unlink(missing_ok=True)

    saving_ok = True

    with h5py.File(input_path, 'r') as f, uproot.recreate(ntuple_path, compression=None) as fout:
        for channel_no in range(4):
            print(f"Processing channel {channel_no}")
            gain_mV = f[f'channel_{channel_no}'].attrs['gain_mV']
            offset_mV = f[f'channel_{channel_no}'].attrs['offset_mV']
            horiz_interval_ns = f[f'channel_{channel_no}'].attrs['horiz_interval_ns']
            fout[f'channel_{channel_no}/gain_mV'] = str(gain_mV)
            fout[f'channel_{channel_no}/offset_mV'] = str(offset_mV)
            fout[f'channel_{channel_no}/horiz_interval_ns'] = str(horiz_interval_ns)

            peaks_in_bucket = 10000000
            for peak_type in ['positive', 'negative']:
                print(f"Processing {peak_type} peaks")
                total_number_of_peaks = peak_count(f, channel_no, peak_type)
                for i in range(0, total_number_of_peaks, peaks_in_bucket):
                    dict_bucket = {}
                    for name, dataset in f[f'channel_{channel_no}/{peak_type}'].items():
                        dict_bucket[name] = dataset[i:i + peaks_in_bucket]
                    dict_bucket['peak_value_mV'] = dict_bucket['peak_value'] * gain_mV
                    dict_bucket['peak_length_ns'] = dict_bucket['peak_length'] * horiz_interval_ns
                    dict_bucket['peak_start_us'] = dict_bucket['peak_start'] * horiz_interval_ns / 1000
                    dict_bucket['peak_cfd_us'] = dict_bucket['peak_cfd_index'] * horiz_interval_ns / 1000
                    dict_bucket['peak_rise_ns'] = dict_bucket['rise_time'] * horiz_interval_ns
                    dict_bucket['peak_area_ns_mV'] = dict_bucket['peak_area'] * horiz_interval_ns * gain_mV
                    dict_bucket['peak_baseline_mV'] = dict_bucket['peak_baseline'] * gain_mV - offset_mV
                    dict_bucket['peak_noise_mV'] = dict_bucket['peak_noise'] * gain_mV
                    dict_bucket['peak_fwhm_ns'] = dict_bucket['peak_fwhm'] * horiz_interval_ns

                    try:
                        if i == 0:
                            fout[f'channel_{channel_no}/{peak_type}'] = dict_bucket
                        else:
                            fout[f'channel_{channel_no}/{peak_type}'].extend(dict_bucket)
                    except Exception as e:
                        print(f"Error {e} while writing {i} to {i + peaks_in_bucket}")
                        saving_ok = False
                        break
                    
    if not saving_ok:
        #print the size of saved file
        print(f"Generated file with {ntuple_path.stat().st_size} bytes")
        ntuple_path.unlink()

    after_write = time.time()
    print(f"Writing took {after_write - before_write:.3f} s")

Effect:

[ares][plgkongruencj@ac0084 2022-krakow-lgad]$ time poetry run src/convert_from_lv1_to_lv2.py /memfs/7649613/8nA.slim.hdf --save-ntuple
Converting from LV1 to LV2
Command used: convert_from_lv1_to_lv2.py /memfs/7649613/8nA.slim.hdf --save-ntuple
Input path: /memfs/7649613/8nA.slim.hdf
Saving ntuple
Saving ntuple to /memfs/7649613/8nA.slim.root
Processing channel 0
Processing positive peaks
Processing negative peaks
Processing channel 1
Processing positive peaks
Error 'i' format requires -2147483648 <= number <= 2147483647 while writing 0 to 10000000
Processing negative peaks
Error 'i' format requires -2147483648 <= number <= 2147483647 while writing 0 to 10000000
Processing channel 2
Processing positive peaks
Processing negative peaks
Processing channel 3
Processing positive peaks
Processing negative peaks
Generated file with 7498169826 bytes
Writing took 18.070 s

real    0m19.113s
user    0m8.029s
sys     0m10.919s

Version working:

It seems that enabling default compression makes possible to write correct files larger than 2 GB.

Code:

import time
from pathlib import Path
import h5py
from hdf import peak_count
import uproot


def convert_hdf_to_ntuple(input_path: Path):
    ntuple_path = input_path.with_suffix('.root')
    print(f"Saving ntuple to {ntuple_path}")
    before_write = time.time()
    ntuple_path.unlink(missing_ok=True)

    saving_ok = True

    with h5py.File(input_path, 'r') as f, uproot.recreate(ntuple_path) as fout:
        for channel_no in range(4):
            print(f"Processing channel {channel_no}")
            gain_mV = f[f'channel_{channel_no}'].attrs['gain_mV']
            offset_mV = f[f'channel_{channel_no}'].attrs['offset_mV']
            horiz_interval_ns = f[f'channel_{channel_no}'].attrs['horiz_interval_ns']
            fout[f'channel_{channel_no}/gain_mV'] = str(gain_mV)
            fout[f'channel_{channel_no}/offset_mV'] = str(offset_mV)
            fout[f'channel_{channel_no}/horiz_interval_ns'] = str(horiz_interval_ns)

            peaks_in_bucket = 10000000
            for peak_type in ['positive', 'negative']:
                print(f"Processing {peak_type} peaks")
                total_number_of_peaks = peak_count(f, channel_no, peak_type)
                for i in range(0, total_number_of_peaks, peaks_in_bucket):
                    dict_bucket = {}
                    for name, dataset in f[f'channel_{channel_no}/{peak_type}'].items():
                        dict_bucket[name] = dataset[i:i + peaks_in_bucket]
                    dict_bucket['peak_value_mV'] = dict_bucket['peak_value'] * gain_mV
                    dict_bucket['peak_length_ns'] = dict_bucket['peak_length'] * horiz_interval_ns
                    dict_bucket['peak_start_us'] = dict_bucket['peak_start'] * horiz_interval_ns / 1000
                    dict_bucket['peak_cfd_us'] = dict_bucket['peak_cfd_index'] * horiz_interval_ns / 1000
                    dict_bucket['peak_rise_ns'] = dict_bucket['rise_time'] * horiz_interval_ns
                    dict_bucket['peak_area_ns_mV'] = dict_bucket['peak_area'] * horiz_interval_ns * gain_mV
                    dict_bucket['peak_baseline_mV'] = dict_bucket['peak_baseline'] * gain_mV - offset_mV
                    dict_bucket['peak_noise_mV'] = dict_bucket['peak_noise'] * gain_mV
                    dict_bucket['peak_fwhm_ns'] = dict_bucket['peak_fwhm'] * horiz_interval_ns

                    try:
                        if i == 0:
                            fout[f'channel_{channel_no}/{peak_type}'] = dict_bucket
                        else:
                            fout[f'channel_{channel_no}/{peak_type}'].extend(dict_bucket)
                    except Exception as e:
                        print(f"Error {e} while writing {i} to {i + peaks_in_bucket}")
                        saving_ok = False
                        break
                    
    if not saving_ok:
        #print the size of saved file
        print(f"Generated file with {ntuple_path.stat().st_size} bytes")
        ntuple_path.unlink()

    after_write = time.time()
    print(f"Writing took {after_write - before_write:.3f} s")

Effect:

[ares][plgkongruencj@ac0084 2022-krakow-lgad]$ time poetry run src/convert_from_lv1_to_lv2.py /memfs/7649613/8nA.slim.hdf --save-ntuple
Converting from LV1 to LV2
Command used: convert_from_lv1_to_lv2.py /memfs/7649613/8nA.slim.hdf --save-ntuple
Input path: /memfs/7649613/8nA.slim.hdf
Saving ntuple
Saving ntuple to /memfs/7649613/8nA.slim.root
Processing channel 0
Processing positive peaks
Processing negative peaks
Processing channel 1
Processing positive peaks
Processing negative peaks
Processing channel 2
Processing positive peaks
Processing negative peaks
Processing channel 3
Processing positive peaks
Processing negative peaks
Writing took 205.975 s

real    3m26.941s
user    3m12.136s
sys     0m14.046s
[ares][plgkongruencj@ac0084 2022-krakow-lgad]$ ls -alh /memfs/7649613/
total 9.4G
drwx------ 5 plgkongruencj root    180 Feb 16 23:19 .
drwxr-xr-x 3 root          root     60 Feb 16 15:34 ..
drwxr-xr-x 2 plgkongruencj plgrid   80 Feb 16 20:40 20231204m2
-rw-r--r-- 1 plgkongruencj plgrid 4.3G Feb 16 20:49 8nA.slim.hdf
-rw-r--r-- 1 plgkongruencj plgrid 5.1G Feb 16 23:22 8nA.slim.root
-rwxr-xr-x 1 plgkongruencj plgrid  20M Feb 13 20:15 code
drwxr-xr-x 5 plgkongruencj plgrid  100 Feb 16 18:47 poetry_cache
drwxr-xr-x 2 plgkongruencj plgrid   60 Feb 16 19:42 poetry_config
-rw-r--r-- 1 plgkongruencj plgrid 8.2M Feb 16 15:34 vscode_cli.tar.gz

@grzanka
Copy link
Author

grzanka commented Feb 19, 2024

The simplest code to reproduce the problem (path to the output file can be adjusted):

from pathlib import Path
import numpy as np
import uproot
ntuple_path = Path('/memfs/7680475/file.root')
data_size = 1000_000_000
data_dict = {
        "x": np.ones(data_size, dtype=np.float64),
}
with uproot.recreate(ntuple_path, compression=None) as fout:
    fout["tree"] = data_dict

This snippet gives me the error:

---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
Cell In[4], [line 2](vscode-notebook-cell:?execution_count=4&line=2)
      [1](vscode-notebook-cell:?execution_count=4&line=1) with uproot.recreate(ntuple_path, compression=None) as fout:
----> [2](vscode-notebook-cell:?execution_count=4&line=2)     fout["tree"] = data_dict

File [/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:984](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:984), in WritableDirectory.__setitem__(self, where, what)
    [982](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:982) if self._file.sink.closed:
    [983](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:983)     raise ValueError("cannot write data to a closed file")
--> [984](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:984) self.update({where: what})

File [/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1555](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1555), in WritableDirectory.update(self, pairs, **more_pairs)
   [1552](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1552)     for item in path:
   [1553](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1553)         directory = directory[item]
-> [1555](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1555)     uproot.writing.identify.add_to_directory(v, name, directory, streamers)
   [1557](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1557) self._file._cascading.streamers.update_streamers(self._file.sink, streamers)

File [/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/identify.py:152](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/identify.py:152), in add_to_directory(obj, name, directory, streamers)
    [150](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/identify.py:150) if is_ttree:
    [151](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/identify.py:151)     tree = directory.mktree(name, metadata)
--> [152](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/identify.py:152)     tree.extend(data)
    [154](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/identify.py:154) else:
    [155](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/identify.py:155)     writable = to_writable(obj)

File [/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1834](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1834), in WritableTree.extend(self, data)
   [1807](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1807) def extend(self, data):
   [1808](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1808)     """
   [1809](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1809)     Args:
   [1810](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1810)         data (dict of str \u2192 arrays): More array data to add to the TTree.
   (...)
   [1832](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1832)         **As a word of warning,** be sure that each call to :ref:`uproot.writing.writable.WritableTree.extend` includes at least 100 kB per branch/array. (NumPy and Awkward Arrays have an `nbytes <[https://numpy.org/doc/stable/reference/generated/numpy.ndarray.nbytes.html>`__](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.nbytes.html%3E%60__) property; you want at least ``100000`` per array.) If you ask Uproot to write very small TBaskets, it will spend more time working on TBasket overhead than actually writing data. The absolute worst case is one-entry-per-:ref:`uproot.writing.writable.WritableTree.extend`. See `#428 (comment) <[https://github.com/scikit-hep/uproot5/pull/428#issuecomment-908703486>`__](https://github.com/scikit-hep/uproot5/pull/428#issuecomment-908703486%3E%60__).
   [1833](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1833)     """
-> [1834](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1834)     self._cascading.extend(self._file, self._file.sink, data)

File [/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:816](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:816), in Tree.extend(self, file, sink, data)
    [813](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:813)     datum["fEntryOffsetLen"] = 4 * (len(big_endian_offsets) - 1)
    [815](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:815) elif big_endian_offsets is None:
--> [816](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:816)     totbytes, zipbytes, location = self.write_np_basket(
    [817](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:817)         sink, branch_name, compression, big_endian
    [818](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:818)     )
    [819](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:819) else:
    [820](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:820)     totbytes, zipbytes, location = self.write_jagged_basket(
    [821](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:821)         sink, branch_name, compression, big_endian, big_endian_offsets
    [822](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:822)     )

File [/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1399](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1399), in Tree.write_np_basket(self, sink, branch_name, compression, array)
   [1395](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1395) location = self._freesegments.allocate(fNbytes, dry_run=False)
   [1397](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1397) out = []
   [1398](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1398) out.append(
-> [1399](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1399)     uproot.reading._key_format_big.pack(
   [1400](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1400)         fNbytes,
   [1401](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1401)         1004,  # fVersion
   [1402](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1402)         fObjlen,
   [1403](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1403)         uproot._util.datetime_to_code(datetime.datetime.now()),  # fDatime
   [1404](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1404)         fKeylen,
   [1405](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1405)         0,  # fCycle
   [1406](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1406)         location,  # fSeekKey
   [1407](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1407)         parent_location,  # fSeekPdir
   [1408](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1408)     )
   [1409](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1409) )
   [1410](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1410) out.append(fClassName)
   [1411](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1411) out.append(fName)

error: 'i' format requires -2147483648 <= number <= 2147483647---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
Cell In[4], [line 2](vscode-notebook-cell:?execution_count=4&line=2)
      [1](vscode-notebook-cell:?execution_count=4&line=1) with uproot.recreate(ntuple_path, compression=None) as fout:
----> [2](vscode-notebook-cell:?execution_count=4&line=2)     fout["tree"] = data_dict

File [/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:984](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:984), in WritableDirectory.__setitem__(self, where, what)
    [982](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:982) if self._file.sink.closed:
    [983](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:983)     raise ValueError("cannot write data to a closed file")
--> [984](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:984) self.update({where: what})

File [/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1555](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1555), in WritableDirectory.update(self, pairs, **more_pairs)
   [1552](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1552)     for item in path:
   [1553](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1553)         directory = directory[item]
-> [1555](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1555)     uproot.writing.identify.add_to_directory(v, name, directory, streamers)
   [1557](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1557) self._file._cascading.streamers.update_streamers(self._file.sink, streamers)

File [/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/identify.py:152](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/identify.py:152), in add_to_directory(obj, name, directory, streamers)
    [150](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/identify.py:150) if is_ttree:
    [151](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/identify.py:151)     tree = directory.mktree(name, metadata)
--> [152](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/identify.py:152)     tree.extend(data)
    [154](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/identify.py:154) else:
    [155](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/identify.py:155)     writable = to_writable(obj)

File [/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1834](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1834), in WritableTree.extend(self, data)
   [1807](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1807) def extend(self, data):
   [1808](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1808)     """
   [1809](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1809)     Args:
   [1810](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1810)         data (dict of str \u2192 arrays): More array data to add to the TTree.
   (...)
   [1832](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1832)         **As a word of warning,** be sure that each call to :ref:`uproot.writing.writable.WritableTree.extend` includes at least 100 kB per branch/array. (NumPy and Awkward Arrays have an `nbytes <[https://numpy.org/doc/stable/reference/generated/numpy.ndarray.nbytes.html>`__](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.nbytes.html%3E%60__) property; you want at least ``100000`` per array.) If you ask Uproot to write very small TBaskets, it will spend more time working on TBasket overhead than actually writing data. The absolute worst case is one-entry-per-:ref:`uproot.writing.writable.WritableTree.extend`. See `#428 (comment) <[https://github.com/scikit-hep/uproot5/pull/428#issuecomment-908703486>`__](https://github.com/scikit-hep/uproot5/pull/428#issuecomment-908703486%3E%60__).
   [1833](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1833)     """
-> [1834](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/writable.py:1834)     self._cascading.extend(self._file, self._file.sink, data)

File [/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:816](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:816), in Tree.extend(self, file, sink, data)
    [813](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:813)     datum["fEntryOffsetLen"] = 4 * (len(big_endian_offsets) - 1)
    [815](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:815) elif big_endian_offsets is None:
--> [816](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:816)     totbytes, zipbytes, location = self.write_np_basket(
    [817](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:817)         sink, branch_name, compression, big_endian
    [818](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:818)     )
    [819](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:819) else:
    [820](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:820)     totbytes, zipbytes, location = self.write_jagged_basket(
    [821](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:821)         sink, branch_name, compression, big_endian, big_endian_offsets
    [822](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:822)     )

File [/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1399](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1399), in Tree.write_np_basket(self, sink, branch_name, compression, array)
   [1395](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1395) location = self._freesegments.allocate(fNbytes, dry_run=False)
   [1397](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1397) out = []
   [1398](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1398) out.append(
-> [1399](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1399)     uproot.reading._key_format_big.pack(
   [1400](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1400)         fNbytes,
   [1401](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1401)         1004,  # fVersion
   [1402](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1402)         fObjlen,
   [1403](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1403)         uproot._util.datetime_to_code(datetime.datetime.now()),  # fDatime
   [1404](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1404)         fKeylen,
   [1405](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1405)         0,  # fCycle
   [1406](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1406)         location,  # fSeekKey
   [1407](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1407)         parent_location,  # fSeekPdir
   [1408](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1408)     )
   [1409](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1409) )
   [1410](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1410) out.append(fClassName)
   [1411](https://vscode-remote+tunnel-002bac0018.vscode-resource.vscode-cdn.net/memfs/7680475/poetry_cache/virtualenvs/2022-krakow-lgad-_qGHPVZk-py3.11/lib/python3.11/site-packages/uproot/writing/_cascadetree.py:1411) out.append(fName)

error: 'i' format requires -2147483648 <= number <= 2147483647

@grzanka grzanka changed the title Writing ntuple larger than 2GB fails Writing ntuple larger than 2GB fails when no compression is used Feb 19, 2024
@grzanka
Copy link
Author

grzanka commented Feb 19, 2024

Interesting, I've tried as well the same simple code but with compression enabled:

from pathlib import Path
import numpy as np
import uproot
ntuple_path = Path('file.root')
data_size = 1000_000_000
data_dict = {
        "x": np.ones(data_size, dtype=np.float64),
}
with uproot.recreate(ntuple_path) as fout:
    fout["tree"] = data_dict

The code took 20 min to run on my cluster, used ~40GB of RAM and crashed with:

Traceback (most recent call last):
  File "/memfs/7685922/bug.py", line 10, in <module>
    fout["tree"] = data_dict
    ~~~~^^^^^^^^
  File "/memfs/7685922/venv/lib/python3.11/site-packages/uproot/writing/writable.py", line 984, in __setitem__
    self.update({where: what})
  File "/memfs/7685922/venv/lib/python3.11/site-packages/uproot/writing/writable.py", line 1555, in update
    uproot.writing.identify.add_to_directory(v, name, directory, streamers)
  File "/memfs/7685922/venv/lib/python3.11/site-packages/uproot/writing/identify.py", line 152, in add_to_directory
    tree.extend(data)
  File "/memfs/7685922/venv/lib/python3.11/site-packages/uproot/writing/writable.py", line 1834, in extend
    self._cascading.extend(self._file, self._file.sink, data)
  File "/memfs/7685922/venv/lib/python3.11/site-packages/uproot/writing/_cascadetree.py", line 816, in extend
    totbytes, zipbytes, location = self.write_np_basket(
                                   ^^^^^^^^^^^^^^^^^^^^^
  File "/memfs/7685922/venv/lib/python3.11/site-packages/uproot/writing/_cascadetree.py", line 1399, in write_np_basket
    uproot.reading._key_format_big.pack(
struct.error: 'i' format requires -2147483648 <= number <= 2147483647

@jpivarski
Copy link
Member

I've been meaning to get back to this. Maybe we could add an error message, but the ROOT format itself does not allow TBaskets to be bigger than 2 GB because both the fNbytes (compressed size) and fObjlen (uncompressed size) are 32-bit integers. Here's where it's trying to write a TKey for the TBasket (where you see the exception):

uproot.reading._key_format_big.pack(
fNbytes,
1004, # fVersion
fObjlen,
uproot._util.datetime_to_code(datetime.datetime.now()), # fDatime
fKeylen,
0, # fCycle
location, # fSeekKey
parent_location, # fSeekPdir
)

and here's the definition of _key_format_big (a "big" TKey uses 64-bit integers for the location of the TBasket, so that files can be bigger than 2 GB, but no single object, such as a TBasket, can be that large).

_key_format_big = struct.Struct(">ihiIhhqq")

Here's the ROOT definition of a TKey:

https://root.cern.ch/doc/master/classTKey.html#ab2e59bcc49663466e74286cabd3d42c1

in which fNbytes and fObjlen are declared to be type Int_t, which is 32-bits.


Do you know that

file["tree_name"] = {"branch": branch_data}

writes all of the branch_data as one TBasket? TBaskets are the granular unit of reading and writing ROOT TTrees, so if you write all of the data in one TBasket, any reader (ROOT, Uproot, UnROOT) will have to read it all into memory at once—ROOT TTrees can only be read piecemeal if they're written as multiple TBaskets. In Uproot, the way to do that is

file["tree_name"] = {"branch": first_basket}
file["tree_name"].extend({"branch": second_basket})
file["tree_name"].extend({"branch": third_basket})
...

It can't be an interface that takes all of the data in one call because the TBasket data might not fit in memory, especially if you have many TBranches (each with one TBasket). This interface is documented here.


In most files, ROOT TBaskets tend to be too small: they tend to be on the order of kilobytes, when it would be more efficient to read if they were megabytes. If you ask ROOT to make big TBaskets, on the order of megabytes or bigger, it just doesn't do it—there seems to be some internal limit. Uproot does exactly what you ask, and you were asking for gigabyte-sized TBaskets. If you didn't run into the 2 GB limit, I wonder if ROOT would be able to read them. Since it prevents the writing of TBaskets that large, I wouldn't be surprised if there's an implicit assumption in the reading code. Did you ever write 1 GB TBaskets and then read them back in ROOT?


About this issue, I think I can close it because the format simply doesn't accept integers of that size, and most likely, you intended to write multiple TBaskets with uproot.WritableTree.extend.

@grzanka
Copy link
Author

grzanka commented Feb 20, 2024

@jpivarski thanks for your detailed explanation. Indeed documentation of uproot is great on that aspects. It just needs a careful reading.

Can you suggest me some methods on inspecting bucket size and checking the exact limits ?

I am playing with uproot as tool to convert large HDF files into something that could be then inspected online using JSROOT. The ROOT ntuples are transfered to S3 filestystem provided by our supercomputing center (ACK Cyfronet in Krakow). As S3 provides nice way to share the files via URL, the JSROOT can nicely load the files. I exploit there the partial read from HTTP feature (root-project/jsroot#284).

I've played a bit with basket size and for my use reading case the optimum basket size is about 1000000 (10^6) rows/entries. This gives fastest loading time in JSROOT. My tree has ~20 branches with mostly 64 bit floats.

For small benchmark I've took the same HDF file and generated two root files, one with 100k entries per basket and one with 1000k entries.

You can play yourselves with them:

100k entries/basket

1000k entries/basket

In general I feel that having less HTTP request of ~1MB size gives the most optimal perfomance. Going down with basket size to 10k entries slows down JSROOT even more.

The problem

Now the problem is following: with 1000000 entries per basket I cannot process larger files.
I've used following code to generate ROOT from HDF, using extend approach and limiting the amount of data which goes into basket. I am not sure if the calculation of basket size is correct here, but at least gives an order of magnitute:

    all_basket_count = 0
    with h5py.File(input_path, 'r') as f, uproot.recreate(ntuple_path) as fout:
        for channel_no in range(4):
            print(f"Processing channel {channel_no}")
            gain_mV = f[f'channel_{channel_no}'].attrs['gain_mV']
            offset_mV = f[f'channel_{channel_no}'].attrs['offset_mV']
            horiz_interval_ns = f[f'channel_{channel_no}'].attrs['horiz_interval_ns']
            fout[f'channel_{channel_no}/gain_mV'] = str(gain_mV)
            fout[f'channel_{channel_no}/offset_mV'] = str(offset_mV)
            fout[f'channel_{channel_no}/horiz_interval_ns'] = str(horiz_interval_ns)

            peaks_in_bucket = 1000_000
            for peak_type in ['positive', 'negative']:
                print(f"Processing {peak_type} peaks")
                total_number_of_peaks = peak_count(f, channel_no, peak_type)
                for i in range(0, total_number_of_peaks, peaks_in_bucket):
                    dict_bucket = {}
                    for name, dataset in f[f'channel_{channel_no}/{peak_type}'].items():
                        dict_bucket[name] = dataset[i:i + peaks_in_bucket]
                    dict_bucket['peak_value_mV'] = dict_bucket['peak_value'] * gain_mV
                    dict_bucket['peak_length_ns'] = dict_bucket['peak_length'] * horiz_interval_ns
                    dict_bucket['peak_start_us'] = dict_bucket['peak_start'] * horiz_interval_ns / 1000
                    dict_bucket['peak_cfd_us'] = dict_bucket['peak_cfd_index'] * horiz_interval_ns / 1000
                    dict_bucket['peak_rise_ns'] = dict_bucket['rise_time'] * horiz_interval_ns
                    dict_bucket['peak_area_ns_mV'] = dict_bucket['peak_area'] * horiz_interval_ns * gain_mV
                    dict_bucket['peak_baseline_mV'] = dict_bucket['peak_baseline'] * gain_mV - offset_mV
                    dict_bucket['peak_noise_mV'] = dict_bucket['peak_noise'] * gain_mV
                    dict_bucket['peak_fwhm_ns'] = dict_bucket['peak_fwhm'] * horiz_interval_ns

                    try:
                        if i == 0:
                            fout[f'channel_{channel_no}/{peak_type}'] = dict_bucket
                        else:
                            fout[f'channel_{channel_no}/{peak_type}'].extend(dict_bucket)
                        basket_size = 0
                        entry_size = 0
                        for key, value in dict_bucket.items():
                            basket_size += value.nbytes
                            entry_size += value.dtype.itemsize
                        basket_entries = dict_bucket['peak_value_mV'].shape[0]
                        print(f"\tadding basket {i//peaks_in_bucket:d} with {basket_entries} peaks")
                        print(f"\t\tentry size: {entry_size} bytes, total basket size: {basket_size / 1024 / 1024} MB")
                        all_basket_count += 1
                        print(f"\t\ttotal number of baskets: {all_basket_count}")
                    except Exception as e:
                        print(f"Error {e} while writing {i} to {i + peaks_in_bucket}")
                        saving_ok = False
                        break

when running this code I get an error with Error 'i' format requires -2147483648 <= number <= 2147483647 while writing 0 to 1000000 and following log:

Converting from LV1 to LV2
Command used: convert_from_lv1_to_lv2.py /memfs/7699480/4nA.slim.hdf --save-ntuple
Input path: /memfs/7699480/4nA.slim.hdf
Saving ntuple
Saving ntuple to /memfs/7699480/4nA.slim.root
Processing channel 0
Processing positive peaks
	adding basket 0 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 1
	adding basket 1 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 2
	adding basket 2 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 3
	adding basket 3 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 4
	adding basket 4 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 5
	adding basket 5 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 6
	adding basket 6 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 7
	adding basket 7 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 8
	adding basket 8 with 507702 peaks
		entry size: 152 bytes, total basket size: 73.59571838378906 MB
		total number of baskets: 9
Processing negative peaks
	adding basket 0 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 10
	adding basket 1 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 11
	adding basket 2 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 12
	adding basket 3 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 13
	adding basket 4 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 14
	adding basket 5 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 15
	adding basket 6 with 433912 peaks
		entry size: 152 bytes, total basket size: 62.89923095703125 MB
		total number of baskets: 16
Processing channel 1
Processing positive peaks
	adding basket 0 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 17
	adding basket 1 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 18
	adding basket 2 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 19
	adding basket 3 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 20
	adding basket 4 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 21
	adding basket 5 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 22
	adding basket 6 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 23
	adding basket 7 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 24
	adding basket 8 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 25
	adding basket 9 with 345506 peaks
		entry size: 152 bytes, total basket size: 50.08403015136719 MB
		total number of baskets: 26
Processing negative peaks
	adding basket 0 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 27
	adding basket 1 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 28
	adding basket 2 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 29
	adding basket 3 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 30
	adding basket 4 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 31
	adding basket 5 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 32
	adding basket 6 with 699947 peaks
		entry size: 152 bytes, total basket size: 101.46326446533203 MB
		total number of baskets: 33
Processing channel 2
Processing positive peaks
	adding basket 0 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 34
	adding basket 1 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 35
	adding basket 2 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 36
	adding basket 3 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 37
	adding basket 4 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 38
	adding basket 5 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 39
	adding basket 6 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 40
	adding basket 7 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 41
	adding basket 8 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 42
	adding basket 9 with 378574 peaks
		entry size: 152 bytes, total basket size: 54.87751770019531 MB
		total number of baskets: 43
Processing negative peaks
	adding basket 0 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 44
	adding basket 1 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 45
	adding basket 2 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 46
	adding basket 3 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 47
	adding basket 4 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 48
	adding basket 5 with 1000000 peaks
		entry size: 152 bytes, total basket size: 144.95849609375 MB
		total number of baskets: 49
	adding basket 6 with 985117 peaks
		entry size: 152 bytes, total basket size: 142.80107879638672 MB
		total number of baskets: 50
Processing channel 3
Processing positive peaks
Error 'i' format requires -2147483648 <= number <= 2147483647 while writing 0 to 1000000
Processing negative peaks
Error 'i' format requires -2147483648 <= number <= 2147483647 while writing 0 to 1000000
Generated file with 3964309449 bytes
Removing /memfs/7699480/4nA.slim.root, file corrupted
Writing took 152.494 s

@grzanka
Copy link
Author

grzanka commented Feb 20, 2024

@jpivarski I am not sure if discussion on closed issue is the best place ? Should I convert this into discussion (https://github.com/scikit-hep/uproot5/discussions) ?

@scikit-hep scikit-hep locked and limited conversation to collaborators Feb 20, 2024
@jpivarski jpivarski converted this issue into discussion #1135 Feb 20, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
bug (unverified) The problem described would be a bug, but needs to be triaged
Projects
None yet
Development

No branches or pull requests

2 participants