Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idr0091-julou-lacinduction S-BIAD852 #650

Open
will-moore opened this issue Feb 22, 2023 · 44 comments
Open

idr0091-julou-lacinduction S-BIAD852 #650

will-moore opened this issue Feb 22, 2023 · 44 comments

Comments

@will-moore
Copy link
Member

idr0091-julou-lacinduction

@will-moore will-moore moved this to test convert in NGFF conversion Feb 22, 2023
@dominikl
Copy link
Member

Issue with conversion:

(base) [dlindner@pilot-zarr2-dev idr0091]$ time /home/dlindner/bioformats2raw/bin/bioformats2raw --memo-directory ../memo /uod/idr/filesets/idr0091-julou-lacinduction/20200622-ftp/Julou_2020_lacInduction_RawImages/20170919/20170919_glyc_lac_1/20170919_glyc_lac_1_MMStack_metadata.txt 20170919_glyc_lac_1_MMStack.ome.zarr
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp4590289654988610984/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
2023-02-22 13:49:12,806 [main] ERROR loci.formats.Memoizer - deleting invalid memo file: ../memo/uod/idr/filesets/idr0091-julou-lacinduction/20200622-ftp/Julou_2020_lacInduction_RawImages/20170919/20170919_glyc_lac_1/.20170919_glyc_lac_1_MMStack_metadata.txt.bfmemo
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at ome.xml.model.Annotation.<init>(Annotation.java:123)
        at ome.xml.model.TextAnnotation.<init>(TextAnnotation.java:91)
        at ome.xml.model.XMLAnnotation.<init>(XMLAnnotation.java:97)

I even tried with export BF_MAX_MEM=56G But watching the process never got over 20G mem usage before crashing.

@sbesson
Copy link
Member

sbesson commented Feb 22, 2023

Pretty sure that BF_MAX_MEM is specific to the Bio-Formats command-line utilities and will not be recognized by bioformats2raw. Have you tried JAVA_OPTS="-Xmx<NN>G" ?

@dominikl
Copy link
Member

👍 It finally worked with export JAVA_OPTS="-Xmx50G" !

@dominikl dominikl moved this from test convert to re-import test image in NGFF conversion Feb 27, 2023
@sbesson
Copy link
Member

sbesson commented Feb 27, 2023

50G definitely feels excessive. I recall some improvements were targeting at handling similar issues for large Micro-Manager metadata files in the past. One thing possibly worth testing independently is whether bioformats2raw 0.6.0 would handle the same data will lower memory requirements /cc @melissalinkert

Semi-related, I would expect this particular file format to work without issues with OMERO 5.6.6. What is our policy for these types of submissions of mixed file formats (probably only a handful of them)? Are we converting everything or only the minimal amount of data? /cc @jburel

@dominikl
Copy link
Member

Oh, I should test a different image then. Didn't notice that this submission had different file formats.

@melissalinkert
Copy link

ome/bioformats#3229 is the last time we addressed memory issues in Micro-Manager, so I'd be surprised if bioformats2raw 0.6.0 helps. Based on the partial stack trace, I'd guess it's original metadata annotations that are causing the problem.

Comparing memory usage for showinf -nopix -omexml /uod/idr/filesets/idr0091-julou-lacinduction/20200622-ftp/Julou_2020_lacInduction_RawImages/20170919/20170919_glyc_lac_1/20170919_glyc_lac_1_MMStack_metadata.txt and showinf -nopix -omexml -no-sas /uod/idr/filesets/idr0091-julou-lacinduction/20200622-ftp/Julou_2020_lacInduction_RawImages/20170919/20170919_glyc_lac_1/20170919_glyc_lac_1_MMStack_metadata.txt should confirm whether that is indeed the issue.

@dominikl
Copy link
Member

dominikl commented Mar 7, 2023

Also converted one of the pattern files, and re-imported. Worked fine. But the converted MMStack can't be re-imported, also memory issue:

2023-03-07 11:54:22,437 17151      [      main] ERROR     ome.formats.importer.cli.ErrorHandler - FILE_EXCEPTION: /data/ngff/idr0091/20170920_glyc_lac_6h_1_MMStack.ome.zarr/OME/METADATA.ome.xml
java.lang.Exception: java.lang.OutOfMemoryError: GC overhead limit exceeded

@dominikl dominikl added the bug label Mar 7, 2023
@will-moore
Copy link
Member Author

@dominikl - are you able to try the --no-sas option suggested by @melissalinkert above and see if that affects memory usage?

@melissalinkert If that is the case, does it suggest a workaround for bioformats2raw or is a fix still a much bigger issue?

A possible option is to use omero-cli-zarr to export since it's only 342 Images (according to IDR/idr-utils#56)

@melissalinkert
Copy link

bioformats2raw does not have a direct equivalent to bfconvert's -no-sas. The closest workaround at the moment is bioformats2raw --no-ome-meta-export, which entirely prevents OME/METADATA.ome.xml from being written; that's likely not what you want. I'm not opposed to adding an equivalent to -no-sas in bioformats2raw, but would like to know if that actually would solve the problem first.

@will-moore
Copy link
Member Author

Going to start exporting with omero-cli-zarr since I can also do this on the idr-ftp machine which doesn't have the raw data mounted...

$ ssh -A idr-ftp.openmicroscopy.org
$ conda create -n omero_zarr_export -c ome python=3.9 zeroc-ice36-python
$ conda activate omero_zarr_export
$ conda install -c conda-forge omero-py
$ pip install git+https://github.com/will-moore/omero-cli-zarr.git@name_option
...
omero-cli-zarr-0.1.dev452+ge882a62

cd /data/ngff/
mkdir idr0091 && cd idr0091

Export 100 images

omero login
for id indo
  echo $id;
  omero zarr export Image:$id --name_by name;
done

@will-moore
Copy link
Member Author

After about 17 hours we have 50 images... (about 3 an hour):

(base) [wmoore@idrftp-ftp ~]$ ls -alh /data/ngff/idr0091
...
drwxrwxr-x.  6 wmoore wmoore  100 Jul 11 21:50 20151218_switch8h_pos2_GL02.pattern.ome.zarr
drwxrwxr-x.  6 wmoore wmoore  100 Jul 11 22:16 20151218_switch8h_pos2_GL04.pattern.ome.zarr
drwxrwxr-x.  6 wmoore wmoore  100 Jul 11 22:44 20151218_switch8h_pos2_GL05.pattern.ome.zarr
drwxrwxr-x.  6 wmoore wmoore  100 Jul 11 23:07 20151218_switch8h_pos5_GL03.pattern.ome.zarr
drwxrwxr-x.  6 wmoore wmoore  100 Jul 11 23:32 20151218_switch8h_pos5_GL05.pattern.ome.zarr
drwxrwxr-x.  6 wmoore wmoore  100 Jul 11 23:52 20151218_switch8h_pos5_GL06.pattern.ome.zarr
drwxrwxr-x.  6 wmoore wmoore  100 Jul 12 00:18 20151218_switch8h_pos5_GL08.pattern.ome.zarr
drwxrwxr-x.  3 wmoore wmoore   42 Jul 12 00:18 20151218_switch8h_pos5_GL09.pattern.ome.zarr

@will-moore
Copy link
Member Author

will-moore commented Jul 12, 2023

Moved 51 zarrs to batch1 and rename image.pattern.ome.zarr to image.ome.zarr...

(base) [wmoore@idrftp-ftp batch1]$  for i in $(ls .); do mv $i `echo $i | sed 's/pattern.ome.zarr$/ome.zarr/'`; done

# zip, with -move
(base) [wmoore@idrftp-ftp batch1]$ for i in */; do zip -mr "${i%/}.zip" "$i"; done

@will-moore
Copy link
Member Author

Created s3 bucket for testing...

$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3 mb s3://idr0091
make_bucket: idr0091
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3api put-bucket-policy --bucket idr0091 --policy file://policy.json
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3api put-bucket-cors --bucket idr0091 --cors-configuration file://cors.json
$ ./mc cp -r /data/ngff/idr0091/20151218_switch8h_pos6_GL01.pattern.ome.zarr uk1s3/idr0091/zarr
...pattern.ome.zarr/3/99/2/0/0: 574.64 MiB / 574.64 MiB ━━━━━━━━━━━━━━━━━━ 38.54 MiB/s 14s

Looks good: https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr0091/zarr/20151218_switch8h_pos6_GL01.pattern.ome.zarr

Screenshot 2023-07-12 at 04 40 15

@will-moore
Copy link
Member Author

will-moore commented Jul 12, 2023

Zipping of 51 images in batch1 above only took an hour.

Upload to BioStudies...

sudo /root/.aspera/cli/bin/ascp -P33001 -i /root/.aspera/cli/etc/asperaweb_id_dsa.openssh -d /data/ngff/idr0091/batch1/idr0091 [email protected]:5f/xxxxxx
...
20151218_switch8h_pos5_GL13.ome.zarr.zip                       100%  433MB  487Mb/s    06:04    
20151218_switch8h_pos5_GL14.ome.zarr.zip                       100%  433MB  323Mb/s    06:12    
Completed: 22054051K bytes transferred in 372 seconds
 (484481K bits/sec), in 51 files, 1 directory.

# deleted
$ rm -rf batch1/

@will-moore will-moore moved this from re-import test image to convert all data to NGFF in NGFF conversion Jul 13, 2023
@will-moore
Copy link
Member Author

Other 49 images from batch 1 completed...
Zipping..

Also starting to export ALL the remaining images...

for id indo
  omero zarr export Image:$id --name_by name;
done

@will-moore
Copy link
Member Author

Looks like the last 2 images here (batch1) didn't export properly - too small:

(base) [wmoore@idrftp-ftp idr0091]$ ls -alh
...
-rw-rw-r--. 1 wmoore wmoore 436M Jul 13 03:49 20160912_Pos0_GL11.pattern.ome.zarr.zip
-rw-rw-r--. 1 wmoore wmoore 437M Jul 13 03:49 20160912_Pos0_GL12.pattern.ome.zarr.zip
drwxrwxr-x. 3 wmoore wmoore   42 Jul 13 00:49 20160912_Pos0_GL14.pattern.ome.zarr
-rw-rw-r--. 1 wmoore wmoore 2.8M Jul 13 05:13 20160912_Pos0_GL14.pattern.ome.zarr.zip
drwxrwxr-x. 3 wmoore wmoore   42 Jul 13 00:49 20160912_Pos0_GL15.pattern.ome.zarr
-rw-rw-r--. 1 wmoore wmoore 427K Jul 13 05:14 20160912_Pos0_GL15.pattern.ome.zarr.zip

Deleted them.

Rename 49 others (remove .pattern) and zip..

for i in $(ls .); do mv $i `echo $i | sed 's/pattern.ome.zarr$/ome.zarr/'`; done
(base) [wmoore@idrftp-ftp idr0091]$ ls
20151218_switch8h_pos6_GL01.ome.zarr  20160526_pos0_GL12.ome.zarr  20160526_pos0_GL26.ome.zarr  20160526_pos4_GL20.ome.zarr  20160912_Pos0_GL02.ome.zarr
20151218_switch8h_pos6_GL03.ome.zarr  20160526_pos0_GL13.ome.zarr  20160526_pos4_GL01.ome.zarr  20160526_pos4_GL21.ome.zarr  20160912_Pos0_GL03.ome.zarr
20151218_switch8h_pos6_GL04.ome.zarr  20160526_pos0_GL16.ome.zarr  20160526_pos4_GL03.ome.zarr  20160526_pos4_GL24.ome.zarr  20160912_Pos0_GL04.ome.zarr
20151218_switch8h_pos6_GL05.ome.zarr  20160526_pos0_GL17.ome.zarr  20160526_pos4_GL06.ome.zarr  20160526_pos4_GL25.ome.zarr  20160912_Pos0_GL05.ome.zarr
20151218_switch8h_pos6_GL06.ome.zarr  20160526_pos0_GL18.ome.zarr  20160526_pos4_GL09.ome.zarr  20160526_pos4_GL27.ome.zarr  20160912_Pos0_GL06.ome.zarr
20151218_switch8h_pos6_GL07.ome.zarr  20160526_pos0_GL19.ome.zarr  20160526_pos4_GL10.ome.zarr  20160526_pos5_GL03.ome.zarr  20160912_Pos0_GL07.ome.zarr
20151218_switch8h_pos6_GL09.ome.zarr  20160526_pos0_GL21.ome.zarr  20160526_pos4_GL11.ome.zarr  20160526_pos5_GL09.ome.zarr  20160912_Pos0_GL10.ome.zarr
20151218_switch8h_pos6_GL10.ome.zarr  20160526_pos0_GL22.ome.zarr  20160526_pos4_GL12.ome.zarr  20160526_pos5_GL12.ome.zarr  20160912_Pos0_GL11.ome.zarr
20160526_pos0_GL01.ome.zarr           20160526_pos0_GL23.ome.zarr  20160526_pos4_GL17.ome.zarr  20160526_pos5_GL13.ome.zarr  20160912_Pos0_GL12.ome.zarr
20160526_pos0_GL05.ome.zarr           20160526_pos0_GL24.ome.zarr  20160526_pos4_GL19.ome.zarr  20160912_Pos0_GL01.ome.zarr

@will-moore
Copy link
Member Author

will-moore commented Jul 13, 2023

Upload the 2nd lot of 49 images from batch1...

sudo /root/.aspera/cli/bin/ascp -P33001 -i /root/.aspera/cli/etc/asperaweb_id_dsa.openssh -d /data/ngff/idr0091 [email protected]:5f/xxxxxx
...
20160912_Pos0_GL03.ome.zarr.zip                               100%  437MB  309Mb/s    07:53    
20160912_Pos0_GL10.ome.zarr.zip                                100%  436MB  171Mb/s    08:05    
Completed: 19024387K bytes transferred in 485 seconds
 (320849K bits/sec), in 49 files, 1 directory.

@will-moore
Copy link
Member Author

Current progress....

Exported 127 of 342 Images.

(342 - 127) / 3 = 72 hours.

First batch of 100 images (2 failed and need re-exporting).
2nd batch of 242 images is running on idr-ftp server, into:

(base) [wmoore@idrftp-ftp ngff]$ ls -alh /data/ngff/idr0091_batch2/
total 4.0K
drwxrwxr-x. 29 wmoore wmoore 4.0K Jul 13 09:10 .
drwxr-xr-x.  9 wmoore root    208 Jul 13 00:52 ..
drwxrwxr-x.  6 wmoore wmoore  100 Jul 13 01:11 20160912_Pos0_GL14.pattern.ome.zarr
drwxrwxr-x.  6 wmoore wmoore  100 Jul 13 01:29 20160912_Pos0_GL15.pattern.ome.zarr
drwxrwxr-x.  6 wmoore wmoore  100 Jul 13 01:48 20160912_Pos0_GL16.pattern.ome.zarr
...

...and this should complete in 3 days.

@will-moore
Copy link
Member Author

Looks like all remaining zarrs exported OK...

$ ls /data/ngff/idr0091_batch2/ | wc
    242     242    9327

rename to remove .pattern and zip...

$ screen -r idr0091_zip
$ cd /data/ngff/idr0091_batch2/
$ for i in $(ls .); do mv $i `echo $i | sed 's/pattern.ome.zarr$/ome.zarr/'`; done
$ for i in */; do zip -mr "${i%/}.zip" "$i"; done

@will-moore will-moore moved this from convert all data to NGFF to Zip and upload to BioStudies in NGFF conversion Aug 7, 2023
@will-moore
Copy link
Member Author

will-moore commented Aug 15, 2023

Started uploading 242 zips...

$ screen -r idr0091_aspera
$ sudo /root/.aspera/cli/bin/ascp -P33001 -i /root/.aspera/cli/etc/asperaweb_id_dsa.openssh -d /data/ngff/idr0091_batch2/idr0091 [email protected]:5f/****

@will-moore
Copy link
Member Author

Checked size of zips on BioStudies. 20160912_Pos4_GL06.ome.zarr.zip is smaller than others - as this is only single timepoint: https://idr.openmicroscopy.org/webclient/?show=image-10648217

Use JS to list files from submissions page:

let names = [];
[].forEach.call(document.querySelectorAll("div [role='row'] .ag-cell[col-id='name']"), function(div) {
  names.push(div.innerHTML.trim());
});
console.log(names.join("\nidr0091/"));
console.log(names.length);

@will-moore will-moore moved this from Zip and upload to BioStudies to BioStudies Submission in NGFF conversion Aug 16, 2023
@will-moore will-moore removed the bug label Aug 17, 2023
@will-moore
Copy link
Member Author

Looks like the pixels hasn't been updated for this image:

idr=> select path, name from pixels where image = 10648757;
                                       path                                       |            name            
----------------------------------------------------------------------------------+----------------------------
 demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837/20200817-pattern | 20161207_Pos1_GL06.pattern

@will-moore
Copy link
Member Author

The sql doesn't contain OME/METADATA.ome.xml...

4053851.sql

begin;
    select mkngff_fileset(
      4053851,
      '22c41bb8-36e5-4386-9825-179b180d8238',
      'cdf35825-def1-4580-8d0b-9c349b8f78d6',
      'demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/',
      array[
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr/', '.zattrs', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr/', '.zgroup', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr/', '0', 'Directory'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr/0/', '.zarray', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr/', '1', 'Directory'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr/1/', '.zarray', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr/', '2', 'Directory'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr/2/', '.zarray', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr/', '3', 'Directory'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr/3/', '.zarray', 'application/octet-stream']
      ]::text[][]
    );
commit;

@will-moore
Copy link
Member Author

will-moore commented Aug 30, 2023

@joshmoore I see from https://github.com/IDR/omero-mkngff/blob/4c1e32bb32a7b92f427634630e6b552cbb186509/src/omero_mkngff/__init__.py#L108 that mkngff expects to find a METADATA.xml with which to update the pixels table, but in the case of omero-cli-zarr-exported NGFF data, we don't have METADATA.xml, so the pixels table won't get updated, leading to the errors above.

We'll need to pick another file to update the pixels table with.

I'll open an issue on the repo: IDR/omero-mkngff#7

@will-moore
Copy link
Member Author

Running this sql fixes the image

UPDATE pixels SET name = '.zattrs', path = 'demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr' where image in (select id from Image where fileset = 5287497);

http://localhost:1080/webclient/?show=image-10648757

Screenshot 2023-08-30 at 17 32 31

@will-moore
Copy link
Member Author

Actually, it seems that Bio-Formats is not fussy which file is referenced in pixels table.
After this, the image is still viewable...

idr=> UPDATE pixels SET name = '.zarray', path = 'demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr/3' where image in (select id from Image where fileset = 5287497);

@will-moore will-moore moved this from Data on Embassy s3 to create new Filesets in idr-next in NGFF conversion Aug 31, 2023
@will-moore
Copy link
Member Author

We now have all 342 Filesets available at https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/pages/S-BIAD852.html

Lets use next batch (not first 11 above) for testing IDR/omero-mkngff#8
Testing on idr0138-pilot this time...

Update to branch

conda activate mkngff
pip uninstall omero-mkngff
pip install 'omero-mkngff @ git+https://github.com/will-moore/omero-mkngff@always_update_pixels'
idr0091/20160912_Pos8_GL21.ome.zarr,S-BIAD852/03e12e59-d0cd-456a-99fa-c55dba56b029,4053336
idr0091/20160526_pos5_GL03.ome.zarr,S-BIAD852/040d0262-cf47-4ddd-b5c7-cad13bf98ada,4053438
idr0091/20161130_switch_IPTG1uM_Pos0_GL06.ome.zarr,S-BIAD852/043c117e-1b42-4691-88e6-87f0bd67917d,4053797
idr0091/20161021_Pos5_GL04.ome.zarr,S-BIAD852/053569d5-6ca3-40ec-a1f0-ba163109cc0f,4053499
idr0091/20151218_switch8h_pos5_GL13.ome.zarr,S-BIAD852/057a0a1c-96d1-4cc5-8e4f-c63ce4961080,4053189
idr0091/20160526_pos4_GL21.ome.zarr,S-BIAD852/058b1fac-f751-48d1-8e54-65ce179e1bdb,4053434
idr0091/20161007_Pos0_GL05.ome.zarr,S-BIAD852/05eb785a-9989-4e93-a18d-adf6dd60615b,4053374
idr0091/20161212_Pos0_GL19.ome.zarr,S-BIAD852/07608c5c-ea6d-4e93-9443-efe56fc27ea0,4053451
idr0091/20151204_switch6h_pos0_GL10.ome.zarr,S-BIAD852/07cabbd4-5946-4cf7-ba0f-2b29b60f1184,4053146
idr0091/20160912_Pos4_GL12.ome.zarr,S-BIAD852/08f9303d-b58d-49f9-9655-b858d7218443,4053316

Took about 8 minutes to generate each sql file...

...
BEGIN
 mkngff_fileset 
----------------
        5811622
(1 row)
COMMIT
UPDATE 0
BEGIN
 mkngff_fileset 
----------------
        5811623
(1 row)
COMMIT
UPDATE 0
BEGIN
 mkngff_fileset 
----------------
        5811624
(1 row)
COMMIT
UPDATE 0
BEGIN
 mkngff_fileset 
----------------
        5811625
(1 row)
COMMIT
UPDATE 0
BEGIN
 mkngff_fileset 
----------------
        5811626
(1 row)
COMMIT
UPDATE 0

Find image from last Fileset created and check pixels name, path...

idr=> select id from image where fileset =5811626;
    id    
----------
 10648222
(1 row)

idr=> select name, path from pixels where image = 10648222;
            name            |                                      path                                       
----------------------------+---------------------------------------------------------------------------------
 20160912_Pos4_GL12.pattern | demo_2/Blitz-0-Ice.ThreadPool.Server-2/2020-10/02/23-00-58.921/20200817-pattern
(1 row)

Realise that this didn't work as I've used the OLD Fileset ID to update pixels after the new Fileset is created.
Pushed fix to IDR/omero-mkngff@2314311

Then re-installed...

@will-moore
Copy link
Member Author

Try with fresh filesets...

idr0091/20161014_Pos1_GL02.ome.zarr,S-BIAD852/09369079-50e6-486e-9e72-40e7a0eef8ec,4053346
idr0091/20151218_switch8h_pos5_GL12.ome.zarr,S-BIAD852/0a1ff011-a78f-4b11-b8f5-c24ffd0972f6,4053188
idr0091/20151204_switch6h_pos0_GL20.ome.zarr,S-BIAD852/0a812f66-99dd-4280-bb59-7d04f7e75b39,4053152
idr0091/20160912_Pos0_GL16.ome.zarr,S-BIAD852/0a858893-dcb1-40f7-ac4d-86cd80d1587d,4053302

@will-moore
Copy link
Member Author

After running sql commands, get Image IDs from Fileset IDs..

idr=> select id from image where fileset in (5811627, 5811628, 5811629, 5811630)
idr-> ;
    id    
----------
 10648252
 10648094
 10648058
 10648208
(4 rows)

Check pixels...
=> select path, name from pixels where image = 10648252;
                                                       path                                                       |  name   
------------------------------------------------------------------------------------------------------------------+---------
 demo_2/Blitz-0-Ice.ThreadPool.Server-12/2020-10/03/02-02-34.667_mkngff/09369079-50e6-486e-9e72-40e7a0eef8ec.zarr | .zattrs

Image is directly viewable!

Screenshot 2023-08-31 at 15 55 59

@will-moore
Copy link
Member Author

will-moore commented Sep 22, 2023

Going to generate mkngff sql on ALL Filesets on idr0125-pilot. https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/pages/S-BIAD852.html
Above tests were run on idr0138-pilot, so DB doesn't have original Fileset IDs now).

idr0091.csv commit

for r in $(cat $IDRID.csv); do
  biapath=$(echo $r | cut -d',' -f2)
  uuid=$(echo $biapath | cut -d'/' -f2)
  fsid=$(echo $r | cut -d',' -f3)
  omero mkngff sql --symlink_repo /data/OMERO/ManagedRepository --secret=$SECRET $fsid "/bia-integrator-data/$biapath/$uuid.zarr" >> "$IDRID/$fsid.sql"
  psql -U omero -d idr -h $DBHOST -f "$IDRID/$fsid.sql"
done

NB: First 10 failed sql as had already been run on idr0125-pilot above - Need to sort out...

... took 25 mins in total.

@will-moore
Copy link
Member Author

Also saw another random fail for

idr0091/20161207_Pos1_GL06.ome.zarr,S-BIAD852/de82a935-3143-4ce3-9439-9ab986237b09,4053851

just caught this...

ERROR:  duplicate key value violates unique constraint "originalfile_repo_path_index"
DETAIL:  Key (repo, regexp_split_to_array((('/'::text || path) || name) || '/'::text, '/+'::text))=(cdf35825-def1-4580-8d0b-9c349b8f78d6, {"",demo_2,Blitz-0-Ice.ThreadPool.Server-16,2020-10,03,18-15-40.837_mkngff,de82a935-3143-4ce3-9439-9ab986237b09.zarr,.zattrs,""}) already exists.
CONTEXT:  SQL statement "insert into originalfile
          (id, permissions, creation_id, group_id, owner_id, update_id, mimetype, repo, path, name)
          values (nextval('seq_originalfile'), old_perms, new_event, old_group, old_owner, new_event,
            info[i][3], repo, info[i][1], uuid || info[i][2])
          returning id"
PL/pgSQL function mkngff_fileset(bigint,character varying,character varying,character varying,text[]) line 42 at SQL statement
ROLLBACK

@will-moore
Copy link
Member Author

Re-exporting on idr-ftp with pixels type fix as at ome/omero-cli-zarr#157 with merge branch

pip install 'omero-cli-zarr @ git+https://github.com/will-moore/omero-cli-zarr@merge_prs'

omero login
for id indo
  echo $id;
  omero zarr export Image:$id --name_by name;
done

@will-moore
Copy link
Member Author

Also exported "batch2" as above...

Renamed ALL 342 filesets to remove pattern

for i in $(ls .); do mv $i `echo $i | sed 's/pattern.ome.zarr$/ome.zarr/'`; done

Zip - not deleting...

$ for i in */; do zip -r "${i%/}.zip" "$i"; done

@will-moore
Copy link
Member Author

will-moore commented Jan 4, 2024

on idr-testing... (goofys is at /usr/bin/goofys)...

sudo mkdir /idr0091 && sudo /usr/bin/goofys --endpoint https://uk1s3.embassy.ebi.ac.uk/ -o allow_other idr0091 /idr0091

(base) [wmoore@test120-omeroreadwrite ~]$ ls /idr0091/zarr/
20151218_switch8h_pos6_GL01.pattern.ome.zarr

On idr-ftp, delete the existing (invalid) data and upload all images...

./mc rm --recursive uk1s3/idr0091/zarr/20151218_switch8h_pos6_GL01.pattern.ome.zarr

./mc cp -r /data/ngff/idr0091/idr0091/ uk1s3/idr0091/zarr
..._Pos1_GL26.ome.zarr/3/99/2/0/0: 96.73 GiB / 96.73 GiB ━━━━━━━━━

idr-testing...

(base) [wmoore@test120-omeroreadwrite ~]$ ls /idr0091/zarr/ | wc
    342     342   10722

E.g. looks good: https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr0091/zarr/20160526_pos0_GL01.ome.zarr

@will-moore
Copy link
Member Author

On idr-testing, let's try to update symlink to fix dtype issues...

Test with Image: 20151204_switch6h_pos0_GL01.pattern, ID: 10648046...
Existing failure:

$ python check_pixels.py --max-planes=sizeC Image:10648046
Start: 2024-01-04 22:12:48.846978
Checking Image:10648046
max_planes: sizeC
max_images: 0
0/1 Check Image:10648046 20151204_switch6h_pos0_GL01.pattern
ERROR:omero.gateway:Failed to getPlane() or getTile() from rawPixelsStore
Traceback (most recent call last):
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 7542, in getTiles
    convertedPlane = unpack(convertType, rawPlane)
struct.error: unpack requires a buffer of 174528 bytes

That Image has symlink like this:

(venv3) (base) [wmoore@test120-omeroreadwrite scripts]$ ls -alh !$
ls -alh /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-20/2020-10/02/14-41-08.138_mkngff/
total 8.0K
drwxr-sr-x.  2 omero-server omero-server  126 Nov  1 15:40 .
drwxrwsr-x. 22 omero-server omero-server 4.0K Oct 11 09:49 ..
lrwxrwxrwx.  1 omero-server omero-server  109 Oct 11 09:41 971f2809-c748-4259-8044-81ba6c774fdd.zarr -> /bia-integrator-data/S-BIAD852/971f2809-c748-4259-8044-81ba6c774fdd/971f2809-c748-4259-8044-81ba6c774fdd.zarr
-rw-r--r--.  1 omero-server omero-server   25 Nov  1 15:40 971f2809-c748-4259-8044-81ba6c774fdd.zarr.bfoptions

As omero-server...

rm /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-20/2020-10/02/14-41-08.138_mkngff/971f2809-c748-4259-8044-81ba6c774fdd.zarr

$ ln -s /idr0091/zarr/20151204_switch6h_pos0_GL01.ome.zarr /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-20/2020-10/02/14-41-08.138_mkngff/971f2809-c748-4259-8044-81ba6c774fdd.zarr

Symlink looks good:

$ ls -alh /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-20/2020-10/02/14-41-08.138_mkngff/
total 8.0K
drwxr-sr-x.  2 omero-server omero-server  126 Jan  4 22:32 .
drwxrwsr-x. 22 omero-server omero-server 4.0K Oct 11 09:49 ..
lrwxrwxrwx.  1 omero-server omero-server   50 Jan  4 22:32 971f2809-c748-4259-8044-81ba6c774fdd.zarr -> /idr0091/zarr/20151204_switch6h_pos0_GL01.ome.zarr
-rw-r--r--.  1 omero-server omero-server   25 Nov  1 15:40 971f2809-c748-4259-8044-81ba6c774fdd.zarr.bfoptions

Fixed!

$ python check_pixels.py --max-planes=sizeC Image:10648046
Start: 2024-01-04 22:37:35.901700
Checking Image:10648046
max_planes: sizeC
max_images: 0
0/1 Check Image:10648046 20151204_switch6h_pos0_GL01.pattern
End: 2024-01-04 22:38:05.999497

@will-moore
Copy link
Member Author

We can actually use IDR/idr-utils#54 script to do this, if we provide mapping.csv

Test with a single Image on idr-testing...
ID: 10648047, Name 20151204_switch6h_pos0_GL02.pattern
mapping.csv (existing symlink -> new target)

f12bdada-57eb-4fab-90ef-9655e4106497.zarr,20151204_switch6h_pos0_GL02.ome.zarr

As omero-server...

$ echo f12bdada-57eb-4fab-90ef-9655e4106497.zarr,20151204_switch6h_pos0_GL02.ome.zarr > idr0091_symlinks.csv

login as public user, then..

$ python /uod/idr/metadata/idr-utils/scripts/managed_repo_symlinks.py Image:10648047 /idr0091/zarr/ --repo /data/OMERO/ManagedRepository --fileset-mappings idr0091_symlinks.csv --report

fileset_dirs {'f12bdada-57eb-4fab-90ef-9655e4106497.zarr': '20151204_switch6h_pos0_GL02.ome.zarr'}

Fileset: 6314412 /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-20/2020-10/02/14-46-33.031_mkngff/
Render Image 10648047
fs_contents ['f12bdada-57eb-4fab-90ef-9655e4106497.zarr', 'f12bdada-57eb-4fab-90ef-9655e4106497.zarr.bfoptions']
Link from /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-20/2020-10/02/14-46-33.031_mkngff/f12bdada-57eb-4fab-90ef-9655e4106497.zarr to /idr0091/zarr/20151204_switch6h_pos0_GL02.ome.zarr
Symlink target not found: /idr0091/zarr/f12bdada-57eb-4fab-90ef-9655e4106497.zarr.bfoptions

Success!

$ python scripts/check_pixels.py Image:10648047 --max-planes=sizeC
Start: 2024-01-05 10:02:34.840789
Checking Image:10648047
max_planes: sizeC
max_images: 0
0/1 Check Image:10648047 20151204_switch6h_pos0_GL02.pattern
End: 2024-01-05 10:02:51.334694

@will-moore
Copy link
Member Author

will-moore commented Jan 5, 2024

On idr-testing, make idr0091_temp.csv which is idr0091.csv but modified to remove idr0091/ and S-BIAD on each row:

20161212_Pos0_GL14.ome.zarr,0008e8fc-721f-4465-8ff2-bebcce8bca8a,4053448
20161212_Pos1_GL19.ome.zarr,0044dd95-07e1-4937-938b-dde53ebbb719,4053473
20161007_Pos0_GL01.ome.zarr,00602c54-e3bd-406c-83fd-a802b58182b0,4053371
...

From that, we can make symlinks mapping file as above:

for r in $(cat idr0091_temp.csv); do
  name=$(echo $r | cut -d',' -f1)
  uuid=$(echo $r | cut -d',' -f2)
  echo "$uuid.zarr,$name" >> idr0091_symlinks.csv
done

Now we run managed_repo_symlinks for each Image...

for r in $(cat idr0091_imageids.csv); do
  python /uod/idr/metadata/idr-utils/scripts/managed_repo_symlinks.py Image:$r /idr0091/zarr/ --repo /data/OMERO/ManagedRepository --fileset-mappings idr0091_symlinks.csv --report
done
...
Fileset: 6314331 /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-21/2020-10/02/17-11-44.019_mkngff/
Render Image 10648074
fs_contents ['b80ac5e8-ff4d-4235-aaab-4adfeec0db48.zarr', 'b80ac5e8-ff4d-4235-aaab-4adfeec0db48.zarr.bfoptions']
Link from /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-21/2020-10/02/17-11-44.019_mkngff/b80ac5e8-ff4d-4235-aaab-4adfeec0db48.zarr to /idr0091/zarr/20151204_switch6h_pos5_GL12.ome.zarr
Symlink target not found: /idr0091/zarr/b80ac5e8-ff4d-4235-aaab-4adfeec0db48.zarr.bfoptions
...

EDIT... took about 15 mins to do 342 images...

...
Fileset: 6314271 /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2020-10/03/18-48-59.765_mkngff/
Render Image 10648771
fs_contents ['882f80fa-f40f-455b-b923-09dce086675b.zarr', '882f80fa-f40f-455b-b923-09dce086675b.zarr.bfoptions']
Link from /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2020-10/03/18-48-59.765_mkngff/882f80fa-f40f-455b-b923-09dce086675b.zarr to /idr0091/zarr/20161207_Pos1_GL26.ome.zarr
Symlink target not found: /idr0091/zarr/882f80fa-f40f-455b-b923-09dce086675b.zarr.bfoptions

@will-moore
Copy link
Member Author

will-moore commented Jan 5, 2024

python /uod/idr/metadata/idr-utils/scripts/check_pixels.py Project:1351 --max-planes=sizeC > /tmp/check_pixels_20240105_idr0091.log

All good 👍

(base) [wmoore@test120-omeroreadwrite ~]$ grep pattern /tmp/check_pixels_20240105_idr0091.log | wc
    342    1368   20719
(base) [wmoore@test120-omeroreadwrite ~]$ grep Error /tmp/check_pixels_20240105_idr0091.log | wc
      0       0       0

@will-moore
Copy link
Member Author

On idr-ftp, the zips created on 18th Dec (above) have been uploaded (not sure of exact date), following deletion of the old idr0091 folder on 16th Jan:
from history...

sudo /root/.aspera/cli/bin/ascp -P33001 -i /root/.aspera/cli/etc/asperaweb_id_dsa.openssh -d /data/ngff/idr0091/idr0091 [email protected]:5f/136e8d-e...

@will-moore
Copy link
Member Author

will-moore commented Feb 20, 2024

Images updated on https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/pages/S-BIAD852.html

New idr0090.csv file at IDR/mkngff_upgrade_scripts@0522d43 and IDR/mkngff_upgrade_scripts@c92c217 based on csv provided by Kola.

Running mkngff on idr-next (since this has the NGFF filesets that we wish to replace), using --fs_suffix=None so we don't add an extra _mkngff to Fileset paths.

(venv3) [wmoore@prod120-omeroreadwrite ~]$ git clone https://github.com/IDR/mkngff_upgrade_scripts.git
(venv3) [wmoore@prod120-omeroreadwrite ~]$ cd mkngff_upgrade_scripts/ngff_filesets/


for r in $(cat $IDRID.csv); do
  biapath=$(echo $r | cut -d',' -f2)
  uuid=$(echo $biapath | cut -d'/' -f2)
  fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
  omero mkngff sql $fsid --fs_suffix=None --clientpath="https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/$biapath/$uuid.zarr" "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"
done

EDIT: something went wrong as all the .sql files are empty!

Fixed the idr0091.csv (mising S-BIAD852/ from each row. Running again...

Pushed at IDR/mkngff_upgrade_scripts@03b02e7

Won't test these yet as idr-testing is being used for microservices testing.

@will-moore
Copy link
Member Author

On new pilot #675 (comment)

Ran all the mkngff SQL scripts... ending for idr0091 with...

...
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-12/2024-02/28/17-06-10.963
Creating dir at /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-12/2024-02/28/17-06-10.963_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-12/2024-02/28/17-06-10.963_mkngff/fdfdbb32-c1c2-4eec-8bbd-ffc3b729958b.zarr -> /bia-integrator-data/S-BIAD852/fdfdbb32-c1c2-4eec-8bbd-ffc3b729958b/fdfdbb32-c1c2-4eec-8bbd-ffc3b729958b.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-12/2024-02/28/17-06-10.963
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-12/2024-02/28/17-06-10.963_mkngff/fdfdbb32-c1c2-4eec-8bbd-ffc3b729958b.zarr.bfoptions
UPDATE 1
BEGIN
 mkngff_fileset
----------------
        6319888
(1 row)

COMMIT
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-11/2024-02/28/16-45-47.233
Creating dir at /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-11/2024-02/28/16-45-47.233_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-11/2024-02/28/16-45-47.233_mkngff/fe65c558-7099-48c4-8222-a5dc54da884a.zarr -> /bia-integrator-data/S-BIAD852/fe65c558-7099-48c4-8222-a5dc54da884a/fe65c558-7099-48c4-8222-a5dc54da884a.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-11/2024-02/28/16-45-47.233
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-11/2024-02/28/16-45-47.233_mkngff/fe65c558-7099-48c4-8222-a5dc54da884a.zarr.bfoptions
UPDATE 1
BEGIN
 mkngff_fileset
----------------
        6319889
(1 row)

COMMIT
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-6/2024-02/28/17-13-27.461
Creating dir at /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-6/2024-02/28/17-13-27.461_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-6/2024-02/28/17-13-27.461_mkngff/fe795db1-82c3-42b0-bbf8-5c4230bebdc9.zarr -> /bia-integrator-data/S-BIAD852/fe795db1-82c3-42b0-bbf8-5c4230bebdc9/fe795db1-82c3-42b0-bbf8-5c4230bebdc9.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-6/2024-02/28/17-13-27.461
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-6/2024-02/28/17-13-27.461_mkngff/fe795db1-82c3-42b0-bbf8-5c4230bebdc9.zarr.bfoptions

Last row in idr0091.csv at https://github.com/IDR/mkngff_upgrade_scripts/blob/1b64ab85fab537faafd62d6e19c01cf5ab32d11f/ngff_filesets/idr0091.csv
is
idr0091/20161212_Pos1_GL04.ome.zarr,S-BIAD852/fe795db1-82c3-42b0-bbf8-5c4230bebdc9,6314392

this image is http://localhost:1080/webclient/?show=image-10648367
and the Fileset ID is 4053461.

So, the idr0091.csv above is out of date, and was missed from the update at IDR/mkngff_upgrade_scripts@03b02e7

@will-moore
Copy link
Member Author

Try to clean-up (delete) the 342 Filesets we created above - last one ID 6319889.
First one ID = 6319548?

idr=> select id from Image where fileset=6319548;
 15150680
(1 row)

http://localhost:1080/webclient/?show=image-15150680 in webclient on pilot-idrngff is a tiff image but has wrong Fileset with 44e015db3952.zarr which corresponds to the first row of idr0090.csv.

For all Filesets 6319548 -> 6319889 we want to:

  • Find the original Fileset that it replaced.
  • Switch the Images back to the Original Fileset
  • Delete the new Fileset!

For Last Image/Fileset...

idr=> select child from FilesetAnnotationLink where parent=6319889;
  child   
----------
 38302449
idr=> select longvalue from Annotation where id=38302449;
 longvalue 
-----------
   6314392

This corresponds to the Fileset IDs updated in IDR/mkngff_upgrade_scripts@25c5372

So, NEW Fileset IDs are 6319548 -> 6319889
OLD Fileset IDs are in idr0091.csv before that commit.

First row...

  • Old Fileset ID 6314330 (from old idr0091.csv), New Fileset ID: 6319548 (to be deleted), Image: 15150680
update image set fileset = 6314330 where fileset = 6319548;
for i in {6319548..6319889}; do echo $i > idr0091_ids.csv; done

idr0091_ids.csv (removed first line 6319548,6314330 - already done update above.

NEW Fileset ID, OLD Fileset ID

6319549,6314371
6319550,6314286
6319551,6314139
...
6319887,6314352
6319888,6314232
6319889,6314392

Then

for r in $(cat idr0091_ids.csv); do
  newid=$(echo $r | cut -d',' -f1)
  oldid=$(echo $r | cut -d',' -f2)
  psql -U omero -d idr -h $DBHOST -c "update image set fileset = $oldid where fileset = $newid"
done


for r in $(cat idr0091_ids.csv); do
  newid=$(echo $r | cut -d',' -f1)
  echo $newid && omero delete Fileset:$newid
done

@will-moore will-moore moved this from check_pixels to NGFF studies in NGFF conversion May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: NGFF studies
Development

No branches or pull requests

5 participants