Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Concatenation warning that stops all_statistics.csv from being produced #969

Closed
7 tasks done
llwiggins opened this issue Oct 16, 2024 · 6 comments · Fixed by #972
Closed
7 tasks done

[Bug]: Concatenation warning that stops all_statistics.csv from being produced #969

llwiggins opened this issue Oct 16, 2024 · 6 comments · Fixed by #972
Labels
bug Something isn't working

Comments

@llwiggins
Copy link
Collaborator

llwiggins commented Oct 16, 2024

Checklist

  • Re-run analysis with topostats process --core 1.
  • Describe the bug.
  • Include the configuration file.
  • Copy of the output.
  • The exact command that failed. This is what you typed at the command line, including any options.
  • TopoStats version, this is reported by topostats --version
  • Operating System and Python Version

Describe the bug

Both @MaxGamill-Sheffield and I keep running into the same concatenation warning when running topostats process. This warning occurs right at the end of processing and results in no all_statistics.csv being output. It looks as though the issue arises from deprecation of the function that originally concatenated empty or all NA data frames, and the suggested resolution is to exclude these prior to concatenation.

Copy of the output

Traceback (most recent call last):
  File "/Users/laura/miniconda3/envs/topoly/bin/topostats", line 8, in <module>
    sys.exit(entry_point())
             ^^^^^^^^^^^^^
  File "/Users/laura/TopoStats/topostats/entry_point.py", line 386, in entry_point
    args.func(args)
  File "/Users/laura/TopoStats/topostats/run_topostats.py", line 171, in run_topostats
    results = pd.concat(results.values())
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/laura/miniconda3/envs/topoly/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 395, in concat
    return op.get_result()
           ^^^^^^^^^^^^^^^
  File "/Users/laura/miniconda3/envs/topoly/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 684, in get_result
    new_data = concatenate_managers(
               ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/laura/miniconda3/envs/topoly/lib/python3.11/site-packages/pandas/core/internals/concat.py", line 189, in concatenate_managers
    values = _concatenate_join_units(join_units, copy=copy)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/laura/miniconda3/envs/topoly/lib/python3.11/site-packages/pandas/core/internals/concat.py", line 491, in _concatenate_join_units
    warnings.warn(
FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.

Include the configuration file

base_dir: /Volumes/shared/pyne_group/Shared/AFM_Data/Metallodrugs/data # Directory in which to search for data files
output_dir: /Volumes/shared/pyne_group/Shared/AFM_Data/Metallodrugs/output # Directory to output results to
log_level: info # Verbosity of output. Options: warning, error, info, debug
cores: 1 # Number of CPU cores to utilise for processing multiple files simultaneously.
file_ext: .spm # File extension of the data files.
loading:
  channel: Height # Channel to pull data from in the data files.
filter:
  run: true # Options : true, false
  row_alignment_quantile: 0.5 # lower values may improve flattening of larger features
  threshold_method: std_dev # Options : otsu, std_dev, absolute
  otsu_threshold_multiplier: 1.0
  threshold_std_dev:
    below: 10.0 # Threshold for data below the image background
    above: 1.0 # Threshold for data above the image background
  threshold_absolute:
    below: -1.0 # Threshold for data below the image background
    above: 1.0 # Threshold for data above the image background
  gaussian_size: 1.0121397464510862 # Gaussian blur intensity in px
  gaussian_mode: nearest
  # Scar remvoal parameters. Be careful with editing these as making the algorithm too sensitive may
  # result in ruining legitimate data.
  remove_scars:
    run: true
    removal_iterations: 2 # Number of times to run scar removal.
    threshold_low: 0.250 # lower values make scar removal more sensitive
    threshold_high: 0.666 # lower values make scar removal more sensitive
    max_scar_width: 4 # Maximum thichness of scars in pixels.
    min_scar_length: 16 # Minimum length of scars in pixels.
grains:
  run: true # Options : true, false
  # Thresholding by height
  threshold_method: std_dev # Options : std_dev, otsu, absolute, unet
  otsu_threshold_multiplier: 1.0
  threshold_std_dev:
    below: 10.0 # Threshold for grains below the image background
    above: 1.0 # Threshold for grains above the image background
  threshold_absolute:
    below: -1.0 # Threshold for grains below the image background
    above: 1.0 # Threshold for grains above the image background
  direction: above # Options: above, below, both (defines whether to look for grains above or below thresholds or both)
  # Thresholding by area
  smallest_grain_size_nm2: 50 # Size in nm^2 of tiny grains/blobs (noise) to remove, must be > 0.0
  absolute_area_threshold:
    above: [300, 30000] # above surface [Low, High] in nm^2 (also takes null)
    below: [null, null] # below surface [Low, High] in nm^2 (also takes null)
  remove_edge_intersecting_grains: true # Whether or not to remove grains that touch the image border
  unet_config:
    model_path: null # Path to a trained U-Net model
    grain_crop_padding: 2 # Padding to apply to the grain crop bounding box
    upper_norm_bound: 5.0 # Upper bound for normalisation of input data. This should be slightly higher than the maximum desired / expected height of grains.
    lower_norm_bound: -1.0 # Lower bound for normalisation of input data. This should be slightly lower than the minimum desired / expected height of the background.
grainstats:
  run: true # Options : true, false
  edge_detection_method: binary_erosion # Options: canny, binary erosion. Do not change this unless you are sure of what this will do.
  cropped_size: -1 # Length (in nm) of square cropped images (can take -1 for grain-sized box)
  extract_height_profile: true # Extract height profiles along maximum feret of molecules
disordered_tracing:
  run: true # Options : true, false
  min_skeleton_size: 10 # Minimum number of pixels in a skeleton for it to be retained.
  pad_width: 1 # Pixels to pad grains by when tracing
  mask_smoothing_params:
    gaussian_sigma: 2 # Gaussian smoothing parameter 'sigma' in pixels.
    dilation_iterations: 2 # Number of dilation iterations to use for grain smoothing.
    holearea_min_max: [0, null] # Range (min, max) of a hole area in nm to refil in the smoothed masks.
  skeletonisation_params:
    method: topostats # Options : zhang | lee | thin | topostats
    height_bias: 0.6 # Percentage of lowest pixels to remove each skeletonisation iteration. 1 equates to zhang.
  pruning_params:
    method: topostats # Method to clean branches of the skeleton. Options : topostats
    max_length: 10.0 # Maximum length in nm to remove a branch containing an endpoint.
    height_threshold: # The height to remove branches below.
    method_values: mid # The method to obtain a branch's height for pruning. Options : min | median | mid.
    method_outlier: mean_abs # The method to prune branches based on height. Options : abs | mean_abs | iqr.
nodestats:
  run: true # Options : true, false
  node_joining_length: 7.0 # The distance over which to join nearby crossing points.
  node_extend_dist: 14.0 # The distance over which to join nearby odd-branched nodes.
  branch_pairing_length: 20.0 # The length from the crossing point to pair and trace, obtaining FWHM's.
  pair_odd_branches: false # Whether to try and pair odd-branched nodes. Options: true and false.
  pad_width: 1 # Pixels to pad grains by when tracing (should be the same as disordered_tracing).
ordered_tracing:
  run: true
  ordering_method: nodestats # The method of ordering the disordered traces.
  pad_width: 1 # Pixels to pad grains by when tracing (should be the same as disordered_tracing).
splining:
  run: true # Options : true, false
  method: "rolling_window" # Options : "spline", "rolling_window"
  rolling_window_size: 20.0e-9 # size in nm of the rolling window.
  spline_step_size: 7.0e-9 # The sampling rate of the spline in metres.
  spline_linear_smoothing: 5.0 # The amount of smoothing to apply to linear splines.
  spline_circular_smoothing: 5.0 # The amount of smoothing to apply to circular splines.
  spline_degree: 3 # The polynomial degree of the spline.
#  cores: 1 # Number of cores to use for parallel processing
plotting:
  run: true # Options : true, false
  style: topostats.mplstyle # Options : topostats.mplstyle or path to a matplotlibrc params file
  savefig_format: null # Options : null, png, svg or pdf. tif is also available although no metadata will be saved. (defaults to png) See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html
  savefig_dpi: 600 # Options : null (defaults to the value in topostats/plotting_dictionary.yaml), see https://afm-spm.github.io/TopoStats/main/configuration.html#further-customisation and https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html
  pixel_interpolation: null # Options : https://matplotlib.org/stable/gallery/images_contours_and_fields/interpolation_methods.html
  image_set: core # Options : all, core
  zrange: [-2, 5] # low and high height range for core images (can take [null, null]). low <= high
  colorbar: true # Options : true, false
  axes: true # Options : true, false (due to off being a bool when parsed)
  num_ticks: [null, null] # Number of ticks to have along the x and y axes. Options : null (auto) or integer > 1
  cmap: null # Colormap/colourmap to use (default is 'nanoscope' which is used if null, other options are 'afmhot', 'viridis' etc.)
  mask_cmap: blue_purple_green # Options : blu, jet_r and any in matplotlib
  histogram_log_axis: false # Options : true, false
summary_stats:
  run: true # Whether to make summary plots for output data
  config: null

To Reproduce

No response

TopoStats Version

Git main branch

Python Version

3.11

Operating System

MacOS M1/M2 (post-2021)

Python Packages

No response

@llwiggins llwiggins added the bug Something isn't working label Oct 16, 2024
@MaxGamill-Sheffield
Copy link
Collaborator

Just adding that when I found this issue, I had all dnatracing, plotting and summary_stats turned off.

@ns-rse
Copy link
Collaborator

ns-rse commented Oct 16, 2024

Pandas version at the very least would be useful to know as its pd.concat() that does the concatenation.

pip show pandas

@llwiggins
Copy link
Collaborator Author

pandas v2.2.3

@MaxGamill-Sheffield
Copy link
Collaborator

Small update where I got with this earlier using Laura's smaller test set.

  • Managed to replicate on two images (must be >= two) for the dataframe concatenation mismatch). These are attached in minimal_topostats_example.zip below. minimal_topostats_example.zip
  • Fails on the concatenation of each images grainstats.csv / result (different to the failure of imagestats.csv I saw earlier).
  • Saving the csv's (attatched) and writing a script (below) to concat the dataframe from the values of a dictionary, in a similar way to run_topostats.py causes no issue in the same environment 🙃. saved_csvs.zip.
import pandas as pd
import numpy as np
from collections import defaultdict


base = "/Users/Maxgamill/Desktop/Uni/PhD/topo_test/TopoStats/concat/"
img1 = "20230526_puc19_tube1_24hr_mg.0_00003"
img2 = "20230526_puc19_tube1_24hr_mg.0_00002"

results = defaultdict()
for img in [img1, img2]:
    df = pd.read_csv(base+img+".csv")
    results[img] = df

total_df = pd.concat(results.values())
  • I've made branch maxgamill-sheffield/969-concat-issue in which I've attempted to make a few fixes:
    • The folder_<stats>.csv was being overwritten by the dis and mol stats so that has been modified to produce all folder stats.
    • The error / failed outputs of the better tracing pipeline now add the columns that should have been added should it have succeeded.
  • Thought it might have been because of columns that are present in one but not the other due to failure but alas nope.

Package list:

absl-py                   2.1.0                    pypi_0    pypi
accessible-pygments       0.0.5                    pypi_0    pypi
afmreader                 0.0.1                    pypi_0    pypi
alabaster                 0.7.16                   pypi_0    pypi
appnope                   0.1.4                    pypi_0    pypi
argparse                  1.4.0                    pypi_0    pypi
astroid                   3.1.0                    pypi_0    pypi
asttokens                 2.4.1                    pypi_0    pypi
astunparse                1.6.3                    pypi_0    pypi
babel                     2.16.0                   pypi_0    pypi
backcall                  0.2.0                    pypi_0    pypi
beautifulsoup4            4.12.3                   pypi_0    pypi
biopython                 1.84                     pypi_0    pypi
black                     24.4.2                   pypi_0    pypi
bzip2                     1.0.8                h93a5062_5    conda-forge
ca-certificates           2024.2.2             hf0a4a13_0    conda-forge
certifi                   2024.7.4                 pypi_0    pypi
cfgv                      3.4.0                    pypi_0    pypi
charset-normalizer        3.3.2                    pypi_0    pypi
cheap-repr                0.5.1                    pypi_0    pypi
click                     8.1.7                    pypi_0    pypi
cloudpickle               3.0.0                    pypi_0    pypi
comm                      0.2.2                    pypi_0    pypi
contourpy                 1.2.1                    pypi_0    pypi
coverage                  7.5.1                    pypi_0    pypi
cycler                    0.12.1                   pypi_0    pypi
debugpy                   1.8.1                    pypi_0    pypi
decorator                 5.1.1                    pypi_0    pypi
dill                      0.3.8                    pypi_0    pypi
distlib                   0.3.8                    pypi_0    pypi
docutils                  0.20.1                   pypi_0    pypi
entrypoints               0.4                      pypi_0    pypi
et-xmlfile                1.1.0                    pypi_0    pypi
exceptiongroup            1.2.1                    pypi_0    pypi
execnet                   2.1.1                    pypi_0    pypi
executing                 2.0.1                    pypi_0    pypi
filelock                  3.14.0                   pypi_0    pypi
filetype                  1.2.0                    pypi_0    pypi
flatbuffers               24.3.25                  pypi_0    pypi
fonttools                 4.51.0                   pypi_0    pypi
gast                      0.6.0                    pypi_0    pypi
google-pasta              0.2.0                    pypi_0    pypi
grpcio                    1.66.2                   pypi_0    pypi
h5py                      3.11.0                   pypi_0    pypi
identify                  2.5.36                   pypi_0    pypi
idna                      3.8                      pypi_0    pypi
igor2                     0.5.6                    pypi_0    pypi
imageio                   2.34.1                   pypi_0    pypi
imagesize                 1.4.1                    pypi_0    pypi
iniconfig                 2.0.0                    pypi_0    pypi
ipykernel                 6.29.4                   pypi_0    pypi
ipython                   8.24.0                   pypi_0    pypi
isort                     5.13.2                   pypi_0    pypi
jedi                      0.19.1                   pypi_0    pypi
jinja2                    3.1.4                    pypi_0    pypi
joblib                    1.4.2                    pypi_0    pypi
jupyter-client            7.4.9                    pypi_0    pypi
jupyter-core              5.7.2                    pypi_0    pypi
keras                     3.5.0                    pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
lazy-loader               0.4                      pypi_0    pypi
libclang                  18.1.1                   pypi_0    pypi
libffi                    3.4.2                h3422bc3_5    conda-forge
libsqlite                 3.45.3               h091b4b1_0    conda-forge
libzlib                   1.2.13               h53f4e23_5    conda-forge
llvmlite                  0.43.0                   pypi_0    pypi
loguru                    0.7.2                    pypi_0    pypi
markdown                  3.7                      pypi_0    pypi
markdown-it-py            3.0.0                    pypi_0    pypi
markupsafe                2.1.5                    pypi_0    pypi
matplotlib                3.8.4                    pypi_0    pypi
matplotlib-inline         0.1.7                    pypi_0    pypi
mccabe                    0.7.0                    pypi_0    pypi
mdit-py-plugins           0.4.1                    pypi_0    pypi
mdurl                     0.1.2                    pypi_0    pypi
ml-dtypes                 0.4.1                    pypi_0    pypi
mypy-extensions           1.0.0                    pypi_0    pypi
myst-parser               4.0.0                    pypi_0    pypi
namex                     0.0.8                    pypi_0    pypi
ncurses                   6.4.20240210         h078ce10_0    conda-forge
nest-asyncio              1.6.0                    pypi_0    pypi
networkx                  3.3                      pypi_0    pypi
nodeenv                   1.8.0                    pypi_0    pypi
numba                     0.60.0                   pypi_0    pypi
numpy                     1.26.4                   pypi_0    pypi
numpydoc                  1.8.0                    pypi_0    pypi
numpyencoder              0.3.0                    pypi_0    pypi
openpyxl                  3.1.5                    pypi_0    pypi
openssl                   3.3.0                h0d3ecfb_0    conda-forge
opt-einsum                3.4.0                    pypi_0    pypi
optree                    0.12.1                   pypi_0    pypi
packaging                 24.0                     pypi_0    pypi
pandas                    2.2.2                    pypi_0    pypi
parso                     0.8.4                    pypi_0    pypi
pathspec                  0.12.1                   pypi_0    pypi
pexpect                   4.9.0                    pypi_0    pypi
pickleshare               0.7.5                    pypi_0    pypi
pillow                    10.3.0                   pypi_0    pypi
pip                       24.2                     pypi_0    pypi
platformdirs              4.2.1                    pypi_0    pypi
pluggy                    1.5.0                    pypi_0    pypi
pockets                   0.9.1                    pypi_0    pypi
pre-commit                3.7.0                    pypi_0    pypi
prompt-toolkit            3.0.43                   pypi_0    pypi
protobuf                  4.25.5                   pypi_0    pypi
psutil                    5.9.8                    pypi_0    pypi
ptyprocess                0.7.0                    pypi_0    pypi
pure-eval                 0.2.2                    pypi_0    pypi
pydata-sphinx-theme       0.15.4                   pypi_0    pypi
pyfiglet                  1.0.2                    pypi_0    pypi
pygments                  2.18.0                   pypi_0    pypi
pylint                    3.1.0                    pypi_0    pypi
pyparsing                 3.1.2                    pypi_0    pypi
pyspm                     0.6.1                    pypi_0    pypi
pytest                    7.4.4                    pypi_0    pypi
pytest-cov                5.0.0                    pypi_0    pypi
pytest-durations          1.2.0                    pypi_0    pypi
pytest-github-actions-annotate-failures 0.2.0                    pypi_0    pypi
pytest-lazy-fixture       0.6.3                    pypi_0    pypi
pytest-mpl                0.17.0                   pypi_0    pypi
pytest-regtest            2.1.1                    pypi_0    pypi
pytest-testmon            2.1.1                    pypi_0    pypi
pytest-xdist              3.6.1                    pypi_0    pypi
python                    3.10.14         h2469fbe_0_cpython    conda-forge
python-dateutil           2.9.0.post0              pypi_0    pypi
pytz                      2024.1                   pypi_0    pypi
pyupgrade                 3.15.2                   pypi_0    pypi
pyyaml                    6.0.1                    pypi_0    pypi
pyzmq                     26.0.3                   pypi_0    pypi
readline                  8.2                  h92ec313_1    conda-forge
requests                  2.32.3                   pypi_0    pypi
rich                      13.9.1                   pypi_0    pypi
ruamel-yaml               0.18.6                   pypi_0    pypi
ruamel-yaml-clib          0.2.8                    pypi_0    pypi
schema                    0.7.7                    pypi_0    pypi
scikit-image              0.23.2                   pypi_0    pypi
scikit-learn              1.4.2                    pypi_0    pypi
scipy                     1.13.0                   pypi_0    pypi
seaborn                   0.13.2                   pypi_0    pypi
setuptools                69.5.1             pyhd8ed1ab_0    conda-forge
six                       1.16.0                   pypi_0    pypi
skan                      0.11.1                   pypi_0    pypi
snakeviz                  2.2.0                    pypi_0    pypi
snoop                     0.4.3                    pypi_0    pypi
snowballstemmer           2.2.0                    pypi_0    pypi
soupsieve                 2.6                      pypi_0    pypi
sphinx                    7.4.7                    pypi_0    pypi
sphinx-autoapi            3.2.1                    pypi_0    pypi
sphinx-autodoc-typehints  2.2.3                    pypi_0    pypi
sphinx-markdown-tables    0.0.17                   pypi_0    pypi
sphinx-multiversion       0.2.4                    pypi_0    pypi
sphinx-rtd-theme          2.0.0                    pypi_0    pypi
sphinxcontrib-applehelp   2.0.0                    pypi_0    pypi
sphinxcontrib-devhelp     2.0.0                    pypi_0    pypi
sphinxcontrib-htmlhelp    2.1.0                    pypi_0    pypi
sphinxcontrib-jquery      4.1                      pypi_0    pypi
sphinxcontrib-jsmath      1.0.1                    pypi_0    pypi
sphinxcontrib-mermaid     0.9.2                    pypi_0    pypi
sphinxcontrib-napoleon    0.7                      pypi_0    pypi
sphinxcontrib-qthelp      2.0.0                    pypi_0    pypi
sphinxcontrib-serializinghtml 2.0.0                    pypi_0    pypi
spyder-kernels            2.3.3                    pypi_0    pypi
stack-data                0.6.3                    pypi_0    pypi
tabulate                  0.9.0                    pypi_0    pypi
tensorboard               2.17.1                   pypi_0    pypi
tensorboard-data-server   0.7.2                    pypi_0    pypi
tensorflow                2.17.0                   pypi_0    pypi
tensorflow-io-gcs-filesystem 0.37.1                   pypi_0    pypi
termcolor                 2.4.0                    pypi_0    pypi
threadpoolctl             3.5.0                    pypi_0    pypi
tifffile                  2024.5.3                 pypi_0    pypi
tk                        8.6.13               h5083fa2_1    conda-forge
tokenize-rt               5.2.0                    pypi_0    pypi
tomli                     2.0.1                    pypi_0    pypi
tomlkit                   0.12.5                   pypi_0    pypi
toolz                     0.12.1                   pypi_0    pypi
topofileformats           0.1.0                    pypi_0    pypi
topoly                    1.0.2                    pypi_0    pypi
topostats                 2.2.2.dev896+gcc66a1fa9          pypi_0    pypi
tornado                   6.4                      pypi_0    pypi
tqdm                      4.66.4                   pypi_0    pypi
traitlets                 5.14.3                   pypi_0    pypi
typing-extensions         4.11.0                   pypi_0    pypi
tzdata                    2024.1                   pypi_0    pypi
urllib3                   2.2.2                    pypi_0    pypi
virtualenv                20.26.1                  pypi_0    pypi
wcwidth                   0.2.13                   pypi_0    pypi
werkzeug                  3.0.4                    pypi_0    pypi
wheel                     0.43.0             pyhd8ed1ab_1    conda-forge
wrapt                     1.16.0                   pypi_0    pypi
wurlitzer                 3.1.0                    pypi_0    pypi
xz                        5.2.6                h57fd34a_0    conda-forge

@ns-rse
Copy link
Collaborator

ns-rse commented Oct 21, 2024

Re-opening as #973 is still open.

@ns-rse ns-rse reopened this Oct 21, 2024
@ns-rse
Copy link
Collaborator

ns-rse commented Oct 24, 2024

Closed by #973

@ns-rse ns-rse closed this as completed Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants