XClone BAF xclone.pp.xclonedata > cell_anno.index.name = None #24

stublakemore · 2024-12-04T08:48:52Z

Dear Xianjie,

After spending a lot of time trying to resolve this issue myself, I wonder whether you can help me (again)!

Using Python3.9 and XClone v0.3.8 as per my other now resolved issues, when attempting to create the BAF_adata object using the xclone.pp.xclonedata function following the standard API documentation (https://xclone-cnv.readthedocs.io/en/latest/API.html#baf-module), I get the following error when running exactly this code:

BAF_adata = xclone.pp.xclonedata([AD_file, DP_file], 'BAF', mtx_barcodes_file, "hg19_genes", "Sample_6_BAF")

Traceback (most recent call last):
File "", line 1, in
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/xclone/preprocessing/_data.py", line 251, in xclonedata
cell_anno.index.name = None
UnboundLocalError: local variable 'cell_anno' referenced before assignment

Maybe there's something amiss with my .mtx objects, because I have 3 rather than two: cellSNP.tag.AD.mtx cellSNP.tag.DP.mtx cellSNP.tag.OTH.mtx, but running an adapted version of xclone.pp.xclonedata leads to the same error

BAF_adata = xclone.pp.xclonedata([AD_file, DP_file, OTH_file], 'BAF', mtx_barcodes_file, "hg19_genes", "Sample_6_BAF")
Traceback (most recent call last):
File "", line 1, in
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/xclone/preprocessing/_data.py", line 251, in xclonedata
cell_anno.index.name = None
UnboundLocalError: local variable 'cell_anno' referenced before assignment

I wondered whether I needed to initially load the features.tsv from the RDR analysis first as cell_anno, but that also kicks up the same error.

Following your intuition of looking at the log files, I inspected it, but don't seem to find anything that would mean my resolved xcltk issue actually isn't resolved? Please find the log file attached! You will surely identify something I can't!

Many thanks,

Stuart
pileup.log

stublakemore · 2024-12-11T11:39:17Z

Dear Xianjie,

I would really appreciate it if you could give me your insight on this! I don't have any further developments to report regarding either a solution or the cause of the issue.

Cheers,

Stuart

hxj5 · 2024-12-13T03:26:36Z

Do you have any suggestions @Rongtingting?

Rongtingting · 2024-12-22T07:07:57Z

Hi Stuart,
@stublakemore
Could you try https://xclone-cnv.readthedocs.io/en/latest/preprocessing.html#baf-load this or attach the codes you used?
I am afraid that you did not specify the right path for the data files?

Bests,
Rongting

stublakemore · 2024-12-23T08:46:50Z

Dear Rongting,

Thanks for getting in touch. The exact codes I used is in my initial issue query at the top of the message chain. Specifically, I wonder whether it's a problem with the xlctk baf preprocessing, because not only do I have these file names rather than the readthedocs names: cellSNP.tag.AD.mtx cellSNP.tag.DP.mtx cellSNP.tag.OTH.mtx rather than "AD.mtx" & "DP.mtx", but I also rather have cellSNP.samples.tsv rather than "barcodes.tsv". I've tried taking the barcodes.tsv file from the RDR pre-processing object, without success... Below my code:

Attempt 1:

data_dir = "/projects/mpi-sclc/sblakemo/Sample_6/Sample_6_baf/pileup/"
AD_file = data_dir + "cellSNP.tag.AD.mtx"
DP_file = data_dir + "cellSNP.tag.DP.mtx"
mtx_barcodes_file = data_dir + "cellSNP.samples.tsv"
BAF_adata = xclone.pp.xclonedata([AD_file, DP_file], 'BAF',
... mtx_barcodes_file,
... genome_mode = "hg19_genes")
Traceback (most recent call last):
File "", line 1, in
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/xclone/preprocessing/_data.py", line 262, in xclonedata
Xadata = AnnData(AD, obs=cell_anno, var=regions_anno) # dtype='int32'
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/anndata/_core/anndata.py", line 254, in init
self._init_as_actual(
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/anndata/_core/anndata.py", line 428, in _init_as_actual
self._var = _gen_dataframe(
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/functools.py", line 888, in wrapper
return dispatch(args[0].class)(*args, **kw)
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/anndata/_core/aligned_df.py", line 65, in _gen_dataframe_df
raise _mk_df_error(source, attr, length, len(anno))
ValueError: Observations annot. var must have as many rows as X has columns (1680), but has 32696 rows.

Attempt 2:

data_dir = "/projects/mpi-sclc/sblakemo/Sample_6/Sample_6_baf/pileup/"
AD_file = data_dir + "cellSNP.tag.AD.mtx"
DP_file = data_dir + "cellSNP.tag.DP.mtx"
mtx_barcodes_file = data_dir + "barcodes.tsv"
BAF_adata = xclone.pp.xclonedata([AD_file, DP_file], 'BAF',
... mtx_barcodes_file,
... genome_mode = "hg19_genes")
Traceback (most recent call last):
File "", line 1, in
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/xclone/preprocessing/_data.py", line 262, in xclonedata
Xadata = AnnData(AD, obs=cell_anno, var=regions_anno) # dtype='int32'
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/anndata/_core/anndata.py", line 254, in init
self._init_as_actual(
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/anndata/_core/anndata.py", line 428, in _init_as_actual
self._var = _gen_dataframe(
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/functools.py", line 888, in wrapper
return dispatch(args[0].class)(*args, **kw)
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/anndata/_core/aligned_df.py", line 65, in _gen_dataframe_df
raise _mk_df_error(source, attr, length, len(anno))
ValueError: Observations annot. var must have as many rows as X has columns (1680), but has 32696 rows.

Here also the head of the AD matrix, which shows 4 columns but only containing 3 columns worth of data...
head cellSNP.tag.AD.mtx
%%MatrixMarket matrix coordinate integer general
%
1680 757 74066
1 6 1
1 42 1
1 43 1
1 51 1
1 90 1
1 118 1
1 120 1

which is the same for the DP matrix
%%MatrixMarket matrix coordinate integer general
%
1680 757 151948
1 6 1
1 12 2
1 25 1
1 42 1
1 43 1
1 49 1
1 51 1

If you need anything else from me to be able to resolve the issue, let me know and I'll look to provide further information!

Cheers,

Stuart

hxj5 · 2024-12-25T10:11:01Z

Hi Stuart, thanks for the detailed information. It seems you were using the matrices in pileup folder (from xcltk baf) as inputs to XClone. Actually, the inputs to XClone should be the matrices in baf_fc folder, if you were running xcltk v0.3.0 or above.

stublakemore · 2024-12-29T06:55:06Z

Hi Xianjie,

Thanks for the feedback! I'm running xcltk v0.3.1. Yes, I was using the matrices in the pileup folder from xcltk baf, since the output of my xcltk baf function (folder name Sample_6_baf) only gives me the following order structure:

phasing pileup Sample_6.genotype.vcf.gz scripts

Could it be, that the previous xcltk issue which I thought I had resolved still isn't functioning correctly (see here: hxj5/xcltk#10)? I can't imagine it's as simple as rerunning the xcltk baf function and having the folder output named to Sample_6_baf_fc...

I'm wondering whether this is more likely to be a hg19 to hg38 issue?

Just so you know, basefc xlctk & xclone analysis has been successful, so it's only the baf side of the analysis that I'm struggling with.

Thanks for the continued support!

Stuart

stublakemore · 2025-01-09T05:31:44Z

Hi Xianjie,

I hope you had a good Christmas break. Just wondering whether you could help me continue troubleshooting this issue? Esspecially if it has more to do with my previous xcltk issue which I thought I had resolved?

Many thanks,

Stuart

hxj5 · 2025-01-09T06:48:26Z

Hi Stuart, I agree with you that you may first check the log files in the scripts/phasing/ folder and then fix the issue, as discussed in thread hxj5/xcltk#10, to get the baf_fc folder containing the sparce matrices.

stublakemore · 2025-01-09T07:56:48Z

Hi Xianjie,

It seems that for the baf xcltk analysis, it tried to use eagle from my laptop rather than from the server...

/Users/stuartblakemore/Documents/SJ_Blakemore/xcltk_requirements/Eagle_v2.4.1/eagle: cannot execute binary file

So I think as long as I call Eagle from it's correct location on the HPC I should have resolved the issue... I consider this issue now closed! Thank you for your input once again!

Cheers,

Stuart

stublakemore closed this as completed Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XClone BAF xclone.pp.xclonedata > cell_anno.index.name = None #24

XClone BAF xclone.pp.xclonedata > cell_anno.index.name = None #24

stublakemore commented Dec 4, 2024

stublakemore commented Dec 11, 2024

hxj5 commented Dec 13, 2024

Rongtingting commented Dec 22, 2024

stublakemore commented Dec 23, 2024

hxj5 commented Dec 25, 2024

stublakemore commented Dec 29, 2024

stublakemore commented Jan 9, 2025

hxj5 commented Jan 9, 2025

stublakemore commented Jan 9, 2025

XClone BAF xclone.pp.xclonedata > cell_anno.index.name = None #24

XClone BAF xclone.pp.xclonedata > cell_anno.index.name = None #24

Comments

stublakemore commented Dec 4, 2024

stublakemore commented Dec 11, 2024

hxj5 commented Dec 13, 2024

Rongtingting commented Dec 22, 2024

stublakemore commented Dec 23, 2024

hxj5 commented Dec 25, 2024

stublakemore commented Dec 29, 2024

stublakemore commented Jan 9, 2025

hxj5 commented Jan 9, 2025

stublakemore commented Jan 9, 2025