Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XClone BAF xclone.pp.xclonedata > cell_anno.index.name = None #24

Closed
stublakemore opened this issue Dec 4, 2024 · 9 comments
Closed

Comments

@stublakemore
Copy link

Dear Xianjie,

After spending a lot of time trying to resolve this issue myself, I wonder whether you can help me (again)!

Using Python3.9 and XClone v0.3.8 as per my other now resolved issues, when attempting to create the BAF_adata object using the xclone.pp.xclonedata function following the standard API documentation (https://xclone-cnv.readthedocs.io/en/latest/API.html#baf-module), I get the following error when running exactly this code:

BAF_adata = xclone.pp.xclonedata([AD_file, DP_file], 'BAF', mtx_barcodes_file, "hg19_genes", "Sample_6_BAF")

Traceback (most recent call last):
File "", line 1, in
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/xclone/preprocessing/_data.py", line 251, in xclonedata
cell_anno.index.name = None
UnboundLocalError: local variable 'cell_anno' referenced before assignment

Maybe there's something amiss with my .mtx objects, because I have 3 rather than two: cellSNP.tag.AD.mtx cellSNP.tag.DP.mtx cellSNP.tag.OTH.mtx, but running an adapted version of xclone.pp.xclonedata leads to the same error

BAF_adata = xclone.pp.xclonedata([AD_file, DP_file, OTH_file], 'BAF', mtx_barcodes_file, "hg19_genes", "Sample_6_BAF")
Traceback (most recent call last):
File "", line 1, in
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/xclone/preprocessing/_data.py", line 251, in xclonedata
cell_anno.index.name = None
UnboundLocalError: local variable 'cell_anno' referenced before assignment

I wondered whether I needed to initially load the features.tsv from the RDR analysis first as cell_anno, but that also kicks up the same error.

Following your intuition of looking at the log files, I inspected it, but don't seem to find anything that would mean my resolved xcltk issue actually isn't resolved? Please find the log file attached! You will surely identify something I can't!

Many thanks,

Stuart
pileup.log

@stublakemore
Copy link
Author

Dear Xianjie,

I would really appreciate it if you could give me your insight on this! I don't have any further developments to report regarding either a solution or the cause of the issue.

Cheers,

Stuart

@hxj5
Copy link
Collaborator

hxj5 commented Dec 13, 2024

Do you have any suggestions @Rongtingting?

@Rongtingting
Copy link
Collaborator

Hi Stuart,
@stublakemore
Could you try https://xclone-cnv.readthedocs.io/en/latest/preprocessing.html#baf-load this or attach the codes you used?
I am afraid that you did not specify the right path for the data files?

Bests,
Rongting

@stublakemore
Copy link
Author

Dear Rongting,

Thanks for getting in touch. The exact codes I used is in my initial issue query at the top of the message chain. Specifically, I wonder whether it's a problem with the xlctk baf preprocessing, because not only do I have these file names rather than the readthedocs names: cellSNP.tag.AD.mtx cellSNP.tag.DP.mtx cellSNP.tag.OTH.mtx rather than "AD.mtx" & "DP.mtx", but I also rather have cellSNP.samples.tsv rather than "barcodes.tsv". I've tried taking the barcodes.tsv file from the RDR pre-processing object, without success... Below my code:

Attempt 1:

data_dir = "/projects/mpi-sclc/sblakemo/Sample_6/Sample_6_baf/pileup/"
AD_file = data_dir + "cellSNP.tag.AD.mtx"
DP_file = data_dir + "cellSNP.tag.DP.mtx"
mtx_barcodes_file = data_dir + "cellSNP.samples.tsv"
BAF_adata = xclone.pp.xclonedata([AD_file, DP_file], 'BAF',
... mtx_barcodes_file,
... genome_mode = "hg19_genes")
Traceback (most recent call last):
File "", line 1, in
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/xclone/preprocessing/_data.py", line 262, in xclonedata
Xadata = AnnData(AD, obs=cell_anno, var=regions_anno) # dtype='int32'
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/anndata/_core/anndata.py", line 254, in init
self._init_as_actual(
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/anndata/_core/anndata.py", line 428, in _init_as_actual
self._var = _gen_dataframe(
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/functools.py", line 888, in wrapper
return dispatch(args[0].class)(*args, **kw)
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/anndata/_core/aligned_df.py", line 65, in _gen_dataframe_df
raise _mk_df_error(source, attr, length, len(anno))
ValueError: Observations annot. var must have as many rows as X has columns (1680), but has 32696 rows.

Attempt 2:

data_dir = "/projects/mpi-sclc/sblakemo/Sample_6/Sample_6_baf/pileup/"
AD_file = data_dir + "cellSNP.tag.AD.mtx"
DP_file = data_dir + "cellSNP.tag.DP.mtx"
mtx_barcodes_file = data_dir + "barcodes.tsv"
BAF_adata = xclone.pp.xclonedata([AD_file, DP_file], 'BAF',
... mtx_barcodes_file,
... genome_mode = "hg19_genes")
Traceback (most recent call last):
File "", line 1, in
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/xclone/preprocessing/_data.py", line 262, in xclonedata
Xadata = AnnData(AD, obs=cell_anno, var=regions_anno) # dtype='int32'
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/anndata/_core/anndata.py", line 254, in init
self._init_as_actual(
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/anndata/_core/anndata.py", line 428, in _init_as_actual
self._var = _gen_dataframe(
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/functools.py", line 888, in wrapper
return dispatch(args[0].class)(*args, **kw)
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/anndata/_core/aligned_df.py", line 65, in _gen_dataframe_df
raise _mk_df_error(source, attr, length, len(anno))
ValueError: Observations annot. var must have as many rows as X has columns (1680), but has 32696 rows.

Here also the head of the AD matrix, which shows 4 columns but only containing 3 columns worth of data...
head cellSNP.tag.AD.mtx
%%MatrixMarket matrix coordinate integer general
%
1680 757 74066
1 6 1
1 42 1
1 43 1
1 51 1
1 90 1
1 118 1
1 120 1

which is the same for the DP matrix
%%MatrixMarket matrix coordinate integer general
%
1680 757 151948
1 6 1
1 12 2
1 25 1
1 42 1
1 43 1
1 49 1
1 51 1

If you need anything else from me to be able to resolve the issue, let me know and I'll look to provide further information!

Cheers,

Stuart

@hxj5
Copy link
Collaborator

hxj5 commented Dec 25, 2024

Hi Stuart, thanks for the detailed information. It seems you were using the matrices in pileup folder (from xcltk baf) as inputs to XClone. Actually, the inputs to XClone should be the matrices in baf_fc folder, if you were running xcltk v0.3.0 or above.

@stublakemore
Copy link
Author

Hi Xianjie,

Thanks for the feedback! I'm running xcltk v0.3.1. Yes, I was using the matrices in the pileup folder from xcltk baf, since the output of my xcltk baf function (folder name Sample_6_baf) only gives me the following order structure:

phasing pileup Sample_6.genotype.vcf.gz scripts

Could it be, that the previous xcltk issue which I thought I had resolved still isn't functioning correctly (see here: hxj5/xcltk#10)? I can't imagine it's as simple as rerunning the xcltk baf function and having the folder output named to Sample_6_baf_fc...

I'm wondering whether this is more likely to be a hg19 to hg38 issue?

Just so you know, basefc xlctk & xclone analysis has been successful, so it's only the baf side of the analysis that I'm struggling with.

Thanks for the continued support!

Stuart

@stublakemore
Copy link
Author

Hi Xianjie,

I hope you had a good Christmas break. Just wondering whether you could help me continue troubleshooting this issue? Esspecially if it has more to do with my previous xcltk issue which I thought I had resolved?

Many thanks,

Stuart

@hxj5
Copy link
Collaborator

hxj5 commented Jan 9, 2025

Hi Stuart, I agree with you that you may first check the log files in the scripts/phasing/ folder and then fix the issue, as discussed in thread hxj5/xcltk#10, to get the baf_fc folder containing the sparce matrices.

@stublakemore
Copy link
Author

Hi Xianjie,

It seems that for the baf xcltk analysis, it tried to use eagle from my laptop rather than from the server...

/Users/stuartblakemore/Documents/SJ_Blakemore/xcltk_requirements/Eagle_v2.4.1/eagle: cannot execute binary file

So I think as long as I call Eagle from it's correct location on the HPC I should have resolved the issue... I consider this issue now closed! Thank you for your input once again!

Cheers,

Stuart

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants