Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to control smoothing #74

Open
EmilioKolo opened this issue Aug 31, 2024 · 0 comments
Open

How to control smoothing #74

EmilioKolo opened this issue Aug 31, 2024 · 0 comments

Comments

@EmilioKolo
Copy link

Hi,

I'm using joypy to display the position of mutations associated with different diseases over a gene's length, and I need some tight control over the smoothing parameter of the ridgeline plot.

Example table with dummy data:

mutation position disease
p.Cys100Arg 100 D1
p.Gly120Arg 120 D1
p.Trp122Ser 122 D1
p.His60Gln 60 D2
p.Gln110His 110 D2
p.Arg65Cys 65 D3
p.Arg67Gln 67 D3
p.Ser70Trp 70 D3

When I run the joyplot function, I set up x_range to be the length of the entire gene (+/- ~10 to fine-tune the x axis). So, if the gene is 150 aminoacids long, I set x_range to [-10, 160]. E.g.

joypy.joyplot(dataframe_table, by='disease', column='position', x_range=(-10,160))

This leads to a way too smooth graph where far-away mutations appear to be "connected" by their frequencies, so D2 looks like a smooth cap over the entire graph, when it's only two mutations at positions 60 and 110.

ex_joyplot

I'm assuming that the default functionality searches for values around each point in the x axis and makes some transformation to smooth things out. I would like to have some control over that smoothing factor, since the influencing of a value in the x axis by a different value that's too far away is counterproductive for my purposes.

The data points can be quite sparse in many regions and for the most uncommon diseases so a histogram would not be ideal, since bins are fixed boxes with no relevance to the function of the gene, while every point being slightly influenced by nearby mutations has some biological relevance... up to a point.

I tried to use the kind="values" workaround but that would require some reshaping of the data and it may not look as good, but if this isn't a planned feature I will probably just do that.

Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant