Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JBrowse2 #6021

Closed
wants to merge 215 commits into from
Closed

Add JBrowse2 #6021

wants to merge 215 commits into from

Conversation

bgruening
Copy link
Member

@bgruening bgruening commented May 21, 2024

This PR is adding JBrowse2. It tries to merge #5695 with @abretaud work from #3997.

I tried my best to keep all the features from both branches, but I'm aware its not perfect.
However, I think it's in a solid state and was tested by various people from the VGP and ERGA projects, so I think we should deploy it properly.

It's a super exciting but complicated tool, so I assume we will see many updates over time as we did with JBrowse1.

One major improvement over version 1 is that we have now proper, more complicated tests. We use the location attribute in Galaxy tests now and create complicated tracks. Test data is currently here. The resulting zip file is then extensively "inspected". I think we can use this in the future to increase the test coverage.

A few other things in this PR:

  • HiC, Juicebox HiC, PAF, CRAM ... tracks added
  • The requested repeats from Jbrowse2 #5695 are included.
  • I added <required_files> so hopefully this helps with Pulsar
  • added creators with Ross, Helena and Anthony

Thanks to @abretaud, @hexylena and @fubar2 for working on this.

FOR CONTRIBUTOR:

  • - I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • - License permits unrestricted use (educational + commercial)
  • - This PR adds a new tool or tool collection
  • - This PR updates an existing tool or tool collection
  • - This PR does something else (explain below)

Copy link
Member

@hexylena hexylena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We, appreciate, you trying to resolve this. However, we still feel uncomfortable with this PR, as we've had (internal) discussions around what needs to be part of a jbrowse2 PR.

What you've done here seems to be to incorporate Ross' changes on top of the work in #3997, replacing it in many places.

That code as it was, did not fit with what the genome annotation community feels is important moving forward. I refer you to my review comments in #5697 (review), this PR does not address any of those concerns. Namely: it still uses JSON to configure tracks which is fragile and undesirable, when there are solid CLI commands that we can be sure will continue working going forward if they make internal structure changes. Ross reports (unfiled) issues with the CLI tools, we should work to resolve those for the health of the community and use limited exceptions where we need things sooner.

by making a separate PR you've taken away from the diff that would be visible, had you made a PR to Anthony's branch in #3997. So it is harder to see what is being added to or changed from that.

If you really want the community to accept this, I would request that you:

  • Close this PR and re-open it against anthony's branch. This is needed so we can see what changes are being made to that existing extremely solid base of a PR and discuss those on a case by case basis.
  • Revert to using the CLI, rather than JSON blobs that are not as future proof.

We need to review this in more detail, and I think there are other bits of code smell like the SSL context that give me pause. I've made some small review comments but it is not a full review, I'll save that for when we can see a reduced diff of what has changed against the existing PR.

tools/jbrowse2/jb2_urlconf.py Outdated Show resolved Hide resolved
Comment on lines +495 to +496
scontext.check_hostname = False
scontext.verify_mode = ssl.VerifyMode.CERT_NONE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disabling ssl hosname checking and disabling any certificate validation makes me extremely uncomfortable. Could you explain why this is necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a fallback. We had some users putting in URLs with invalid certs. By default, the cert is checked. If you prefer I can remove the fallback.

Comment on lines +927 to +949
trackDict = {
"type": "AlignmentsTrack",
"trackId": tId,
"name": trackData["name"],
"category": [
categ,
],
"assemblyNames": [trackData["assemblyNames"]],
"adapter": {
"type": "CramAdapter",
"cramLocation": {"uri": url},
"craiLocation": {
"uri": url + ".crai",
},
"sequenceAdapter": genseqad,
},
"displays": [
{
"type": "LinearAlignmentsDisplay",
"displayId": "%s-LinearAlignmentsDisplay" % tId,
},
],
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is quite fragile I fear

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, it works so far for the tests that have been done by the VGP/ERGA community. I hope we can make this more stable as we go.

Comment on lines +323 to +324
<option value="uri">Provide a URI (e.g. "https://..." for a genome reference as
tabix bgzip with predictable index file URI</option>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide an argument for why this is necessary despite violating the principles of galaxy (reproducibility, provenance tracking)? I cannot think of any other galaxy tool that allows you to do this, use data from an arbitrary URL. We won't know anything about where that dataset came from.

Additionally we then cannot make any guarantees that the JBrowse instance will actually work (in a reduced network situation)

If we were to keep this feature, then we should make a large disclaimer every where it is an option "Your jbrowse instance may not work if a network accessible track was used" so users do not think it is an issue with our tool or jbrowse.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik, this was requested by users to create JBrowse2 (relocateable) archives that point to public data. In this case it was the GenomeArk, where all assembly data from VGP ends up. I think the argumentation was, the data should not be stored on Galaxy for every, so they need to move. But at the same point, there is a need to add a JBrowse2 instance to the GenomeArk that still works. I hope I recalled the feature correctly.

@bgruening
Copy link
Member Author

I just updated to the latest version of JBrowse2, no issues so far, tests are passing 🎉

@hexylena the repeat you requested in #5697 (comment) is also integrated as far as I understood.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants