Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Howto annotate to a database #1331

Closed
satra opened this issue Jul 19, 2014 · 9 comments
Closed

Feature request: Howto annotate to a database #1331

satra opened this issue Jul 19, 2014 · 9 comments

Comments

@satra
Copy link

satra commented Jul 19, 2014

i'm trying to extract some information from publications (i've appended the list of things i'm going to be searching for). what i would really like to do is to open h.alpha as i view the pdf (which i can) and annotate using these "terms" as i go along (also i can). what would be supercool is if all of this info could itself be turned into an RDF dataset automatically that i can then render/play with. what would also be nice is if i could use a default tag as i do this, so that a tleast i can collect all these annotations.

right now, i'm using a spreadsheet to collect this info, and i think we can do better.

any thoughts/suggestions would be much appreciated.

- sample size (N)
- disorder 
- clinical_application (Y/N)
- used_cross_validation (Y/N/not specified)
  - type_of_cross_validation (Leave One Out, Split-Half, ...)
  - had_separate_validation_set (Y/N)
- used_nested_cross_validation
  - for feature selection
  - for parameter selection
  - for model selection
- learning_model (Linear/Logistic Regression, SVM, SVR, RandomForest, ...)
- estimated/determined_feature_importance
- loss_function 
- metric (explained_variance, Pearson correlation, ...)
- significance_testing (t-test, correlation p-value, permutation test, ...)
@satra satra changed the title Feature request Howto -- annotate to a database Feature request: Howto annotate to a database Jul 19, 2014
@BigBlueHat
Copy link
Contributor

@satra, providing OpenAnnotation.org Data Model support is very much in the works! @Treora showed me a promising demo on Friday that adds the necessary JSON-LD @context to map the current JSON output to Open Annotation's RDF model.

Once that's done, you'd be able to query the stream, ask it for the JSON-LD representation, and pipe that into pyld or similar.

Sound OK for starters? 😃

@BigBlueHat
Copy link
Contributor

@satra @Treora's JSON-LD annotator-store PR is up if you're interested in what the JSON output will look like if/when that gets merged. Hope it's heading the right direction for what you're after!

@satra
Copy link
Author

satra commented Jul 28, 2014

@BigBlueHat - that's great. i think my question is a little different. here is an example:

say i have a document that i want to annotate, but what i really want to do is extract the things that current NLP would find hard to do. so i might highlight a sentence and annotate as:

keeping with the json-ld theme:

{
"@context" : {
    "N" : { 
      "@id": "http://nidm.nidash.org/foo/num_participants",
      "@type": "xsd:integer",
}
},
"N": 42
}

what i would now like is for this annotation, not to be stored as a string, but actually hooked up as a content so that i can search for all annotations that have num_participants.

as an extended ui, i might say let's take all the fields (in the original post) and put them in a dropdown ui for the h.alpha annotator. as part of the annotation i can select a field and add value, or directly refer to the annotation as value.

does that make sense?

@tilgovi
Copy link
Contributor

tilgovi commented Jul 28, 2014

@satra you're saying you want to be able to add custom fields, I think.

That makes sense to me. There's actually no validation preventing anyone from putting custom fields into the database right now.

There isn't any index of available fields, though. And the index of the fields themselves is known to have problems with case sensitivity and parsing being tracked at openannotation/annotator-store#77.

@satra
Copy link
Author

satra commented Jul 30, 2014

@tilgovi - indeed but with the notion that people could store this set for others to use. so the ui would allow me to "push" a set of custom fields to another collaborator.

ideally we would be simply extend the hypothesis ui ourselves to add these fields. a little bit like configuring/extending together.js

@tilgovi
Copy link
Contributor

tilgovi commented Jul 30, 2014

@satra excellent!

It's going to take us a bit. There's going to need to be a few cycles of refactoring, some work on Annotator, and all of this while we're trying to fix bugs and land other features we think are high priority.

I'll open an issue in hypothesis/vision about it. It may span work on more than just this repo.

@BigBlueHat
Copy link
Contributor

@satra been pondering this a bit, and wondered if you could use tags that include both their names and their values--think URN (ex: urn:isbn:978-0596001087), but for "random" data points. 😄

An interesting and related idea I came across today called Tag URI's may hold some promise here also: http://www.taguri.org/ They look like this: tag:[email protected],2001-06-05:Taiko

I just tested using an URN with our current tag system, and it functions fine. 😸
https://hypothes.is/stream?q=tag:urn:ietf:rfc:5023

💭's welcome.

@satra
Copy link
Author

satra commented Jan 4, 2015

@BigBlueHat - happy new year - last year ran away in a hurry. having cleared my inbox till last august here is the very belated response!

i think using tags like this is fine. it's just that it would be good for the UI to display the RDFs:label if available for that tag and link out from the tag to the URI when possible. the issue is that in the biomedical field a lot of the tag uris would have numeric identifiers that would be very difficult to parse.

for examples of numeric identifiers see: http://www.ontobee.org/index.php

@nickstenning
Copy link
Contributor

Hi there! I'm going to close this as part of a clean-up of all issues currently open on this repo that represent ideas or features rather than reports of bugs or technical chores.

I want to be clear that this isn't intended to say anything at all about the content of this issue—it certainly doesn't mean we're no longer planning to do the work discussed here—just that we don't want to use GitHub issues to track feature requests or ideas, because the threads can get long and somewhat unwieldy.

If you're interested in what we are working on at the moment, you can check out our Trello board and, for a longer-term view, our roadmap.

And, if you're interested in following up on this issue, please do continue the discussion on our developer community mailing list. You might also want to check out our contributing guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants