Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

launch year for each Dataverse installation #7

Open
pdurbin opened this issue Jul 12, 2019 · 25 comments
Open

launch year for each Dataverse installation #7

pdurbin opened this issue Jul 12, 2019 · 25 comments
Assignees

Comments

@pdurbin
Copy link
Member

pdurbin commented Jul 12, 2019

When I see graphs like the following from http://slides.com/mercecrosas/dataversecommunity2018#/5 ...

Screen Shot 2019-07-11 at 10 42 05 PM

... I think, "This is a great graph but it would be nice to have the actual launch date of each of the Dataverse installations. We could add a column for this in the database.

@pdurbin
Copy link
Member Author

pdurbin commented Oct 2, 2019

In IRC today I announced a new spreadsheet I'm calling "Crowdsourced information about Dataverse Installations": https://docs.google.com/spreadsheets/d/1bfsw7gnHlHerLXuk7YprUT68liHfcaMxs1rFciA-mEo/edit?usp=sharing

The first question I asked was this, "please add your launch_year, the year you started with Dataverse (fresh or migration from some other system)".

Here's the conversation: http://irclog.iq.harvard.edu/dataverse/2019-10-02#i_107856

@pdurbin pdurbin changed the title launch date for each Dataverse installation launch year for each Dataverse installation Oct 25, 2019
@pdurbin
Copy link
Member Author

pdurbin commented Oct 28, 2019

On Friday @shlake and I talked about different approaches for getting this data:

  • Keep pestering the entire mailing list like I have, for example, at https://groups.google.com/d/msg/dataverse-community/xFRXZ7BAuQA/q_fLaenBBgAJ
  • Send targeted emails to people we know from this or that installation. If we don't know the person to contact, try to get the email from OAI-PMH.
  • Declare a census called "Dataverse Installation 2019 Census" and try to get launch year as part of that.

@shlake do you have a preference? Am I forgetting any other approach we talked about?

All, are there more approaches we haven't considered? @poikilotherm is starting to hack on the map. 😄

@shlake
Copy link
Contributor

shlake commented Oct 28, 2019

@pdurbin I'm going to pester the list (just once - not keep doing it) AND will send targeted emails to folks we know.

Once we figure out ALL the bits of info we need, then I think we can declare a census to get what we don't have.

@pdurbin
Copy link
Member Author

pdurbin commented Dec 6, 2019

I just made a couple pull requests to help try to make it more apparent that some launch years are missing.

Pull request #39 - add launch year to table

70338886-b9e1fb80-181b-11ea-9430-8382a83c569d

Pull request #40 - add script to convert data.json to a TSV file

70345238-8e660d80-1829-11ea-8e5d-7e70066847cc
70345237-8dcd7700-1829-11ea-9861-efeaa6b8d431

shlake added a commit that referenced this issue Dec 9, 2019
@pdurbin
Copy link
Member Author

pdurbin commented Feb 27, 2020

I'm going to pester the list (just once - not keep doing it) AND will send targeted emails to folks we know.

@shlake do you still want to pester the list once or should I re-pester? 😄 I just made the following diagram which I was planning to sent to the list. Please let me know! It's a screenshot from https://dataverse.org/installations with some angry red added for the years we are missing. 😄

As you know the fix it to ask everyone to fill in the year in the "crowdsourced spreadsheet" spreadsheet: at https://docs.google.com/spreadsheets/d/1bfsw7gnHlHerLXuk7YprUT68liHfcaMxs1rFciA-mEo/edit#gid=0

Then we run python3 update-data.py and make pull requests.

I'm motivated because yesterday my friend and officemate @erikbuunk started making some killer data visualizations about the Dataverse community (that I can't wait to share!). 🎉

Oh, my other thought is that we can @ mention people here know who could track down the data. People like @eugene-barsky @skasberger @umuchlish @Venki18 @dheles @sjaefulafandi @lmaylein @adam3smith @CCMumma @kaitlinnewson @jmjamison and others that aren't top of mind.

Dataverse_Installations_Around_the_World_The_Dataverse_Project_-Dataverse org-_2020-02-27_05 52 12

@pdurbin
Copy link
Member Author

pdurbin commented Apr 6, 2020

This just in from @skasberger for AUSSDA Dataverse - " you once asked me, when we launched. it was the 20th of august 2017."

@CCMumma
Copy link

CCMumma commented Apr 6, 2020

TDR was launched in November 2016 - sorry I missed this one earlier.

@shlake
Copy link
Contributor

shlake commented Apr 9, 2020

@pdurbin just sent an email to the list asking for dates.

@pdurbin
Copy link
Member Author

pdurbin commented Apr 9, 2020

@shlake THANK YOU! You're the best! 🎉 🎉 🎉

For anyone who missed it, here's a link to that email: https://groups.google.com/d/msg/dataverse-community/jqfxdU3e2FQ/qroo3eMvAAAJ

@pdurbin
Copy link
Member Author

pdurbin commented Jun 3, 2022

Thank you to @IlariaBelvedere for taking an interest in this issue, creating some sub-issues, and reviving the thread at https://groups.google.com/d/msg/dataverse-community/jqfxdU3e2FQ/qroo3eMvAAAJ !

By the way, here's a one liner for getting the counts per year:

cat data/data.json | jq '.installations[].launch_year' -r | sort | uniq -c

   1 2005
   1 2012
   1 2013
   1 2014
   2 2015
   4 2016
   5 2017
   1 2018
   9 2019
  13 2020
   8 2021
   3 2022
  30 null

Here's a quick chart using Google Spreadsheets:

chart(1)

@adam3smith
Copy link

I don't think the spreadsheet is still editable? I can comment only, which isn't super helpful for spreadsheets. QDR launched its Dataverse catalog in 2018

pdurbin added a commit that referenced this issue Jun 4, 2022
pdurbin added a commit that referenced this issue Jun 4, 2022
@pdurbin
Copy link
Member Author

pdurbin commented Jun 4, 2022

Thank you @IlariaBelvedere for all the research! Here's how https://iqss.github.io/dataverse-installations/year.html looks now!

Dataverse Installations by Year
2005 * 1
2008 * 1
2012 * 1
2013 * 1
2014 *** 3
2015 *** 3
2016 **** 4
2017 ******** 8
2018 ****** 6
2019 ******************** 20
2020 ************* 13
2021 ******** 8
2022 *** 3
???? ******* 7

Installations with Unknown Launch Year
CIMMYT Research Data
DataSpace@HKUST
Fudan University
Göttingen Research Online
ICRISAT
Johns Hopkins University
Maine Dataverse Network

Here's the most recent research:

Thanks again, @IlariaBelvedere!

@pdurbin
Copy link
Member Author

pdurbin commented Jun 5, 2022

Ok! I just did some research and added launch years for the last few installations. Here's how https://iqss.github.io/dataverse-installations/year.html looks now:

Dataverse Installations by Year
2005 * 1
2008 * 1
2012 ** 2
2013 ** 2
2014 **** 4
2015 *** 3
2016 ******* 7
2017 ******** 8
2018 ****** 6
2019 ********************* 21
2020 ************* 13
2021 ******** 8
2022 *** 3

2019 was a big year for us! 😄

I guess next we should think about what the definition of done is for this issue. 😄

@pdurbin
Copy link
Member Author

pdurbin commented Jun 5, 2022

Here's a quick graph from Google Spreadsheets showing the cumulative number or running total of installations.

Screen Shot 2022-06-04 at 9 53 14 PM

The formula is from https://www.statology.org/google-sheets-cumulative-percentage/

This is how I get the year and count:

cat data/data.json | jq '.installations[].launch_year' -r | sort | uniq -c | awk '{print $1, $2}' | tr " " "\t"

@IlariaBelvedere
Copy link

Hello, thank you for the graph? Could I use it in my thesis (with attribution)?

@pdurbin
Copy link
Member Author

pdurbin commented Jun 5, 2022

@IlariaBelvedere absolutely! Please go ahead. And thanks again for helping so much!

I actually added a second ASCII art graph (over time) last night to https://iqss.github.io/dataverse-installations/year.html

Here's how they look as of this writing:

Dataverse Installations by Year
2005 * 1
2008 * 1
2012 ** 2
2013 ** 2
2014 **** 4
2015 *** 3
2016 ******* 7
2017 ******** 8
2018 ****** 6
2019 ********************* 21
2020 ************* 13
2021 ******** 8
2022 *** 3

Dataverse Installations Over Time
2005 * 1
2008 ** 2
2012 **** 4
2013 ****** 6
2014 ********** 10
2015 ************* 13
2016 ******************** 20
2017 **************************** 28
2018 ********************************** 34
2019 ******************************************************* 55
2020 ******************************************************************** 68
2021 **************************************************************************** 76
2022 ******************************************************************************* 79

@IlariaBelvedere
Copy link

Thank you very much for the graphs, it is satisfying to look at the statistics over the years :) And thank you for thanking me ahahha, I am happy if I can help :)
I think there are other things that could be done maybe. 1) A few description of the installations could be filled, to complete the picture XD 2) Some of the installations could no longer be active, is there a way to find out about the most recent updates by communicating with the institutions themselves?

pdurbin added a commit that referenced this issue Jun 6, 2022
pdurbin added a commit that referenced this issue Jun 6, 2022
based on oldest published Dataverse: July 2013
@pdurbin
Copy link
Member Author

pdurbin commented Jun 6, 2022

  1. A few description of the installations could be filled, to complete the picture XD

@IlariaBelvedere Yes, I also noticed that some descriptions are missing. If you feel like creating new issues and reaching out to the installation contacts, please go ahead.

  1. Some of the installations could no longer be active, is there a way to find out about the most recent updates by communicating with the institutions themselves?

Probably. The way to communicate with an installation is to look at contact_email in the spreadsheet mentioned in the README. Here's a direct link: https://docs.google.com/spreadsheets/d/1bfsw7gnHlHerLXuk7YprUT68liHfcaMxs1rFciA-mEo/edit#gid=0

@pdurbin
Copy link
Member Author

pdurbin commented Jun 6, 2022

In 650911f I put together some charts at https://iqss.github.io/dataverse-installations/charts.html

Here's how they look:

by-year-and-over-time

That charts.html page isn't linked from anywhere. I'm not sure if it should be a standalone page or not.

https://iqss.github.io/dataverse-installations/ has the map and descriptions (screenshot below). Maybe for now we could add a link under the map (and before the descriptons) to the new charts.html page. Thoughts?

Screen Shot 2022-06-06 at 5 22 25 PM

@IlariaBelvedere
Copy link

  1. A few description of the installations could be filled, to complete the picture XD

@IlariaBelvedere Yes, I also noticed that some descriptions are missing. If you feel like creating new issues and reaching out to the installation contacts, please go ahead.

  1. Some of the installations could no longer be active, is there a way to find out about the most recent updates by communicating with the institutions themselves?

Probably. The way to communicate with an installation is to look at contact_email in the spreadsheet mentioned in the README. Here's a direct link: https://docs.google.com/spreadsheets/d/1bfsw7gnHlHerLXuk7YprUT68liHfcaMxs1rFciA-mEo/edit#gid=0

Thank you @pdurbin! I am going to try to contact them and I will let you know about the results. :)

@IlariaBelvedere
Copy link

@pdurbin
I noticed now that the missing description are only six, if I am not wrong, so I think they can be found on the official sites and be filled in this way, what do you think?
About the contact emails, there is the one from Abacus that I think has to be updated, because that one is not working: however, there are the mails of the library here: https://ask.library.ubc.ca/.

@pdurbin
Copy link
Member Author

pdurbin commented Jun 7, 2022

@IlariaBelvedere before we start filling in missing descriptions, can you please create a new issue for this? Yes, copying a reasonable description from an official site sounds fine.

A new issue for the Abacus email too, please! 😄

@pdurbin
Copy link
Member Author

pdurbin commented Jun 7, 2022

@pdurbin pdurbin moved this to Community Backlog (Phil) in IQSS Dataverse Project Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants