Learning Data Science on the Ethereum Blockchain with the Omniacs
We want to share our joy of data science with the Ethereum eco-system through an informative set of online courses and case studies that merge data analysis, blockchain analytics and statistical programming.
We aspire to create a modular online course to help new blockchain developers understand the principles and best practices of data science. Using open data sources from across the Ethereum landscape (think StateoftheDApps, UniSwap, & Etherscan), the course will teach the topics of data munging, data visualization, exploratory data analysis, machine learning, and dabble in a bit of deep learning and artificial intelligence. As the course grows, we intend to create case studies around how to apply data science to specific DApps (think machine learning techniques for predicting markets with Numer.ai data or analyzing traits in CryptoKitties).
Our ultimate vision is to spur and inspire the next generation of developers through interesting applications of data science to emerging blockchain technologies.
Gitcoin Grant Link: https://gitcoin.co/grants/562/learning-data-science-on-the-ethereum-blockchain-
Module 1 - Basic Data Structures and Munging
- Motivating Example (Slides, Python Slides, Video)
- Reading Files (Slides, Python Slides, Video)
- Basics (Slides, Video)
- Data structures (Slides, Video)
Data Sources: CryptoPunks, Crypto Art Pulse
Module 2 - Statistical Graphics and Visualization
- Why graph?
- Visualization Principles and Practices
- Plotting Basics
- Building Plots Layer by Layer
- Polishing Your Graphs
Data Sources: Omni Analytics Group, CryptoPunks
Module 3 - Supervised and Unsupervised Machine Learning
- The ML workflow
- Supervised learning for classification
- Unsupervised learning for grouping
- Forecasting what's next
- Deep learning for sequences
Data Sources: Omni Analytics Group
Module 4 - Case Studies
- Clustering and segmenting Ethereum validator performance (R, Python, Video)
- Visualizing slashings in the Ethereum Medalla testnet (R, Python, Video)
- Reconstructing the Crypto Sentiment Investment Curve in ggplot2
- Interacting with and Analyzing Numerai Network Growth with GraphQL and ggplot2
- Tornado.Cash Initial Distribution Analysis
- Stable Coin Analysis
- Using R Shiny to explore Numerai tournament data (Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8, Part 9)
- Crypto Punks NFT Value Analysis (Slides, Video)
- Hashmask Rarity Analysis (Slides, Video)
- Geocoding and Mapping Cryptojobs (Slides, Video)
- Cryptojobs Exploratory Data Analysis (Slides, Video)
- Making Graphs fun with BadgerDAO (Slides)
- An Exploratory Data Analysis of PoolTogether (Slides)
- An Exploratory Data Analysis of Yearn.Finance (Slides)
- Interacting with the Covalent API via httr (Slides)
- Analyzing the Ethereum Name Service (ENS) (Slides)
- Forecasting the Trend in Bitcoin Dominance (Slides)
- An Analysis of the Uniswap Platform (Slides)
- Forefront Social Token Analysis (Slides)
- Predicting Growth in L2 Chains (Slides)
- A Statistical Dive into the Unlock Protocol (Slides, Video)
- Filecoin Miner Index API Exploration (Slides)
- Uniswap Airdrop Liquidity Provider Analysis (Slides)
- Uniswap Governance Analysis (Slides)
Data Sources: Beaconscan, Numerai Tournament Data, Haskmasks, Crypto Punks, Crypto Jobs, Pool Together, BadgerDAO, Yearn Finance, ENS, Uniswap
Module 5 - Understanding and Defending Gitcoin
- Quadratic Funding in the Wild: A Post-Round Analysis of Gitcoin’s Fund Matching Mechanism (Slides)
- Gitcoin Grants Analysis (Slides)
Data Sources: Gitcoin, Omni Analytics Group
Lagniappe
Gitcoin Grant Round 8
Like everything else in the world, 2020 flipped our course development plans upside down. Instead of building the course from the bottom up, we chose to repurpose and refactor our Medalla research into motivating case studies on how to perform data analysis on Ethereum 2.0 blockchain data. The case studies walk through, in detail, how we performed the analysis that ultimately netted us a bronze prize. We also included a tutorial on how to use R Shiny as an exploration tool for understanding the Numer.ai dApp's tournament data.
Change log
- Restructured the repository for clarity
- Added 3 case studies
Gitcoin Grant Round 9
For Round 9 we've doubled down on our "case study first" approach to teaching data science using projects on the Ethereum blockchain as examples. This update includes a look at NFTs, stable coins, market capitalization estimation, and an introduction to GraphQL. We expanded two of our original case studies to include Python versions, so if you are interested in learning more about that language you can check those out here and here. This update also includes our first attempt at creating video lectures for the eager learners who would like to dive deeper into the concepts. We intend to use funding from this round to further expand the set of case studies we produce and improve the quality of our video content.
Change log
- Major update to the course aesthetics
- Module 1 updated
- 5 New case studies (with 2 more to be published during the active grant round)
- First video lecture published to Youtube
Gitcoin Grant Round 10
This round update had us working with our first outside contributor. @Amelia188 gave our course a proper copy edit by fixing tons of typos, correcting grammar errors and improving the overall readability of the material. We look forward to her continued contributions and encourage others to reach out to us about opportunities to collaborate. To further expand out the base content for the course, we completed the material for the second module that focuses on statistical graphics. Other updates for this round include two NFT related case studies and a host of new video lectures. Your support for this round will help us expand our contributor pool and further improve on the quality of our content.
Change log
- Tons of copy edits
- 4 Video lectures for Module 1 have been published
- Slides for Module 2 - Statistical Graphics and Visualization have been completed
- 2 NFT related case studies were created and videos produced
Gitcoin Grant Round 11
We've been busy! Over the last quarter we've been working with DAOs to help them understand their data and it has been this work that inspired us to create two new case studies all about blockchain jobs using data from Cryptojobs. In addition to these two case studies, we've included another one on Yearn.Finance created by our newest collaborator @vintro. This update also includes 3 new videos to supplement the case studies. As usual, we really appreciate the support and contributions this round will help us find and compensate additional course contributors.
Change log
- 3 Video lectures
- 3 New case studies
- Various copy edits
Gitcoin Grant Round 12
Our update for this round is on the smaller side. We have fixed a few links and added a couple of case new case studies. If you can, be a little patient with us. We're going to try to come back with some big updates for the next round. Stay tuned!
Change log
- 2 New case studies
- Various link fixes and copy edits
Gitcoin Grant Round 13
In addition to making progress on Module 3, we've doubled down on our cross-platform initiative to include more Python coded examples. We now have the first two sections of Module 1 translated into Python. A special shout out to @JSchoonmaker! Stay tuned because we're actually going to be updating the course throughout this round.
Change log
- Added a Python tutorials for Module 1
- Updates to Module 3
- Various link fixes and copy edits
Gitcoin Grant Round 14
We’re bouncing all around for this season! The new case studies for this round touch on forecasting L2 contract deployment, exploring ENS domain names, characterizing Filecoin miners, understanding Unlock Protocol contract interactions, and analyzing Uniswap with the uniswappeR package. We’ll be updating throughout the period so don’t be too surprised if a few new case studies or additional content pops up out of nowhere! Also, we don't want to forget to extend kudos to @NadiaAntony for her course content contributions!
Change log
- 5+ new Case Studies
- Various link fixes and copy edits
- Minor updates to Module 3
Gitcoin Grants Round 15 and Beta Round
This round we've decided to experiment with a little bit of alternative styled content! In the lagniappe
folder you'll find a compilation "Tweet Book" of R programming related tweets from our Twitter. As our first foray into children's content, we created a data collection worksheet that has young, budding data scientists describing crypto currency logos in spreadsheet form. It should be a short, fun exercise for anyone getting used to how data is structured. We've also included two new Uniswap-centric case studies, one of which won an award for most insightful analysis of the protocol's governance. With that, we hope you enjoy this update!
Change log
- 2 new Case Studies
- Added our first kids data collection worksheet
- Added Vol. 1 of our quick stats "Tweet book"
- Various link fixes
Gitcoin Grants Round 21
We are back and very excited to share some of the new stuff we've learned! Over the course of the Gitcoin Grants Round 21 (and the rest of the year) we'll be adding case studies based on some of the analysis we've completed for Arbitrum, JokeRace, and Aave. These newer case studies will tackle interesting topics in Web3 governance, grant analysis and even touch on how to leverage open source AI into your analytics workflow. Check back often with us and stay tuned for more updates!
For a little over a year, this course has been inspiring new data scientists and teaching the fundamentals of statistical analysis all while introducing students to the world of cryptocurrency through the practical application of these skills to Ethereum based projects. As we continue to push forward on our dream of educating the next generation of blockchain native data scientists, we are going to begin to experiment with more Web3 enabled features for the course. Were investigating the use of interactive quizzes with POAP rewards, hosting live seminars within Decentraland, and creating a Discord community of course afficiandos. We'd love your continued support as we grow this course into the best open source data science educational public good on the internet!
"This course is like a rain following a drought. It kindly walks you through the process starting from the use of R to introduction of graphs and machine learning concepts with interesting case studies. I strongly recommend it not only to researchers interested in Ethereum blockchain but also to any students or professionals that have interest in learning data analysis and science." - Will Shin (Principal Economist at Klaytn)
- March 11th, 2022 - 59 Stars - 16 Forks
- June 8th, 2022 - 76 Stars - 17 Forks
- Sept 6th, 2022 - 90 Stars - 21 Forks
- Apr 17th, 2023 - 101 Stars - 25 Forks
- Aug 5th, 2024 - 119 Stars - 29 Forks
Do you all have experience in this stuff?
Why yes, we do! Omni Analytics Group is a team of PhD level statistical consultants that have been teaching and solving difficult data science problems for nearly a decade. We are passionate about data science and blockchain technologies. Just check out our twitter.
Do I need any prior experience before taking this course?
Our intention is to start from the beginning and build up not only your data chops, but your statistical intuition and programming knowledge. At the end of these courses, you should be able to match a statistical technique to a blockchain data problem, write a basic script to analyze it and confidently search online for more advanced knowledge.
What programming languages will the course focus on?
We'll initially focus on the statistical language R, but then expand to Python. As the course grows, we hope to include examples with contracts written in Solidity.
Can I request a topic?
Sure! Once we flesh out the initial course material. If funding persists, we'd be more than happy to take suggestions on case studies or topics.