- Description
- Instructor Contact Information
- Course Outline and Materials
- Schedule, Location, Calendar, and Offline Discussion
- Website and Communication
- Course Activities
- Grading Information
- Attendance, Conduct, Honesty, and Accommodations
- Frequently Asked Questions
This is an advanced short (1-credit) course designed to:
- Discuss common misunderstandings & typical errors in the practice of statistical data analysis.
- Provide a mental toolkit for critically thinking about statistical methods and results.
Classes will involve lectures, discussions, hands-on exercises, and homework about concepts critical to the day-to-day use and consumption of quantitative/computational techniques. Please use this course flyer to help spread the word.
Note
Open to both undergraduate and graduate students for credit. Counts toward the CMSE minor, graduate certificates, and dual PhD. Please email Heather Johnson at [email protected] if you need an override.
Postdocs, staff scientists/specialists, and faculty members are welcome to sit-in. (Please fill-out the course survey.)
To get the most out of this course, it would be ideal if you have:
- Familiarity with introductory statistics and probability, and
- Basic experience with data wrangling, analysis, and visualization using R/Python.
Check out some recommended online preparatory materials that you can use to refresh all these concetps.
- For introductory courses in statistics, please checkout a number of good ones offered in STT. Statistics at the level of STT 231 is strongly recommended.
- If you would like introdocutory courses in programming (in R or Python) and in how to do statistical analysis in R or Python, please ckeckout CMSE 201-202 or CMSE 890 301-304.
- Some background in introductory biology would also be nice, for instance LB 144 and 145 OR BS 161 and 162 OR BS 181H and 182H, or equivalent.
Please fill-out the course survey to help me better understand your background, motivation, etc., and provide advice on whether this course is right for you.
Arjun Krishnan | ... |
---|---|
Affiliation | Dept. Computational Mathematics, Science, and Engineering Dept. Biochemistry and Molecular Biology |
Office | 2507H Engineering Building |
Contact | Email: [email protected] Twitter: @compbiologist Website: https://www.thekrishnanlab.org |
[ Top ]
(subject to changes)
- Estimation of error & uncertainty
- P-value & P-hacking
- Multiple hypothesis correction
- Statistical power & Underpowered statistics
- Pseudoreplication
- Confounding variables & batch effects
- Circular analysis
- Regression to the mean & stopping rules
- Confirmation & survivorship bias
- Base rates & Permutation test
- Describing different distributions
- Continuity errors & model abuse
- Visualization challenges
- Researcher degrees of freedom
- Data sharing / Hiding data
- Reproducible research
- Difference in significance & significant differences
[ Top ]
Nov 6 – Dec 4, 2019
S/L | Info |
---|---|
Schedule | Mon and Wed 3:00-4:50 pm |
East Lansing Location | A158 Plant & Soil Science Bldg |
Grand Rapids Location | 5005 Grand Rapids Research Center Except Nov 20 when it in Room 2005 |
Day | Date | Topic | Learning Materials |
---|---|---|---|
Day 01 | Nov 06 (W) | Welcome | Getting started with statistical data analysis | Lecture Pre-class assignment (due before Day 02) |
Day 02 | Nov 11 (M) | Estimation of error & uncertainty | Hypothesis testing | Lecture |
Day 03 | Nov 13 (W) | P-value | P-hacking | Publication Bias | Multiple hypothesis testing | Lecture Pre-class assignment (due before Day 04) |
Day 04 | Nov 18 (M) | Statistical power & underpowered statistics | Lecture |
Day 05 | Nov 20 (W) | Pseudoreplication | Confounding variables & batch effects | Circular analysis | Regression to the mean & stopping rules | Lecture Pre-class assignment (due before Day 06) |
Day 06 | Nov 25 (M) | Base rates | Model abuse | Biases | Lecture |
Day 07 | Nov 27 (W) | Descriptive statistics | Measuring associations | Visual inference | Lecture Pre-class assignment (due before Day 08) |
Day 08 | Dec 02 (M) | Visualization challenges | Lecture Notes on the Final Exam |
Day 09 | Dec 04 (W) | Researcher degrees of freedom | Data sharing/hiding | Holistic analysis | Pre-registration | Reproducible research | Lecture |
Day 10 | Dec 05 (Th) (Note diff. day) | Final Exam Slot 1 (Note different location): 12:30 – 2:30 pm at Akers Hall - Classroom 135 |
|
Day 11 | Dec 06 (F) (Note diff. day) | Final Exam Slot 2 (Note different location): 12:30 – 2:30 pm at Akers Hall - Classroom 135 |
Tue & Thr 9–9:30am
I will block these times from my schedule and be present in my office (2507H Engineering).
Couple of things to note:
- While I'm happy to chat with you in person, many times, just sending me a message on Slack with your questions/concerns might work as well. So, if you have specific Qs in mind, just shoot me a message and let's see if we can resolve it then and there.
- If you would indeed like to meet in person, please try to meet me during this time. But, don't worry if you can't make it during this window for some reason. Again, just send me a message on Slack and we'll find a time that works for both of us.
[ Top ]
This GitHub repo will serve as the course website.
The primary mode of communication in this course (including major announcements), will be the course Slack account. All of you should have invitations to join this account in your MSU email.
Emails
Although the bulk of the communication will take place via Slack, at times (rarely), we will send out important course information via email. This email is sent to your MSU email address (the one that ends in “@msu.edu”). You are responsible for all information sent out to your University email account, and for checking this account on a regular basis.
[ Top ]
For each topic, you will be assigned reading materials and, occassionaly, a coding assignment. The links to these materials will be posted on this page next to the topic on the Calendar and instructions will be provided on Slack.
Each completed assignment is due before the next class.
In general:
- Do the assignments and additional readings.
- Show up to class.
- Work in groups during in-class discussion sessions.
- No one will have the perfect background: Ask questions about statistical or programming concepts.
- Contribute to the materials in-class and on slack.
- Correct me when I am wrong.
A major goal of this course is to prepare you for performing statisitcal data analysis with care, and for presenting your ideas and findings effectively. The final exam will serve as a practical way to do exactly that. We discuss and nail the details when we meet in class. It is scheduled twice to accommodate everyone's schedule: Dec 05 and Dec 06 12:30 – 2:30 pm at Akers Hall - Classroom 135.
[ Top ]
Activity | Percentage |
---|---|
Assignments | ~25% |
Class participation | ~50% |
Final Exam | ~25% |
[ Top ]
This class is heavily based on material presented and worked on in class, and it is critical that you attend and participate fully every week! Therefore, class attendance is absolutely required. Arriving late, leaving early, or not showing-up for a whole class without prior arrangement with the instructor counts as an unexcused absence. Note that if you have a legitimate reason to miss class (such as job, graduate school, or medical school interviews), you must arrange this ahead of time to be excused from class. More than two unexcused absences will impact your grade at the discretion of the course instructor.
All conduct should serve the singular goal of sustaining a inclusive, supportive, and friendly environment where we can do our best work and have a great time doing it.
- Do work that you’re proud of, from the smallest piece of writing/code to the final exams.
- Be supportive of your classmates; respect each others' strengths, weaknesses, differences, and beliefs.
- Communicate openly & respectfully with everyone in the class.
- Ask for help; at the same time, respect and appreciate others' time and effort.
Respectful and responsible behavior is expected at all times, which includes not interrupting other students, turning your cell phone off, refraining from non-course-related use of electronic devices, and not using offensive or demeaning language in our discussions. Flagrant or repeated violations of this expectation may result in ejection from the classroom, grade-related penalties, and/or involvement of the university Ombudsperson.
I am unequivocally dedicated to providing a harassment-free experience for everyone, regardless of gender, gender identity and expression, age, sexual orientation, disability, physical appearance, body size, race, or religion (or lack thereof). We will not tolerate harassment of colleagues in any form. Behaviors that could be considered discriminatory or harassing, or unwanted sexual attention, will not be tolerated and will be immediately reported to the appropriate MSU office (which may include the MSU Police Department).
Intellectual integrity is the foundation of the scientific enterprise. In all instances, you must do your own work and give proper credit to all sources that you use in your papers and oral presentations – any instance of submitting another person's work, ideas, or wording as your own counts as plagiarism. This includes failing to cite any direct quotations in your essays, research paper, class debate, or written presentation. The MSU College of Natural Science adheres to the policies of academic honesty as specified in the General Student Regulations 1.0, Protection of Scholarship and Grades, and in the all-University statement on Integrity of Scholarship and Grades, which are included in Spartan Life: Student Handbook and Resource Guide. Students who plagiarize will receive a 0.0 in the course. In addition, University policy requires that any cheating offense, regardless of the magnitude of the infraction or punishment decided upon by the professor, be reported immediately to the dean of the student's college.
It is important to note that plagiarism in the context of this course includes, but is not limited to, directly copying another student's solutions to in-class or homework problems; copying materials from online sources, textbooks, or other reference materials without citing those references in your source code or documentation, or having somebody else do your pre-class work, in-class work, or homework on your behalf. Any work that is done in collaboration with other students should state this explicitly, and have their names as well as yours listed clearly.
More broadly, we ask that students adhere to the Spartan Code of Honor academic pledge, as written by the Associated Students of Michigan State University (ASMSU): "As a Spartan, I will strive to uphold values of the highest ethical standard. I will practice honesty in my work, foster honesty in my peers, and take pride in knowing that honor is worth more than grades. I will carry these values beyond my time as a student at Michigan State University, continuing the endeavor to build personal integrity in all that I do."
If you have a university-documented learning difficulty or require other accommodations, please provide me with your VISA as soon as possible and speak with me about how I can assist you in your learning. If you do not have a VISA but have been documented with a learning difficulty or other problems for which you may still require accommodation, please contact MSU’s Resource Center for People with Disabilities (355-9642) in order to acquire current documentation.
Nevertheless, please come and talk to me. You are welcome in this class and I will do everything I can to accommodate your specific needs.
[ Top ]
1. Why was this course developed? Will this course teach me concepts in statistics?
There are already plenty of existing courses at MSU that teach introductory, intermediate, and advanced statistics. (You can go to Course Descriptions and search using "statistics" under Keyword Search to get the full list.) You can also find a few recommended free online resources here. Teaching statistics will be left to these courses and it will be assumed that you have taken one of these courses (or something equivalent) to learn (≥ introductory) statistics in a traditional manner (which is important without a doubt!).
StatGaps is a non-traditional course that is aimed at discussing what happens – issues that crop-up and nuances that become germane – when the ideas from the traditional courses are applied to actual research, messy data, and real-world problems.
2. Why is coding (in languages like R or Python) a pre-requisite for this course?
The goal of this course is to create opportunities for developing a strong intuition behind many concepts in statistical data analysis, which will take precendence to, motivate, and solidify the related formulae and terminologies. Now, short of starting with a question and mathematically deriving the concepts/formulae, whenever possible, this StatGaps course takes the approach of using programming to implement statistical ideas on real/simulated data and observe the results to build an intuition behind various concepts.
3. What are some specific coding skills I would need for this class?
In this class, you will be writing code to read-in datasets, wrangle them into a convenient format, calling some common statistical functions from standard packages/libraries, implementing some simulations/tests (which will involve random number generation and writing for/while loops), and making plots (scatterplot, histograms, boxplots, etc.).
This means knowing the following depending on your language of choice:
- R: tidyverse (readr, dplyr, ggplot2), calculating summary statistics (e.g. mean/median, std-deviation/variance, correlation), generating random numbers (e.g.
runif
,rnorm
), and writingfor
&while
loops. - Python: pandas (data wrangling), seaborne (data visualization), numpy for calculating summary statistics (e.g. mean/median, std-deviation/variance, correlation) and generating random numbers, and writing
for
&while
loops.
You can find a few recommended free online resources for learning these skills here.
4. Can this course be taken remotely?
Yes! This class will be streamed live via a Zoom link. Details will be available via the class Slack account.
5. I'm a postdoc and I would like to just attend the class to learn for myself - no need for credits or anything. Is that possible?
Yes! To provide some context: this short course is part of an extensive Bioinformatics education program we run out of CMSE. Postdocs (or any MSU-affiliates who are not registered students) can audit any of these Bioinformatics modules for a small fee. With this StatGaps course, however, since I’m still experimenting with it, especially in terms of taking it to a large scale, I’m open to having postdocs, research/staff scientists, and faculty-members audit this course for free. The only things I ask in return is your active engagement with the class and its materials along with providing constructive feedback.
6. Are there going to be exams?
Yes! But you can rest assured that the point of the exam will not be to test you. The exam will give you an opportunity to revisit many of the concepts discussed throughout the class and, in that process, do something practically useful to you in your future efforts with statistical data anlayses. We will discuss and nail the details when we meet in class.
6. Will this short course be converted into a regular semester-long course in the future?
May be. There is a lot of benefit for keeping an essential course like this short and crisp. If you are interested in chatting about this and/or helping with developing this course further, do get in touch!
[ Top ]