diff --git a/_quarto.yml b/_quarto.yml index f7913db..9b0be9c 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -59,11 +59,11 @@ website: # - icon: fa-github fa-lg # href: https://github.com/rudeboybert/SDS192 - sidebar: - style: docked - contents: - text: "Project 1" - href: project1.qmd + # sidebar: + # style: docked + # contents: + # text: "Project 1" + # href: project1.qmd page-footer: right: diff --git a/course-materials/Final-Project/Project.qmd b/course-materials/Final-Project/Project.qmd index ab53065..913a65e 100644 --- a/course-materials/Final-Project/Project.qmd +++ b/course-materials/Final-Project/Project.qmd @@ -1,214 +1,70 @@ --- -title: "" +title: "Final Project" +subtitle: "Due: December 16, 2024" +author: "Schwab" +editor: visual +toc: false --- -## Project Overview +## Final portfolio -As part of the coursework requirement for SDS 220, you will conduct a quantitative analysis on a real data set with the goal of using data analysis to address a research question. Your final product will be a written report describing the motivation for the project, details of the data used in the analysis, and the conclusions drawn from your analysis. The project is an opportunity to integrate what you've learned about data wrangling, data visualization, and statistical inference with statistical communication skills. +For your final project you will build a website displaying your three projects from the semester. -The core project requirements pertain to the statistical analysis you perform, and the narrative structure of your written report. Roughly speaking, your data analysis project will address a question like "How is {Variable X} related to {Variable Y}". This means that your data set must measure several variables. +This final project is a portfolio of your work, you can choose to leave it on the internet or take it down after it is graded. -Your analysis must also include at least one hypothesis test that allows you to draw an inferential conclusions about how your outcome variable is related to your explanatory variable(s). For example, you may include a hypothesis test for a difference in means, a hypothesis test for a difference in proportions, or a hypothesis test on the coefficients of a regression model. +You do not need to remake your projects, but you may want to proof read the blog posts. -When choosing which variables to use in your analysis, it is prudent to keep the hypothesis testing requirement in mind. For example, before choosing a categorical variable with 5 levels as your outcome, and a numeric explanatory variable, ask yourself: Do I know how to conduct a hypothesis test that would answer a research question about how these kinds of variables relate? If not, it is recommended that you choose a different set of variables to analyze. +You do not need to include your name on any of your work and you need to get permission from your partner to include their name. -Your report should be structured as a narrative document that introduces the topic of your analysis (including your research question(s) and the goals of your data analysis), explains the nature of the data you are analyzing in sufficient detail, explains what analyses are being performed and what can be learned from them, and ends with overall conclusions related to your research question that are supported by your data analysis. Your writing should be supplemented with illustrative figures and tables that help the reader understand how these data help answer the questions of your investigation. Your report should also cite at least one outside source of information (in addition to citing the source of your data), though that source does not necessarily have to be a peer-reviewed \`academic' article. +(If you are actually making a digital portfolio, that you plan to keep on the internet, your name would be included in the about me section.) -## Stages and Timeline -You will complete your project several stages outlined below. More details about what is required to complete each stage is described in the subsequent sections. +## Requirements: -- **Friday, November 17 - Group formation and Data Set Selection (2 points)** +1. Each of your projects must be individual files on different webpages. They must be rendered to html. Please check your live website to make sure everything looks good and the links work. - +2. You must remove the code, warnings and errors from each rendered file. - +3. Your website will have 5 pages minimum: - + i. Landing page - This must have an image of some data graphic, this could be anything you made, including work from SDS 100, 220 or a lab. From the landing page people should be able to navigate to your three projects and your about me section. - + ii. One page per project. -- **Data Analysis Plan ** + iii. An about page so people can contact you to hire you. You can choose to have genuine information here but no contact information until you are ready to graduate. You can also choose to post fake information that seems legitimate and professional. You cannot add identifying info about your partner here. Once again fake information is completely okay, I'll be able to verify your github username. -- **Project Presentation** +4. You will make use of the nav bar. -- **Saturday, December 20 - Written Report** +5. Your website should look professional. Use any of the skills from 192 or 100. Consider headers and links. -- **Saturday, December 20 - Reflection** +6. Your website must be pushed to github and turned on and working.* Please test it by visiting you site. When you are done submit the link to your site to the forum on moodle. -## Group formation and Data Set Selection +7. When I'm done grading I'll let you know and you can turn off your portfolio. -You will work in pairs to complete this project, and you will choose your partner yourself. If you do not have a good idea who you want to work with, I encourage you to solicit potential project partners in the course slack. +8. (Optional) add a 404 page. -Informing the instructor who you are working with is considered the first checkpoint of the group project assignment. You are expected to fill out the provided form before the deadline posted. In the event you are unable to join a group by the deadline, or do not inform the instructor, you will be assigned to a group by the instructor. +## Website options: -Once you have formed your group, the next step in the project is chosing your data set and planning your analysis. You are free to use any data set the meets the below requirements: +1. [Here is the website for quarto websites.](https://quarto.org/docs/websites/) -- You may not use any variable representing temporal measurement (months, days, years, etc.) as explanatory variable +2. Change the [theme of your website.](https://quarto.org/docs/output-formats/html-themes.html) -- You may not use proportions/percentages as your outcome variable +3. Customize your website using ideas from [here](https://ucsb-meds.github.io/customizing-quarto-websites/#/title-slide). -- The data set must be in a "tidy" data format to start with +## Some examples of digital portfolios -- There must be at least 10 observations per level in all categorical variables +A perfect Digital Portfolio takes time to make, yours does not need to be perfect. It is just a start, you can add work to it throughout your college and post-college career. -- You may not use any data sets used in this class, the Into to Modern Statistics \[IMS\] book, or any data used as part of example demonstrations. +[Erin Schwab's work](https://erinschwab.webflow.io/) -To simplify the process of choosing a data set, a curated list of data sets that are appropriate for this project is provided [HERE](ListOfProjectDatasets.pdf). Additionally, if you wish to use a data set not on the curated list, you must get approval. +[A website with examples](https://careerfoundry.com/en/blog/data-analytics/data-analytics-portfolio-examples/) -You will inform the instructor who you are working with, and your chosen data set using the form provided [HERE](https://forms.gle/LKV829Nu6rksmMCB9) (only one group member needs to fill out the form). This form is graded on completion provided it is submitted by the due date. +## A few things to note: -## Data Analysis Plan +- I am meant to protect your identity as students. Eventually you will want contact information on your portfolio, but leave it off for now. Don't even mention Smith College. -Once you've chosen a data set, you must form a research question, and plan an analysis suitable to address your question. One member of your group must fill out the Data Analysis Pre-Registration form. This form to submit this is HERE, and will ask you: +- After your website is graded I'll let you know and you should turn it off. -- What variables you plan to analyze +- This is meant to be a simple skill that students last Fall wished they knew how to do. It should not be very very time consuming. Make something that meet requirements then move on. You can perfect it later. -- What research question you have - -- What hypothesis you have about your variables - -- How you plan to test your hypothesis - -- How will you divide the work - -Completing the Data Analysis Pre-Registration form is considered the second checkpoint of the group project assignment. You are expected to fill out the Data Analysis Pre-Registration form before the deadline posted. If no one from your group fills out the Data Analysis Pre-Registration form by the posted deadline, your group will be assigned a data set by the instructor, and a receive a 0 for this portion of the grade. - - -Note, your plan can have errors, this is chance to have feedback. We are looking to see if you have thought carefully about a plan. You are also allowed to make minor changes your plan after the form has been submitted. - - - - - - - - - - - - - - - - - - - - - -## Data Analysis Report - -In this final phase, you will execute the project by writing a technical report that introduces your topics, describes your data and analysis, and explains the conclusions that are supported by your analysis. Many of the key features of this report have already been described in the Core Project Requirements section. In general, your report should follow this basic format: - -1. Introduction: an overview of your project. In a few paragraphs, you should explain clearly and precisely what your research question is, why it is interesting, and what contribution you have made towards answering that question. You should give an overview of the specifics of your model, but not the full details. Most readers never make it past the introduction, so this is your chance to hook the reader, and is in many ways the most important part of the paper! - -2. Data: a brief description of your data set. This section should introduce the reader to the data by answering questions such as: what variables are included? Where did they come from? What are units of measurement? What is the population that was sampled? How was the sample collected? This section should also acquaint the reader with your data set through the use of univariate and multivariate visualizations, and univariate and multivariate summaries. - -3. Results: A hypothesis test set up to address your research questions(s). You should interpret the results in the context of the data explain their relevance. If you are using a regression model in your analysis, each of it's coefficients should be clearly explained. You should include negative results, but be careful about how you interpret them. For example, you may want to say something along the lines of: 'we found no evidence that explanatory variable x is associated with response variable y,' or 'explanatory variable x did not provide any additional explanatory power above what was already conveyed by explanatory variable z'. On other hand, you probably shouldn't claim: 'there is no relationship between x and y'. - -4. Dicussion/Conclusion: a summary of your findings and a discussion of their limitations. Be sure to remind the reader of the question that you originally set out to answer, and summarize your answer to the question. Your discussion should also protect yourself against misinterpretation by being clear about what is not implied by your research. Finally, you should also discuss the limitations of your analysis and/or data, and how it could be augmented or improved with future research. This \`future questions' portion of the discussion should include a brief exploratory analysis showing how your project could be extended to include additional explanatory variables in the future. - -5. Appendix (Potentially Optional): If you have any supplemental analyses that would be of interest (e.g., examining the assumptions of your two-sample t-tets), you may wish to place those analyses in an appendix at the end of your report. - -6. Bibliography: Your report should contain at least two references (one of those references must be the source of your data). You can use any reasonable citation style (e.g., APA or MLA format), but regardless of style, your bibliography should clearly identify the primary source you are referencing. - -To complete the project, your group should submit - -1. Your written report as either a PDF document, - -2. The .qmd or .Rmd file that, when rendered, creates the self-contained PDF document for the report. - -3. Your bibtex references file (and your citation style file, if any citation style file was used) - -There is no limit to the length of the technical report, but it should not be longer than it needs to be; you should strive to express your ideas concisely and precisely in all situations. Even though you are expected to write your technical report as an Rmarkdown or Quarto document, the PDF file you submit should not display any R code, warnings, or messages. Your technical report should also be well formatted and well organized (e.g., using properly formatted headers to divide sections, using LaTeX markup for mathematical symbols where needed, tables should have bolded column and row labels, etc.). - -## Guidelines for Successful Statistical Writing - -This document should be written for peer reviewers, who comprehend statistics at least as well as you do. You should aim for a level of complexity that is more statistically sophisticated than an article in the Science section of The New York Times, but less sophisticated than an academic journal. For example, your report will use terms that that you will likely never see in the Times (e.g. a $t$-statistic), but you should not dwell or expound on technical points with no obvious ramifications for the reader (e.g., explaining that your plots were made using ggplot2, or including the definition of a p-value after reporting the p-value of your test statistic). Your goal for this paper is to convince a statistically-minded reader (e.g. a student in this class, or a student from another school who has taken an introductory statistics class) that you have addressed an interesting research question in a meaningful way. But even a reader with no background in statistics should be able to read your report and get the gist of it. - -A good example of how the writing in this report might differ from the writing you've done on a HW assignment or exam is how you describe (or rather, don't) describe your null and alternative hypotheses. In a HW assignment or exam, you might be asked to explicitly state your null and alternative hypotheses in words and symbols, and included sentences like 'The null hypothesis is that in the population of all babies, there is no difference in the average birth weight between babies born to smokers babies born to non-smokers' or equations like $\mu_A – \mu_B = 0$. But in a research paper, you would not include such literal descriptions of the null hypothesis; rather, you would explain your research question ('Does the smoking behavior of pregnant women affect their baby's birth weight?') and how you chose to address this research question with data analysis (e.g., a two-sample t-test, using a two-tailed p-value). In other words, you shouldn't explicitly write the null and alternative hypotheses, but a statistically literate reader should be able to answer the question 'What are the null and alternative hypotheses being tested?' based on your writing. - -This report is not simply a dump of all the figures, tables, and calculations that you made during this project, or a story about all the things you tried out in the beginning of the project but didn't end up keeping. Rather, the technical report should be focused and concise, and based on the minimal set of R code that is necessary to understand your results and findings in full. If you make a claim about your research question or data, it must be justified by explicit calculation. A knowledgeable reviewer should be able to run each line of code in your .qmd or .Rmd file without modification, and verify every statement that you have made. - -And even though your report may be written in a Quarto document, and rendered in RStudio, its primary purpose is to be a narrative report that describes how you addressed your research question, not a computer program or a lab notebook with technical details and no context. You should not present tables, figures, or calculations without a written explanation of the information that is supposed to be conveyed by that table or figure. Keep in mind the distinction between data and information. For example, simply displaying a table with the means and standard deviations of your variables with not explanation is not meaningful. And adding an accompanying sentence that reiterates the content of the table (e.g. 'the mean of variable x was 34.5 and the standard deviation was 2.8...') is equally meaningless. What you should strive to do is interpret these values in context (e.g. 'although variables $x_1$ and $x_2$ have similar means, the variability of $x_1$ is much larger, suggesting...'). - -## Written Report Grading - -Below is a rough list of features I will be looking for when judging the quality of the final project report. Each section will be graded out of 5 points, for a total of 25 points: (5) Exceptional, (4) very good, (3) satisfactory, (2) below expectations, (1) reasonable attempt. - -**Introduction** - -- Is the topic clearly explained? - -- Is the research question / goals for the analysis clearly explained? - -- Is there any justification or motivation for this study? - -- Are the outcome and explanatory variables of interest clearly identified? - -- Are important parts of upcoming analysis and conclusion foreshadowed? - -- Is the source of the data clearly explained and cited? - -**Exploratory Data Analysis** - -- Are any important data wrangling steps (e.g., filtering missing values, recoding variables) and any impacts of these steps clearly described? - -- Is the number of observations clearly identified? - -- Are the distributions of each outcome and explanatory variable visualized and clearly described? Are the distributions summarized with appropriate statistics? - -- Are the relationships between variables explored with visualizations? Are these relationships discussed and interpreted in writing? - -- Do all figures and tables have captions, and is the content of figures/tables sufficiently unpackage/explained for the reader? - -- Are the content of the figures and tables overly redundant (they should not be!) - -**Results** - -- Is the type of test you are using made clear? - -- Is the relationship between the research question and hypothesis test made clear? - -- Are the test statistic, p-value, degrees of freedom (t-test only) and alpha level clearly explained? - -- Are the results of the test clear? - -- Are you including a confidence interval around the statistics relevant for your hypothesis test? - -- Is your Null Hypothesis Distribution shown? - -- Are your model's coefficients clearly interpreted, and is the fitted regression equation presented (regression only) ? - -- Are your hypothesis test's assumptions, if any, investigated to make sure they are reasonable assumptions to make? (These assumptions may be assessed in an appendix section) - -**Discussion** - -- Are all the conclusions explained in a clear and straightforward manner such a that person without statistical training or domain expertise can understand them? - -- Are the results of your analysis clearly tied back to your analysis goals and research questions? - -- Is there any discussion of what the results mean (e.g., are the results and conclusion connected back the motivation for your study?) - -- Are limitations of your analysis and/or data acknowledged? - -- Does your discussion include a brief exploratory analysis showing how your project could be extended to include additional explanatory variables in the future? - -**Writing Style** - -- Overall throughout the report, are the ideas expressed precisely and concisely? - -- Overall throughout the report, is the writing and exposition in the report geared towards someone with a similar level of statistical training (e.g., the writing doesn't rely heavily on jargon, but also doesn't elaborate on minute details that don't benefit the readers understanding of the research). - -- Does the report include a citaiton for the source of the data, and reference at least one outside source, using approrpriate in-text citations and a formatted bibliography? - -- Is the R code hidden from the reader, and is the report free from messages, warning messages, and error messages? - -- Is the output from your R code appropriately styled (i.e., are tables formatted as tables?) and is the report free from 'random' outputs (e.g., a p-value being printed out in the middle of a paragraph) - -## Reflection - -You will write a short reflection piece describing what you learned from working on this project and how well you and your group navigated the collaborative learning process. - -Details coming soon! +- If you would prefer to not post a website online, you still have to make one, then you have to invite me to your repo so I can download it and make sure it works. You have to let me know you are doing this before the due date. diff --git a/docs/course-materials/Final-Project/Project.html b/docs/course-materials/Final-Project/Project.html index f0a08b9..61e14bc 100644 --- a/docs/course-materials/Final-Project/Project.html +++ b/docs/course-materials/Final-Project/Project.html @@ -2,12 +2,13 @@
- + + -