This repository is for internal administration of the course
For the 2024/25 autumn semester the repository will be at https://github.com/sdam-elte/dslab
Course Administrator: David Visontai
The List of projects and applicants is here
Location and time of meetings: The meetings will be held in 5.128 seminar room and will start always at 12:30 PM on Tuesday and end the latest at 14:00. Most of the time students will give presentations and report on their progress, so it is not obligatory to be present during the whole session, but highly recommended.
The goal of the course is to instil practical skills needed for exploratory data analysis. With the acquired knowledge the student shall be able to perform independent research requiring handling of big data. To this end the students will have to explore a couple of longer running projects inspired data intensive problems drawn from multiple fields such as astronomy, genomics and social networks. The students will familiarize themselves with a wide skillset from various software engineering techniques to presenting their well distilled research in a manner that is accessible for the general public.
26/09/2023 - (Meeting) Presentation I: Description of the chosen topic and plan of action (10 minutes maximum fo each presentation)
10/10/2023 - (Meeting and progress report submission) Presentation II.
24/10/2023 - (Meeting and progress report submission) Presentation III.
07/11/2023 - (Meeting and progress report submission) Presentation IV.
28/11/2023 - (Meeting and report submission deadline) Final Presentations.
12/12/2023 - (Meeting) Presentation of the reproduced works
- Final report - 10 points
- Quality of the presentations (mainly the final presentation) - 10 points
- Interactive visualization - 10 points
- Reproducibility - 5 points
There will be a maximum of 15 minutes allocated for each presenter and 5 minutes for further discussions.
- All the data and other necessary files will be accesible in the Kooplex system, in
/v/courses/datascilab.public/
directories - If you'd like to access any large file, that is still not there, please notify the administrator
- Any material should be uploaded to the Kooplex system. If you use another platform for presentation, then supply all necessary informations for accessing that presentation into a file, that will be submitted.
- Large datafiles (> 100MB), that are produced during the workflow and are necessary for obtaining the final results should be kept also in the
/v/courses/datascilab.public/
directory. Before submitting your work, please ask the administrator (David Visontai in this case) to make a copy of it in the right directory
Pleas upload all the notebooks and scripts, that are needed to reproduce the results!
Please, comment all necessary steps, functions etc.! Consider another person's approach who will try to read your code:
- what questions will they ask
- what steps are not obvious
- The notebooks and scripts have to have comments and help text and docstrings bearing in mind that someone in the future might want to reproduce the results
- If you have any technical problem or question, please feel free to file an issue in this github repository so that anyone are able to answer you or see the right answer for that question.