Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of theming and exercises makes the lesson disjointed #1472

Open
smangham opened this issue Oct 30, 2024 · 2 comments
Open

Lack of theming and exercises makes the lesson disjointed #1472

smangham opened this issue Oct 30, 2024 · 2 comments

Comments

@smangham
Copy link

How could the content be improved?

The lesson's introduced, conceptually, as a realistic research project analysing data files. However it then almost immediately pivots into doing fairly abstract and arbitrary work on thesis.txt, extracts of Little Women, random gene sequences of fictional creatures... these files are then scattered in a bunch of subdirectories. The lesson makes very little use of the actual data.

The exercises are also quite abstract, and heavily focus on multiple choices based on "Look at this example directory tree" - not making use of the actual directory trees in the data we have them download.

I think it'd flow a lot better if:

  1. Basic shell scripts were introduced very early - possibly straight after basics like using wildcards on the command line.
  2. Then, a lot of the multiple choice exercises could be replaced with 'write a shell script that...' which used the actual data directories in the material - so people can poke around and explore to find the answer if they don't know.
  3. Tools were then introduced with use cases for the actual data - e.g.
    • Using 'find' to get a subset of files
    • Using grep to extract a particular ID/date/time of record from that file
    • Using cut to select a particular column
    • Using loops to repeat this for a particular set of parameters
    • Using shell script inputs to allow the user to specify the column

There's a lot of use of wc, sort, head -n and tail -n but I don't think they're that likely to be part of real pipelines. If selecting specific lines is required then sed -n is the realistic option, whilst head and tail should be introduced for their typical uses of peeking at files.

@froggleston
Copy link
Contributor

From my perspective, if a learner has never seen the shell before, let alone heard about scripting, then introducing shell scripts early would greatly increase the cognitive load for a learner. Without a mental model of the filesystem, as they are increasingly used to cloud based solutions where this is often abstracted away, writing scripts in early exercises would be a big ask in my experience. By using simple and abstract exercises, a learner doesn't have to focus on the data itself but can focus on running the commands and getting used to typing commands into the prompt and interpreting the output.

Similarly, the multiple choice exercises attempt to provide formative assessment for the instructor and learners.

Again, I feel this is the goal of the lesson is to introduce novice learners to what is often a completely alien environment, not to attempt to get them writing shell scripts from the outset.

@bkmgit
Copy link
Contributor

bkmgit commented Nov 12, 2024

Thanks for your feedback. A more coherent narrative could be helpful, though using data from a variety of fields is good because the software carpentry curriculum helps people develope software in a variety of fields. Shell scripts are useful, but transitioning to a command line editor takes a bit of time, and so would make learning more challenging. wc and sort are helpful in processing data. sed would require more introduction to regular expressions.

Possibly relevant further reading is Data Science at the Command Line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants