- Understand the standard input and output of Linux
- Be able to use input and output redirection and pipes
- Be able to count, shuf, sort, find, and replace texts in a file
- Be able to download data from the internet
- Be able to transfer data between remote and local machine
- Understand Linux variables and be able to create one
- Understand and be able to use for loop
In Linux, when you enter a command as an input, you receive an output. It's the basic workflow of Linux.
By default:
- The standard input (stdin) device is the keyboard.
- The standard output (stdout) device is the screen.
The input can be redirected from a file or another command's output. To redirect input from a file, we use the <
symbol followed by the filename. For example:
cat < sample_1.fastq
works the same as cat sample_1.fastq
which prints the file contents on the screen.
We can redirect the output to a file using the >
symbol. For example:
head -n 4 > test.txt
This redirects the output of the command head
to a text file, we can use cat
to view what's in the result file.
Make sure to use correct filename, output redirect overwrite everything in an existing file.
If you want to append contents to an existing file, you can use >>
. Let's try to append something to file test.txt
:
ls >> test.txt
Pipe is another form of redirection, it can redirect the output of the first command to the input of the second command. For example:
# list only the first 5 files in /etc
ls /etc | head -n 5
We can pipe as many commands as we like. For example:
# list only the fifth file in /etc
ls /etc | head -n 5 | tail -n 1
This command is used to count the number of lines, words, and characters in a file.
wc sample_1.fastq
The output should be:
73428 165213 11164775 sample_1.fastq
It is useful when we trying to get an idea of how many data we have inside a file. For example, fastq files are formatted as 4 lines representing one sequence. So, we can know that there are 73428/4=18357 sequences in the sample fastq file.
There are options to only display one data:
wc -l
only shows the number of lines.wc -w
only shows the number of words.wc -c
only shows the number of characters.
shuf
allows you to shuffle a file by lines.
sort
allows you to sort a file by lines.
uniq
allows you to remove duplicates of presorted files.
We won't use these commands in our course but they are very useful. If you would like to learn more about them, you can read the help manual or watch tutorials online.
This command is used to search text in a given file or input, it searches for lines containing a match to given text.
First, let's create a new text file called linux_unix.txt
and input the following information for us to search later.
linux os
unix os
linux is developed from unix
macos is developed from unix
linux and macos are siblings
To search text in a file:
grep 'pattern' [filename]
Please search 'linux', 'unix', and 'macos' in file linux_unix.txt
.
You can use grep
with pipe to search things in an output of another command:
ls | grep 'fastq'
Some options to search with functionality:
grep -i # case-insensitive search
grep -c # print only the count of matching lines
Please use case-insensitive option to search 'Linux' and 'LinuX'.
Please try the -c
option.
The echo command is used to display text or output to the terminal or standard output (stdout). It simply prints the text or variables to the screen. For example, to print a sentence on the screen:
echo "Hello, world"
You can use either double quotes or single quotes to quote the sentence.
This command allows you to translate or delete characters. It cannot take a file as the argument, so we normally use tr
with pipes |
or input redirection <
.
To translate 6 to 7:
echo 'file_6' | tr 6 7
The tr
command cannot translate strings, it only works on characters. For example, if I want to replace the name Jiajia with another name:
echo 'Welcome Jiajia' | tr 'Jiajia' 'David'
It gives us:
Welcome Dddidd
Which is not something we were expecting. That is because it translates each character rather than find the whole thing and replace it. To find and replace strings, we will use another command called sed
.
The tr
command works really well when we want to :
1. Convert between lower and upper case
tr "[:lower:]" "[:upper:]" < linux_unix.txt
2. Convert between space/tab/newline characters etc
# convert space characters to newline characters
tr ' ' '\n' < linux_unix.txt
3. Delete characters
Here we use the -d
option of tr
.
echo 'Hello, world' | tr -d 'o'
Delete all the space characters:
tr -d ' ' < linux_unix.txt
Because the tr
command doesn't work on files directly, you need to use output redirect >
to save the result.
sed
can perform text manipulation on files. It reads input line by line, performs specified operations on the text, and outputs the modified result. sed
supports a variety of commands and options to perform a wide range of text transformations.
Find and replace string:
The syntax is sed s/pattern/replacement/g filename
.
sed 's/linux/unix/g' linux_unix.txt
For more functionality of sed
, you can read the help manual.
This command allows you to download files from a server using various proticols, including HTTP, HTTPS, FTP, FTPS, SCP, SFTP, and more.
Let's use it to download another fastq file from a FTP address.
curl -O ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR258/004/SRR2589044/SRR2589044_1.fastq.gz
Can you explain to me what the -O
means?
scp
allows you to transfer files between a remote machine to a local machine, since we only use our local machine this time we don't need to transfer. But this command is useful when you working on a remote machine.
rsync
allows you to transfer directories with everything in it while scp
only allows file transfer.
In Linux, variables are used to store data, such as text strings, numbers, or arrays, with a name that serves as a reference to that data. The syntax to create a variable is:
variable_name=value
For example, to create a variable called name
with the value John
:
name="John"
To access the value of a variable, we use the $
sign followed by the variable name:
echo "My name is $name"
Now, try using single quotes for the above command:
echo 'My name is $name'
What did you get? It printed out $name
rather than the actual value we refer to. It is because even though both single quotes ''
and double quotes ""
are used to define string literals, they have some differences in behaviours and effects.
Text enclosed in single quotes is treated as a literal string. This means that everything between the single quotes is preserved exactly as written, and no special characters are interpreted.
For example, if you have a variable $var
within single quotes, it will be treated as the literal string "$var", not its value.
Single quotes are often used when you want to ensure that the text remains unchanged, without any variable substitution or interpretation of special characters.
Text enclosed in double quotes allows for variable expansion and interpretation of special characters within the string. For example, if you have a variable $var
within the double quotes, its value will be inserted into the string.
A loop is a powerful programming tool that enables you to execute a set of commands repeatedly. There are two types of loops in Linux, for loop and while loop. Here, we will learn for
loop as it is more frequently used in data analysis.
for
loop allows you to iterate over a list of items and perform actions on each item in the list. The syntax looks like:
for thing in list_of_things
do
command_1
command_2
...
done
To print each of the elements in the list, we can run:
for i in a b c
do
echo $i
done
or
fruits="apple orange banana"
for fruit in $fruits
do
echo "I like $fruit"
done
For loop is very useful when you want to perform the same activity on a list of things.
- OpenAI - ChatGPT