Bash:Bash-Text files

From Juneday education
(Redirected from Chapter:Bash-Text files)
Jump to: navigation, search


Meta information about this chapter

Expand using link to the right to see the full content.

Books this chapter belongs to

Introduction

This chapter introduces text files and common commands to operate on text and text files.

Purpose

The purpose of this chapter is to prepare the students for manipulating text and text files using common bash filters and commands. This is a central skill for most tasks involving computers, be it systems administration, databases or programming. If we take programming as an example, the practice of programming is to do with creating and changing text files with source code. Code checkers and compilers deal with said text files and it is common to use build tools or setup scripts for testing and deployment. A good understanding of the text manipulating capabilities of the shell is a great help for anyone working with computers.

For database students, it might be interesting to compare the searching and filtering of text files to the use of SQL for similar tasks. Another point is that backup, import and export of databases typically is done using scripts or directly on the command line.

Systems administration and operating systems students will be exposed to a plethora of configuration files and batch scripts. Knowing how to work with text and text files is, in our belief, a great help for such students and professionals.

Goal

The goal of this chapter is to give a basic knowledge of text and text file manipulation using filters and other commands.

Instructions to the teacher

Common problems

Some students have a hard time understanding the concept of text files. Text files are of course a special case of digital files in that we can "look at them" using the built-in support for interpreting text which comes with the shell.

Some concept regarding text and text files which are often perceived as magical or mysterious include escape sequences and special control characters such as newline and end-of-file. The difference between how Windows encodes newline and "carriage return" compared to most other systems is a common cause for confusion, e.g. when students open a source code file, created on GNU/Linux, and opens it in Windows using for instance Notepad. Using cygwin and more capable editors than Notepad greatly helps with this. Another possible cause for confusion (particularly for programming students) involve indentation with a mix of TAB and spaces (which look the same to the students but may very well be interpreted miserably differently by various editors and other software.

The encoding of text might also be worth bringing up here. A common surprise for students is when they write some software which outputs characters from e.g. the Swedish charset and it looks fine on one system but looks like crap on other systems.

Videos

Lecture videos

Live coding videos

English live coding:

Swedish live coding:

Exercises

Ex 01 - A closer look at echo for outputting text

We suggest watching: Order and creating files (eng) before doing this exercise.

From the previous chapter, you know how to create a directory and enter it. Create a directory called Ex00­text in your home directory and enter it:

$ mkdir Ex00­text
$ cd Ex00­text

Now that we have a dedicated directory for this session, we can contain the files we create to one place and we’ll be able to find them later.

But before we create any files here, we will look at the echo command. This command is used for printing text to the standard out stream (which in a terminal is the terminal itself, unless we tell the shell that standard out should go somewhere else). It works a bit like the print family of functions in most programming languages.

Start by echoing some text to the terminal:

$ echo "hello bash"

That wasn’t so hard, was it? Let’s learn some more about echo! How can I echo a string that contains a " (double quote)? There are (at least) two ways. Either we enclose the whole string in single quotes:

$ echo 'This is a double quote: "'

The result will be:

This is a double quote: "

Or, we can escape the double quote when inside double quotes (including the resulting output):

$ echo "Here's another one: \""
Here's another one: "

OK, but how do I echo a single quote when inside single quotes? Well, you don’t but you can add an escaped one at the end:

$ echo 'Here is a single quote: '\'
Here is a single quote: '

Note that the above is composed of two "strings". First one using single quotes: 'Here is a single quote: ' then, immediately after it a single escaped "single quoute": \' .

What else is the difference between double and single quotes? A lot. For instance, inside double quotes, variables are expanded to some value. But inside single quotes, text is treated verbatim. Example:

$ var="Hello bash"
$ echo '$var'
$var
$ echo "$var"
Hello bash

(A variable is a named memory where we can store some value - see for instance UNIX - using variables or Bash variables)

We declared a variable called var and initialized it to “Hello bash”. Then we echoed $var inside single quotes. It was echoed verbatim. Next we echoed $var insided double quotes, and the value of the variable was echoed instead.

Bash has a number of very useful expansion features as well. Try this:

$ echo Number{1,2,3,4,5}

What was printed on the screen? And what about this:

$ echo Number{1..9}

Expand using link to the right to see the solution.

Solution to ex 01

A major takeaway from the exercise 01 is that you can use either double quotes or single quotes around text you want to echo. If you don't want bash to expand (interpret or evaluate) special characters such as the dollar in $varable_name you can use single quotes. Single quotes is mostly used when you want to express "exactly this text as I wrote it".

Regarding the last parts using {some expression} here's what the result is:

$ echo Number{1,2,3,4,5}
Number1 Number2 Number3 Number4 Number5
$ echo Number{1..9}
Number1 Number2 Number3 Number4 Number5 Number6 Number7 Number8 Number9
$

The trick is called "brace expansion" or "curly brace expansion". It is a useful and powerful technique and you can read about it and some examples here (Linux Journal). Imagine if you have files which follow some sequential pattern, like file1 file2 file3 file4. Then you can express that list of files using file{1..4} and apply some command or action on all of them. Very useful indeed.

Ex 02 - Creating text files

We suggest watching: Order and creating files (eng) before doing this exercise.

OK, now we know a little about printing text. Let’s create a text file. A common way is of course to use an editor for that. We strongly advise against using Notepad. There are a number of superior alternatives. We recommend using the Atom editor. If you really want to become efficient, we recommend to learn an editor that you can run inside the terminal, such as emacs or vim. With that, we mean editors you can start interactively inside the terminal window without opening a new window. But those editors are subject to courses dedicated solely to learning them, so we’ll let you choose an editor of your choice. Be aware though, that notepad is not playing well at all with cygwin and UNIX­-like systems, because it insists on encoding newlines in a way specific to Windows, when most other editors in windows work fine for text files created in cygwin and other UNIX-­like environments (this is, of course part of the reason we advice against using Notepad).

Let’s assume we have installed Atom and want to create a text file here in our Ex00­text directory. We’d do that by starting Atom from the command line (after changing directory to Ex00text) and give it the file name of the file we want to create:

$ Atom.cmd flowers.txt

Since there is no such file, Atom is kind enough to create it. Type in some flowers, one on each line. Save the file and return to cygwin (click on the terminal to activate that window again). Then type ls to see if the file was created in the current directory:

$ ls
flowers.txt

As you see, the file is there.

Expand using link to the right to see the solution.

Solution to ex 02

This is not really a solution but rather a comment about the exercise. It is important that you get familiar with using at least one decent text editor (like Atom), and this is regardless of whether you plan to study or work with systems administration, databases or programming (or even web development). Spend some time getting familiar with the editor you choose to use and learn how to create new files directly from the command line, how to open and edit files directly from the command line. This will save you a great deal of time compared to using the command line as a separate tool and some editor as another separate tool, because often you are using the files from the command line. Therefor it is only natural to open or create the files directly from the terminal - you are probably already doing some work in the directory of the file or somewhere close to it.

Ex03 - Text files - printing to screen using cat

We suggest watching: Order and creating files (eng) before doing this exercise.

Let’s see what flowers I (or you, if you put some yourself!) put in the file. But how can we inspect a text file inside the terminal? There is an app for that! The command is cat, which is a wonderful and very capable command. Let’s run cat and give it flowers.txt as the argument:

$ cat flowers.txt
Tulip
Rose
Sunflower
Daisy

So, we can use cat in order to print the contents of a file to standard output (a stream that by default is connected to our terminal window). For those of you who know Java, standard out is the stream connected to the System.out object. But, we’ll talk more about streams later on. Let’s instead see what more we can do with out text file. Let’s run cat backwards. That would be tac ­(cat spelled backwards):

$ tac flowers.txt
Daisy
Sunflower
Rose
Tulip

Our file is printed in reverse order. What if we wanted to print the text of each line in reverse? There is of course a command for that too! Try this:

$ rev flowers.txt
piluT
esoR
rewolfnuS
ysiaD

Expand using link to the right to see the solution.

Solution to ex 03

Using cat to output a text file directly to the terminal is a very useful practice. If you just want to inspect some (small) text file, there is often no need to start an editor just to look at the contents. As you will learn in later chapters, you can even use cat to create brand new text files, which is a great time saver. We want to inspire you to learn many of the tools of the great toolbox which comes with cygwin and bash.

Ex04 - Textfiles searching for text using grep

We suggest watching Grep: (eng) before doing this exercise.

We can also search inside a text for a string of text. For this, we’ll use the grep (or egrep) command(s). To make it more interesting, I’ve added some more flowers to the file (do it yourself and try this!):

$ cat flowers.txt
Tulip
Rose
Sunflower
Daisy
Violet
Primrose
Daffodil
Forget Me Not
Edelweiss
Foxglove
Gerbera
Gladiolus
Crocus
Lily
Petunia
Narcissus
Mimosa
Lotus
Holly
Iris
Dahlia

Now, let’s say we want to search in the file and list (print to the standard out) only flowers that contain the letter ‘t’. We use the command grep for that:

$ grep t flowers.txt
Violet
Forget Me Not
Petunia
Lotus

What about if we don’t care if it’s lower or upper case ‘t’?

$ grep -­i t flowers.txt
Tulip
Violet
Forget Me Not
Petunia
Lotus

The flag (or option as we also call it) ­-i stands for “ignore case”. It seems to work fine. What about the string “ol”?

$ grep ol flowers.txt
Violet
Gladiolus
Holly

If we hate flowers that contain the letter “i”, can we grep for all lines that do not contain “i”? Of course we can. There’s an option for that! The flag ­-v means (inVert match) or “does not match”:

$ grep ­-v i flowers.txt
Rose
Sunflower
Forget Me Not
Foxglove
Gerbera
Crocus
Lotus
Holly

Expand using link to the right to see the solution.

Solution to ex 04

Once again, using grep is extremely useful in order to get some work done without having to leave the terminal and start an editor if all we want to do is to filter out some lines from a file. It is actually very common that you are only interesting in some lines of some file according to some search criteria. It could be as simple as "I remember that I had a file with names of flowers, which file was it?". You can then use grep to invesigate a lot of files:

$ grep tulip flowers.txt cars.txt domains.txt 
flowers.txt:tulip

As you see, only one file matched "tulip" (even if the file name was a hint here) and grep answers with flowers.txt:tulip. If we search for something which is not present in any file, that can also be useful information:

$ grep monkey flowers.txt cars.txt domains.txt 
$

As you see here (try it yourself - create a bunch of files and play around with grep!), the result was nothing, meaning that the string "monkey" was not present in any of the files.

There are a lot of other uses for grep and we recommend that you play with it.

Ex05 - sorting text

We suggest watching Sort: (eng) before doing this exercise.

We can also sort text (lexicographically). The natural ordering of text is based on the ascii table of characters. It is like the phone book ordering (alphabetically) for normal text, only that upper case letters come before lower case letters. And the ascii table has also many non­alphabetical characters, of course, but for text files with mostly words in them, sorting is quite intuitive. Most of the time, when we sort text, we mean alphabetically, which works as we expect it.

The command sort can do all kinds of sorting for us. Let’s ask it to sort the text in the file and print the result for us. Note that the file will not be changed by sort, sort only sorts the contents for printing to standard out:

$ sort flowers.txt
Crocus
Daffodil
Dahlia
Daisy
Edelweiss
Forget Me Not
Foxglove
Gerbera
Gladiolus
Holly
Iris
Lily
Lotus
Mimosa
Narcissus
Petunia
Primrose
Rose
Sunflower
Tulip
Violet

We can ask sort to do a reverse sort as well:

$ sort -­r flowers.txt
Violet
Tulip
Sunflower
Rose
Primrose
Petunia
Narcissus
Mimosa
Lotus
Lily
Iris
Holly
Gladiolus
Gerbera
Foxglove
Forget Me Not
Edelweiss
Daisy
Dahlia
Daffodil
Crocus

Expand using link to the right to see the solution.

Solution to ex 05

Sorting text is actually also quite common. As you will see in later chapters, most of the commands from this chapter can be combined together to perform some pretty powerful text manipulation. A take-away from this small exercise is that you don't have to write your own program just to sort some text - the sort command is already created and just waiting to be put to work.

Ex06 - getting statistics from text and text files

For those of you who are statistics freaks, we can also ask for information about the text content of the file. How many lines (here: flowers) are there? How many characters are there in the file? How many words?

$ wc -­l flowers.txt
21 flowers.txt
$ wc -­c flowers.txt
164 flowers.txt
$ wc ­-w flowers.txt
23 flowers.txt

To get all information at the same time, you may leave out the options/flags and just run wc on the file. The command wc can be thought of as being named after a short for “word count”.

Expand using link to the right to see the solution.

Solution to ex 06

The word counting command wc is also very useful. It can be used to answer questions like "In this huge text file of email addresses, how many addresses have a ".com" top domain? You could first grep for ".com" and save the output in some file, and then run wc -l on that file to see how many lines were in it. Actually you would probably use pipes for this, but that's another topic for a later chapter! (Pipes is the way to combine commands in bash)

Ex07 - Heads and tails of text and text files

We can also inspect the beginning or the end of a file (or text stream). Let’s list the first five lines of flowers, using the head command:

$ head -­5 flowers.txt
Tulip
Rose
Sunflower
Daisy
Violet

And the last seven lines:

$ tail ­-7 flowers.txt
Petunia
Narcissus
Mimosa
Lotus
Holly
Iris
Dahlia

Expand using link to the right to see the solution.

Solution to ex 07

Taking the head or tail off some text or text file can also prove to be very useful. If, for instance, you are dealing with text files which all have some common header that you want to read or use, you could get the first N lines of code using head. Similarly, sometimes you are only interested in, say, the last lines of some files. Maybe you have some files with some number crunching and in every file, the last line contains a sum. Now, you could use tail -1 (the flag is "minus one" for "one line from the end") to only get the sum out of these files. And as said above, in particular when combined with other commands, head and tail can prove to be extremly powerful.

Ex08 - Compressing text and text files

We suggest watching Zip and unzip (eng) before doing this exercise.


It is sometimes useful to compress a text file to save space. To do this one may use zip or some other compression tool. To compress the flowers.txt and save the compressed version in flowers.zip one may do this:

$ zip flowers.zip flowers.txt
adding: flowers.txt (deflated 23%)

(You may need to install zip and unzip, using the cygwin installer if you don't have them installed)

If you want to unzip (decompress) the flowers.zip later on, you just run unzip with the file name flowers.zip as the argument.

Note! To zip a directory and all its contents, do zip -r zipfilename.zip directoryname. The -r flag means "recursively" and creates an archive with the directory and all its contents inside. When unzipping a zip file with a directory like this, unzip will first create the directory and then put all the contents inside it. Check with unzip -l if you want to check what's inside a zip archive.

Expand using link to the right to see the solution.

Solution to ex 08

This exercise was mostly to show you that as with most cases, you don't need to leave the terminal only to compress a file using the zip compression. You can zip (compress) and unzip (extract or uncompress) files directly on the command line. Note that you will probably have to install both zip and unzip using the cygwin installer in order to test these commands. Unfortunately, they are not part of a standard installation but it is very simple to add packages of software to your cygwin installation. We recommend that you install both zip and unzip, not only to test them out, but because it is so common to find zip files (particularly in a Windows environment). In the UNIX-world, there are other common archiving and compression programs such as tar, gzip, gunzip and more.

Ex09 - Manipulating zipped files

We suggest watching Zcat and zgrep (eng) before doing this exercise.

Note to Mac OS users: On Mac OS, you can only use zcat on gzip files, and you use the following syntax: $ zcat < flowers.txt.gz and you create flowers.txt.gz this way: $ gzip flowers.txt (which will replace flowers.txt with the gzipped file flowers.txt.gz).

Most of the text tools have a version for compressed input. You may for instance cat the contents of flowers.zip using zcat:

$ zcat flowers.zip

Try it! And you can also try: zgrep on the flowers.zip. It works like grep but operates on compressed texts.

Expand using link to the right to see the solution.

Solution to ex 09

As you probably have guessed by now, this exercise was mostly to show you how to be lazy and efficient. If you have a zipped text file and you want to use grep in order to look for some string inside the file, it is not necessary to first unzip the file. If you have zgrep installed, you can grep in the zipped file directly.

Links

External links

Where to go next

Next chapter is about globbing.

« PreviousBook TOCNext »