Files

A file is a sequence of bytes. A byte is a small chunk of information that is 8 bits long. A bit is one binary digit that is either 0 or 1. Let us create a small file to play around with.

kwrite junk

Add the following two lines to the file and save it.

I am a file.
Are you a file too?

To see the file,

[alice@onyx ~]$ cat junk
I am a file.
Are you a file too?
[alice@onyx ~]$

We can see a visible representation of all the bytes in the file with command od (octal dump):

[alice@onyx ~]$ od -c junk
0000000   I       a   m       a       f   i   l   e   .  \n   A   r   e
0000020       y   o   u       a       f   i   l   e       t   o   o   ?
0000040  \n
0000041
[alice@onyx ~]$

The option -c means "interpret bytes as characters." Turning on the -b option will show the bytes as well:

[alice@onyx ~]$ od -cb junk
0000000   I       a   m       a       f   i   l   e   .  \n   A   r   e
        111 040 141 155 040 141 040 146 151 154 145 056 012 101 162 145
0000020       y   o   u       a       f   i   l   e       t   o   o   ?
        040 171 157 165 040 141 040 146 151 154 145 040 164 157 157 077
0000040  \n
        012
0000041
[alice@onyx ~]$

The 7-digit numbers on the side are the position of the next character (in octal, or base-8). The character at the end of each line from the file has the octal code 012, which is the ASCII code for the newline character. Note that most systems and languages use Unicode for characters (such as Java). Linux uses the UTF-8 encoding for Unicode, which makes any ASCII code also a valid Unicode. There is a lot more to Unicode and character encoding but for now this is sufficient to keep us going.

Some other common special characters are backspace (\b or 010), tab(\t or 011), and carriage return (\r or 015). The codes are again given in octal (or base-8). When we type a command on a line and press Enter, it generates a newline and the characters typed are processed by the system. That means we can backspace and edit the current line as long we as don't press the Enter key.

Note that there is no special character to denote the end of a file. The operating system signifies the end of a file by saying there is no more data in the file. A program will detect this when the next read from the file returns zero bytes.

Typing Ctrl-d sends whatever we have typed so far on the command line to the program that is reading it. So if we haven't typed anything, the program will read no characters, and it will look like the end of the file. That is why typing Ctrl-d logs us out of the terminal.