File Types

The structure of a file is determined by the programs that use it. Linux doesn't impose any structure on a file and as a result the system cannot tell us the type of file. However, there is a file command that makes an educated guess. It does not use the file name, as those are conventions and not reliable. Instead it reads the first few hundred bytes from the file and looks for clues to the file type. For example, here is the file command on some typical files.

[alice@localhost ~]$ file textFile ListFiles.java  ListFiles.class test.data
textFile:        ASCII text
ListFiles.java:  C source, ASCII text
ListFiles.class: compiled Java class data, version 52.0 (Java 1.8)
test.data:       data

[alice@localhost ~]$ file /home/alice /usr/bin /usr/bin/ls
/home/alice:     directory
/usr/bin:        directory
/usr/bin/ls:     ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, 
                 interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, 
                 BuildID[sha1]=8f8149dbcfdd68a9e7d0e8d29115d05c390522d0, stripped

Note that the file named textFile contains simple text in it. It guesses the Java source file to be either a C program or an ASCII text file. So in this case, the guess isn't quite spot on. It guesses the type of the compiled Java class correctly. A class file is a binary file, unlike a text file. It does not contain ASCII code or newlines in it. The next file is a data file, which is also a binary format determined by the program that created it. The next two are directories, which are also binary files. The last one is the ls program, which is a compiled program and it correctly prints out various details for it. A compiled program is also a binary file.

The lack of file formats in the operating system is, in general, an advantage. It allows any system programs to work with any file, with only a few exceptions. Text files are in general more flexible than binary files but they are a bit slower than binary files. However, binary files require specialized programs to access them. A common example of binary files are databases.

To show the structure of a binary file, examine the following Java program that creates a simple binary data file.

style=javanonum
\begin{lstlisting}
import java.io.DataOutputStream;
import java.io.File;
import ...
...t(2);
dout.writeInt(3);
dout.writeInt(4);
dout.close();
}
}
\end{lstlisting}

Let us compile and run the above Java program. Then we will examine the output file using od. We will use the -b option to show the bytes. We also examine the file type for the output file.

[alice@localhost ~]$ javac CreateDataFile.java
[alice@localhost ~]$ java CreateDataFile

[alice@localhost ~]$ file out.data 
out.data: data

[alice@localhost ~]$ od -b out.data 
0000000 000 000 000 001 000 000 000 002 000 000 000 003 000 000 000 004
0000020

Note that it shows that there are four integers in the binary data file. Each one takes four bytes. There are no spaces or newline characters in the file as it not a text file.