Tuesday, November 17, 2009

A crash course in Linux file commands for the newly initiated



Although GUI desktops such as KDE and GNOME help users take advantage of Linux features without functional knowledge of the command-line interface, more power and flexibility are often required. Moreover, a basic familiarity with these commands is still essential to properly automate certain functions in a shell script.

This article is a "crash course" in Linux file commands for those who are either new to the operating system or simply in need of a refresher. It includes a brief overview of the more useful commands as well as guidance regarding their most powerful applications. Combined with a little experimentation, the information included here should lead to an easy mastery of these essential commands. (Note: When a kernel tweaked with Oracle Cluster File System (OCFS) is involved, some of these commands may behave somewhat differently. In that case, Oracle provides an OCFS toolset that can be a better alternative for file command purposes.)

Note that all the included examples were tested on SUSE Linux 8.0 Professional. While there is no reason to believe they will not work on other systems, you should check your documentation for possible variations if problems arise.

Background Concepts

Before delving into specifics, let's review some basics.

Files and Commands

Everything is treated as a file in the Linux/UNIX operating system: hardware devices, including the keyboard and the terminal, directories, the commands themselves and, of course, files. This curious convention is, in fact, the basis for the power and flexibility of Linux/UNIX.
Most commands, with few variations, take the form:
 
command [option] [source file(s)] [target file]

Getting Help

Among the most useful commands, especially for those learning Linux, are those that provide help. Two important sources of information in Linux are the on-line reference manuals, or man pages, and the whatis facility. You can access an unfamiliar command's man page description with the whatis command.
 
$ whatis echo

To learn more about that command use:
 
$ man  echo 

If you do not know the command needed for a specific task, you can generate possibilities using man -k, also known 
as apropos, and a topic. For example:
 
$ man -k files

One useful but often-overlooked command provides information about using man itself:
 
$ man man

You can scroll through any man page using the SPACEBAR; the UP ARROW will scroll backward through the file. To quit, enter q,!, or CTRL-Z.

Classes of Users

Remember the adage "All animals are equal but some animals are more equal than others?" In the Linux world, the root user rules.

The root user can log in from another username as su, derived from "superuser." To perform tasks such as adding a new user, printer, or file system, log in as root or change to superuser with the su command and root password. System files, including those that control the initialization process, belong to root. While they may be available for regular users to read, for the sake of your system security, the right to edit should be reserved for root.

The BASH shell

Other shells are available, but BASH, the Bourne Again Shell, is the Linux default. It incorporates features of its namesake, the Bourne shell, and those of the Korn, C and TCSH shells.

The BASH built-in command history remembers the last 500 commands entered by default. They can be viewed by entering history at the command prompt. A specific command is retrieved by pressing the UP ARROW or DOWN ARROW at the command prompt, or by entering its number in the history list preceded by "!", such as:
 
$ !49

You can also execute a command by its offset from the highest entry in the history list: $ !-3 would execute event number 51, if there were 53 events in the history list.

Like other shells in the UNIX/Linux world, BASH uses special environment variables to facilitate system administration. Some examples are:
  • HOME, the user's home directory
  • PATH, the search path Linux uses to search for executable images of commands you enter
  • HISTSIZE, the number of history events saved by your system

In addition to these reserved keywords, you can define your own environment variables. Oracle, for example, uses ORACLE_HOME, among other variables, that must be set in your environment for an Oracle installation to complete successfully.

Variables can be set temporarily at the prompt:
 
$HISTSIZE=100

Or, set permanently, either on a system wide basis in /etc/profile (requires root privileges), or locally in .profile.
The value of an environment variable can be viewed with the echo command using a $ to access the value.
 
$ echo $HOME
 
/home/bluher

All current environment variables can be viewed with env.

Regular Expressions and Wildcards

Many Linux commands use the wildcards * and ? to match any number of characters or any single character respectively; regular pattern-matching expressions utilize a period (.) to match any single character except "new line." Both use square brackets ([ ]) to match sets of characters in addition to *. The *, however, has a similar but different meaning in each case: Although it will match one or more characters in the shell, it matches zero or more instances of the preceding character in a regular expression. Some commands, like egrep and awk, use a wider set of special characters for pattern matching.

File Manipulation Commands

Anatomy of a File Listing

The ls command, used to view lists of files in any directory to which a user has execute permission, has many interesting options. For example:
$ ls -liah *
22684 -rw-r--r--    1 bluher   users         952 Dec 28 18:43 .profile
19942 -rw-r--r--    1 scalish  users          30 Jan  3 20:00 test2.out
  925 -rwxr-xr-x    1 scalish  users         378 Sep  2  2002 test.sh

The listing above shows 8 columns:
  • The first column indicates the inode of the file, because we used the -i option. The remaining columns are normally displayed with the -l option.
  • The second column shows the file type and file access permissions.
  • The third shows the number of links, including directories.
  • The fourth and fifth columns show the owner and the group owner of the files. Here, the owner "bluher" belongs to the group "users".
  • The sixth column displays the file size with the units displayed, rather than the default number of bytes, because we used the -h option.
  • The seventh column shows the date, which looks like three columns consisting of the month, day and year or time of day.
  • The eighth column has the filenames. Use of -a in the option list causes the list of hidden files, like .profile, to be included in the listing.

Working with Files

Files and directories can be moved (mv), copied (cp) or removed (rm). Judicious use of the -i option to get confirmation is usually a good idea.
 
$ cp -i ls.out ls2.out
 
cp: overwrite `ls2.out'?

The mv command allows the -b option, which makes a backup copy before moving files. Both rm and cp accept the powerful, but dangerous, -r option, which operates recursively on a directory and its files.
 
$ rm -ir Test
 
rm: descend into directory `Test'? y

Directories can be created with mkdir and removed with rmdir. However, because a directory containing files cannot be removed with rmdir, it is frequently more convenient to use rm with the -r option.

All files have ownership and protections for security reasons. The file access permissions, or filemode, comprise the same 10 characters described previously:
  • The first character indicates the type of file. The most common are - for a file, d for a directory, and l for a link.
  • The next nine characters are access permissions for three classes of users: the file owner (characters 2-4), the user's group (5-7) and others (8-10), where r signifies read permission, w means write permission, and x designates execute permission on a file. A dash, -, found in any of these nine positions indicates that action is prohibited by that class of user.

Access permissions can be set with character symbols or, binary masks using the chmod command. To use the binary masks, convert the character representation of the three groups of permissions into binary and then into octal format:
User class:
Owner
Group
Others
character representation:
rwx
r-x
r--
binary representation:
111
101
100
octal representation:
7
5
4









To give write permission to the group, you could use:
 
chmod g+w test.sh or chmod 774 test.sh

Default file permissions are set with the umask command, either systemwide in the /etc/init.dev file or locally in the .profile file. This command indicates the number to be subtracted from 777 to obtain the default permissions:
 
$ umask 022

This would result in a default file permission of 755 for all new files created by the user.

A file's ownership can be changed with chown:
 
$ chown bluher ls.out

Here, bluher is the new file owner. Similarly, group membership would be changed as follows:
 
$ chgrp devgrp ls.out

Here, devgrp is the new group.

One piece of information that ls does not provide is which files are text, and which are binary. To find this information, you can use the file * command.

Renaming Files

Two popular ways to give a file more than one name are with links and the alias command. Alias can be used to rename a longer command to something more convenient such as:
 
$ alias ll='ls -l'
$ ll

Notice the use of single quotes so that BASH passes the term on to alias instead of evaluating it itself. Alias can also be used as an abbreviation for lengthy pathnames:
 
$ alias jdev9i=/jdev9i/jdev/bin/jdev

For more information on alias and its counter-command unalias, check the man page for BASH, under the subsection "SHELL BUILTIN COMMANDS". In the last example, an environment variable could have been defined to accomplish the same result.
 
$ export JDEV_HOME=/jdev9i/jdev/bin/jdev
$ echo $JDEV_HOME
/jdev9i/jdev/bin/jdev
$ $JDEV_HOME

Links allow several filenames to refer to a single source file using the following format:
 
ln [-s] fileyouwanttolinkto newname

The ln command alone creates a hard link to a file, while using the -s option creates a symbolic link. Briefly, a hard link is almost indistinguishable from the original file, except that the inodes of the two files will be the same. Symbolic links are easier to distinguish because they appear in a long file listing with a -> indicating the source file and an l for the filetype.

Looking In and For Files

File Filters

Commands used to read and perform operations on file contents are sometimes referred to as filters. The sed and awk commands, already discussed at length in previous OTN articles, are two examples of filters that will not be discussed here.

Commands such as cat, more, and less let you view the contents of a text file from the command line, without having to invoke an editor. Cat is short for "concatenate" and will print the file contents to standard output (the screen) by default. One of the most interesting options available with cat is the -n option, which prints the file contents with numbered output lines.
 
$ cat -n test.out
     1  This is a test.

As cat outputs all lines in a file at once, you may prefer to use more and less because they both output file contents one screen at a time. Less is an enhanced version of more that allows key commands from the vi text editor to enhance file viewing. For example, d scrolls forward and b scrolls backward N lines (if N is specified before d or b.) The value entered for N becomes the default for subsequent d commands. The man page utility uses less to display manual contents.

Redirection and Pipes

Redirection allows command output to be "redirected" to a file other than standard output, or, input. The standard symbol for redirection, >, creates a new file. The >> symbol appends output to an existing file:
 
$ more test2.out
  Another test. 
$ cat test.out >> test2.out
$ cat test2.out
 
Another test.
This is a test.

Standard input to a file can be redirected with the < symbol:
 
$ cat < test2.out

Error messages are redirected and appended with 2> and 2>> using the format:
 
$ command 2> name_of_error_file

To avoid unintentionally overwriting an existing file, use the BASH built-in command set:
 
$ set -o noclobber

This feature can be overridden using the >! symbol between your command and output file. To turn it off, use +o in place of -o.

Redirection works between a command, or file, and a file. One term of the redirection statement must be a file.

Pipes use the |symbol and work between commands. For instance, you could send the output of a command directly to the printer with:
 
$ ls -l * | lpr

A command in the history list can be found quickly with:
 
$ history | grep cat

More Filters

Grep, fgrep and egrep all print lines matching a pattern. All three commands search files for a specified pattern, 
which is helpful if you can't remember the name of a needed file. The basic format is:
grep [options] PATTERN [FILE...]
 
$ grep -r 'Subject' nsmail 

CTRL-Z will terminate output of the above or any other command.
Perhaps the most useful option with grep is -s. If you search through system files as anything other than root, error messages will be generated for every file to which you do not have access permission. This command suppresses those messages.

Fgrep, also invoked as grep -F, looks only for fixed strings, rather than the regular expressions that grep accepts. While egrep accepts patterns containing a wider selection of special characters, such as |, which signifies the conditional OR operator.
 
$ egrep 'Subject|mailto' *

Finding Files

The GNU version of the find command is powerful, flexible and more forgiving than classic versions found on UNIX systems. It is useful for tasks involving a directory structure, including finding and executing commands on files. The basic format of the find command is:
 
$ find startdirectory options matchcriteria [actionoptions]

If you know the name of a file, or even part of the name, but not the directory it is in, you can do this:
 
$ find . -name 'test*'
 
./test
./jdevhome/mywork/EmpWS/EmpBC4J/test

Unlike classic UNIX systems, the -print action at the end is not required in Linux, as it is assumed if no other action option is designated. A dot ( . ) in the startdirectory position causes find to begin a search in your working directory. A double dot, .., begins a search in the parent directory. You can start a search in any directory.
Note that you can use wildcards as part of the search criteria as long as you enclose the whole term in single quotes.
 
$ find . -name 'test*' -print
./test.out
./test2.out

To produce a list of files with the .out extension:
 
$ find /home -name '*.out'
Remember, however, that you will probably get numerous "Permission denied" error messages unless you run the command as supersuser.

One of the most powerful search tools is the -exec action used with grep:
 
$ find . -name '*.html' -exec grep 'mailto:foo@yahoo.com' {} \;

Here we have asked find to start in the current directory,
., look for an html file, *.html, and execute -exec the grep command on the current file, {}. When using the -exec action, a semicolon, ;, is required, as it is for a few other actions when using find. The backslash, \, and quotes are needed to ensure that BASH passes these terms through so they are interpreted by the command rather than the shell.

Now in Command
There are many more useful commands available in Linux, and powerful ways to utilize them, than can be covered here. Moreover, there is often more than one way to accomplish many tasks.

We have looked at only some of the most commonly used and instructive Linux file commands. A mastery of these basic but critical tools should move your Linux education to the fast track. With the man pages at your fingertips, and a willingness to experiment, you now have enough information to begin exploring the power of Linux file operations.
In my next article, I'll provide a similar explanation of Linux system commands.

REFERENCES
Guide to Linux File Command Mastery By Sheryl Calish

No comments:

Post a Comment