Simple Introduction to UNIX

R.B. Lammers, updated Feb 3, 2003

This introduction assumes you know a little about UNIX (you know about simple navigation of the directory structure and can view files with ls, cd, more, pwd, cp, mv, ln -s, chmod). See Simpler Introduction to UNIX - The Basics for more details.

The following commands will allow you to process text files.

Oct 23, 2002: See also a recent article in Linux Journal: "Dogs" of the Linux Shell by L.J. Iacona. Several of the commands in the article appear quite useful (e.g. tac, fold, and dirname).


man	get manual page on a UNIX command

	example: man uniq


cut	extract columns of data

	example: cut -f -3,5,7-9 -d ' ' infile1 > outfile1

		-f 2,4-6 	field 
		-c 35-44	character
		-d ':'	delimiter (default is a tab)


sort	sort lines of a file  (Warning: default delimiter is white space/character transition)

	example: sort -nr infile1 | more

		-n	numeric sort
		-r 	reverse sort
		-k 3,5	start key 


wc     count lines, words, and characters in a file

        example: wc -l infile1

                -l      count lines
                -w      count words
                -c 	count characters


paste	reattach columns of data

	example: paste infile1 infile2 > outfile2


cat	concatenate files together

	example: cat infile1 infile2 > outfile2

		-n	number lines
		-vet	show non-printing characters (good 
			for finding problems)


uniq	remove duplicate lines (normally from a sorted file)

	example: sort infile1 | uniq -c > outfile2

		-c 	show count of lines
		-d 	only show duplicate lines


join	perform a relational join on two files

	example: join -1 1 -2 3 infile1 infile2 > outfile1

		-1 FIELD	join field of infile1
		-2 FIELD	join field of infile2


cmp	compare two files

	example: cmp infile1 infile2


diff or diff3	compare 2 or 3 files - show differences

	example: diff infile1 infile2 | more
	example: diff3 infile1 infile2 infile3 > outfile1


head	extract lines from a file counting from the beginning

	example: head -100 infile1 > outfile1


tail	extract lines from a file counting from the end

	example: tail +2 infile1 > outfile1

		-n	count from end of file (n is an integer)
		+n	count from beginning of file (n is an integer)


dos2unix convert dos-based characters to UNIX format (the file is 
		overwritten).

	example: dos2unix infile1


tr	translate characters - example shows replacement of spaces 
		with newline character

	example: tr " "  "[\012*]" < infile1 > outfile


grep	extract lines from a file based on search strings and 
		regular expressions

	example: grep 'Basin1' infile1 > outfile2
	example: grep -E '15:20|15:01' infile1 | more


sed	search and replace parts of a file based on regular 
		expressions

	example: sed -e 's/450/45/g' infile1 > outfile3


Regular Expressions

Regular expressions can be used with many programs including ls, grep, sed, 
vi, emacs, perl, etc.  Be aware that each program has variations on usage.

ls examples:

	ls Data*.txt
	ls Data4[5-9].ps	list ps files beginning with Data numbered 45-49

sed examples: (these are the regex part of the sed command only)

	s/450/45/g		search for '450' replace with '45' everywhere
	s/99/-9999\.00/g	search for all '99' replace with '-9999.00' 
	s/Basin[0-9]//g		remove the word Basin followed by a single digit
	s/^12/12XX/		search for '12' at the beginning of a line, 
				insert XX
	s/Basin$//		remove the word Basin if it is at the end of 
				the line.
	s/^Basin$//		remove the word Basin if it is the only word on 
				the line.
	s/[cC]/100/g		search for 'c' or 'C' replace with 100

	45,$s/\([0-9][0-9]\)\.\([0-9][0-9]\)/\2\.\1/g
				on lines 45 to the end of file, search for two digits
				followed by a '.' followed by two digits.  replace
				with the digit pairs reversed.

	2,$s/,\([^,]*\),/,\"\1\",/
				on all lines except the first, search for a comma,
				followed by any text, followed by a comma.  replace
				the found text surrounded by double quotes. 
  
	s/\([0-9][0-9]\):\([0-9][0-9]\):\([0-9][0-9][0-9][0-9]\)/Year = \3, Month = \2, Day = \1/
				search for 2 digits, followed by a colon, followed by 2 digits, 
				followed by a colon, followed by 4 digits.  replace with
				text plus values in a different order.


Pipes, standard input, standard output:

Standard output, ">", places the results of a command into the file named 
after the ">".  A new file will be written (an old file with the same name 
will be removed).  In order to append to an existing file use ">>".

Pipes allow you to connect multiple commands together to form a data stream.  
For example, to count the number of times the string "Nile" occurs in the 
3rd column of a file run this:

	cut -f 3 infile1 | sort | uniq -c | grep 'Nile'

or do this:

	cut -f 3 infile1 | grep 'Nile' | wc -l


From a global STN Attributes data set (tab delimited):

  - extract all North American basins draining into the Atlantic Ocean
  - select only columns 2,3,4,5,11,12,13, and 17
  - replace all missing data values (either -99 or -999) with -9999.0
  - remove duplicate lines
  - sort by the first column
  - number all lines sequentially 
  - save to a new file

grep 'North America' STNAttributes.txt | grep 'Atlantic Ocean' \
  | cut -f 2-5,11-13,17 | sed -e 's/-99\|-999/-9999\.0/g'      \
  | sort | uniq | cat -n > NewSTNAttributes.txt


--
This page produced by

Richard Lammers
Water Systems Analysis Group
Institute for the Study of Earth, Oceans, and Space
University of New Hampshire
Durham, NH 03824


Water Systems Analysis Group | CSRC | EOS | UNH