Perl Tutorial
Bioinformatics Orientation 2008
Eric Bishop
Adapted from slides found at:
www.csd.uoc.gr/~hy439/Perl.ppt
original author is not indicated 1
Why Perl?
Perl is built around regular expressions
REs are good for string processing
Therefore Perl is a good scripting language
Perl is especially popular for CGI scripts
Perl makes full use of the power of UNIX
Short Perl programs can be very short
“Perl is designed to make the easy jobs easy,
without making the difficult jobs impossible.” --
Larry Wall, Programming Perl
2
Why not Perl?
Perl is very UNIX-oriented
Perl is available on other platforms...
...but isn’t always fully implemented there
However, Perl is often the best way to get some
UNIX capabilities on less capable platforms
Perl does not scale well to large programs
Weak subroutines, heavy use of global variables
Perl’s syntax is not particularly appealing
3
Perl Example 1
#!/usr/bin/perl
#
# Program to do the obvious
#
print 'Hello world.'; # Print a message
4
Understanding “Hello World”
Comments are # to end of line
But the first line, #!/usr/local/bin/perl,
tells where to find the Perl compiler on your
system
Perl statements end with semicolons
Perl is case-sensitive
5
Running your program
Two ways to run your program:
perl hello.pl
chmod 700 hello.pl
./hello.pl
6
Scalar variables
Scalar variables start with $
Scalar variables hold strings or numbers, and
they are interchangeable
When you first use (declare) a variable use the
my keyword to indicate the variable’s scope
Not necessary but good programming practice
Examples:
my $priority = 9;
my $priority = “A”;
7
Arithmetic in Perl
$a = 1 + 2; # Add 1 and 2 and store in $a
$a = 3 - 4; # Subtract 4 from 3 and store in
$a
$a = 5 * 6; # Multiply 5 and 6
$a = 7 / 8; # Divide 7 by 8 to give 0.875
$a = 9 ** 10; # Nine to the power of 10, that
is, 910
$a = 5 % 2; # Remainder of 5 divided by 2
++$a; # Increment $a and then return
it
$a++; # Return $a and then increment
8
Arithmetic in Perl cont’d
You sometimes may need to group
terms
Use parentheses ()
(5-6)*2 is not 5-(6*2)
9
String and assignment
operators
$a = $b . $c; # Concatenate $b and
$c
$a = $b x $c; # $b repeated $c
times
$a = $b; # Assign $b to $a
$a += $b; # Add $b to $a
$a -= $b; # Subtract $b from $a
10
Single and double quotes
$a = 'apples';
$b = 'bananas';
print $a . ' and ' . $b;
prints: apples and bananas
print '$a and $b';
prints: $a and $b
print "$a and $b";
prints: apples and bananas
11
Perl Example 2
#!/usr/bin/perl
# program to add two numbers
my $a = 3;
my $b = 5;
my $c = “the sum of $a and $b and 9 is: ”;
my $d = $a + $b + 9;
print “$c $d\n”;
12
Exercise 1
Modify example 2 to print (12 -9 )*3
(don’t do it in your head!)
13
if statements
if ($a eq “”)
{
print "The string is empty\n";
}
else
{
print "The string is not empty\n";
}
14
Tests
All of the following are false:
0, '0', "0", '', "”, “Zero”
Anything not false is true
Use == and != for numbers, eq and
ne for strings
&&, ||, and ! are and, or, and not,
respectively.
15
if - elsif statements
if ($a eq “”)
{ print "The string is empty\n"; }
elsif (length($a) == 1)
{ print "The string has one character\
n"; }
elsif (length($a) == 2)
{ print "The string has two characters\n";
}
else 16
while loops
#!/usr/local/bin/perl
my $i = 5;
while ($i < 15)
{
print ”$i";
$i++;
}
17
do..while loops
#!/usr/local/bin/perl
my $i = 5;
do
{
print ”$i\n";
$i++;
}
while ($i < 15” && $i != 5);
18
for loops
for (my $i = 5; $i < 15; $i++)
{
print "$i\n";
}
19
last
The last statement can be used to exit a loop before it
would otherwise end
for (my $i = 5; $i < 15; $i++)
{
print "$i,";
if($i == 10)
{
last;
}
}
print “\n”;
when run, this prints 5,6,7,8,9,10
20
next
The next statement can be used to end the current loop iteration
early
for (my $i = 5; $i < 15; $i++)
{
if($i == 10)
{
next;
}
print "$i,";
}
print “\n”
when run, this prints 5,6,7,8,9,11,12,13,14
21
Standard I/O
On the UNIX command line;
< filename means to get input from this file
> filename means to send output to this file
STDIN is standard input
To read a line from standard input use:
my $line = <STDIN>;
STDOUT is standard output
Print will output to STDOUT by default
You can also use :
print STDOUT “my output goes here”;
22
File I/O
Often we want to read/write from specific files
In perl, we use file handles to manipulate files
The syntax to open a handle to read to a file for reading is
different than opening a handle for writing
To open a file handle for reading:
open IN, “<fileName”;
To open a file handle for writing:
open OUT, “>fileName”;
File handles must be closed when we are finished
with them -- this syntax is the same for all file
handles
close IN; 23
File I/O cont’d
Once a file handle is open, you may use
it just like you would use STDIN or
STDOUT
To read from an open file handle:
my $line = <IN>;
To write to an open file handle:
print OUT “my output data\n”;
24
Perl Example 3
#!/usr/bin/perl
# singlespace.pl: remove blank lines from a file
# Usage: perl singlespace.pl < oldfile > newfile
while (my $line = <STDIN>)
{
if ($line eq "\n")
{
next;
}
print "$line";
}
25
Exercise 2
Modify Example 3 so that blank lines
are removed ONLY if they occur in first
10 lines of original file
26
Arrays
my @food = ("apples", "bananas",
"cherries");
But…
print $food[1];
prints "bananas"
my @morefood = ("meat", @food);
@morefood now contains:
("meat", "apples", "bananas", "cherries");
27
push and pop
push adds one or more things to the end of a
list
push (@food, "eggs", "bread");
push returns the new length of the list
pop removes and returns the last element
$sandwich = pop(@food);
$len = @food; # $len gets length of
@food
$#food # returns index of last element
28
@ARGV: a special array
A special array, @ARGV, contains the
parameters you pass to a program on
the command line
If you run “perl test.pl a b c”, then within
test.pl @ARGV will contain (“a”, “b”, “c”)
29
foreach
# Visit each item in turn and call it
$morsel
foreach my $morsel (@food)
{
print "$morsel\n";
print "Yum yum\n";
}
30
Hashes / Associative arrays
Associative arrays allow lookup by name rather than
by index
Associative array names begin with %
Example:
my %fruit = ("apples”=>"red",
"bananas”=>"yellow", "cherries”=>"red");
Now, $fruit{"bananas"} returns "yellow”
To set value of a hash element:
$fruit{“bananas”} = “green”;
31
Hashes / Associative Arrays II
To remove a hash element use delete
delete $fruit{“bananas”};
You cannot index an associative array, but you can use the
keys and values functions:
foreach my $f (keys %fruit)
{
print ("The color of $f is " . $fruit{$f} . "\n");
}
32
Example 4
#!/usr/bin/perl
my @names = ( "bob", "sara", "joe" );
my %likesHash = ( "bob"=>"steak", "sara"=>"chocolate",
"joe"=>"rasberries" );
foreach my $name (@names)
{
my $nextLike = $likesHash{$name};
print "$name likes $nextLike\n";
}
33
Exercise 3
Modify Example 4 in the following way:
Suppose we want to keep track of books
that these people like as well as food
Bob likes The Lord of the Rings
Sara likes Hitchhiker’s Guide to the Galaxy
Joe likes Thud!
Modify Example 4 to print each person’s
book preference as well as food preference
34
Regular Expressions
$sentence =~ /the/
True if $sentence contains "the"
$sentence = "The dog bites.";
if ($sentence =~ /the/) # is false
…because Perl is case-sensitive
!~ is "does not contain"
35
RE special characters
. # Any single character except a
newline
^ # The beginning of the line or string
$ # The end of the line or string
* # Zero or more of the last character
+ # One or more of the last character 36
RE examples
^.*$ # matches the entire string
hi.*bye # matches from "hi" to "bye"
inclusive
x +y # matches x, one or more blanks,
and y
^Dear # matches "Dear" only at
beginning 37
Square brackets
[qjk] # Either q or j or k
[^qjk] # Neither q nor j nor k
[a-z] # Anything from a to z
inclusive
[^a-z] # No lower case letters
[a-zA-Z] # Any letter
[a-z]+ # Any non-zero sequence of38
More examples
[aeiou]+ # matches one or more
vowels
[^aeiou]+ # matches one or more
nonvowels
[0-9]+ # matches an unsigned
integer
[0-9A-F] # matches a single hex digit
[a-zA-Z] # matches any letter 39
More special characters
\n # A newline
\t # A tab
\w # Any alphanumeric; same as [a-zA-Z0-
9_]
\W # Any non-word char; same as [^a-zA-
Z0-9_]
\d # Any digit. The same as [0-9]
\D # Any non-digit. The same as [^0-9]
\s # Any whitespace character
\S # Any non-whitespace character 40
Quoting special characters
\| # Vertical bar
\[ # An open square bracket
\) # A closing parenthesis
\* # An asterisk
\^ # A carat symbol
\/ # A slash
\\ # A backslash
41
Alternatives and parentheses
jelly|cream # Either jelly or cream
(eg|le)gs # Either eggs or legs
(da)+ # Either da or dada or
# dadada or...
42
The $_ variable
Often we want to process one string
repeatedly
The $_ variable holds the current string
If a subject is omitted, $_ is assumed
Hence, the following are equivalent:
if ($sentence =~ /under/) …
$_ = $sentence; if (/under/) ...
43
Case-insensitive substitutions
s/london/London/i
case-insensitive substitution; will replace
london, LONDON, London, LoNDoN,
etc.
You can combine global substitution
with case-insensitive substitution
s/london/London/gi
44
split
split breaks a string into parts
$info = "Caine:Michael:Actor:14,
Leafy Drive";
@personal = split(/:/, $info);
@personal =
("Caine", "Michael", "Actor", "14,
Leafy Drive");
45
Example 5
#!/usr/bin/perl
my @lines = ( "Boston is cold.",
"I like the Boston Red Sox.",
"Boston drivers make me see red!" );
foreach my $line (@lines)
{
if ($line =~ /Boston.*red/i )
{
print "$line\n";
}
} 46
Exercise 4
Add the folowing to @lines in Example
5: “In Boston, there is a big Citgo sign
that is red and white.”
Now modify Example 5 to print out only
the same two lines as before
47
Calling subroutines
Assume you have a subroutine printargs
that just prints out its arguments
Subroutine calls:
printargs("perly", "king");
Prints: "perly king"
printargs("frog", "and", "toad");
Prints: "frog and toad"
48
Defining subroutines
Here's the definition of printargs:
sub printargs
{ print join(“ “, @_) . ”\n"; }
Parameters for subroutines are in an array called @_
The join() function is the opposite of split()
Joins the strings in an array together into one string
The string specified by first argument is put between
the strings in the arrray
49
Returning a result
The value of a subroutine is the value of the
last expression that was evaluated
sub maximum
{
if ($_[0] > $_[1])
{ $_[0]; }
else
{ $_[1]; }
}
$biggest = maximum(37, 24);
50
Returning a result (cont’d)
You can also use the “return” keyword to return a
value from a subroutine
This is better programming practice
sub maximum
{
my $max = $_[0];
if ($_[1] > $_[0])
{ max = $_[1]; }
return $max;
}
$biggest = maximum(37, 24); 51
Example 6
#!/usr/bin/perl
sub inside
{
my $a = shift @_;
my $b = shift @_;
$a =~ s/ //g;
$b =~ s/ //g;
return ($a =~ /$b/ || $b =~ /$a/);
}
if( inside("lemon", "dole money") )
{
print "\"lemon\" is in \"dole money\"\n";
}
52
Exercise 5
Create a new subroutine, doesnotstart
which, given 2 strings, tests that neither
string starts with the other one
doesnotstart(abc, abcdef) will be false
doesnotstart(doggy, dog) will be false
doesnotstart(bad dog, dog) will be true
53
The End
54