Introduction+to+Perl

**etc..**
Perl (Practical Extraction and Report Language) is a dynamic, dynamically-typed, high-level, scripting (interpreted) language most comparable with PHP and Python. Perl's syntax owes a lot to ancient shell scripting tools, and it is famed for its overuse of confusing symbols, the majority of which are impossible to Google for. Perl's shell scripting heritage makes it great for writing glue code: scripts which link together other scripts and programs. Perl is ideally suited for processing text data and producing more text data. Perl is widespread, popular, highly portable and well-supported. Perl was designed with the philosophy "There's More Than One Way To Do It" (TMTOWTDI) (contrast with Python, where "there should be one - and preferably only one - obvious way to do it").

Type the following lines of code into a text editor and save it as "firstscript.pl":
code format="python" use strict; use warnings;
 * 1) ! /usr/bin/perl

print "Hello, World!", "\n"; code
 * 1) the following line will print "Hello, World!"

$ perl firstscript.pl
==== The script starts with a shebang (#!) (number sign + exclamation mark followed by the full path of the interpreter) line which tells the script the absolute path of the Perl interpreter should be used. Sometimes you have multiple version of Perl installed in the computer but you want to use a particular version then you can use the shebang to direct the script to use that interpreter. ==== ==== After the shebang line you can see two other statements, each ended with semicolon: **use warnings;** and **use strict;** They are put on the top of each script and called "pragmas". A pragma sends a signal to Perl interpreter at the stage of initial syntactic validation, before the program starts running. These lines have no effect when the interpreter encounters them at run time. The pragmas **use warnings** and **use strict** are used to detect certain types of coding errors (track typos, restricts unsafe constructs and variables naming collisions, etc.). ==== ==== **print** is a build-in perl function which prints "Hello, World!" as STDOUT (writes to the shell). "**\n**" is a special character which means adding a newline (return) after "Hello World!". **Double quotation marks, ""**, are used to enclose data that needs to be interpolated before processing. The **semicolon**, **;**, is the statement terminator. The **number sign #** begins a comment that will help programmer document what this line of code means in the current context. The number sign can also comment out a line of code for debugging purposes. A comment lasts until the end of the line. ====

** Whitespaces in Perl **
A Perl program does not care about whitespaces as long as they are not inside the quoted strings. The f ollowing program works perfectly fine: code format="python" use strict; use warnings;
 * 1) ! /usr/bin/perl

print                                      "Hello, World!", "\n"; code But if spaces are inside the quoted strings, then they would be printed as is. For example: code format="python" use strict; use warnings;
 * 1) the following line will print "Hello, World!"
 * 1) ! /usr/bin/perl

print "Hello, World!", "\n"; code

**Variables**

Perl variables come in three types: **scalars**, **arrays** and **hashes**. Each type has its own sigil: $ (dollar sign), @ (at sign) and % (percent sign), respectively.
==== - **Scalars** are simple variables. A scalar is either a number, a string, or a reference. a Perl reference is a scalar data type that holds the memory location of another variable which could be a scalar, an array, or a hash. Because of its scalar nature, a reference can be used anywhere a scalar can be used. ====

1. Perl is a case sensitive programming language. Thus $results and $Results are two different variables.
2. Avoid variable naming collisions. Even experienced programmers make errors in variable names. A common case is forgetting to rename an instance of a variable when cleaning up or refactoring code. **Use strict** "forces" programmers to put "**my"** in front of a variable to declare it whenever they first use this variable, and this variable remains "visible" (or valid) in scope until the end of the enclosing block (by **{}** ) or script.

Now let’s try doing some practice by creating and changing scalar variables.
code format="python" use strict; use warnings;
 * 1) ! /usr/bin/perl

my $x = 6; # Although "=" is an equal sign but it really means "assigned to" my $y = 4;

my $c = $x + $y; print $c, "\n";

my $d = $x - $y; print $d, "\n";

my $e = $x * $y; print $e, "\n";

my $f = $x / $y; print $f, "\n";

my $g = $x ** $y; ## ** is power of sign. In this case it is 6 power of 4 print $g, "\n";

my $h = $x % $y; ## % is a modulus operator. The value of the expression 6 % 4 ## is the remainder when 6 is divided by 4, which is 2 print $h, "\n";



$x += 1; # this is equivalent to $x = $x + 1 print $x, "\n";

$x -= 1; # this is equivalent to $x = $x - 1 print $x, "\n";

$x *=2; # this is equivalent to $x = $x *2 print $x, "\n";

$x /= 2; # this is equivalent to $x = $x /2 print $x, "\n";



my $j = $x; #assign the value of $x to a new variable $j $j += 2; # increment $j by 2 print $x, "\n"; print $j, "\n";



my $k = $x. $y; print $k, "\n";
 * 1) String concatenation using the string concatenation operator . :

$k = $k. $x. $y; print $k, "\n";

my $l = "Hello"; my $m = "World"; my $n = $l. "\t". $m. "!"; #"\t" is a special character meaning a tab print $n, "\n";

$l .= ",". "\t". "my name is Ke!"; print $l, "\n";
 * 1) .= is a concatenation assignment operator, which means appending a string to an existing string



my $o = length ($l); print $o, "\n";
 * 1) to get the length of a scalar variable use the function length



my $new = "Hello, my name is Ke!"; my $p = substr ($new, 18, 2); print $p, "\n";
 * 1) substr means extracting a portion from a string. The syntax is substr(string, startPosition, len).
 * 2) This function starts counting from 0, not 1!

substr ($new, 18, 2) = "Adam"; print $new, "\n"; code
 * 1) we can also use substr to replace a substring within the string

An **array** is declared by an **@** sign and contains a parenthesised list of scalars indexed by intergers beginning at 0.
code format="python" use strict; use warnings;
 * 1) ! /usr/bin/perl

my @array = ("print", "these", "strings", "out", "for", "me");

print "@array", "\n"; #In perl, when you print an array inside double quotes, #the array elements are printed with spaces inserted between them.

print @array, "\n";
 * 1) what if printing an array without double quotes

print $array[0], "\n"; #print the first item of this array print $array[1], "\n"; #print the second item of this array print $array[6], "\n"; #this item doesn’t exist since it is outside the indexes. #you will get a error message print "@array[1..3]","\n"; #print the 2nd to 4th items in order
 * 1) You have to use a dollar sign to access a value from an array,
 * 2) because the value being retrieved is not an array but a scalar:

print $array[-1], "\n"; #print the last item of this array print $array[-2], "\n"; #print the second last item of this array
 * 1) You can use negative indices to retrieve entries starting from the end and working backwards

print "This array has ", scalar @array, " elements", "\n";
 * 1) how many items in the array?

print "The last populated index is ", $#array, "\n";
 * 1) print the last populated index of this array:


 * 1) manipulating arrays
 * 2) push: Pushes the values of the list onto the end of the array.

push @array, "!"; print "@array", "\n";
 * 1) push a scalar to the end of @array;

push @array, ("!","!"); print "@array", "\n";
 * 1) push a new array to the end

pop @array; print "@array", "\n";
 * pop: Pops off the last item of the array.

shift @array; print "@array","\n";
 * 1) shift: Shifts the first value of the array off
 * 2) shortening the array by 1 and moving everything down.

unshift @array, "print"; print "@array","\n";
 * 1) unshift: Prepends a list (scalars or arrays) to the front of the array

print splice (@array,2,2), "\n"; #in this case it cuts off a chunk starting with the element #in the 3nd position and ending 2 elements later. print "@array","\n"; # print out the rest elements in the array
 * 1) splice: splice removes and returns an array slice.
 * 2) splice (@array, StartingIndex, NumElements);

my @revarray = reverse @array; print "@array","\n"; print "@revarray","\n";
 * 1) reverse: returns elements in an array in reverse order.


 * 1) split and join
 * 2) split: split function is to break up strings into a list of substrings by user-defined delimiters
 * 3) and we can place the resulting list of substrings into an array.

my $string = "perl is powerful but complicated!"; my @test1 = split (/\s+/, $string); #Splits this string and uses one or more whitespace #as the delimiter. Store the list in an array called @test1. print "@test1", "\n";

my @test2 = split (/o/, $string); #Splits line and uses letter "o" as the delimiter. #"o" is discarded, returning only what is found to either #side of the delimiters print "@test2", "\n";


 * 1) Preserving delimiters after splitting:
 * 2) If you want to keep the delimiters, here's an example of how.

my @test3 = split ( /(o)/, $string ); #Splits line and uses letter "o" as the delimiter. #"o" is kept as an independent item. The parenthesis caused                                                      #the delimiters to be captured into the list passed to                                      #@test3 right alongside the stuff between the delimiters. print "@test3", "\n";

my $title = "doctor"; my @letters = split ( //, $title ); #Now @letters contains a list of six letters, "d", "o", "c,                                   #"t", "o" and "r". If split is given a null string as a                                    #delimiter, it splits on each null position in the string,                                    #or in other words, every character boundary. The effect is                                    #that the split returns a list broken into individual                                    #characters of $string. print "@letters\n";
 * 1) The null delimiter: delimiter is indicated to be a null string (a string of zero characters).


 * 1) join: join function is in some ways the inverse of split.
 * 2) It takes a list of strings in an array and joins them together with a delimiter and returns that new string.

my @names = ("my", "name", "is", "adam"); my $joined = join ("\t", @names); #it concatenates each item of the list in the array, #separate each by a tab and join them into one new string.

print $joined, "\n";

my $joined2 = join (",", @names); #it concatenates each item in the array, #separate each by a command join them to be a string. print $joined2, "\n";

my @map_result = map {uc $_ } @array; print "@map_result","\n";
 * map: takes an array as input and applies an operation to every item ($_) in this array.
 * 1) It then constructs a new list out of the results. This list can be stored in an array.
 * 2) This is provided in the form of a single expression inside braces:

my @grep_result = grep { length $_ == 5 } @array; print "@grep_result", "\n";
 * 1) grep: This function takes an array as input and returns a filtered list as output.
 * 2) The syntax is similar to map. This time, the argument is evaluated for each scalar $_
 * 3) in the input array. If a boolean true value is returned, the scalar is put into
 * 4) the output list which can be stored in an array, otherwise not.

code

**Caution**. Some day you will put somebody's email address inside a string, "jeff@gmail.com". This will cause Perl to look for an array variable called @gmail to interpolate into the string, and not find it, resulting in a runtime error. Interpolation can be prevented in two ways: by backslash-escaping the sigil, or by using single quotes instead of double quotes. ==== the backslash can perform one of two tasks: it either takes away the special meaning of the character following it (for instance, \@gmail matches character @gmail, it's not an array @gmail), or it is the start of a backslash or escape sequence (**\n**, **\t**). ==== code format="python" print "@array", "\n"; # this print the list saved in @array print "\@array","\n"; # this print a string @array
 * 1) the following line will print "Hello, World!"

code

A hash is an un-ordered group of key-value pairs. The keys are unique strings and the values are scalar values (either a number, a string, or a reference).
==== Some people think that hashes are like arrays (the old name 'associative array' also indicates this, and in some other languages, such as PHP, there is no difference between arrays and hashes.), but there are two major differences between arrays and hashes. Arrays are ordered, and you access an element of an array using its numerical index. Hashes are un-ordered and you access a value using a unique key which is a string. ==== code format="python" use strict; use warnings;
 * 1) ! /usr/bin/perl

my %hash1;
 * 1) Some examples:
 * 2) create an empty hash;
 * 1) to insert key-value pairs, the basic syntax is $hash1{key} = value. key is placed in {}.
 * 2) to access a specific value using a key, we use $ sign (not a % sign) because value is a scalar.

$hash1{"apple"} = "green"; $hash1{"banana"} = "yellow"; $hash1{"strawberry"} = "red"; $hash1{"grape"} = "purple";
 * 1) Insert 4 key-value pairs into a hash


 * 1) now this hash contains four key-values pairs, the values are strings.

print $hash1{"apple"}, "\n";
 * 1) print the color of apple

print keys %hash1, "\n"; # produce a list of the keys without delimiter separating them.
 * 1) print all the keys

print join ("\t", keys %hash1), "\n";
 * 1) use function join to join these keys by tabs

print values %hash1, "\n";
 * 1) print all the values

print join ("\t", values %hash1), "\n";
 * 1) use function join to join these values by tabs


 * 1) Note: The order of keys %hash1 and values %hash1 is effectively random.
 * 2) They will differ between runs of the program.

print $hash1{"orange"}, "\n";
 * 1) If the key does not exist, we'll get a warning about uninitialized value.


 * 1) We could have key-value pairs simultaneously passing to the hash a list of key-value pairs:

my %hash2 = ("blueberry" => "blue", "orange" => "orange", "cherry" => "red");
 * 1) => is called the fat arrow or fat comma, and it is used to indicate pairs of elements.

print scalar (keys %hash2), "\n";
 * 1) print the size (key-values pairs) of the hash.

delete $hash2{"cherry"}; print scalar (keys %hash2), "\n"; #check how many key-value pairs are left in this hash code ====__ To recap, you have to use square brackets to retrieve a value from an array, but you have to use braces to retrieve a value from a hash. The square brackets are effectively a numerical operator and the braces are effectively a string operator. __====
 * 1) delete a key-value pair from the hash.

** Multi-dimensional arrays and hashes (nested data structures) **
==== Note: Perl arrays and hashes CAN NOT contain other arrays and hashes as elements. They can only contain scalars. To manage complicated data structures like multidimensional arrays and nested hashes, Perl introduced a feature called "reference", and using references is the key to managing complicated, structured data in Perl. ====

code format="python" my @actual_array = ("a", "b", "c"); my %actual_hash = ("a"=>"1", "b"=>"2", "c"=> "3");
 * 1) use for actual arrays and hashes

my $arrayref = \@actual_array; my $hashref = \%actual_hash;
 * 1) to get reference of target arrays and hashes, use backslash \. note: references are scalars

@{$arrayref}; %{$hashref};
 * 1) to de-reference arrays and hashes


 * 1) essentially @$arrayref is exactly the same as @actual_array, and %$hashref is the same as %actual_hash

$arrayref = ["a", "b", "c"]; $hashref = {"a"=> "1", "b"=> "2", "c"=> "3"}; code
 * 1) in nested data structures we use [] for array references and {} for hash references

**array of arrays**: Each element can have an internal array (indeed a reference to an array). And each element of the internal array can have its own internal array and so on. code format="python" use strict; use warnings;
 * 1) ! /usr/bin/perl

my @aoa1 = (            [ "one", "two", "three"],             [ "4", "5", "6", "7" ],             [ "alpha", "beta" ]           );


 * 1) in this example, the outer array has 3 internal array references (enclosed using square brackets []). In this case ["one","two","three"] is a reference to array ("one","two","three").

print $aoa1[1], "\n";
 * 1) it prints ARRAY(0x7fc12382d128). As mentioned, Perl does not have multi-dimensional arrays.
 * 2) What you see here is that the first element of the @aoa1 array is a reference to an internal,
 * 3) so-called anonymous array that holds the actual values. The ARRAY(0x7fc12382d128) is the address
 * 4) of that internal address in the memory.

print "@{$aoa1[1]}", "\n";
 * 1) to print items of the entire array

print $aoa1[1][2], "\n"; #or do print $aoa[1]->[2]. If your reference is a reference #to an array or hash variable, you can get data using the more #popular arrow operator, ->. -> can be omitted between subscripts ([][])
 * 1) To access the third element of the second array, we need to:

my @aoa2; $aoa2[0][0] = "one"; $aoa2[0][1] = "two"; $aoa2[0][2] = "three"; $aoa2[1][0] = "4"; $aoa2[1][1] = "5"; $aoa2[1][2] = "6"; $aoa2[1][3] = "7"; $aoa2[2][0] = "alpha"; $aoa2[2][1] = "beta";
 * 1) to construct array of arrays

code
 * 1) you can build arrays more than 2 dimension ( not covered by the workshop)

** hash of arrays **
code format="python" use strict; use warnings;
 * 1) ! /usr/bin/perl

my %hoa = (         "fruits" => [ "banana", "apple", "orange" ],          "vegetables" => [ "pepper", "lettuce", "spinch"]          );

print $hoa{"fruits"}, "\n";
 * 1) print the array reference:

print join (" ", @{$hoa{"fruits"}}), "\n";
 * 1) print the entire array of the fruit category

print "@{$hoa{fruit}}", "\n"; # omit "" for key
 * 1) or you can simply do:

print $hoa{"vegetables"}[1], "\n"; #or print $hoa{"vegetables"}->[1] code
 * 1) now let's access the second item in the vegetables category.

** array of hashes **
code format="python" use strict; use warnings;
 * 1) ! /usr/bin/perl

my @aoh = (         {            "husband" => "adam",            "wife" => "betty",            "son" => "john",          },          {            "husband" => "george",            "wife" => "jane",            "son" => "peter",          },          {           "husband" => "leo",           "wife" => "marry",           "son" => "jeremy",          }          );

print $aoh[1],"\n";
 * 1) print the second hash reference

print join (" ", %{$aoh[1]}), "\n";
 * 1) print all key-values stored in the second hash

print $aoh[1]{"husband"}, "\n";
 * 1) print name of the husband of the second family

push @aoh, { "husband" => "fred", "wife"=> "jen", "daughter" => "kate" };
 * 1) add another hash (indeed a reference to the hash) to this array

code

** hash of hashes **
code format="python" use strict; use warnings;
 * 1) ! /usr/bin/perl

my %hoh = (         "fruits" => {                     "banana" => "yellow",                     "apple" => "red",                     "orange" => "orange"                     },          "vegetables" => {                     "pepper" => "green",                     "lettuce" => "green",                     "spinch" => "green"                     }          );

print $hoh{"fruits"},"\n";
 * 1) print the first hash reference

print join (" ", %{$hoh{"fruits"}}), "\n";
 * 1) print all key-value pairs of the first inner hash

print $hoh{"fruits"}{"apple"}, "\n"; code
 * 1) to extract the color of apple

** if ... elsif ... else ... **
code format="python" use strict; use warnings;
 * 1) ! /usr/bin/perl

my $word = "antidisestablishmentarianism"; my $strlen = length $word;

if ($strlen >= 15) { print "'", $word, "' is a very long word!", "\n"; } elsif (10 <= $strlen && $strlen < 15) { #<= means smaller than or equal to, && (double ampersand) #means "and", you can also do elsif (10 <= $strlen and                                       #strlen <15). || means or. print "'", $word, "' is a medium-length word!", "\n"; } else { print "'", $word, "' is a short word!","\n"; }

print "'", $word, "' is actually enormous", "\n" if ($strlen >= 20); code
 * 1) Perl provides a shorter "statement if condition" syntax
 * 2) which is highly recommended for short statements

** unless ... else ... **
code format="python" use strict; use warnings;
 * 1) ! /usr/bin/perl

my $temperature = 20; unless ($temperature > 30) { print $temperature, " degrees Celsius is not very hot!","\n"; } else { print $temperature, " degrees Celsius is actually pretty hot!\n"; }

print "Oh no it's too cold!", "\n" unless ($temperature > 15); code
 * 1) This, by comparison, is highly recommended because it is so easy to read:

** while loop **
code format="python" use strict; use warnings;
 * 1) ! /usr/bin/perl

my $counter = 20; while ($counter > 0) { print $counter, "\n"; $counter = $counter - 2; }
 * 1) an example:

print "done!\n"; code ==== The while loop has a condition, it will keep going until this condition is not true. in our case checking if the variable $counter is larger than 0, and then a block of code wrapped in curly braces. When the execution first reaches the beginning of the while loop it checks if the condition is true or false. If it is FALSE the block is skipped and the next statement, in our case printing 'done' is executed. If the condition of the while is TRUE, the block gets executed, and then the execution goes back to the condition again. It is evaluated again. If it is false the block is skipped and the 'done' is printed. If it is true the block gets executed and we are back to the condition ... This goes on as long as the condition is true or in sort-of English: while (the-condition-is-true) { do-something } ====

** for loop **
code format="python" use strict; use warnings;
 * 1) ! /usr/bin/perl


 * 1) 1. C-style for loop
 * 2) the basic syntax of for C-style loop is
 * 3) for (initialization; condition; iterator) {
 * 4) BODY;
 * }

for (my $i = 0 ; $i <= 10 ; $i++) { ## ++ means increment by 1 each time print $i, "\n"; }
 * 1) example


 * 1) Inside the c-style for loop, there are 3 components, separated by semicolons.
 * 2) These are: the starting statement, the continuation condition, and the iterating statement.
 * 3) the starting statement is usually just an assignment. The second statement is a continuation condition.
 * 4) This will be evaluated at the beginning of every iteration.
 * 5) The first time it evaluates to false, the loop terminates. The third statement is an iterator.

for (my $i = 10; $i >= 0; $i--) { print "$i", "\n"; }
 * 1) count down


 * 1) A C-style loop can be translated into the form of a standard for loop (below)

for my $i (0..10) { #range operator print $i, "\n"; }
 * 1) 2. standard for loop

for my $i (reverse 0..10) { print $i, "\n"; }
 * 1) count down

for (0..10) { print $_, "\n"; }
 * 1) Another way to iterate
 * 1) $_ is a special variable and the default input and pattern-searching space.
 * 2) What is means in each iteration of the loop, the current string is placed in $_,
 * 3) and is used by default by print.

foreach my $number (0..10) { print $number, "\n"; }
 * 1) standard for loop is equivalent to the following constructs. for and foreach can be used interchangeably

code

code format="python" use strict; use warnings;
 * 1) ! /usr/bin/perl

my $num1 = 5; until( $num1 > 10 ){ print "Value of num1: ", $num1, "\n"; $num1++; }
 * 1) until loop execution


 * 1) An until loop statement repeatedly executes a target statement as long as a given condition is false.

code

** do .. until loop **
code format="python" use strict; use warnings;
 * 1) ! /usr/bin/perl

my $num2 = 5; do { print "Value of num2: $num2\n"; $num2++; } until ($num2 > 10); code

code format="python" use strict; use warnings;
 * loop through arrays and hashes **
 * 1) ! /usr/bin/perl


 * 1) one dimensional array

my @numbers = ("2", "4", "6", "8", "10", "12");


 * 1) now we want to generate a new array where each item is greater than its original by 1.

my @new; foreach my $item (@numbers) { push @new, $item+1; }

print "@new","\n";

my @aoa3 = (          ["one", "two", "three"],           ["4", "5", "6", "7"],           ["alpha", "beta"]           );
 * 1) nested arrays


 * 1) print each item

foreach my $array_ref (@aoa3) { foreach my $item (@{$array_ref}) { #in this case the @ symbol is essentially an array dereference operator. #It can dereference any value which is an array reference.

print $item, "\n"; } }

my %hash3 = ("blueberry" => "blue", "orange" => "orange", "cherry" => "red");
 * 1) one dimensional hash


 * 1) print out each fruit and its color:

foreach my $fruit (keys %hash3) { print "the color of ", $fruit, " is ", $hash3{$fruit},"\n"; }


 * 1) to sort the fruits (keys) alphabetically

foreach my $fruit (sort { $a cmp $b} keys %hash3) { #the keys to be compared are passed into the sort #subroutine as the package global variables $a and $b

print "the color of ", $fruit, " is ", $hash3{$fruit},"\n"; }


 * 1) to sort the color (values) alphabetically

foreach my $fruit (sort {$hash3{$a} cmp $hash3{$b}} keys %hash3 ) { print "the color of ", $fruit, " is ", $hash3{$fruit},"\n"; }


 * 1) NOTE: when sort numbers then use sort {$a <=> $b}. <=> is also called spaceship operator

my %hoh2 = (          "fruit" => {                      "banana" => "yellow",                      "apple" => "red",                      "orange" => "orange"                      },           "vegetables" => {                      "pepper" => "green",                      "lettuce" => "green",                      "spinch" => "green"                      }                      );
 * 1) nested hash


 * 1) print each item (keys in the nested "hash") in each category (keys in the outter/top-level hash),
 * 2) sort the item alphabetically

foreach my $category (sort {$a cmp $b} keys %hoh2) { foreach my $item (sort {$a cmp $b} keys %{$hoh2{$category}}) {#% is dereference a hash reference print "the color of ", $item, " is ", $hoh2{$category}{$item},"\n"; } }


 * 1) what if you only want to print out color for vegetables. You need to add a condition using if

foreach my $category (sort {$a cmp $b} keys %hoh2) { if ($category eq "vegetables") { foreach my $item (sort {$a cmp $b} keys %{$hoh2{$category}}) { print "the color of ", $item, " is ", $hoh2{$category}{$item},"\n"; } } } code

** User-defined subroutines **
==== Perl has many many built-in functions such as sort, split, shift, pop, etc. Perl also allows the user to define their own functions, called **subroutines**. The simplest way for reusing code is building subroutines. ==== ==== Subroutines are declared using the **sub** keyword. In contrast with built-in functions, user-defined subroutines always accept the same input: a list of scalars. Subroutines can not accept arrays and hashes, but it can accept array references and hash references. Inside the subroutines you can dereference them to get the actual arrays and hashes. Subroutines should be invoked using **parenthesis**, even when called with no arguments. This makes it clear that a subroutine call is happening. ====

When you call a subroutine you can pass any number of arguments (scalars) to that subroutine, and the values will be placed in the local array @_.
code format="python" use strict; use warnings;
 * 1) ! /usr/bin/perl

my $a1 = 15; my $a2 = 20; my $all = getsum ($a1, $a2); #$a1 and $a2 are stored in @_; print "The sum of ", $a1, " and ", $a2, " is ", $all, "\n";
 * 1) example: you want write a simple function that adds two values together to get the total

sub getsum { #start the subroutine, enclosed using {} my ($x, $y) = @_; #create two scalar variables to get the values from @_; my $sum = $x + $y; #a variable declared with my is visible only #within the block in which it is declared. return ($sum); } code code format="python" use strict; use warnings; my @aon = ("15", "20", "2", "9" ); my $soa = getsum2 (\@aon); #If you put a \ in front of a variable, you get a reference to that variable. #Then you can pass the reference to the subroutine. print "The sum of all elements is ", $soa, "\n";
 * 1) Another example: we want to print the sum of all elements in an array
 * 2) ! /usr/bin/perl

sub getsum2 { my ($x) = @_; my @array = @{$x}; my $total; # foreach (@array) { $total += $_; } return ($total); } code code format="python"
 * 1) now let’s write a script to covert meters to feet (1 meters = 3.28084 feet)
 * 2) and feet to meters (1 foot = 0.3048 meters) by taking take input from the command line (shell)

use warnings; use strict;
 * 1) ! /usr/bin/perl

die " Usage: perl converter.pl number:  provide a number to convert unit:    m(meters) or f(feet)?

examples: 1. to convert 2 meters to feet perl converter.pl 2 m

2. to convert 2 feet to meters perl converter.pl 2 f

" unless (scalar @ARGV == 2); #@ARGV is a perl special variable that contains the arguments                           #given to the program, as ordered by the shell.

my $result = convert ($ARGV[0], $ARGV[1]); if ($ARGV[1] eq "m") { print $ARGV[0], " meters equals ", $result, " feet.", "\n"; } if ($ARGV[1] eq "f") { print $ARGV[0], " feet equals ", $result, " meters.", "\n" }

sub convert { my ($num, $t) = @_; my $out; if ($t eq "f") { $out = $num * 0.3048; } if ($t eq "m") { $out = $num * 3.28084; } return ($out); } code


 * Regular expression **

==== Perl's text processing power comes from its use of regular expressions. A regular expression (regex or regexp) is a string of characters that can be used to define the pattern or patterns you are viewing. Regular expressions are often used in conditionals. ==== code format="python" use strict; use warnings;
 * 1) ! /usr/bin/perl


 * 1) a simple example:

my $text1 = "Chatfield"; print "Found a hat!","\n" if ($text1 =~ m/hat/); code

The **match operator** (m, //abbreviated// ) identifies a regular expression—in this example, hat. This pattern is not a word. Instead it means "the h character, followed by the a character, followed by the t character." Each character in the pattern is an indivisible element, or atom. It matches or it doesn't. ==== The regex binding operator (=~) is an infix operator (Fixity) which applies the regex of its second operand to a string provided by its first operand. When evaluated in scalar context, a match evaluates to a true value if it succeeds. The negated form of the binding operator (!~) evaluates to a true value unless the match succeeds ====

Some commonly used special characters in regex
\n # A newline

[] # alternative match
| #alternative match \ #escape charater

==== Clearly characters like **$**, **|**, **[**, **)**, **\**, **/** and so on are peculiar cases in regular expressions. If you want to match for one of those then you have to precede it by a backslash. So: ====

\\ # A backslash
code format="python" use strict; use warnings;
 * 1) ! /usr/bin/perl

my $str = "Usage:524/1000 messages; Usage:666/1000 messages"; if ( $str =~ m/^Usage:(\d+)/) { my $used = $1; print "The first user used ", $used, " messages!","\n"; }
 * 1) another example:


 * 1) Parentheses perform sub-matches. After a successful match operation is performed,
 * 2) the sub-matches get stuffed stored into the built-in variables $1, $2, $3, ...:

my $text2 = "Hello world"; if ($text2 =~ m/(\w+)\s+(\w+)/) { print "success!","\n"; print $1, "\t"; print $2, "\n"; } code Substitution operations are performed using =~ s/A/B/g. Its first operand is a regular expression to match when used with the regex binding operator. The second operand is a substring used to replace the matched portion of the first operand used with the regex binding operator. code format="python" use strict; use warnings;
 * 1) ! /usr/bin/perl

my $text3 = "Hello world"; $text3 =~ s/[oe]/r/g; print $text3, "\n"
 * 1) I want to replace all "o" and "e" in "Hello world" with "r".

code tr /ABC/abc/ means transliteration. It is not a regular expression operator. It is suitable (and faster than s///) for substitutions of one single character with another single character code format="python" use strict; use warnings;
 * 1) In this case, an =~ s///g call performs a global search/replace
 * 1) ! /usr/bin/perl

my $text4 = "a1ab2c3"; $text4 =~ tr/abc/123/; print $text4, "\n"; code


 * Perl file handling: open, read, write and close files **

The basics of handling files in Perl are simple: you associate a filehandle with a file and then use a variety of operators and functions within Perl to read and update the data stored within the data stream associated with the filehandle. In other words, filehandle is essentially a reference to a specific location inside a specific file. All filehandles are capable of read/write access, so you can read from and update any file or device associated with a filehandle.

In a terminal you can do **wget http://cgrlucb.wikispaces.com/file/view/data.fastq** or **curl -O** ** http://cgrlucb.wikispaces.com/file/view/data.fastq **

Read a file: code format="python" use strict; use warnings;
 * 1) ! /usr/bin/perl

my $fastq = "/Users/kebi/Desktop/PerlWorkshop/data.fastq"; open (my $fh, "<", $fastq) || die "can not open $fastq!\n";
 * 1) let's read a file saved in the computer disk and print line by line from this file
 * 1) open the file and read it into a file handle.
 * 2) open means open a channel for your program to "talk to" the file.
 * 3) For this Perl provides the open function. "<" means read in and ">" means write out.

while (<$fh>) { #Iterate over each line in the file handle, #Note that the <$fh> (angle brackets) expression reads in the file entirely #in one go in an array. in this case you can think of this #array contains lines in this this file. chomp (my $line = $_); #saving the current line to the scalar variable #$line and remove the ending newline character print $line, "\n";

}

close $fh; #close a filehandle, and therefore disassociate the filehandle from the corresponding file. #Until a file-handle is closed, it is possible that there’s some data out there which has not #been written to disk. Other applications will not see that data yet. Closing a filehandle #releases the filehandle resource. Furthermore, closing a filehandle improves code readability. #It tells future readers "I'm done with that. Although in many cases without closing a          #filehandle could be fine, it is generally believed to be a good practice to close the           #filehandle after you done working with it. code

Read and write files: code format="python" use strict; use warnings;
 * 1) ! /usr/bin/perl


 * 1) let's read a file saved in the computer disk and print line by line into an outfile

my $fastq = "/Users/kebi/Desktop/PerlWorkshop/data.fastq"; my $outfile = "/Users/kebi/Desktop/PerlWorkshop/data.fastq_copy";

open (my $fh, "<", $fastq) || die "can not open $fastq!\n"; open (my $out, ">", $outfile); #open for output, link a file handle $out to the outfile $outfile

while (<$fh>) { chomp (my $line = $_); print $out $line, "\n"; #print this line (by adding a newline character) #into the outfile filehandle } close $fh; close $out; code

Now let's write a script converting this fastq to fasta format code format="python" use strict; use warnings;
 * 1) ! /usr/bin/perl

my $fastq = "/Users/kebi/Desktop/PerlWorkshop/data.fastq"; my $fasta = "/Users/kebi/Desktop/PerlWorkshop/data.fasta";

open (my $fh, "<", $fastq) || die "can not open $fastq!\n"; open (my $out, ">", $fasta) || die "can not open $fasta!\n";

while (<$fh>) { chomp (my $line = $_); if ($line =~ m /^@(\S+\/1$)/) { chomp (my $seq = <$fh>); print $out ">", $1, "\n"; #print the header print $out $seq, "\n"; } } close $fh; close $out; code

Now let's write a script to reverse complement the DNA sequence in fasta file, read the files from command line: hint: using @ARGV to store arguments from the shell code format="python" use warnings; use strict;
 * 1) ! /usr/bin/perl

die (qq/

Usage: perl revcomp.pl  fasta_file: provide a sequence file in fasta format

\n/) unless (scalar @ARGV == 1);
 * 1) qq can be used instead of double quotes

my $fasta = $ARGV[0]; my $revcomp_fasta = $fasta. "_revcomp";

open (my $fh, "<", $fasta) || die "can not open $fasta!\n"; open (my $out, ">", $revcomp_fasta) || die "can not open $revcomp_fasta!\n";

while (<$fh>) { chomp (my $line = $_); if ($line =~ m /^>\S+/) { chomp (my $seq = <$fh>); my $revcomp = reverse $seq; #reverse the DNA sequence $revcomp =~ tr/ACGTacgt/TGCAtgca/; #replace a nucleotide to its reverse-complement form print $out $line, "\n"; #print the header print $out $revcomp, "\n"; } } close $fh; close $out; code
 * System calls **

In a Perl script, you can call any external programs or other scripts like you would from the command line using a system call. We use the function **system** Now let's write another simple script called "system_call.pl" to call revcomp.pl code format="python" use warnings; use strict;
 * 1) ! /usr/bin/perl

die (qq/ Usage: perl system_call.pl  fasta_file: provide a sequence file in fasta format! \n/) unless (@ARGV);

system ("perl rev_comp.pl $ARGV[0]");

code