Perl Subroutines Tutorial
The Perl language was created by Larry Wall in the 80's, long before the extended use of the Internet that began in the middle of the 90's. It was not aimed at the Internet, as is evident from the meaning of its name: Practical Extraction and Report Language. A language to extract information from text files and to print it as a report, it is obviously useful to system administrators working in installations using the Unix operating system because Unix produces a great number of log files. Indeed, Perl was created joining features of the "C" language, the shell language "sh" and the Unix utilities "sed" and "awk."
Being strong in string handling, Perl is a good choice for processing user data input through a form or for juggling data in a database to produce a dynamic page. A great number of Perl scripts are freely available from many Internet sites, its applications ranging from a simple guest-book to a full-fledged content management system.
Perl syntax
I will briefly explain what is needed to know of Perl's syntax to understand the example programs.
The names of scalar variables begin with a dollar sign: $a_var. There are special variables. For instance, '$_' is the current line, no matter how it has been got. Remember that Perl was originally created to process text files, line by line.
The names of array variables begin with an at sign: @an_array. Array example:
@elements = ("earth", "water", "air", "fire");
$one_element = $elements[2]; # note the square brackets
The names of hashes--arrays indexed by strings--begin with a percent sign: %a_hash. Hash example:
%user = ("firstname", "Herbert", "lastname", "Hoover",
"age", 55, "city", "Oklahoma");
$username = $user{"firstname"}; # note the curly brackets
An assignment is made using the equal sign. Each statement is finished with a semicolon. When dealing with an element of an array, the at sign is replaced by the dollar sign. Mathematical operators and special assignment operators are like in the C language. Numeric relational, numeric equality, bitwise, and logical operators are the same as those of the C language.
For string comparison and concatenation, the following operators are provided.
. (period) concatenation
eq equality
ne inequality
lt less than
le less or equal than
gt greater than
ge greater or equal than
Programming Perl with subroutines
Like many programming languages, Perl provides for the use of subroutines (user defined functions or UDF). They may be included in the same file with the main program, or they can be written in a separate file. In the latter case, a special statement must be used to load the auxiliary file contents, which may be one or more subroutines. There are other methods to deal with subroutines but I will not explain them here. I will next explain how to code and use subroutines, and I will give four examples of subroutines related with the CGI protocol.
If the subroutine is included within the same file where it is called (the main program or another subroutine), there is nothing special that you must do to define or declare it. You must begin writing the subroutine by using the 'sub' keyword, followed by the name of the function. The code of the function must be a block, so you must write your statements within a pair of curly brackets.
# this is an example of subroutine
sub printit {
print "Hello, World!";
}
# main program
printit();
exit;
|
The subroutine code can be placed anywhere in the source file. There was a requirement in previous versions of the interpreter to use an ampersand when calling a function, but this has been relaxed. Now it is possible to omit the ampersand if parentheses are included after the name of the subroutine. Some languages make a difference between subroutines, which do not return a value, and functions, that do return a value. This is not the case with Perl. A value may be returned using the 'return' statement. If this statement is not used, the subroutine ends when the last bracket is reached; the value returned is that of the last expression calculated.
sub assignit {
my $a_var;
$a_var = 3.15;
return $a_var;
}
# main program
print assignit();
exit;
|
The preceding example shows how a subroutine may return a value. The 'my' keyword makes that the $a_var variable belong only to the subroutine. Its value is not known to the main program unless it is returned by the routine. If this statement had been omitted, the $a_var variable would be a global one and the main program could use it (after calling 'assignit') without need of the 'return' statement--see the next example--. I could have chosen to return a constant instead of a variable, but that would make a too much simple program.
sub assignit {
$a_var = 3.15;
}
# main program
assignit();
print $a_var;
exit;
|
Subroutines can be passed parameters; however, there is no mechanism to declare formal parameters within a subroutine. Instead, all passed parameters are consolidated into the array @_, from which they must be extracted if one wishes to name them separately. It is customarily to include at the start of a subroutine a statement like one of the following.
my ($param1, $param2, $param3) = @_; # more than one parameter
my ($param1) = shift(@_); # just one parameter
my ($param1) = shift;
The next example shows the calling with a parameter.
sub assignit {
my ($a_var) = shift;
return ($a_var + 1);
}
# main program
print assignit( 3.15 );
exit;
|
Conversely, if one wants to return more than one value, one can do it, but all the returned values are assigned to a single array (list), which must be decomposed by the calling program.
sub assignit {
my $a_var = 6;
return ($a_var, $a_var**3);
}
# main program
($key, $value) = assignit();
print "The cube of $key is $value";
exit;
|
If you want to return no value, you may specify only the bare 'return' statement. The subroutine will return an empty list or an undefined value.
A subroutine may be written in a separate source file, called a library file. A library file in this case is nothing but a file with one or more subroutines that are included all at once. There are three ways to make use of a library: they are the statements (keywords) 'do', 'use', and 'require'. I will explain here the first one, as the other two are more useful when using Perl modules. For the sake of these statements, Perl has an environmental variable, INC, consisting in a list of directories where source files can be found. This variable is reflected in the array @INC, and is used when a source file is mentioned without specifying its location.
The 'do' statement is the simplest way to manage subroutine libraries and all it does is to read the file whose name is given and to parse it as a part of the current program. When the interpreter founds a line like:
do 'sublib.pl';
it searches for the file, reads it, and incorporates any subroutine it finds to the current program. Please note that the subroutines are not executed; they are made available to the program where the 'do' occurred. Also, any statement found in the file that is not part of a subroutine is executed as if it were part of the main program.
Perl modules
Modules are Perl subroutines that are not considered as forming part of the main program. Technically, we can say that they have their own name space. They are made available by using the keywords 'use' and 'require'. For example, you may include the following line:
use CGI;
The extension ".pm" (which stands for "Perl module") does not need to be included as it is assumed.
A library file composed of many modules is also known as a module, and when you 'use' it, all of the included modules (or functions) are made available. Many people who write subroutines that they consider can be useful to somebody else make them available through the Comprehensive Perl Archive Network (CPAN).
The coding of Perl CGI scripts is eased by the use of a Perl module that bears the appropriate name of "CGI." Written by Lincoln Stein, CGI.pm provides a number of useful subroutines, accessible as function calls or as object methods using the object-oriented paradigm. It can be used to retrieve the arguments of the script (the keywords and values that are passed to your script in the query string that comes after the question mark), and to manage the "cookies" of the user.
While the CGI module allows using function calls or object properties, it is preferable to use the latter method. The notation for the use of object properties or methods makes use of an arrow to point to the desired property or method. Thus, the following expression:
$some_object->some_property
means that we want to retrieve the property called "some_property" belonging to the object called "some_object."
Using Perl subroutines in CGI scripts
The Common Gateway Interface (CGI) was created to allow Web users to query databases located in a server machine. It led to the concept of dynamic pages, i.e., Web pages that are not written as an HTML file but, instead, are produced in the moment by a CGI script. The query parameters are gathered by a form, and transmitted to the CGI script using one of two methods: POST or GET.
Using the GET method, the name and the value of the form elements are used to form 'name=value' pairs that are concatenated using ampersands (&). The resultant string is called a 'query string,' and is placed after an interrogation mark (?) following the URL of the CGI script. This augmented URL can be used in an ordinary link. Its appearance may resemble the following.
mycgi.pl?name=Bob+Merrill&place=Brooklyn+N.Y.
As it can be seen, these special URLs are encoded using a method called "url-encoding," which consists in replacing spaces with plus symbols, and certain characters with their hexadecimal equivalent in the ISO Latin-1 character set, preceded by a percentage symbol (%). For example, a slash will give "%2F", and an equal symbol will give "%3D". A dash, on the contrary, will pass unaltered.
When the POST method is indicated in the FORM tag, the form data is placed on the standard input of the script. The Web server will set two environmental variables: 'CONTENT_LENGTH' contains the input length in bytes, and 'CONTENT_TYPE' should contain the string 'application/x-www-form-urlencoded'.
Four subroutines are shown here that can be used to process the data input through a form. The first one does the decoding, the second prints an error message, and the other two routines call the first two ones to process the query string sent using either the GET or the POST method. These four subroutines can be integrated into a library file. Also, a main program is presented that acts as a CGI script calling three of the routines to print its arguments as an HTML page.
# decode the query string
# output the result in the array @in
# return 0 = OK -- 1 = error
sub decode {
my ($querystr) = @_;
foreach (split("&", $querystr)) {
if (/(.*)=(.*)/) {
# decompose name/value pair
($name, $value) = ($1, $2);
# replace pluses with spaces
$value =~ s/\+/ /g ;
# decode hexadecimal into its char equivalent
$value =~ s/%(..)/pack('c',hex($1))/eg;
# assign to global array
$in{$name} = $value;
}
else {
return 1;
}
}
return 0;
}
|
An associative array (also called a hash) is used, @in, whose keys are the names of the form fields, and the elements are the values. This array is global, that is, it is known by all the routines. It is used as a method to return the results without using the 'return' statement. The query string must be passed as a parameter to this subroutine. If an equal symbol is not found in one of the fields delimited by '&', a value of one is returned indicating an error.
# prints a page giving the name of the offending module
# and the error message
sub showError {
my ($program, $text) = @_;
print <<"EOM";
Content-Type: text/html
<HTML>
<P>Error in '$program'.</P>
<P>$text</P>
</HTML>
EOM
}
|
The error subroutine receives two parameters: the name of the script where the error occurred, and a helpful error message. It then prints to the standard output a minimal HTML page.
# processing GET data...
# calls the routine 'decode' using an environmental variable
sub getArgs {
if ( $ENV{REQUEST_METHOD} eq 'GET'
&& $ENV{QUERY_STRING} ne '') {
$ret = decode( $ENV{QUERY_STRING} );
if ( $ret ) {
showError( 'decode', 'Error in query string' );
}
}
else {
showError( 'getArgs', 'Error in method or query' );
}
}
|
The subroutine to process a query string sent using the GET method asks if the method is 'GET', and if the query string is not empty. Then it calls 'decode' to do the decoding. If there is some error, it calls 'showError'.
# processing POST data...
# retrieves arguments from standard input and calls
# the routine 'decode'
sub postArgs {
if ( $ENV{REQUEST_METHOD} eq 'POST' && $ENV{CONTENT_TYPE}
eq 'application/x-www-form-urlencoded' ) {
read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
$ret = decode( $buffer );
if ( $ret ) {
showError( 'decode', 'Error in form data' );
}
}
else {
showError( 'postArgs', 'Error in method or query' );
}
}
|
The subroutine to process form data sent using the POST method asks if the method is 'POST', and reads from standard input the number of bytes indicated by 'CONTENT_LENGTH'.
The four subroutines presented here could be placed in a file called, let us say, "cgi-form.pl". The layout of "cgi-form.pl" will be:
%in = ();
sub decode {
...
}
sub showError {
...
}
sub getArgs {
...
}
sub postArgs {
...
}
1;
|
The first and the last statements are not part of a subroutine. The first will declare within the calling script the array '%in'. The last is necessary so that the 'do' statement succeeds. Every source file that is loaded with the 'do' statement must end by evaluating an expression with a result of one; otherwise, the 'do' will fail and the subroutines will not be loaded.
The CGI script whose name constitutes the 'ACTION' attribute of the FORM tag will have among its first instructions the following one.
do 'cgi-form.pl';
It will then be able to call 'getArgs' or 'postArgs', and will receive in the array '%in' the name and values of the form fields. A sample script to test the subroutines follows.
# sample main program - receives data using the POST protocol
# and prints them as a set of name-value pairs
#!/usr/bin/perl
do 'cgi-form.pl' || die "Cannot load library";
print "Content-Type: text/html\n\n";
postArgs();
print "<HTML>";
print "<H1>My website script</H1>";
@k = keys(%in);
if ( $#k > 0 ) {
foreach $element ( @k ) {
print "<p>$element ==> $in{$element}"
}
}
else {
print "<p>There are no elements"
}
print "</HTML>";
exit;
|
Previous | Contents | Next
|