A general approach to command line switches and their default values in Perl
In the UNIX world youll rarely find a program which doesnt support a few or many arguments (or command line parameters) which influence the execution of the program.
When Perl programs require arguments (one of the simplest cases: an input filename) one could investigate the ARGV hash (an approach which works well in easy cases) or one could turn to one of the Perl modules, in particular if the arguments are command line switches.
In this article I will discuss a few types of command line switches and the possible logic behind.
What is a command line switch?
Just to recap: a command line switch is traditionally denoted as a hyphen followed by a letter optionally followed by a value e.g. -d
or -d 25
. Note the space between the switch and its value. Some programs require this space whereas other require the value to be attached to the switch like -d25
and still others allow both. Some programs allow switches to be concatenated like -ltr
instead of -l -t -r
. Others allow switches to be more than one letter. Some programs allow a switch to appear multiple times like -v in awk.Further complexities exist: one switch might override others. Some switches exclude each other mutually.
All these cases would need to be handled properly.
On top of that (I think it was) the GNU world introduced double hyphen switches with (usually) string switches e.g. --verbose
.
In the remainder of this article I will only use the simple case of single letter switches with or without argument. I will be using Getopt::Std, one of the core Perl modules and its function getopts. Its basic usage is getopts(ab:,%opts);
for two switches -a
and -b foo
.
Various types of command line switches with or without default values
The typical distinction between command line switches is whether they are boolean (switched on or off) or carry an additional argument. Then there is the question: if a command line switch is absent should there be a default value used in the program?The following table explains the differences and shows a few examples.
switch | Example | Default | Notes |
---|---|---|---|
-a | ... | false | A boolean switch by its very nature has a default value true or false which should be the opposite of what the switch intends to trigger. |
-d $HOME/tmp | output directory | /tmp | Certain things in the program require a default value e.g. the program needs to know where to store its output files. Its left to the programmer to decide which of the default values can be overruled by command line switches. |
-u joe,sandy | user list | current user | Some command line switches can take more complex arguments, in this case a comma separated list of users. Its absence should be covered by a reasonable default value e.g. the current user. |
-p 1507 | process id | all processes | Some switches do specify a setting which acts as a filter or a kind of a restriction but its absence does not imply a default value but is somewhat vague. In the user list example before another default behaviour could have been all users instead of current user. |
Rather than defining a list of variables to set the defaults like
$OUTDIR = "/tmp"; $USERS = $ENV{USER}; ...and later somehow associate these variables with the switches a (in my view) cleaner approach is to
The following Perl program handles the cases above.
... ? ... : ...
operator is used to set the actual variables(Getopt::Std sets boolean switches to 1 which represents true, the opposite (and default) could be anything that evaluates to false in an if(...) clause, I chose undef rather than 0).
#!/usr/bin/perl use strict; use Getopt::Std; # to process command line arguments # Define the defaults in a hash my %defaults; $defaults{"a"} = undef; $defaults{"d"} = "/tmp"; $defaults{"u"} = $ENV{USER}; $defaults{"p"} = undef; # Retrieve the command line switches into a hash # making sure which ones are boolean and which require an argument with : my %opts; getopts(ad:u:p:,%opts); # Put either the default values or the command line switch arguments into a hash my %vars; foreach my $key (keys %defaults) { $vars{$key} = exists $opts{$key} ? $opts{$key} : $defaults{$key} ; } # Test output: see what is contained in vars foreach my $key (keys %vars) { print $key," ",$vars{$key}," "; } print " "; # Check decision tree for boolean and unspecified switches print "a is set " if( $vars{"a"} ); print "p: all processes " unless( $vars{"p"} );
If run without any command line switches:
u andreas p a d /tmp p: all processes
With -a and -u
... -a -u joe,sandy u joe,sandy p a 1 d /tmp a is set p: all processes
With -p and -d
... -d $HOME/tmp -p 1507 u andreas p 1507 a d /export/home/andreas/tmp
With this general approach one hash vars contains all the information and its contents can be used directly later in the program (like -d output directory) or used in a decision process defined vs. undefined.
Of course there are more issues like the ones mentioned above (e.g. conflicting switches) or validity of values (e.g. does the output directory exist and is writable) but they need to be resolved somewhere else in the code.