diff options
Diffstat (limited to 'chromium/third_party/cygwin/lib/perl5/5.10/pods/perlop.pod')
-rw-r--r-- | chromium/third_party/cygwin/lib/perl5/5.10/pods/perlop.pod | 2497 |
1 files changed, 0 insertions, 2497 deletions
diff --git a/chromium/third_party/cygwin/lib/perl5/5.10/pods/perlop.pod b/chromium/third_party/cygwin/lib/perl5/5.10/pods/perlop.pod deleted file mode 100644 index 7cac1deddfc..00000000000 --- a/chromium/third_party/cygwin/lib/perl5/5.10/pods/perlop.pod +++ /dev/null @@ -1,2497 +0,0 @@ -=head1 NAME -X<operator> - -perlop - Perl operators and precedence - -=head1 DESCRIPTION - -=head2 Operator Precedence and Associativity -X<operator, precedence> X<precedence> X<associativity> - -Operator precedence and associativity work in Perl more or less like -they do in mathematics. - -I<Operator precedence> means some operators are evaluated before -others. For example, in C<2 + 4 * 5>, the multiplication has higher -precedence so C<4 * 5> is evaluated first yielding C<2 + 20 == -22> and not C<6 * 5 == 30>. - -I<Operator associativity> defines what happens if a sequence of the -same operators is used one after another: whether the evaluator will -evaluate the left operations first or the right. For example, in C<8 -- 4 - 2>, subtraction is left associative so Perl evaluates the -expression left to right. C<8 - 4> is evaluated first making the -expression C<4 - 2 == 2> and not C<8 - 2 == 6>. - -Perl operators have the following associativity and precedence, -listed from highest precedence to lowest. Operators borrowed from -C keep the same precedence relationship with each other, even where -C's precedence is slightly screwy. (This makes learning Perl easier -for C folks.) With very few exceptions, these all operate on scalar -values only, not array values. - - left terms and list operators (leftward) - left -> - nonassoc ++ -- - right ** - right ! ~ \ and unary + and - - left =~ !~ - left * / % x - left + - . - left << >> - nonassoc named unary operators - nonassoc < > <= >= lt gt le ge - nonassoc == != <=> eq ne cmp ~~ - left & - left | ^ - left && - left || // - nonassoc .. ... - right ?: - right = += -= *= etc. - left , => - nonassoc list operators (rightward) - right not - left and - left or xor - -In the following sections, these operators are covered in precedence order. - -Many operators can be overloaded for objects. See L<overload>. - -=head2 Terms and List Operators (Leftward) -X<list operator> X<operator, list> X<term> - -A TERM has the highest precedence in Perl. They include variables, -quote and quote-like operators, any expression in parentheses, -and any function whose arguments are parenthesized. Actually, there -aren't really functions in this sense, just list operators and unary -operators behaving as functions because you put parentheses around -the arguments. These are all documented in L<perlfunc>. - -If any list operator (print(), etc.) or any unary operator (chdir(), etc.) -is followed by a left parenthesis as the next token, the operator and -arguments within parentheses are taken to be of highest precedence, -just like a normal function call. - -In the absence of parentheses, the precedence of list operators such as -C<print>, C<sort>, or C<chmod> is either very high or very low depending on -whether you are looking at the left side or the right side of the operator. -For example, in - - @ary = (1, 3, sort 4, 2); - print @ary; # prints 1324 - -the commas on the right of the sort are evaluated before the sort, -but the commas on the left are evaluated after. In other words, -list operators tend to gobble up all arguments that follow, and -then act like a simple TERM with regard to the preceding expression. -Be careful with parentheses: - - # These evaluate exit before doing the print: - print($foo, exit); # Obviously not what you want. - print $foo, exit; # Nor is this. - - # These do the print before evaluating exit: - (print $foo), exit; # This is what you want. - print($foo), exit; # Or this. - print ($foo), exit; # Or even this. - -Also note that - - print ($foo & 255) + 1, "\n"; - -probably doesn't do what you expect at first glance. The parentheses -enclose the argument list for C<print> which is evaluated (printing -the result of C<$foo & 255>). Then one is added to the return value -of C<print> (usually 1). The result is something like this: - - 1 + 1, "\n"; # Obviously not what you meant. - -To do what you meant properly, you must write: - - print(($foo & 255) + 1, "\n"); - -See L<Named Unary Operators> for more discussion of this. - -Also parsed as terms are the C<do {}> and C<eval {}> constructs, as -well as subroutine and method calls, and the anonymous -constructors C<[]> and C<{}>. - -See also L<Quote and Quote-like Operators> toward the end of this section, -as well as L</"I/O Operators">. - -=head2 The Arrow Operator -X<arrow> X<dereference> X<< -> >> - -"C<< -> >>" is an infix dereference operator, just as it is in C -and C++. If the right side is either a C<[...]>, C<{...}>, or a -C<(...)> subscript, then the left side must be either a hard or -symbolic reference to an array, a hash, or a subroutine respectively. -(Or technically speaking, a location capable of holding a hard -reference, if it's an array or hash reference being used for -assignment.) See L<perlreftut> and L<perlref>. - -Otherwise, the right side is a method name or a simple scalar -variable containing either the method name or a subroutine reference, -and the left side must be either an object (a blessed reference) -or a class name (that is, a package name). See L<perlobj>. - -=head2 Auto-increment and Auto-decrement -X<increment> X<auto-increment> X<++> X<decrement> X<auto-decrement> X<--> - -"++" and "--" work as in C. That is, if placed before a variable, -they increment or decrement the variable by one before returning the -value, and if placed after, increment or decrement after returning the -value. - - $i = 0; $j = 0; - print $i++; # prints 0 - print ++$j; # prints 1 - -Note that just as in C, Perl doesn't define B<when> the variable is -incremented or decremented. You just know it will be done sometime -before or after the value is returned. This also means that modifying -a variable twice in the same statement will lead to undefined behaviour. -Avoid statements like: - - $i = $i ++; - print ++ $i + $i ++; - -Perl will not guarantee what the result of the above statements is. - -The auto-increment operator has a little extra builtin magic to it. If -you increment a variable that is numeric, or that has ever been used in -a numeric context, you get a normal increment. If, however, the -variable has been used in only string contexts since it was set, and -has a value that is not the empty string and matches the pattern -C</^[a-zA-Z]*[0-9]*\z/>, the increment is done as a string, preserving each -character within its range, with carry: - - print ++($foo = '99'); # prints '100' - print ++($foo = 'a0'); # prints 'a1' - print ++($foo = 'Az'); # prints 'Ba' - print ++($foo = 'zz'); # prints 'aaa' - -C<undef> is always treated as numeric, and in particular is changed -to C<0> before incrementing (so that a post-increment of an undef value -will return C<0> rather than C<undef>). - -The auto-decrement operator is not magical. - -=head2 Exponentiation -X<**> X<exponentiation> X<power> - -Binary "**" is the exponentiation operator. It binds even more -tightly than unary minus, so -2**4 is -(2**4), not (-2)**4. (This is -implemented using C's pow(3) function, which actually works on doubles -internally.) - -=head2 Symbolic Unary Operators -X<unary operator> X<operator, unary> - -Unary "!" performs logical negation, i.e., "not". See also C<not> for a lower -precedence version of this. -X<!> - -Unary "-" performs arithmetic negation if the operand is numeric. If -the operand is an identifier, a string consisting of a minus sign -concatenated with the identifier is returned. Otherwise, if the string -starts with a plus or minus, a string starting with the opposite sign -is returned. One effect of these rules is that -bareword is equivalent -to the string "-bareword". If, however, the string begins with a -non-alphabetic character (excluding "+" or "-"), Perl will attempt to convert -the string to a numeric and the arithmetic negation is performed. If the -string cannot be cleanly converted to a numeric, Perl will give the warning -B<Argument "the string" isn't numeric in negation (-) at ...>. -X<-> X<negation, arithmetic> - -Unary "~" performs bitwise negation, i.e., 1's complement. For -example, C<0666 & ~027> is 0640. (See also L<Integer Arithmetic> and -L<Bitwise String Operators>.) Note that the width of the result is -platform-dependent: ~0 is 32 bits wide on a 32-bit platform, but 64 -bits wide on a 64-bit platform, so if you are expecting a certain bit -width, remember to use the & operator to mask off the excess bits. -X<~> X<negation, binary> - -Unary "+" has no effect whatsoever, even on strings. It is useful -syntactically for separating a function name from a parenthesized expression -that would otherwise be interpreted as the complete list of function -arguments. (See examples above under L<Terms and List Operators (Leftward)>.) -X<+> - -Unary "\" creates a reference to whatever follows it. See L<perlreftut> -and L<perlref>. Do not confuse this behavior with the behavior of -backslash within a string, although both forms do convey the notion -of protecting the next thing from interpolation. -X<\> X<reference> X<backslash> - -=head2 Binding Operators -X<binding> X<operator, binding> X<=~> X<!~> - -Binary "=~" binds a scalar expression to a pattern match. Certain operations -search or modify the string $_ by default. This operator makes that kind -of operation work on some other string. The right argument is a search -pattern, substitution, or transliteration. The left argument is what is -supposed to be searched, substituted, or transliterated instead of the default -$_. When used in scalar context, the return value generally indicates the -success of the operation. Behavior in list context depends on the particular -operator. See L</"Regexp Quote-Like Operators"> for details and -L<perlretut> for examples using these operators. - -If the right argument is an expression rather than a search pattern, -substitution, or transliteration, it is interpreted as a search pattern at run -time. Note that this means that its contents will be interpolated twice, so - - '\\' =~ q'\\'; - -is not ok, as the regex engine will end up trying to compile the -pattern C<\>, which it will consider a syntax error. - -Binary "!~" is just like "=~" except the return value is negated in -the logical sense. - -=head2 Multiplicative Operators -X<operator, multiplicative> - -Binary "*" multiplies two numbers. -X<*> - -Binary "/" divides two numbers. -X</> X<slash> - -Binary "%" is the modulo operator, which computes the division -remainder of its first argument with respect to its second argument. -Given integer -operands C<$a> and C<$b>: If C<$b> is positive, then C<$a % $b> is -C<$a> minus the largest multiple of C<$b> less than or equal to -C<$a>. If C<$b> is negative, then C<$a % $b> is C<$a> minus the -smallest multiple of C<$b> that is not less than C<$a> (i.e. the -result will be less than or equal to zero). If the operands -C<$a> and C<$b> are floating point values and the absolute value of -C<$b> (that is C<abs($b)>) is less than C<(UV_MAX + 1)>, only -the integer portion of C<$a> and C<$b> will be used in the operation -(Note: here C<UV_MAX> means the maximum of the unsigned integer type). -If the absolute value of the right operand (C<abs($b)>) is greater than -or equal to C<(UV_MAX + 1)>, "%" computes the floating-point remainder -C<$r> in the equation C<($r = $a - $i*$b)> where C<$i> is a certain -integer that makes C<$r> have the same sign as the right operand -C<$b> (B<not> as the left operand C<$a> like C function C<fmod()>) -and the absolute value less than that of C<$b>. -Note that when C<use integer> is in scope, "%" gives you direct access -to the modulo operator as implemented by your C compiler. This -operator is not as well defined for negative operands, but it will -execute faster. -X<%> X<remainder> X<modulo> X<mod> - -Binary "x" is the repetition operator. In scalar context or if the left -operand is not enclosed in parentheses, it returns a string consisting -of the left operand repeated the number of times specified by the right -operand. In list context, if the left operand is enclosed in -parentheses or is a list formed by C<qw/STRING/>, it repeats the list. -If the right operand is zero or negative, it returns an empty string -or an empty list, depending on the context. -X<x> - - print '-' x 80; # print row of dashes - - print "\t" x ($tab/8), ' ' x ($tab%8); # tab over - - @ones = (1) x 80; # a list of 80 1's - @ones = (5) x @ones; # set all elements to 5 - - -=head2 Additive Operators -X<operator, additive> - -Binary "+" returns the sum of two numbers. -X<+> - -Binary "-" returns the difference of two numbers. -X<-> - -Binary "." concatenates two strings. -X<string, concatenation> X<concatenation> -X<cat> X<concat> X<concatenate> X<.> - -=head2 Shift Operators -X<shift operator> X<operator, shift> X<<< << >>> -X<<< >> >>> X<right shift> X<left shift> X<bitwise shift> -X<shl> X<shr> X<shift, right> X<shift, left> - -Binary "<<" returns the value of its left argument shifted left by the -number of bits specified by the right argument. Arguments should be -integers. (See also L<Integer Arithmetic>.) - -Binary ">>" returns the value of its left argument shifted right by -the number of bits specified by the right argument. Arguments should -be integers. (See also L<Integer Arithmetic>.) - -Note that both "<<" and ">>" in Perl are implemented directly using -"<<" and ">>" in C. If C<use integer> (see L<Integer Arithmetic>) is -in force then signed C integers are used, else unsigned C integers are -used. Either way, the implementation isn't going to generate results -larger than the size of the integer type Perl was built with (32 bits -or 64 bits). - -The result of overflowing the range of the integers is undefined -because it is undefined also in C. In other words, using 32-bit -integers, C<< 1 << 32 >> is undefined. Shifting by a negative number -of bits is also undefined. - -=head2 Named Unary Operators -X<operator, named unary> - -The various named unary operators are treated as functions with one -argument, with optional parentheses. - -If any list operator (print(), etc.) or any unary operator (chdir(), etc.) -is followed by a left parenthesis as the next token, the operator and -arguments within parentheses are taken to be of highest precedence, -just like a normal function call. For example, -because named unary operators are higher precedence than ||: - - chdir $foo || die; # (chdir $foo) || die - chdir($foo) || die; # (chdir $foo) || die - chdir ($foo) || die; # (chdir $foo) || die - chdir +($foo) || die; # (chdir $foo) || die - -but, because * is higher precedence than named operators: - - chdir $foo * 20; # chdir ($foo * 20) - chdir($foo) * 20; # (chdir $foo) * 20 - chdir ($foo) * 20; # (chdir $foo) * 20 - chdir +($foo) * 20; # chdir ($foo * 20) - - rand 10 * 20; # rand (10 * 20) - rand(10) * 20; # (rand 10) * 20 - rand (10) * 20; # (rand 10) * 20 - rand +(10) * 20; # rand (10 * 20) - -Regarding precedence, the filetest operators, like C<-f>, C<-M>, etc. are -treated like named unary operators, but they don't follow this functional -parenthesis rule. That means, for example, that C<-f($file).".bak"> is -equivalent to C<-f "$file.bak">. -X<-X> X<filetest> X<operator, filetest> - -See also L<"Terms and List Operators (Leftward)">. - -=head2 Relational Operators -X<relational operator> X<operator, relational> - -Binary "<" returns true if the left argument is numerically less than -the right argument. -X<< < >> - -Binary ">" returns true if the left argument is numerically greater -than the right argument. -X<< > >> - -Binary "<=" returns true if the left argument is numerically less than -or equal to the right argument. -X<< <= >> - -Binary ">=" returns true if the left argument is numerically greater -than or equal to the right argument. -X<< >= >> - -Binary "lt" returns true if the left argument is stringwise less than -the right argument. -X<< lt >> - -Binary "gt" returns true if the left argument is stringwise greater -than the right argument. -X<< gt >> - -Binary "le" returns true if the left argument is stringwise less than -or equal to the right argument. -X<< le >> - -Binary "ge" returns true if the left argument is stringwise greater -than or equal to the right argument. -X<< ge >> - -=head2 Equality Operators -X<equality> X<equal> X<equals> X<operator, equality> - -Binary "==" returns true if the left argument is numerically equal to -the right argument. -X<==> - -Binary "!=" returns true if the left argument is numerically not equal -to the right argument. -X<!=> - -Binary "<=>" returns -1, 0, or 1 depending on whether the left -argument is numerically less than, equal to, or greater than the right -argument. If your platform supports NaNs (not-a-numbers) as numeric -values, using them with "<=>" returns undef. NaN is not "<", "==", ">", -"<=" or ">=" anything (even NaN), so those 5 return false. NaN != NaN -returns true, as does NaN != anything else. If your platform doesn't -support NaNs then NaN is just a string with numeric value 0. -X<< <=> >> X<spaceship> - - perl -le '$a = "NaN"; print "No NaN support here" if $a == $a' - perl -le '$a = "NaN"; print "NaN support here" if $a != $a' - -Binary "eq" returns true if the left argument is stringwise equal to -the right argument. -X<eq> - -Binary "ne" returns true if the left argument is stringwise not equal -to the right argument. -X<ne> - -Binary "cmp" returns -1, 0, or 1 depending on whether the left -argument is stringwise less than, equal to, or greater than the right -argument. -X<cmp> - -Binary "~~" does a smart match between its arguments. Smart matching -is described in L<perlsyn/"Smart matching in detail">. -X<~~> - -"lt", "le", "ge", "gt" and "cmp" use the collation (sort) order specified -by the current locale if C<use locale> is in effect. See L<perllocale>. - -=head2 Bitwise And -X<operator, bitwise, and> X<bitwise and> X<&> - -Binary "&" returns its operands ANDed together bit by bit. -(See also L<Integer Arithmetic> and L<Bitwise String Operators>.) - -Note that "&" has lower priority than relational operators, so for example -the brackets are essential in a test like - - print "Even\n" if ($x & 1) == 0; - -=head2 Bitwise Or and Exclusive Or -X<operator, bitwise, or> X<bitwise or> X<|> X<operator, bitwise, xor> -X<bitwise xor> X<^> - -Binary "|" returns its operands ORed together bit by bit. -(See also L<Integer Arithmetic> and L<Bitwise String Operators>.) - -Binary "^" returns its operands XORed together bit by bit. -(See also L<Integer Arithmetic> and L<Bitwise String Operators>.) - -Note that "|" and "^" have lower priority than relational operators, so -for example the brackets are essential in a test like - - print "false\n" if (8 | 2) != 10; - -=head2 C-style Logical And -X<&&> X<logical and> X<operator, logical, and> - -Binary "&&" performs a short-circuit logical AND operation. That is, -if the left operand is false, the right operand is not even evaluated. -Scalar or list context propagates down to the right operand if it -is evaluated. - -=head2 C-style Logical Or -X<||> X<operator, logical, or> - -Binary "||" performs a short-circuit logical OR operation. That is, -if the left operand is true, the right operand is not even evaluated. -Scalar or list context propagates down to the right operand if it -is evaluated. - -=head2 C-style Logical Defined-Or -X<//> X<operator, logical, defined-or> - -Although it has no direct equivalent in C, Perl's C<//> operator is related -to its C-style or. In fact, it's exactly the same as C<||>, except that it -tests the left hand side's definedness instead of its truth. Thus, C<$a // $b> -is similar to C<defined($a) || $b> (except that it returns the value of C<$a> -rather than the value of C<defined($a)>) and is exactly equivalent to -C<defined($a) ? $a : $b>. This is very useful for providing default values -for variables. If you actually want to test if at least one of C<$a> and -C<$b> is defined, use C<defined($a // $b)>. - -The C<||>, C<//> and C<&&> operators return the last value evaluated -(unlike C's C<||> and C<&&>, which return 0 or 1). Thus, a reasonably -portable way to find out the home directory might be: - - $home = $ENV{'HOME'} // $ENV{'LOGDIR'} // - (getpwuid($<))[7] // die "You're homeless!\n"; - -In particular, this means that you shouldn't use this -for selecting between two aggregates for assignment: - - @a = @b || @c; # this is wrong - @a = scalar(@b) || @c; # really meant this - @a = @b ? @b : @c; # this works fine, though - -As more readable alternatives to C<&&> and C<||> when used for -control flow, Perl provides the C<and> and C<or> operators (see below). -The short-circuit behavior is identical. The precedence of "and" -and "or" is much lower, however, so that you can safely use them after a -list operator without the need for parentheses: - - unlink "alpha", "beta", "gamma" - or gripe(), next LINE; - -With the C-style operators that would have been written like this: - - unlink("alpha", "beta", "gamma") - || (gripe(), next LINE); - -Using "or" for assignment is unlikely to do what you want; see below. - -=head2 Range Operators -X<operator, range> X<range> X<..> X<...> - -Binary ".." is the range operator, which is really two different -operators depending on the context. In list context, it returns a -list of values counting (up by ones) from the left value to the right -value. If the left value is greater than the right value then it -returns the empty list. The range operator is useful for writing -C<foreach (1..10)> loops and for doing slice operations on arrays. In -the current implementation, no temporary array is created when the -range operator is used as the expression in C<foreach> loops, but older -versions of Perl might burn a lot of memory when you write something -like this: - - for (1 .. 1_000_000) { - # code - } - -The range operator also works on strings, using the magical auto-increment, -see below. - -In scalar context, ".." returns a boolean value. The operator is -bistable, like a flip-flop, and emulates the line-range (comma) operator -of B<sed>, B<awk>, and various editors. Each ".." operator maintains its -own boolean state. It is false as long as its left operand is false. -Once the left operand is true, the range operator stays true until the -right operand is true, I<AFTER> which the range operator becomes false -again. It doesn't become false till the next time the range operator is -evaluated. It can test the right operand and become false on the same -evaluation it became true (as in B<awk>), but it still returns true once. -If you don't want it to test the right operand till the next -evaluation, as in B<sed>, just use three dots ("...") instead of -two. In all other regards, "..." behaves just like ".." does. - -The right operand is not evaluated while the operator is in the -"false" state, and the left operand is not evaluated while the -operator is in the "true" state. The precedence is a little lower -than || and &&. The value returned is either the empty string for -false, or a sequence number (beginning with 1) for true. The -sequence number is reset for each range encountered. The final -sequence number in a range has the string "E0" appended to it, which -doesn't affect its numeric value, but gives you something to search -for if you want to exclude the endpoint. You can exclude the -beginning point by waiting for the sequence number to be greater -than 1. - -If either operand of scalar ".." is a constant expression, -that operand is considered true if it is equal (C<==>) to the current -input line number (the C<$.> variable). - -To be pedantic, the comparison is actually C<int(EXPR) == int(EXPR)>, -but that is only an issue if you use a floating point expression; when -implicitly using C<$.> as described in the previous paragraph, the -comparison is C<int(EXPR) == int($.)> which is only an issue when C<$.> -is set to a floating point value and you are not reading from a file. -Furthermore, C<"span" .. "spat"> or C<2.18 .. 3.14> will not do what -you want in scalar context because each of the operands are evaluated -using their integer representation. - -Examples: - -As a scalar operator: - - if (101 .. 200) { print; } # print 2nd hundred lines, short for - # if ($. == 101 .. $. == 200) ... - - next LINE if (1 .. /^$/); # skip header lines, short for - # ... if ($. == 1 .. /^$/); - # (typically in a loop labeled LINE) - - s/^/> / if (/^$/ .. eof()); # quote body - - # parse mail messages - while (<>) { - $in_header = 1 .. /^$/; - $in_body = /^$/ .. eof; - if ($in_header) { - # ... - } else { # in body - # ... - } - } continue { - close ARGV if eof; # reset $. each file - } - -Here's a simple example to illustrate the difference between -the two range operators: - - @lines = (" - Foo", - "01 - Bar", - "1 - Baz", - " - Quux"); - - foreach (@lines) { - if (/0/ .. /1/) { - print "$_\n"; - } - } - -This program will print only the line containing "Bar". If -the range operator is changed to C<...>, it will also print the -"Baz" line. - -And now some examples as a list operator: - - for (101 .. 200) { print; } # print $_ 100 times - @foo = @foo[0 .. $#foo]; # an expensive no-op - @foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items - -The range operator (in list context) makes use of the magical -auto-increment algorithm if the operands are strings. You -can say - - @alphabet = ('A' .. 'Z'); - -to get all normal letters of the English alphabet, or - - $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15]; - -to get a hexadecimal digit, or - - @z2 = ('01' .. '31'); print $z2[$mday]; - -to get dates with leading zeros. - -If the final value specified is not in the sequence that the magical -increment would produce, the sequence goes until the next value would -be longer than the final value specified. - -If the initial value specified isn't part of a magical increment -sequence (that is, a non-empty string matching "/^[a-zA-Z]*[0-9]*\z/"), -only the initial value will be returned. So the following will only -return an alpha: - - use charnames 'greek'; - my @greek_small = ("\N{alpha}" .. "\N{omega}"); - -To get lower-case greek letters, use this instead: - - my @greek_small = map { chr } ( ord("\N{alpha}") .. ord("\N{omega}") ); - -Because each operand is evaluated in integer form, C<2.18 .. 3.14> will -return two elements in list context. - - @list = (2.18 .. 3.14); # same as @list = (2 .. 3); - -=head2 Conditional Operator -X<operator, conditional> X<operator, ternary> X<ternary> X<?:> - -Ternary "?:" is the conditional operator, just as in C. It works much -like an if-then-else. If the argument before the ? is true, the -argument before the : is returned, otherwise the argument after the : -is returned. For example: - - printf "I have %d dog%s.\n", $n, - ($n == 1) ? '' : "s"; - -Scalar or list context propagates downward into the 2nd -or 3rd argument, whichever is selected. - - $a = $ok ? $b : $c; # get a scalar - @a = $ok ? @b : @c; # get an array - $a = $ok ? @b : @c; # oops, that's just a count! - -The operator may be assigned to if both the 2nd and 3rd arguments are -legal lvalues (meaning that you can assign to them): - - ($a_or_b ? $a : $b) = $c; - -Because this operator produces an assignable result, using assignments -without parentheses will get you in trouble. For example, this: - - $a % 2 ? $a += 10 : $a += 2 - -Really means this: - - (($a % 2) ? ($a += 10) : $a) += 2 - -Rather than this: - - ($a % 2) ? ($a += 10) : ($a += 2) - -That should probably be written more simply as: - - $a += ($a % 2) ? 10 : 2; - -=head2 Assignment Operators -X<assignment> X<operator, assignment> X<=> X<**=> X<+=> X<*=> X<&=> -X<<< <<= >>> X<&&=> X<-=> X</=> X<|=> X<<< >>= >>> X<||=> X<//=> X<.=> -X<%=> X<^=> X<x=> - -"=" is the ordinary assignment operator. - -Assignment operators work as in C. That is, - - $a += 2; - -is equivalent to - - $a = $a + 2; - -although without duplicating any side effects that dereferencing the lvalue -might trigger, such as from tie(). Other assignment operators work similarly. -The following are recognized: - - **= += *= &= <<= &&= - -= /= |= >>= ||= - .= %= ^= //= - x= - -Although these are grouped by family, they all have the precedence -of assignment. - -Unlike in C, the scalar assignment operator produces a valid lvalue. -Modifying an assignment is equivalent to doing the assignment and -then modifying the variable that was assigned to. This is useful -for modifying a copy of something, like this: - - ($tmp = $global) =~ tr [A-Z] [a-z]; - -Likewise, - - ($a += 2) *= 3; - -is equivalent to - - $a += 2; - $a *= 3; - -Similarly, a list assignment in list context produces the list of -lvalues assigned to, and a list assignment in scalar context returns -the number of elements produced by the expression on the right hand -side of the assignment. - -=head2 Comma Operator -X<comma> X<operator, comma> X<,> - -Binary "," is the comma operator. In scalar context it evaluates -its left argument, throws that value away, then evaluates its right -argument and returns that value. This is just like C's comma operator. - -In list context, it's just the list argument separator, and inserts -both its arguments into the list. These arguments are also evaluated -from left to right. - -The C<< => >> operator is a synonym for the comma, but forces any word -(consisting entirely of word characters) to its left to be interpreted -as a string (as of 5.001). This includes words that might otherwise be -considered a constant or function call. - - use constant FOO => "something"; - - my %h = ( FOO => 23 ); - -is equivalent to: - - my %h = ("FOO", 23); - -It is I<NOT>: - - my %h = ("something", 23); - -If the argument on the left is not a word, it is first interpreted as -an expression, and then the string value of that is used. - -The C<< => >> operator is helpful in documenting the correspondence -between keys and values in hashes, and other paired elements in lists. - - %hash = ( $key => $value ); - login( $username => $password ); - -=head2 List Operators (Rightward) -X<operator, list, rightward> X<list operator> - -On the right side of a list operator, it has very low precedence, -such that it controls all comma-separated expressions found there. -The only operators with lower precedence are the logical operators -"and", "or", and "not", which may be used to evaluate calls to list -operators without the need for extra parentheses: - - open HANDLE, "filename" - or die "Can't open: $!\n"; - -See also discussion of list operators in L<Terms and List Operators (Leftward)>. - -=head2 Logical Not -X<operator, logical, not> X<not> - -Unary "not" returns the logical negation of the expression to its right. -It's the equivalent of "!" except for the very low precedence. - -=head2 Logical And -X<operator, logical, and> X<and> - -Binary "and" returns the logical conjunction of the two surrounding -expressions. It's equivalent to && except for the very low -precedence. This means that it short-circuits: i.e., the right -expression is evaluated only if the left expression is true. - -=head2 Logical or, Defined or, and Exclusive Or -X<operator, logical, or> X<operator, logical, xor> -X<operator, logical, defined or> X<operator, logical, exclusive or> -X<or> X<xor> - -Binary "or" returns the logical disjunction of the two surrounding -expressions. It's equivalent to || except for the very low precedence. -This makes it useful for control flow - - print FH $data or die "Can't write to FH: $!"; - -This means that it short-circuits: i.e., the right expression is evaluated -only if the left expression is false. Due to its precedence, you should -probably avoid using this for assignment, only for control flow. - - $a = $b or $c; # bug: this is wrong - ($a = $b) or $c; # really means this - $a = $b || $c; # better written this way - -However, when it's a list-context assignment and you're trying to use -"||" for control flow, you probably need "or" so that the assignment -takes higher precedence. - - @info = stat($file) || die; # oops, scalar sense of stat! - @info = stat($file) or die; # better, now @info gets its due - -Then again, you could always use parentheses. - -Binary "xor" returns the exclusive-OR of the two surrounding expressions. -It cannot short circuit, of course. - -=head2 C Operators Missing From Perl -X<operator, missing from perl> X<&> X<*> -X<typecasting> X<(TYPE)> - -Here is what C has that Perl doesn't: - -=over 8 - -=item unary & - -Address-of operator. (But see the "\" operator for taking a reference.) - -=item unary * - -Dereference-address operator. (Perl's prefix dereferencing -operators are typed: $, @, %, and &.) - -=item (TYPE) - -Type-casting operator. - -=back - -=head2 Quote and Quote-like Operators -X<operator, quote> X<operator, quote-like> X<q> X<qq> X<qx> X<qw> X<m> -X<qr> X<s> X<tr> X<'> X<''> X<"> X<""> X<//> X<`> X<``> X<<< << >>> -X<escape sequence> X<escape> - - -While we usually think of quotes as literal values, in Perl they -function as operators, providing various kinds of interpolating and -pattern matching capabilities. Perl provides customary quote characters -for these behaviors, but also provides a way for you to choose your -quote character for any of them. In the following table, a C<{}> represents -any pair of delimiters you choose. - - Customary Generic Meaning Interpolates - '' q{} Literal no - "" qq{} Literal yes - `` qx{} Command yes* - qw{} Word list no - // m{} Pattern match yes* - qr{} Pattern yes* - s{}{} Substitution yes* - tr{}{} Transliteration no (but see below) - <<EOF here-doc yes* - - * unless the delimiter is ''. - -Non-bracketing delimiters use the same character fore and aft, but the four -sorts of brackets (round, angle, square, curly) will all nest, which means -that - - q{foo{bar}baz} - -is the same as - - 'foo{bar}baz' - -Note, however, that this does not always work for quoting Perl code: - - $s = q{ if($a eq "}") ... }; # WRONG - -is a syntax error. The C<Text::Balanced> module (from CPAN, and -starting from Perl 5.8 part of the standard distribution) is able -to do this properly. - -There can be whitespace between the operator and the quoting -characters, except when C<#> is being used as the quoting character. -C<q#foo#> is parsed as the string C<foo>, while C<q #foo#> is the -operator C<q> followed by a comment. Its argument will be taken -from the next line. This allows you to write: - - s {foo} # Replace foo - {bar} # with bar. - -The following escape sequences are available in constructs that interpolate -and in transliterations. -X<\t> X<\n> X<\r> X<\f> X<\b> X<\a> X<\e> X<\x> X<\0> X<\c> X<\N> - - \t tab (HT, TAB) - \n newline (NL) - \r return (CR) - \f form feed (FF) - \b backspace (BS) - \a alarm (bell) (BEL) - \e escape (ESC) - \033 octal char (example: ESC) - \x1b hex char (example: ESC) - \x{263a} wide hex char (example: SMILEY) - \c[ control char (example: ESC) - \N{name} named Unicode character - -The character following C<\c> is mapped to some other character by -converting letters to upper case and then (on ASCII systems) by inverting -the 7th bit (0x40). The most interesting range is from '@' to '_' -(0x40 through 0x5F), resulting in a control character from 0x00 -through 0x1F. A '?' maps to the DEL character. On EBCDIC systems only -'@', the letters, '[', '\', ']', '^', '_' and '?' will work, resulting -in 0x00 through 0x1F and 0x7F. - -B<NOTE>: Unlike C and other languages, Perl has no \v escape sequence for -the vertical tab (VT - ASCII 11), but you may use C<\ck> or C<\x0b>. - -The following escape sequences are available in constructs that interpolate -but not in transliterations. -X<\l> X<\u> X<\L> X<\U> X<\E> X<\Q> - - \l lowercase next char - \u uppercase next char - \L lowercase till \E - \U uppercase till \E - \E end case modification - \Q quote non-word characters till \E - -If C<use locale> is in effect, the case map used by C<\l>, C<\L>, -C<\u> and C<\U> is taken from the current locale. See L<perllocale>. -If Unicode (for example, C<\N{}> or wide hex characters of 0x100 or -beyond) is being used, the case map used by C<\l>, C<\L>, C<\u> and -C<\U> is as defined by Unicode. For documentation of C<\N{name}>, -see L<charnames>. - -All systems use the virtual C<"\n"> to represent a line terminator, -called a "newline". There is no such thing as an unvarying, physical -newline character. It is only an illusion that the operating system, -device drivers, C libraries, and Perl all conspire to preserve. Not all -systems read C<"\r"> as ASCII CR and C<"\n"> as ASCII LF. For example, -on a Mac, these are reversed, and on systems without line terminator, -printing C<"\n"> may emit no actual data. In general, use C<"\n"> when -you mean a "newline" for your system, but use the literal ASCII when you -need an exact character. For example, most networking protocols expect -and prefer a CR+LF (C<"\015\012"> or C<"\cM\cJ">) for line terminators, -and although they often accept just C<"\012">, they seldom tolerate just -C<"\015">. If you get in the habit of using C<"\n"> for networking, -you may be burned some day. -X<newline> X<line terminator> X<eol> X<end of line> -X<\n> X<\r> X<\r\n> - -For constructs that do interpolate, variables beginning with "C<$>" -or "C<@>" are interpolated. Subscripted variables such as C<$a[3]> or -C<< $href->{key}[0] >> are also interpolated, as are array and hash slices. -But method calls such as C<< $obj->meth >> are not. - -Interpolating an array or slice interpolates the elements in order, -separated by the value of C<$">, so is equivalent to interpolating -C<join $", @array>. "Punctuation" arrays such as C<@*> are only -interpolated if the name is enclosed in braces C<@{*}>, but special -arrays C<@_>, C<@+>, and C<@-> are interpolated, even without braces. - -You cannot include a literal C<$> or C<@> within a C<\Q> sequence. -An unescaped C<$> or C<@> interpolates the corresponding variable, -while escaping will cause the literal string C<\$> to be inserted. -You'll need to write something like C<m/\Quser\E\@\Qhost/>. - -Patterns are subject to an additional level of interpretation as a -regular expression. This is done as a second pass, after variables are -interpolated, so that regular expressions may be incorporated into the -pattern from the variables. If this is not what you want, use C<\Q> to -interpolate a variable literally. - -Apart from the behavior described above, Perl does not expand -multiple levels of interpolation. In particular, contrary to the -expectations of shell programmers, back-quotes do I<NOT> interpolate -within double quotes, nor do single quotes impede evaluation of -variables when used within double quotes. - -=head2 Regexp Quote-Like Operators -X<operator, regexp> - -Here are the quote-like operators that apply to pattern -matching and related activities. - -=over 8 - -=item qr/STRING/msixpo -X<qr> X</i> X</m> X</o> X</s> X</x> X</p> - -This operator quotes (and possibly compiles) its I<STRING> as a regular -expression. I<STRING> is interpolated the same way as I<PATTERN> -in C<m/PATTERN/>. If "'" is used as the delimiter, no interpolation -is done. Returns a Perl value which may be used instead of the -corresponding C</STRING/msixpo> expression. The returned value is a -normalized version of the original pattern. It magically differs from -a string containing the same characters: C<ref(qr/x/)> returns "Regexp", -even though dereferencing the result returns undef. - -For example, - - $rex = qr/my.STRING/is; - print $rex; # prints (?si-xm:my.STRING) - s/$rex/foo/; - -is equivalent to - - s/my.STRING/foo/is; - -The result may be used as a subpattern in a match: - - $re = qr/$pattern/; - $string =~ /foo${re}bar/; # can be interpolated in other patterns - $string =~ $re; # or used standalone - $string =~ /$re/; # or this way - -Since Perl may compile the pattern at the moment of execution of qr() -operator, using qr() may have speed advantages in some situations, -notably if the result of qr() is used standalone: - - sub match { - my $patterns = shift; - my @compiled = map qr/$_/i, @$patterns; - grep { - my $success = 0; - foreach my $pat (@compiled) { - $success = 1, last if /$pat/; - } - $success; - } @_; - } - -Precompilation of the pattern into an internal representation at -the moment of qr() avoids a need to recompile the pattern every -time a match C</$pat/> is attempted. (Perl has many other internal -optimizations, but none would be triggered in the above example if -we did not use qr() operator.) - -Options are: - - m Treat string as multiple lines. - s Treat string as single line. (Make . match a newline) - i Do case-insensitive pattern matching. - x Use extended regular expressions. - p When matching preserve a copy of the matched string so - that ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined. - o Compile pattern only once. - -If a precompiled pattern is embedded in a larger pattern then the effect -of 'msixp' will be propagated appropriately. The effect of the 'o' -modifier has is not propagated, being restricted to those patterns -explicitly using it. - -See L<perlre> for additional information on valid syntax for STRING, and -for a detailed look at the semantics of regular expressions. - -=item m/PATTERN/msixpogc -X<m> X<operator, match> -X<regexp, options> X<regexp> X<regex, options> X<regex> -X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c> - -=item /PATTERN/msixpogc - -Searches a string for a pattern match, and in scalar context returns -true if it succeeds, false if it fails. If no string is specified -via the C<=~> or C<!~> operator, the $_ string is searched. (The -string specified with C<=~> need not be an lvalue--it may be the -result of an expression evaluation, but remember the C<=~> binds -rather tightly.) See also L<perlre>. See L<perllocale> for -discussion of additional considerations that apply when C<use locale> -is in effect. - -Options are as described in C<qr//>; in addition, the following match -process modifiers are available: - - g Match globally, i.e., find all occurrences. - c Do not reset search position on a failed match when /g is in effect. - -If "/" is the delimiter then the initial C<m> is optional. With the C<m> -you can use any pair of non-alphanumeric, non-whitespace characters -as delimiters. This is particularly useful for matching path names -that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is -the delimiter, then the match-only-once rule of C<?PATTERN?> applies. -If "'" is the delimiter, no interpolation is performed on the PATTERN. - -PATTERN may contain variables, which will be interpolated (and the -pattern recompiled) every time the pattern search is evaluated, except -for when the delimiter is a single quote. (Note that C<$(>, C<$)>, and -C<$|> are not interpolated because they look like end-of-string tests.) -If you want such a pattern to be compiled only once, add a C</o> after -the trailing delimiter. This avoids expensive run-time recompilations, -and is useful when the value you are interpolating won't change over -the life of the script. However, mentioning C</o> constitutes a promise -that you won't change the variables in the pattern. If you change them, -Perl won't even notice. See also L<"qr/STRING/msixpo">. - -=item The empty pattern // - -If the PATTERN evaluates to the empty string, the last -I<successfully> matched regular expression is used instead. In this -case, only the C<g> and C<c> flags on the empty pattern is honoured - -the other flags are taken from the original pattern. If no match has -previously succeeded, this will (silently) act instead as a genuine -empty pattern (which will always match). - -Note that it's possible to confuse Perl into thinking C<//> (the empty -regex) is really C<//> (the defined-or operator). Perl is usually pretty -good about this, but some pathological cases might trigger this, such as -C<$a///> (is that C<($a) / (//)> or C<$a // />?) and C<print $fh //> -(C<print $fh(//> or C<print($fh //>?). In all of these examples, Perl -will assume you meant defined-or. If you meant the empty regex, just -use parentheses or spaces to disambiguate, or even prefix the empty -regex with an C<m> (so C<//> becomes C<m//>). - -=item Matching in list context - -If the C</g> option is not used, C<m//> in list context returns a -list consisting of the subexpressions matched by the parentheses in the -pattern, i.e., (C<$1>, C<$2>, C<$3>...). (Note that here C<$1> etc. are -also set, and that this differs from Perl 4's behavior.) When there are -no parentheses in the pattern, the return value is the list C<(1)> for -success. With or without parentheses, an empty list is returned upon -failure. - -Examples: - - open(TTY, '/dev/tty'); - <TTY> =~ /^y/i && foo(); # do foo if desired - - if (/Version: *([0-9.]*)/) { $version = $1; } - - next if m#^/usr/spool/uucp#; - - # poor man's grep - $arg = shift; - while (<>) { - print if /$arg/o; # compile only once - } - - if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/)) - -This last example splits $foo into the first two words and the -remainder of the line, and assigns those three fields to $F1, $F2, and -$Etc. The conditional is true if any variables were assigned, i.e., if -the pattern matched. - -The C</g> modifier specifies global pattern matching--that is, -matching as many times as possible within the string. How it behaves -depends on the context. In list context, it returns a list of the -substrings matched by any capturing parentheses in the regular -expression. If there are no parentheses, it returns a list of all -the matched strings, as if there were parentheses around the whole -pattern. - -In scalar context, each execution of C<m//g> finds the next match, -returning true if it matches, and false if there is no further match. -The position after the last match can be read or set using the pos() -function; see L<perlfunc/pos>. A failed match normally resets the -search position to the beginning of the string, but you can avoid that -by adding the C</c> modifier (e.g. C<m//gc>). Modifying the target -string also resets the search position. - -=item \G assertion - -You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a -zero-width assertion that matches the exact position where the previous -C<m//g>, if any, left off. Without the C</g> modifier, the C<\G> assertion -still anchors at pos(), but the match is of course only attempted once. -Using C<\G> without C</g> on a target string that has not previously had a -C</g> match applied to it is the same as using the C<\A> assertion to match -the beginning of the string. Note also that, currently, C<\G> is only -properly supported when anchored at the very beginning of the pattern. - -Examples: - - # list context - ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g); - - # scalar context - $/ = ""; - while (defined($paragraph = <>)) { - while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) { - $sentences++; - } - } - print "$sentences\n"; - - # using m//gc with \G - $_ = "ppooqppqq"; - while ($i++ < 2) { - print "1: '"; - print $1 while /(o)/gc; print "', pos=", pos, "\n"; - print "2: '"; - print $1 if /\G(q)/gc; print "', pos=", pos, "\n"; - print "3: '"; - print $1 while /(p)/gc; print "', pos=", pos, "\n"; - } - print "Final: '$1', pos=",pos,"\n" if /\G(.)/; - -The last example should print: - - 1: 'oo', pos=4 - 2: 'q', pos=5 - 3: 'pp', pos=7 - 1: '', pos=7 - 2: 'q', pos=8 - 3: '', pos=8 - Final: 'q', pos=8 - -Notice that the final match matched C<q> instead of C<p>, which a match -without the C<\G> anchor would have done. Also note that the final match -did not update C<pos> -- C<pos> is only updated on a C</g> match. If the -final match did indeed match C<p>, it's a good bet that you're running an -older (pre-5.6.0) Perl. - -A useful idiom for C<lex>-like scanners is C</\G.../gc>. You can -combine several regexps like this to process a string part-by-part, -doing different actions depending on which regexp matched. Each -regexp tries to match where the previous one leaves off. - - $_ = <<'EOL'; - $url = URI::URL->new( "http://www/" ); die if $url eq "xXx"; - EOL - LOOP: - { - print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc; - print(" lowercase"), redo LOOP if /\G[a-z]+\b[,.;]?\s*/gc; - print(" UPPERCASE"), redo LOOP if /\G[A-Z]+\b[,.;]?\s*/gc; - print(" Capitalized"), redo LOOP if /\G[A-Z][a-z]+\b[,.;]?\s*/gc; - print(" MiXeD"), redo LOOP if /\G[A-Za-z]+\b[,.;]?\s*/gc; - print(" alphanumeric"), redo LOOP if /\G[A-Za-z0-9]+\b[,.;]?\s*/gc; - print(" line-noise"), redo LOOP if /\G[^A-Za-z0-9]+/gc; - print ". That's all!\n"; - } - -Here is the output (split into several lines): - - line-noise lowercase line-noise lowercase UPPERCASE line-noise - UPPERCASE line-noise lowercase line-noise lowercase line-noise - lowercase lowercase line-noise lowercase lowercase line-noise - MiXeD line-noise. That's all! - -=item ?PATTERN? -X<?> - -This is just like the C</pattern/> search, except that it matches only -once between calls to the reset() operator. This is a useful -optimization when you want to see only the first occurrence of -something in each file of a set of files, for instance. Only C<??> -patterns local to the current package are reset. - - while (<>) { - if (?^$?) { - # blank line between header and body - } - } continue { - reset if eof; # clear ?? status for next file - } - -This usage is vaguely deprecated, which means it just might possibly -be removed in some distant future version of Perl, perhaps somewhere -around the year 2168. - -=item s/PATTERN/REPLACEMENT/msixpogce -X<substitute> X<substitution> X<replace> X<regexp, replace> -X<regexp, substitute> X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c> X</e> - -Searches a string for a pattern, and if found, replaces that pattern -with the replacement text and returns the number of substitutions -made. Otherwise it returns false (specifically, the empty string). - -If no string is specified via the C<=~> or C<!~> operator, the C<$_> -variable is searched and modified. (The string specified with C<=~> must -be scalar variable, an array element, a hash element, or an assignment -to one of those, i.e., an lvalue.) - -If the delimiter chosen is a single quote, no interpolation is -done on either the PATTERN or the REPLACEMENT. Otherwise, if the -PATTERN contains a $ that looks like a variable rather than an -end-of-string test, the variable will be interpolated into the pattern -at run-time. If you want the pattern compiled only once the first time -the variable is interpolated, use the C</o> option. If the pattern -evaluates to the empty string, the last successfully executed regular -expression is used instead. See L<perlre> for further explanation on these. -See L<perllocale> for discussion of additional considerations that apply -when C<use locale> is in effect. - -Options are as with m// with the addition of the following replacement -specific options: - - e Evaluate the right side as an expression. - ee Evaluate the right side as a string then eval the result - -Any non-alphanumeric, non-whitespace delimiter may replace the -slashes. If single quotes are used, no interpretation is done on the -replacement string (the C</e> modifier overrides this, however). Unlike -Perl 4, Perl 5 treats backticks as normal delimiters; the replacement -text is not evaluated as a command. If the -PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own -pair of quotes, which may or may not be bracketing quotes, e.g., -C<s(foo)(bar)> or C<< s<foo>/bar/ >>. A C</e> will cause the -replacement portion to be treated as a full-fledged Perl expression -and evaluated right then and there. It is, however, syntax checked at -compile-time. A second C<e> modifier will cause the replacement portion -to be C<eval>ed before being run as a Perl expression. - -Examples: - - s/\bgreen\b/mauve/g; # don't change wintergreen - - $path =~ s|/usr/bin|/usr/local/bin|; - - s/Login: $foo/Login: $bar/; # run-time pattern - - ($foo = $bar) =~ s/this/that/; # copy first, then change - - $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-count - - $_ = 'abc123xyz'; - s/\d+/$&*2/e; # yields 'abc246xyz' - s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz' - s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz' - - s/%(.)/$percent{$1}/g; # change percent escapes; no /e - s/%(.)/$percent{$1} || $&/ge; # expr now, so /e - s/^=(\w+)/pod($1)/ge; # use function call - - # expand variables in $_, but dynamics only, using - # symbolic dereferencing - s/\$(\w+)/${$1}/g; - - # Add one to the value of any numbers in the string - s/(\d+)/1 + $1/eg; - - # This will expand any embedded scalar variable - # (including lexicals) in $_ : First $1 is interpolated - # to the variable name, and then evaluated - s/(\$\w+)/$1/eeg; - - # Delete (most) C comments. - $program =~ s { - /\* # Match the opening delimiter. - .*? # Match a minimal number of characters. - \*/ # Match the closing delimiter. - } []gsx; - - s/^\s*(.*?)\s*$/$1/; # trim whitespace in $_, expensively - - for ($variable) { # trim whitespace in $variable, cheap - s/^\s+//; - s/\s+$//; - } - - s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields - -Note the use of $ instead of \ in the last example. Unlike -B<sed>, we use the \<I<digit>> form in only the left hand side. -Anywhere else it's $<I<digit>>. - -Occasionally, you can't use just a C</g> to get all the changes -to occur that you might want. Here are two common cases: - - # put commas in the right places in an integer - 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g; - - # expand tabs to 8-column spacing - 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e; - -=back - -=head2 Quote-Like Operators -X<operator, quote-like> - -=over 4 - -=item q/STRING/ -X<q> X<quote, single> X<'> X<''> - -=item 'STRING' - -A single-quoted, literal string. A backslash represents a backslash -unless followed by the delimiter or another backslash, in which case -the delimiter or backslash is interpolated. - - $foo = q!I said, "You said, 'She said it.'"!; - $bar = q('This is it.'); - $baz = '\n'; # a two-character string - -=item qq/STRING/ -X<qq> X<quote, double> X<"> X<""> - -=item "STRING" - -A double-quoted, interpolated string. - - $_ .= qq - (*** The previous line contains the naughty word "$1".\n) - if /\b(tcl|java|python)\b/i; # :-) - $baz = "\n"; # a one-character string - -=item qx/STRING/ -X<qx> X<`> X<``> X<backtick> - -=item `STRING` - -A string which is (possibly) interpolated and then executed as a -system command with C</bin/sh> or its equivalent. Shell wildcards, -pipes, and redirections will be honored. The collected standard -output of the command is returned; standard error is unaffected. In -scalar context, it comes back as a single (potentially multi-line) -string, or undef if the command failed. In list context, returns a -list of lines (however you've defined lines with $/ or -$INPUT_RECORD_SEPARATOR), or an empty list if the command failed. - -Because backticks do not affect standard error, use shell file descriptor -syntax (assuming the shell supports this) if you care to address this. -To capture a command's STDERR and STDOUT together: - - $output = `cmd 2>&1`; - -To capture a command's STDOUT but discard its STDERR: - - $output = `cmd 2>/dev/null`; - -To capture a command's STDERR but discard its STDOUT (ordering is -important here): - - $output = `cmd 2>&1 1>/dev/null`; - -To exchange a command's STDOUT and STDERR in order to capture the STDERR -but leave its STDOUT to come out the old STDERR: - - $output = `cmd 3>&1 1>&2 2>&3 3>&-`; - -To read both a command's STDOUT and its STDERR separately, it's easiest -to redirect them separately to files, and then read from those files -when the program is done: - - system("program args 1>program.stdout 2>program.stderr"); - -The STDIN filehandle used by the command is inherited from Perl's STDIN. -For example: - - open BLAM, "blam" || die "Can't open: $!"; - open STDIN, "<&BLAM"; - print `sort`; - -will print the sorted contents of the file "blam". - -Using single-quote as a delimiter protects the command from Perl's -double-quote interpolation, passing it on to the shell instead: - - $perl_info = qx(ps $$); # that's Perl's $$ - $shell_info = qx'ps $$'; # that's the new shell's $$ - -How that string gets evaluated is entirely subject to the command -interpreter on your system. On most platforms, you will have to protect -shell metacharacters if you want them treated literally. This is in -practice difficult to do, as it's unclear how to escape which characters. -See L<perlsec> for a clean and safe example of a manual fork() and exec() -to emulate backticks safely. - -On some platforms (notably DOS-like ones), the shell may not be -capable of dealing with multiline commands, so putting newlines in -the string may not get you what you want. You may be able to evaluate -multiple commands in a single line by separating them with the command -separator character, if your shell supports that (e.g. C<;> on many Unix -shells; C<&> on the Windows NT C<cmd> shell). - -Beginning with v5.6.0, Perl will attempt to flush all files opened for -output before starting the child process, but this may not be supported -on some platforms (see L<perlport>). To be safe, you may need to set -C<$|> ($AUTOFLUSH in English) or call the C<autoflush()> method of -C<IO::Handle> on any open handles. - -Beware that some command shells may place restrictions on the length -of the command line. You must ensure your strings don't exceed this -limit after any necessary interpolations. See the platform-specific -release notes for more details about your particular environment. - -Using this operator can lead to programs that are difficult to port, -because the shell commands called vary between systems, and may in -fact not be present at all. As one example, the C<type> command under -the POSIX shell is very different from the C<type> command under DOS. -That doesn't mean you should go out of your way to avoid backticks -when they're the right way to get something done. Perl was made to be -a glue language, and one of the things it glues together is commands. -Just understand what you're getting yourself into. - -See L</"I/O Operators"> for more discussion. - -=item qw/STRING/ -X<qw> X<quote, list> X<quote, words> - -Evaluates to a list of the words extracted out of STRING, using embedded -whitespace as the word delimiters. It can be understood as being roughly -equivalent to: - - split(' ', q/STRING/); - -the differences being that it generates a real list at compile time, and -in scalar context it returns the last element in the list. So -this expression: - - qw(foo bar baz) - -is semantically equivalent to the list: - - 'foo', 'bar', 'baz' - -Some frequently seen examples: - - use POSIX qw( setlocale localeconv ) - @EXPORT = qw( foo bar baz ); - -A common mistake is to try to separate the words with comma or to -put comments into a multi-line C<qw>-string. For this reason, the -C<use warnings> pragma and the B<-w> switch (that is, the C<$^W> variable) -produces warnings if the STRING contains the "," or the "#" character. - - -=item tr/SEARCHLIST/REPLACEMENTLIST/cds -X<tr> X<y> X<transliterate> X</c> X</d> X</s> - -=item y/SEARCHLIST/REPLACEMENTLIST/cds - -Transliterates all occurrences of the characters found in the search list -with the corresponding character in the replacement list. It returns -the number of characters replaced or deleted. If no string is -specified via the =~ or !~ operator, the $_ string is transliterated. (The -string specified with =~ must be a scalar variable, an array element, a -hash element, or an assignment to one of those, i.e., an lvalue.) - -A character range may be specified with a hyphen, so C<tr/A-J/0-9/> -does the same replacement as C<tr/ACEGIBDFHJ/0246813579/>. -For B<sed> devotees, C<y> is provided as a synonym for C<tr>. If the -SEARCHLIST is delimited by bracketing quotes, the REPLACEMENTLIST has -its own pair of quotes, which may or may not be bracketing quotes, -e.g., C<tr[A-Z][a-z]> or C<tr(+\-*/)/ABCD/>. - -Note that C<tr> does B<not> do regular expression character classes -such as C<\d> or C<[:lower:]>. The C<tr> operator is not equivalent to -the tr(1) utility. If you want to map strings between lower/upper -cases, see L<perlfunc/lc> and L<perlfunc/uc>, and in general consider -using the C<s> operator if you need regular expressions. - -Note also that the whole range idea is rather unportable between -character sets--and even within character sets they may cause results -you probably didn't expect. A sound principle is to use only ranges -that begin from and end at either alphabets of equal case (a-e, A-E), -or digits (0-4). Anything else is unsafe. If in doubt, spell out the -character sets in full. - -Options: - - c Complement the SEARCHLIST. - d Delete found but unreplaced characters. - s Squash duplicate replaced characters. - -If the C</c> modifier is specified, the SEARCHLIST character set -is complemented. If the C</d> modifier is specified, any characters -specified by SEARCHLIST not found in REPLACEMENTLIST are deleted. -(Note that this is slightly more flexible than the behavior of some -B<tr> programs, which delete anything they find in the SEARCHLIST, -period.) If the C</s> modifier is specified, sequences of characters -that were transliterated to the same character are squashed down -to a single instance of the character. - -If the C</d> modifier is used, the REPLACEMENTLIST is always interpreted -exactly as specified. Otherwise, if the REPLACEMENTLIST is shorter -than the SEARCHLIST, the final character is replicated till it is long -enough. If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated. -This latter is useful for counting characters in a class or for -squashing character sequences in a class. - -Examples: - - $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case - - $cnt = tr/*/*/; # count the stars in $_ - - $cnt = $sky =~ tr/*/*/; # count the stars in $sky - - $cnt = tr/0-9//; # count the digits in $_ - - tr/a-zA-Z//s; # bookkeeper -> bokeper - - ($HOST = $host) =~ tr/a-z/A-Z/; - - tr/a-zA-Z/ /cs; # change non-alphas to single space - - tr [\200-\377] - [\000-\177]; # delete 8th bit - -If multiple transliterations are given for a character, only the -first one is used: - - tr/AAA/XYZ/ - -will transliterate any A to X. - -Because the transliteration table is built at compile time, neither -the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote -interpolation. That means that if you want to use variables, you -must use an eval(): - - eval "tr/$oldlist/$newlist/"; - die $@ if $@; - - eval "tr/$oldlist/$newlist/, 1" or die $@; - -=item <<EOF -X<here-doc> X<heredoc> X<here-document> X<<< << >>> - -A line-oriented form of quoting is based on the shell "here-document" -syntax. Following a C<< << >> you specify a string to terminate -the quoted material, and all lines following the current line down to -the terminating string are the value of the item. - -The terminating string may be either an identifier (a word), or some -quoted text. An unquoted identifier works like double quotes. -There may not be a space between the C<< << >> and the identifier, -unless the identifier is explicitly quoted. (If you put a space it -will be treated as a null identifier, which is valid, and matches the -first empty line.) The terminating string must appear by itself -(unquoted and with no surrounding whitespace) on the terminating line. - -If the terminating string is quoted, the type of quotes used determine -the treatment of the text. - -=over 4 - -=item Double Quotes - -Double quotes indicate that the text will be interpolated using exactly -the same rules as normal double quoted strings. - - print <<EOF; - The price is $Price. - EOF - - print << "EOF"; # same as above - The price is $Price. - EOF - - -=item Single Quotes - -Single quotes indicate the text is to be treated literally with no -interpolation of its content. This is similar to single quoted -strings except that backslashes have no special meaning, with C<\\> -being treated as two backslashes and not one as they would in every -other quoting construct. - -This is the only form of quoting in perl where there is no need -to worry about escaping content, something that code generators -can and do make good use of. - -=item Backticks - -The content of the here doc is treated just as it would be if the -string were embedded in backticks. Thus the content is interpolated -as though it were double quoted and then executed via the shell, with -the results of the execution returned. - - print << `EOC`; # execute command and get results - echo hi there - EOC - -=back - -It is possible to stack multiple here-docs in a row: - - print <<"foo", <<"bar"; # you can stack them - I said foo. - foo - I said bar. - bar - - myfunc(<< "THIS", 23, <<'THAT'); - Here's a line - or two. - THIS - and here's another. - THAT - -Just don't forget that you have to put a semicolon on the end -to finish the statement, as Perl doesn't know you're not going to -try to do this: - - print <<ABC - 179231 - ABC - + 20; - -If you want to remove the line terminator from your here-docs, -use C<chomp()>. - - chomp($string = <<'END'); - This is a string. - END - -If you want your here-docs to be indented with the rest of the code, -you'll need to remove leading whitespace from each line manually: - - ($quote = <<'FINIS') =~ s/^\s+//gm; - The Road goes ever on and on, - down from the door where it began. - FINIS - -If you use a here-doc within a delimited construct, such as in C<s///eg>, -the quoted material must come on the lines following the final delimiter. -So instead of - - s/this/<<E . 'that' - the other - E - . 'more '/eg; - -you have to write - - s/this/<<E . 'that' - . 'more '/eg; - the other - E - -If the terminating identifier is on the last line of the program, you -must be sure there is a newline after it; otherwise, Perl will give the -warning B<Can't find string terminator "END" anywhere before EOF...>. - -Additionally, the quoting rules for the end of string identifier are not -related to Perl's quoting rules -- C<q()>, C<qq()>, and the like are not -supported in place of C<''> and C<"">, and the only interpolation is for -backslashing the quoting character: - - print << "abc\"def"; - testing... - abc"def - -Finally, quoted strings cannot span multiple lines. The general rule is -that the identifier must be a string literal. Stick with that, and you -should be safe. - -=back - -=head2 Gory details of parsing quoted constructs -X<quote, gory details> - -When presented with something that might have several different -interpretations, Perl uses the B<DWIM> (that's "Do What I Mean") -principle to pick the most probable interpretation. This strategy -is so successful that Perl programmers often do not suspect the -ambivalence of what they write. But from time to time, Perl's -notions differ substantially from what the author honestly meant. - -This section hopes to clarify how Perl handles quoted constructs. -Although the most common reason to learn this is to unravel labyrinthine -regular expressions, because the initial steps of parsing are the -same for all quoting operators, they are all discussed together. - -The most important Perl parsing rule is the first one discussed -below: when processing a quoted construct, Perl first finds the end -of that construct, then interprets its contents. If you understand -this rule, you may skip the rest of this section on the first -reading. The other rules are likely to contradict the user's -expectations much less frequently than this first one. - -Some passes discussed below are performed concurrently, but because -their results are the same, we consider them individually. For different -quoting constructs, Perl performs different numbers of passes, from -one to four, but these passes are always performed in the same order. - -=over 4 - -=item Finding the end - -The first pass is finding the end of the quoted construct, where -the information about the delimiters is used in parsing. -During this search, text between the starting and ending delimiters -is copied to a safe location. The text copied gets delimiter-independent. - -If the construct is a here-doc, the ending delimiter is a line -that has a terminating string as the content. Therefore C<<<EOF> is -terminated by C<EOF> immediately followed by C<"\n"> and starting -from the first column of the terminating line. -When searching for the terminating line of a here-doc, nothing -is skipped. In other words, lines after the here-doc syntax -are compared with the terminating string line by line. - -For the constructs except here-docs, single characters are used as starting -and ending delimiters. If the starting delimiter is an opening punctuation -(that is C<(>, C<[>, C<{>, or C<< < >>), the ending delimiter is the -corresponding closing punctuation (that is C<)>, C<]>, C<}>, or C<< > >>). -If the starting delimiter is an unpaired character like C</> or a closing -punctuation, the ending delimiter is same as the starting delimiter. -Therefore a C</> terminates a C<qq//> construct, while a C<]> terminates -C<qq[]> and C<qq]]> constructs. - -When searching for single-character delimiters, escaped delimiters -and C<\\> are skipped. For example, while searching for terminating C</>, -combinations of C<\\> and C<\/> are skipped. If the delimiters are -bracketing, nested pairs are also skipped. For example, while searching -for closing C<]> paired with the opening C<[>, combinations of C<\\>, C<\]>, -and C<\[> are all skipped, and nested C<[> and C<]> are skipped as well. -However, when backslashes are used as the delimiters (like C<qq\\> and -C<tr\\\>), nothing is skipped. -During the search for the end, backslashes that escape delimiters -are removed (exactly speaking, they are not copied to the safe location). - -For constructs with three-part delimiters (C<s///>, C<y///>, and -C<tr///>), the search is repeated once more. -If the first delimiter is not an opening punctuation, three delimiters must -be same such as C<s!!!> and C<tr)))>, in which case the second delimiter -terminates the left part and starts the right part at once. -If the left part is delimited by bracketing punctuations (that is C<()>, -C<[]>, C<{}>, or C<< <> >>), the right part needs another pair of -delimiters such as C<s(){}> and C<tr[]//>. In these cases, whitespaces -and comments are allowed between both parts, though the comment must follow -at least one whitespace; otherwise a character expected as the start of -the comment may be regarded as the starting delimiter of the right part. - -During this search no attention is paid to the semantics of the construct. -Thus: - - "$hash{"$foo/$bar"}" - -or: - - m/ - bar # NOT a comment, this slash / terminated m//! - /x - -do not form legal quoted expressions. The quoted part ends on the -first C<"> and C</>, and the rest happens to be a syntax error. -Because the slash that terminated C<m//> was followed by a C<SPACE>, -the example above is not C<m//x>, but rather C<m//> with no C</x> -modifier. So the embedded C<#> is interpreted as a literal C<#>. - -Also no attention is paid to C<\c\> (multichar control char syntax) during -this search. Thus the second C<\> in C<qq/\c\/> is interpreted as a part -of C<\/>, and the following C</> is not recognized as a delimiter. -Instead, use C<\034> or C<\x1c> at the end of quoted constructs. - -=item Interpolation -X<interpolation> - -The next step is interpolation in the text obtained, which is now -delimiter-independent. There are multiple cases. - -=over 4 - -=item C<<<'EOF'> - -No interpolation is performed. -Note that the combination C<\\> is left intact, since escaped delimiters -are not available for here-docs. - -=item C<m''>, the pattern of C<s'''> - -No interpolation is performed at this stage. -Any backslashed sequences including C<\\> are treated at the stage -to L</"parsing regular expressions">. - -=item C<''>, C<q//>, C<tr'''>, C<y'''>, the replacement of C<s'''> - -The only interpolation is removal of C<\> from pairs of C<\\>. -Therefore C<-> in C<tr'''> and C<y'''> is treated literally -as a hyphen and no character range is available. -C<\1> in the replacement of C<s'''> does not work as C<$1>. - -=item C<tr///>, C<y///> - -No variable interpolation occurs. String modifying combinations for -case and quoting such as C<\Q>, C<\U>, and C<\E> are not recognized. -The other escape sequences such as C<\200> and C<\t> and backslashed -characters such as C<\\> and C<\-> are converted to appropriate literals. -The character C<-> is treated specially and therefore C<\-> is treated -as a literal C<->. - -=item C<"">, C<``>, C<qq//>, C<qx//>, C<< <file*glob> >>, C<<<"EOF"> - -C<\Q>, C<\U>, C<\u>, C<\L>, C<\l> (possibly paired with C<\E>) are -converted to corresponding Perl constructs. Thus, C<"$foo\Qbaz$bar"> -is converted to C<$foo . (quotemeta("baz" . $bar))> internally. -The other escape sequences such as C<\200> and C<\t> and backslashed -characters such as C<\\> and C<\-> are replaced with appropriate -expansions. - -Let it be stressed that I<whatever falls between C<\Q> and C<\E>> -is interpolated in the usual way. Something like C<"\Q\\E"> has -no C<\E> inside. instead, it has C<\Q>, C<\\>, and C<E>, so the -result is the same as for C<"\\\\E">. As a general rule, backslashes -between C<\Q> and C<\E> may lead to counterintuitive results. So, -C<"\Q\t\E"> is converted to C<quotemeta("\t")>, which is the same -as C<"\\\t"> (since TAB is not alphanumeric). Note also that: - - $str = '\t'; - return "\Q$str"; - -may be closer to the conjectural I<intention> of the writer of C<"\Q\t\E">. - -Interpolated scalars and arrays are converted internally to the C<join> and -C<.> catenation operations. Thus, C<"$foo XXX '@arr'"> becomes: - - $foo . " XXX '" . (join $", @arr) . "'"; - -All operations above are performed simultaneously, left to right. - -Because the result of C<"\Q STRING \E"> has all metacharacters -quoted, there is no way to insert a literal C<$> or C<@> inside a -C<\Q\E> pair. If protected by C<\>, C<$> will be quoted to became -C<"\\\$">; if not, it is interpreted as the start of an interpolated -scalar. - -Note also that the interpolation code needs to make a decision on -where the interpolated scalar ends. For instance, whether -C<< "a $b -> {c}" >> really means: - - "a " . $b . " -> {c}"; - -or: - - "a " . $b -> {c}; - -Most of the time, the longest possible text that does not include -spaces between components and which contains matching braces or -brackets. because the outcome may be determined by voting based -on heuristic estimators, the result is not strictly predictable. -Fortunately, it's usually correct for ambiguous cases. - -=item the replacement of C<s///> - -Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, and interpolation -happens as with C<qq//> constructs. - -It is at this step that C<\1> is begrudgingly converted to C<$1> in -the replacement text of C<s///>, in order to correct the incorrigible -I<sed> hackers who haven't picked up the saner idiom yet. A warning -is emitted if the C<use warnings> pragma or the B<-w> command-line flag -(that is, the C<$^W> variable) was set. - -=item C<RE> in C<?RE?>, C</RE/>, C<m/RE/>, C<s/RE/foo/>, - -Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\E>, -and interpolation happens (almost) as with C<qq//> constructs. - -However any other combinations of C<\> followed by a character -are not substituted but only skipped, in order to parse them -as regular expressions at the following step. -As C<\c> is skipped at this step, C<@> of C<\c@> in RE is possibly -treated as an array symbol (for example C<@foo>), -even though the same text in C<qq//> gives interpolation of C<\c@>. - -Moreover, inside C<(?{BLOCK})>, C<(?# comment )>, and -a C<#>-comment in a C<//x>-regular expression, no processing is -performed whatsoever. This is the first step at which the presence -of the C<//x> modifier is relevant. - -Interpolation in patterns has several quirks: C<$|>, C<$(>, C<$)>, C<@+> -and C<@-> are not interpolated, and constructs C<$var[SOMETHING]> are -voted (by several different estimators) to be either an array element -or C<$var> followed by an RE alternative. This is where the notation -C<${arr[$bar]}> comes handy: C</${arr[0-9]}/> is interpreted as -array element C<-9>, not as a regular expression from the variable -C<$arr> followed by a digit, which would be the interpretation of -C</$arr[0-9]/>. Since voting among different estimators may occur, -the result is not predictable. - -The lack of processing of C<\\> creates specific restrictions on -the post-processed text. If the delimiter is C</>, one cannot get -the combination C<\/> into the result of this step. C</> will -finish the regular expression, C<\/> will be stripped to C</> on -the previous step, and C<\\/> will be left as is. Because C</> is -equivalent to C<\/> inside a regular expression, this does not -matter unless the delimiter happens to be character special to the -RE engine, such as in C<s*foo*bar*>, C<m[foo]>, or C<?foo?>; or an -alphanumeric char, as in: - - m m ^ a \s* b mmx; - -In the RE above, which is intentionally obfuscated for illustration, the -delimiter is C<m>, the modifier is C<mx>, and after delimiter-removal the -RE is the same as for C<m/ ^ a \s* b /mx>. There's more than one -reason you're encouraged to restrict your delimiters to non-alphanumeric, -non-whitespace choices. - -=back - -This step is the last one for all constructs except regular expressions, -which are processed further. - -=item parsing regular expressions -X<regexp, parse> - -Previous steps were performed during the compilation of Perl code, -but this one happens at run time--although it may be optimized to -be calculated at compile time if appropriate. After preprocessing -described above, and possibly after evaluation if concatenation, -joining, casing translation, or metaquoting are involved, the -resulting I<string> is passed to the RE engine for compilation. - -Whatever happens in the RE engine might be better discussed in L<perlre>, -but for the sake of continuity, we shall do so here. - -This is another step where the presence of the C<//x> modifier is -relevant. The RE engine scans the string from left to right and -converts it to a finite automaton. - -Backslashed characters are either replaced with corresponding -literal strings (as with C<\{>), or else they generate special nodes -in the finite automaton (as with C<\b>). Characters special to the -RE engine (such as C<|>) generate corresponding nodes or groups of -nodes. C<(?#...)> comments are ignored. All the rest is either -converted to literal strings to match, or else is ignored (as is -whitespace and C<#>-style comments if C<//x> is present). - -Parsing of the bracketed character class construct, C<[...]>, is -rather different than the rule used for the rest of the pattern. -The terminator of this construct is found using the same rules as -for finding the terminator of a C<{}>-delimited construct, the only -exception being that C<]> immediately following C<[> is treated as -though preceded by a backslash. Similarly, the terminator of -C<(?{...})> is found using the same rules as for finding the -terminator of a C<{}>-delimited construct. - -It is possible to inspect both the string given to RE engine and the -resulting finite automaton. See the arguments C<debug>/C<debugcolor> -in the C<use L<re>> pragma, as well as Perl's B<-Dr> command-line -switch documented in L<perlrun/"Command Switches">. - -=item Optimization of regular expressions -X<regexp, optimization> - -This step is listed for completeness only. Since it does not change -semantics, details of this step are not documented and are subject -to change without notice. This step is performed over the finite -automaton that was generated during the previous pass. - -It is at this stage that C<split()> silently optimizes C</^/> to -mean C</^/m>. - -=back - -=head2 I/O Operators -X<operator, i/o> X<operator, io> X<io> X<while> X<filehandle> -X<< <> >> X<@ARGV> - -There are several I/O operators you should know about. - -A string enclosed by backticks (grave accents) first undergoes -double-quote interpolation. It is then interpreted as an external -command, and the output of that command is the value of the -backtick string, like in a shell. In scalar context, a single string -consisting of all output is returned. In list context, a list of -values is returned, one per line of output. (You can set C<$/> to use -a different line terminator.) The command is executed each time the -pseudo-literal is evaluated. The status value of the command is -returned in C<$?> (see L<perlvar> for the interpretation of C<$?>). -Unlike in B<csh>, no translation is done on the return data--newlines -remain newlines. Unlike in any of the shells, single quotes do not -hide variable names in the command from interpretation. To pass a -literal dollar-sign through to the shell you need to hide it with a -backslash. The generalized form of backticks is C<qx//>. (Because -backticks always undergo shell expansion as well, see L<perlsec> for -security concerns.) -X<qx> X<`> X<``> X<backtick> X<glob> - -In scalar context, evaluating a filehandle in angle brackets yields -the next line from that file (the newline, if any, included), or -C<undef> at end-of-file or on error. When C<$/> is set to C<undef> -(sometimes known as file-slurp mode) and the file is empty, it -returns C<''> the first time, followed by C<undef> subsequently. - -Ordinarily you must assign the returned value to a variable, but -there is one situation where an automatic assignment happens. If -and only if the input symbol is the only thing inside the conditional -of a C<while> statement (even if disguised as a C<for(;;)> loop), -the value is automatically assigned to the global variable $_, -destroying whatever was there previously. (This may seem like an -odd thing to you, but you'll use the construct in almost every Perl -script you write.) The $_ variable is not implicitly localized. -You'll have to put a C<local $_;> before the loop if you want that -to happen. - -The following lines are equivalent: - - while (defined($_ = <STDIN>)) { print; } - while ($_ = <STDIN>) { print; } - while (<STDIN>) { print; } - for (;<STDIN>;) { print; } - print while defined($_ = <STDIN>); - print while ($_ = <STDIN>); - print while <STDIN>; - -This also behaves similarly, but avoids $_ : - - while (my $line = <STDIN>) { print $line } - -In these loop constructs, the assigned value (whether assignment -is automatic or explicit) is then tested to see whether it is -defined. The defined test avoids problems where line has a string -value that would be treated as false by Perl, for example a "" or -a "0" with no trailing newline. If you really mean for such values -to terminate the loop, they should be tested for explicitly: - - while (($_ = <STDIN>) ne '0') { ... } - while (<STDIN>) { last unless $_; ... } - -In other boolean contexts, C<< <I<filehandle>> >> without an -explicit C<defined> test or comparison elicit a warning if the -C<use warnings> pragma or the B<-w> -command-line switch (the C<$^W> variable) is in effect. - -The filehandles STDIN, STDOUT, and STDERR are predefined. (The -filehandles C<stdin>, C<stdout>, and C<stderr> will also work except -in packages, where they would be interpreted as local identifiers -rather than global.) Additional filehandles may be created with -the open() function, amongst others. See L<perlopentut> and -L<perlfunc/open> for details on this. -X<stdin> X<stdout> X<sterr> - -If a <FILEHANDLE> is used in a context that is looking for -a list, a list comprising all input lines is returned, one line per -list element. It's easy to grow to a rather large data space this -way, so use with care. - -<FILEHANDLE> may also be spelled C<readline(*FILEHANDLE)>. -See L<perlfunc/readline>. - -The null filehandle <> is special: it can be used to emulate the -behavior of B<sed> and B<awk>. Input from <> comes either from -standard input, or from each file listed on the command line. Here's -how it works: the first time <> is evaluated, the @ARGV array is -checked, and if it is empty, C<$ARGV[0]> is set to "-", which when opened -gives you standard input. The @ARGV array is then processed as a list -of filenames. The loop - - while (<>) { - ... # code for each line - } - -is equivalent to the following Perl-like pseudo code: - - unshift(@ARGV, '-') unless @ARGV; - while ($ARGV = shift) { - open(ARGV, $ARGV); - while (<ARGV>) { - ... # code for each line - } - } - -except that it isn't so cumbersome to say, and will actually work. -It really does shift the @ARGV array and put the current filename -into the $ARGV variable. It also uses filehandle I<ARGV> -internally--<> is just a synonym for <ARGV>, which -is magical. (The pseudo code above doesn't work because it treats -<ARGV> as non-magical.) - -You can modify @ARGV before the first <> as long as the array ends up -containing the list of filenames you really want. Line numbers (C<$.>) -continue as though the input were one big happy file. See the example -in L<perlfunc/eof> for how to reset line numbers on each file. - -If you want to set @ARGV to your own list of files, go right ahead. -This sets @ARGV to all plain text files if no @ARGV was given: - - @ARGV = grep { -f && -T } glob('*') unless @ARGV; - -You can even set them to pipe commands. For example, this automatically -filters compressed arguments through B<gzip>: - - @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV; - -If you want to pass switches into your script, you can use one of the -Getopts modules or put a loop on the front like this: - - while ($_ = $ARGV[0], /^-/) { - shift; - last if /^--$/; - if (/^-D(.*)/) { $debug = $1 } - if (/^-v/) { $verbose++ } - # ... # other switches - } - - while (<>) { - # ... # code for each line - } - -The <> symbol will return C<undef> for end-of-file only once. -If you call it again after this, it will assume you are processing another -@ARGV list, and if you haven't set @ARGV, will read input from STDIN. - -If what the angle brackets contain is a simple scalar variable (e.g., -<$foo>), then that variable contains the name of the -filehandle to input from, or its typeglob, or a reference to the -same. For example: - - $fh = \*STDIN; - $line = <$fh>; - -If what's within the angle brackets is neither a filehandle nor a simple -scalar variable containing a filehandle name, typeglob, or typeglob -reference, it is interpreted as a filename pattern to be globbed, and -either a list of filenames or the next filename in the list is returned, -depending on context. This distinction is determined on syntactic -grounds alone. That means C<< <$x> >> is always a readline() from -an indirect handle, but C<< <$hash{key}> >> is always a glob(). -That's because $x is a simple scalar variable, but C<$hash{key}> is -not--it's a hash element. Even C<< <$x > >> (note the extra space) -is treated as C<glob("$x ")>, not C<readline($x)>. - -One level of double-quote interpretation is done first, but you can't -say C<< <$foo> >> because that's an indirect filehandle as explained -in the previous paragraph. (In older versions of Perl, programmers -would insert curly brackets to force interpretation as a filename glob: -C<< <${foo}> >>. These days, it's considered cleaner to call the -internal function directly as C<glob($foo)>, which is probably the right -way to have done it in the first place.) For example: - - while (<*.c>) { - chmod 0644, $_; - } - -is roughly equivalent to: - - open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|"); - while (<FOO>) { - chomp; - chmod 0644, $_; - } - -except that the globbing is actually done internally using the standard -C<File::Glob> extension. Of course, the shortest way to do the above is: - - chmod 0644, <*.c>; - -A (file)glob evaluates its (embedded) argument only when it is -starting a new list. All values must be read before it will start -over. In list context, this isn't important because you automatically -get them all anyway. However, in scalar context the operator returns -the next value each time it's called, or C<undef> when the list has -run out. As with filehandle reads, an automatic C<defined> is -generated when the glob occurs in the test part of a C<while>, -because legal glob returns (e.g. a file called F<0>) would otherwise -terminate the loop. Again, C<undef> is returned only once. So if -you're expecting a single value from a glob, it is much better to -say - - ($file) = <blurch*>; - -than - - $file = <blurch*>; - -because the latter will alternate between returning a filename and -returning false. - -If you're trying to do variable interpolation, it's definitely better -to use the glob() function, because the older notation can cause people -to become confused with the indirect filehandle notation. - - @files = glob("$dir/*.[ch]"); - @files = glob($files[$i]); - -=head2 Constant Folding -X<constant folding> X<folding> - -Like C, Perl does a certain amount of expression evaluation at -compile time whenever it determines that all arguments to an -operator are static and have no side effects. In particular, string -concatenation happens at compile time between literals that don't do -variable substitution. Backslash interpolation also happens at -compile time. You can say - - 'Now is the time for all' . "\n" . - 'good men to come to.' - -and this all reduces to one string internally. Likewise, if -you say - - foreach $file (@filenames) { - if (-s $file > 5 + 100 * 2**16) { } - } - -the compiler will precompute the number which that expression -represents so that the interpreter won't have to. - -=head2 No-ops -X<no-op> X<nop> - -Perl doesn't officially have a no-op operator, but the bare constants -C<0> and C<1> are special-cased to not produce a warning in a void -context, so you can for example safely do - - 1 while foo(); - -=head2 Bitwise String Operators -X<operator, bitwise, string> - -Bitstrings of any size may be manipulated by the bitwise operators -(C<~ | & ^>). - -If the operands to a binary bitwise op are strings of different -sizes, B<|> and B<^> ops act as though the shorter operand had -additional zero bits on the right, while the B<&> op acts as though -the longer operand were truncated to the length of the shorter. -The granularity for such extension or truncation is one or more -bytes. - - # ASCII-based examples - print "j p \n" ^ " a h"; # prints "JAPH\n" - print "JA" | " ph\n"; # prints "japh\n" - print "japh\nJunk" & '_____'; # prints "JAPH\n"; - print 'p N$' ^ " E<H\n"; # prints "Perl\n"; - -If you are intending to manipulate bitstrings, be certain that -you're supplying bitstrings: If an operand is a number, that will imply -a B<numeric> bitwise operation. You may explicitly show which type of -operation you intend by using C<""> or C<0+>, as in the examples below. - - $foo = 150 | 105; # yields 255 (0x96 | 0x69 is 0xFF) - $foo = '150' | 105; # yields 255 - $foo = 150 | '105'; # yields 255 - $foo = '150' | '105'; # yields string '155' (under ASCII) - - $baz = 0+$foo & 0+$bar; # both ops explicitly numeric - $biz = "$foo" ^ "$bar"; # both ops explicitly stringy - -See L<perlfunc/vec> for information on how to manipulate individual bits -in a bit vector. - -=head2 Integer Arithmetic -X<integer> - -By default, Perl assumes that it must do most of its arithmetic in -floating point. But by saying - - use integer; - -you may tell the compiler that it's okay to use integer operations -(if it feels like it) from here to the end of the enclosing BLOCK. -An inner BLOCK may countermand this by saying - - no integer; - -which lasts until the end of that BLOCK. Note that this doesn't -mean everything is only an integer, merely that Perl may use integer -operations if it is so inclined. For example, even under C<use -integer>, if you take the C<sqrt(2)>, you'll still get C<1.4142135623731> -or so. - -Used on numbers, the bitwise operators ("&", "|", "^", "~", "<<", -and ">>") always produce integral results. (But see also -L<Bitwise String Operators>.) However, C<use integer> still has meaning for -them. By default, their results are interpreted as unsigned integers, but -if C<use integer> is in effect, their results are interpreted -as signed integers. For example, C<~0> usually evaluates to a large -integral value. However, C<use integer; ~0> is C<-1> on two's-complement -machines. - -=head2 Floating-point Arithmetic -X<floating-point> X<floating point> X<float> X<real> - -While C<use integer> provides integer-only arithmetic, there is no -analogous mechanism to provide automatic rounding or truncation to a -certain number of decimal places. For rounding to a certain number -of digits, sprintf() or printf() is usually the easiest route. -See L<perlfaq4>. - -Floating-point numbers are only approximations to what a mathematician -would call real numbers. There are infinitely more reals than floats, -so some corners must be cut. For example: - - printf "%.20g\n", 123456789123456789; - # produces 123456789123456784 - -Testing for exact equality of floating-point equality or inequality is -not a good idea. Here's a (relatively expensive) work-around to compare -whether two floating-point numbers are equal to a particular number of -decimal places. See Knuth, volume II, for a more robust treatment of -this topic. - - sub fp_equal { - my ($X, $Y, $POINTS) = @_; - my ($tX, $tY); - $tX = sprintf("%.${POINTS}g", $X); - $tY = sprintf("%.${POINTS}g", $Y); - return $tX eq $tY; - } - -The POSIX module (part of the standard perl distribution) implements -ceil(), floor(), and other mathematical and trigonometric functions. -The Math::Complex module (part of the standard perl distribution) -defines mathematical functions that work on both the reals and the -imaginary numbers. Math::Complex not as efficient as POSIX, but -POSIX can't work with complex numbers. - -Rounding in financial applications can have serious implications, and -the rounding method used should be specified precisely. In these -cases, it probably pays not to trust whichever system rounding is -being used by Perl, but to instead implement the rounding function you -need yourself. - -=head2 Bigger Numbers -X<number, arbitrary precision> - -The standard Math::BigInt and Math::BigFloat modules provide -variable-precision arithmetic and overloaded operators, although -they're currently pretty slow. At the cost of some space and -considerable speed, they avoid the normal pitfalls associated with -limited-precision representations. - - use Math::BigInt; - $x = Math::BigInt->new('123456789123456789'); - print $x * $x; - - # prints +15241578780673678515622620750190521 - -There are several modules that let you calculate with (bound only by -memory and cpu-time) unlimited or fixed precision. There are also -some non-standard modules that provide faster implementations via -external C libraries. - -Here is a short, but incomplete summary: - - Math::Fraction big, unlimited fractions like 9973 / 12967 - Math::String treat string sequences like numbers - Math::FixedPrecision calculate with a fixed precision - Math::Currency for currency calculations - Bit::Vector manipulate bit vectors fast (uses C) - Math::BigIntFast Bit::Vector wrapper for big numbers - Math::Pari provides access to the Pari C library - Math::BigInteger uses an external C library - Math::Cephes uses external Cephes C library (no big numbers) - Math::Cephes::Fraction fractions via the Cephes library - Math::GMP another one using an external C library - -Choose wisely. - -=cut |