Please wait

Alternation

Alternation in regular expressions is a concept that allows matching one of several possible patterns. It's represented by the pipe symbol |, and it functions similarly to the logical OR operator in programming languages like PHP.

For example, the regular expression apple|orange will match either "apple" or "orange" in a given string.

In PHP, the logical OR operator || is used to evaluate whether one of several conditions is true. If any of the conditions are true, the entire expression is true. Both concepts are similar in that they allow for multiple conditions and return a positive result if any of the conditions are met. In the context of regular expressions, it's about matching patterns, while in PHP, it's about evaluating logical conditions.

Usage

Let's take a look at an example. We may have the following regular expression: html|php|java(script)?.

We could use it in PHP like so:

$regexp = "/html|php|css|java(script)?/i";
 
$str = "First HTML appeared, then CSS, then JavaScript";
 
preg_match_all($regexp, $str, $matches);
 
print_r($matches[0]); // 'HTML', 'CSS', 'JavaScript'

Sets seem to be similar to alternation. They allow choosing between multiple characters; for instance, gr[ae]y matches "gray" or "grey".

Square brackets allow only characters or character classes. Alternation allows any expressions. A regexp A|B|C means one of the expressions A, B, or C.

For instance:

  • gr(a|e)y means exactly the same as gr[ae]y.
  • gra|ey means "gra" or "ey".

To apply alternation to a chosen part of the pattern, we can enclose it in parentheses:

  • I love HTML|CSS matches "I love HTML" or "CSS".
  • I love (HTML|CSS) matches "I love HTML" or "I love CSS".

Revisiting an old example

In an earlier lesson, we talked about how to write a regular expression for finding time in a string: HH:MM. We used the following regular expression: \d\d:\d\d. This works, but it accepts invalid times, such as 25:99.

A better solution would be the following:

  • If the first digit is 0 or 1, then the next digit can be any: [01]\d.
  • Otherwise, if the first digit is 2, then the next must be [0-3].
  • No other first digit is allowed.

The regular expression would look like this: [01]\d|2[0-3].

Next, we must write a regular expression to find the minutes. The minutes must be from 00 to 59. In the regular expression language, that can be written as [0-5]\d: the first digit 0-5, and then any digit.

Putting both expressions together would give us: [01]\d|2[0-3]:[0-5]\d.

We're almost done, but there's a problem. The alternation | now happens to be between [01]\d and 2[0-3]:[0-5]\d.

That is: minutes are added to the second alternation variant; here's a clear picture:

[01]\d | 2[0-3]:[0-5]\d

That pattern looks for [01]\d or 2[0-3]:[0-5]\d.

But that's wrong; the alternation should only be used in the "hours" part of the regular expression to allow [01]\d OR 2[0-3]. Let's correct that by enclosing "hours" into parentheses: ([01]\d|2[0-3]):[0-5]\d.

The final solution:

$regexp = "/([01]\d|2[0-3]):[0-5]\d/";
 
$str = "00:00 10:10 23:59 25:99 1:2";
 
preg_match_all($regexp, $str, $matches);
 
print_r($matches[0]); // 00:00, 10:10, 23:59

Exercise

Write a regular expression for finding the programming languages: Java, JavaScript, PHP, C, C++

$regexp = "";
 
$str = "Java JavaScript PHP C++ C";
 
preg_match_all($regexp, $str, $matches);
 
print_r($matches[0]);

Key Takeaways

  • Alternation is represented by the vertical bar |, and it allows you to match one of several alternative patterns. It acts like a logical OR.
  • You can use parentheses to group parts of a pattern together. For example, gr(a|e)y matches either gray or grey.

Comments

Please read this before commenting