Wildcards and regular expressions tutorial

tidy trax · December 6, 2004

Tutorial Begin

Wildcard & Regular Expression Tutorial

Keywords:

String: A string is just a single character or multiple characters.

Character: A character is a single letter, number or other thing from the keyboard.

Wildcard: A wildcard is a character that can be used to match multiple or single characters without having to specify the characters to match.

Note: a basic understanding of remote events and user levels is assumed.

This tutorial is intended to give a basic understanding of wildcards (*, ?, &) and regular expressions ($regex, $regml, $regsub). Wildcards and regular expression patterns will be blue, text that can be changed will be red and keywords will be green.

Wildcards:

&

Matches 1 or more characters seperated by a space.

on *:text:hello &:#:

This will match hello world, hello person or any other 2 word sentence starting with hello.

on *:text:& hello:#:

This will match any 2 word sentence ending with hello.

*

Matches 0 or more characters, and it doesn't matter what they are seperated by.

on *:text:hello*:#:

This will match helloish, hellobullo or any other word starting with hello, providing hello are the first 5 characters in the string.

on *:text:*hello:#:

This will match any word ending with hello, providing hello are the last 5 characters in the string.

?

Matches 1 character and can be used anywhere in the string.

on *:text:h?llo:#:

This will match h followed by a character followed by llo. So it will match hello, hallo, hullo, hillo, hollo, hbllo, etc.

Regular Expressions:

Modifiers:

Usage: $regex([name,]string,pattern), where [name,] is optional (used to reference the regex later using $regml([name,]N)), string is the string you want to search, and pattern is the pattern you want to find in the string.

We'll start with one of the most basic regular expressions there is.

//echo -a $regex(hello world,/hello/)

This will be "1", because it found a match, hello is in hello world.

However if you were to use a capital h in hello:

//echo -a $regex(hello world,/Hello/)

It will be "0", because regular expressions are case-sensitive by default.

There are two ways to solve this "problem" (sometimes it's useful).

Number one:

//echo -a $regex(hello world,/(?i)Hello/)

The (?i) means that everything up until (?-i) will be matched case-insensitively, if no (?-i) is specified then everything up until the end of the string is matched case-insensitively.

Number two:

//echo -a $regex(hello world,/Hello/i)

The /i means the whole pattern is matched case-insensitively. Now let's look at matching multiple patterns in a string.

//echo -a $regex(hello hello hello,/hello/)

Even though hello is in hello hello hello 3 times, it will only match once, because by default regular expressions will find a match and then stop searching.

Solution:

//echo -a $regex(hello hello hello,/hello/g)

This will be "3" because the /g tells it to return all matches.

| means OR.

//echo -a $regex(abc,/abd|abc/)

Is "1" because it's set to match abc or abd and abc matched.

^ and $:

^ means the string has to start with the first character/group.

$ means the string has to end with the first character/group.

Examples

//echo -a $regex(hello,/^h/) is "1" because "hello" starts with "h".

//echo -a $regex(hello,/o$/) is "1" because "hello" ends with "o".

//echo -a $regex(hello,/^hello$/) is "1" because "hello" matched "hello" exactly.

Parentheses:

Parentheses (( and )) are used to set what appears in $regml(). See the $regml area of this tutorial for more info.

Character classes:

Character classes generally look like this: [characters here].

//echo -a $regex(hello,/[aeiou]/)

Instead of matching aeiou it will match a or e or i or o or u. You can also use a-z for all lower case letters, A-Z for all upper case letters and 0-9 for all numbers.

You can also use multiple characters inside the group, eg: [a-zA-Z] will be any letter, [a-zA-Z0-9_] will be any letter, number or an underscore (_) (same as \w)

^ negates the group.

[^a-z] means anything except a lower case letter, where as if you had left the ^ out, it would be any lower case letter.

Switches:

\w is any word character, number or an underscore (_) ([a-zA-Z0-9_])

\W is any non-word character, number or an underscore (_) ([^a-zA-Z0-9_], opposite of \w)

\s is any whitespace character.

\S is any non-whitespace character.

\d is any digit ([0-9], 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9).

\D is any non-digit ([^0-9]).

\Q..\E can be used to escape a group of characters, it should be used as an alternative to \ (Eg. //echo -a $regex(\\,/\Q\\E/g))

Note: \ can also be used as an "escape character", so if you wanted to match \, you would use: /\\/.

Quantifiers:

{1,} means "1 or more times".

{,1} means "1 or less times".

{1} means "1 time".

{1,3} means "1, 2, or 3 times".

Examples

\d{1} will match any digit.

\d{,3} will match 3 or less digits.

\d{3} will match 3 digits only.

\d{3,} will match any 3 digits or more.

Note: {} will only match the previous character, or string inside grouping brackets, a\d{3} will be 3 digits, not a\da\da\d, however (a\d){3} will be a\da\da\d

* means 0 or more characters ({0,})

? means 0 or 1 characters ({0,1})

+ means 1 or more characters ({1,})

Examples

\d* means 0 or more digits.

\d+ means 1 or more digits.

\d? means 0 or 1 digits.

Note: do not mix quantifiers up with wildcards.

Grouping

Grouping is used to make quantifiers apply to a whole string, rather than the preceeding character.

Examples:

\da? means there has to be a digit followed by a possible a.

(\da)? means there might be a digit followed by a possible a.

Note: grouping also uses the same thing as you use to set $regml.

Numbered and named matches

You can use numbers and names to make a match you can reference later in the expression.

To fill a numbered match you use (expression), to fill a named match you use (?P<name>expression)

To retrieve a numbered match you use \N (where N refers to the Nth match), to retrieve a named match you use (?P=name)

Examples:

//echo -a $regex(abb,/^a(\1$/) will return "1" because "b" was matched in "abb", then it was matched again in the back reference

//echo -a $regex(abc,/^a(\1$/) will return "0" because "b" was matched in "abc", then it wasn't matched again in the back reference

//echo -a $regex(abb,/^a(?P<bee>(?P=bee)$/) will return "1" because "b" was matched in "abb", then it was matched again in the back reference.

//echo -a $regex(abc,/^a(?P<bee>(?P=bee)$/) will return "0" because "b" was matched in "abc", then it wasn't matched again in the back reference.

Note: when filling a named match using (?P<name><expression>), the actual <> characters are required.

Greediness

Greedy quantifiers try to match as much as possible.

Ungreedy quantifiers try to match as little as possible.

Greedy

\d+ has to match 1 case of 1 or more digits.

\d* has to match 1 case of 0 or more digits.

\d? has to match 1 case of 1 or 0 digits.

Ungreedy

(\d+)? has to match 0 or 1 cases of 1 or more digits.

(\d*)? has to match 0 or 1 cases of 0 or more digits.

(\d?)? has to match 0 or 1 cases of 1 or 0 digits.

Examples:

//echo -a $regex(ababcabababc,/((ab)+c)/) matches "ababcabababc" because the expression tried to match as many cases of "abc" possible, it had to match at least one case of abc.

//echo -a $regex(ababcabababc,/((ab)+?c)/) matches "ababc" because the expression tried to match as little cases of "abc" possible - it will, however, match something if it's in the string - it had to match at least 0 cases of abc.

.:

. will match any character except a newline.

Examples

.* means 0 or more characters.

.+ means 1 or more characters.

.? means 0 or 1 characters.

Positive Lookahead, negative lookahead, positive lookback, negative lookback and non-capturing:

?= is positive lookahead, ?! is negative lookahead, ?<= is positive lookback, ?<! is negative lookback and ?: is non-capturing.

?= will look ahead in the pattern to see if the pattern following ?= is matched, if it isn't matched, it will stop trying to match anything.

?<= will look back in the pattern to see if the pattern behind ?<= has been matched, if it hasn't been matched, it will stop trying to match anything.

?! will look ahead in the pattern to see if the pattern following ?! isn't matched, if it is matched, it will stop trying to match anything.

?<! will look back in the pattern to see if the pattern behind ?<! isn't matched, if it is matched, it will stop trying to match anything.

?: won't fill $regml when placed inside grouping brackets, therefore speeding the regex up.

Examples

//echo -a $regex(foobar,/foo(?=a)bar/) will be "0", because the lookahead has found that foo followed by an "a" wouldn't match, so it won't continue with the rest of the pattern.

//echo -a $regex(foobar,/foo(?=b)bar/) will be "1", because the lookahead has found that "foo" followed by a "b" would match, so it will continue with the rest of the pattern.

//echo -a $regex(foobar,/foo(?<=o)bar/) will be "1", because the lookback has found that an "o" was found in "foo", so it will continue with the rest of the pattern.

//echo -a $regex(foobar,/foo(?<=x)bar/) will be "0", because the lookback has found that "x" wasn't found in "foo", so it won't continue with the rest of the pattern.

//echo -a $regex(foobar,/foo(?!a)bar/) will be "1", because the lookahead has found that "foo" isn't followed by an "a", so it will continue with the rest of the pattern.

//echo -a $regex(foobar,/foo(?!b)bar/) will be "0", because the lookahead has found that "foo" is followed by a "b", so it won't continue with the rest of the pattern.

//echo -a $regex(foobar,/foo(?<!bar)bar/) will be "1", because the lookback has found that "bar" wasn't matched, so it will continue with the rest of the pattern.

//echo -a $regex(foobar,/foo(?<!foo)bar/) will be "0", because the lookback has found that "foo" was matched, so it won't continue with the rest of the pattern.

//echo -a $regex(foobar,/foo(?:bar|bas)/) will be "1", and is exactly the same as //echo -a $regex(foobar,/foo(bar|bas)/) except using ?: will stop it filling $regml() with anything.

Note: by now you should already know what grouping brackets and | (OR) do.

If-then-else

The basic syntax for if-then-else is (?(if)then|else)

Examples

//echo -a $regex(foobar,/foo(?(?<=foo)bar|foo)/) will return 1 because the if statement (?<=foo) found that foo was matched, so then bar was matched instead of foo.

//echo -a $regex(foobar,/foo(?(?<=foo)foo|bar)/) will return 0 because the if statement (?<=foo) found that foo was matched, so then foo wasn't matched.

Comments

The basic syntax for a comment is (?#comment)

Examples

//echo -a $regex(abc,/ab(?#the next letter is c)c/)

//echo -a $regex(abc,/a(?#the next letter is b)bc/)

$regml

$regml is used to reference certain matches made in $regex, I will only give some basic examples on this.

//echo -a $regex(hello world,/(hello)/) $regml(1)

Will echo "1 hello" because it matched hello inside the grouping brackets.

//echo -a $regex(hello world hellor world,/(hell.*)/) $regml(1) $regml(2)

Will echo "1 hello world hellor world" because it matched hell.* once inside the grouping brackets and $regml(2) will return $null ($chr(0)).

$regsub

$regsub is used to replace matches with other text/matches. The syntax is $regsub(string,pattern,replace text,var), i will only give one example.

//var %x | //.echo -q $regsub(hello world,/world/,,%x) | //echo -s %x

This will match "world" in "hello world" and replace it with $null ($chr(0)).

Please send any questions to tidy_trax <at> mirc <dot> net

Tutorial End

Updates (Date format: dd/mm/yyyy):

06/12/2004 Fixed a few errors, added some extra examples of greediness.

Edited December 6, 2004 by tidy trax

Sign In

Wildcards and regular expressions tutorial

Recommended Posts

tidy trax

Link to comment

Share on other sites

Browse

Activity