Outline for November 8, 2021
Reading
: §14
Due
: Homework 3, due November 8, 2021
Pattern matching
Regular expressions
Atoms: letters, digits
Match any character except newline:
.
Match any of a set of characters:
[0123456789]
,
[^0123456789]
,
[0-9]
Repetition:
*
,
+
,
{
m
,
n
}; greedy matching; put
?
after and they match as few characters as possible
Match start, end of string:
^
,
$
;
$
matches end of line, also
Grouping:
(
,
)
Escape metacharacters:
\
“Raw” string notation: backslash not handled specially; put “r” before string
Useful functions/methods [
recomp.py
,
renocomp.py
,
regroup.py
]
re.compile(
str
)
compiles the pattern into
pc
(that is,
pc = re.compile(str)
)
pc
.match(
str
)
returns None if compiled pattern
pc
does not match beginning of string
str
pc
.search(
str
)
returns None if pattern
pc
does not match any part of string
str
pc
.findall(
str
)
returns a list of substrings of the string
str
that match the pattern
pc
pc
.group(
str
)
returns the substring of the string
str
that the pattern
pc
matches
pc
.start(
str
)
returns the starting position of the match
pc
.end(
str
)
returns the ending position of the match
pc
.span(
str
)
returns tuple (start, end) positions of match
Useful abbreviations
\d
matches any digit; same as
[0-9]
\s
matches any space character; same as
[\ \t\n\r\f\v]
\w
matches any alphanumeric character and underscore; same as
[a-zA-Z0-9_]
\D
matches any character
except
a digit; inverse of
\d
\S
matches any character
except
a space character; inverse of
\s
\W
matches any character
except
an alphanumeric character or underscore; inverse of
\w
\b
matches a word boundary — a word is a sequence of alphanumeric characters
Matt Bishop
Office: 2209 Watershed Sciences
Phone: +1 (530) 752-8060
Email:
mabishop@ucdavis.edu
ECS 235A, Computer and Information Security
Version of November 9, 2021 at 2:21PM
You can also obtain a PDF version of this.