Outline for November 4, 2024
Reading:
text
, §11
Due:
Homework 3, due November 13, 2024
Pattern matching, continued
Match any of a set of characters:
[0123456789]
,
[^0123456789]
,
[0-9]
Repetition:
*
— match 0 or more of the preceding regular expression
?
— match 0 or 1 of the preceding regular expression
+
— match 1 or more of the preceding regular expression
{
m
,
n
} — match between
m
and
n
(inclusive) of the preceding regular expression
greedy matching; each matches as many characters as possible
put
?
after and they will match as few characters as possible
^
— match start of string or line
$
— match end of string or line
(
,
)
— used to group regular expressions
|
— used to indicate one of the regular expressions must be matched
\
— used to escape metacharacters
Special sequences
\b
— match beginning or end of word
Useful abbreviations in patterns
\
n
— match
n
th
group
\d
— match any digit; same as
[0-9]
\s
— match any space character; same as
[\t\n\r\f\v]
\w
— match any alphanumeric character and underscore; same as
[a-zA-Z0-9_]
\D
— match any character
except
a digit; inverse of
\d
\S
— match any character
except
a space character; inverse of
\s
\W
— match any character
except
an alphanumeric character or underscore; inverse of
\w
\b
— match a word boundary; a word is a sequence of alphanumeric characters
Useful functions/methods [
recomp.py
,
renocomp.py
,
regroup.py
]
re.compile(
str
)
compiles the pattern into
pc
(that is,
pc = re.compile(str)
)
pc
.match(
str
)
returns None if compiled pattern
pc
does not match beginning of string
str
pc
.search(
str
)
returns None if pattern
pc
does not match any part of string
str
pc
.findall(
str
)
returns a list of substrings of the string
str
that match the pattern
pc
pc
.group(
str
)
returns the substring of the string
str
that the pattern
pc
matches
pc
.start(
str
)
returns the starting position of the match
pc
.end(
str
)
returns the ending position of the match
pc
.span(
str
)
returns tuple (start, end) positions of match
“Raw” string notation: backslash not handled specially; put “r” before string
Reading a URL [
geturl.py
,
geturl2.py
,
geturl3.py
]
Opening a URL
Reading the page as a string
The role of
decode()
[
geturl-nd.py
]
A program to print links in web pages [
urlpat.py
,
urlpat2.py
]
Matt Bishop
Office: 2209 Watershed Sciences
Phone: +1 (530) 752-8060
Email:
mabishop@ucdavis.edu
ECS 235A, Computer and Information Security
Version of November 5, 2024 at 9:00PM
You can also obtain a PDF version of this.