Outline for November 4, 2024

Reading: text, §11
Due: Homework 3, due November 13, 2024


  1. Pattern matching, continued
    1. Match any of a set of characters: [0123456789], [^0123456789], [0-9]
    2. Repetition:
      1. * — match 0 or more of the preceding regular expression
      2. ? — match 0 or 1 of the preceding regular expression
      3. + — match 1 or more of the preceding regular expression
      4. {m,n} — match between m and n (inclusive) of the preceding regular expression
      5. greedy matching; each matches as many characters as possible
      6. put ? after and they will match as few characters as possible
    3. ^ — match start of string or line
    4. $ — match end of string or line
    5. (, ) — used to group regular expressions
    6. | — used to indicate one of the regular expressions must be matched
    7. \ — used to escape metacharacters

  2. Special sequences
    1. \b — match beginning or end of word

  3. Useful abbreviations in patterns
    1. \n — match nth group
    2. \d — match any digit; same as [0-9]
    3. \s — match any space character; same as [\t\n\r\f\v]
    4. \w — match any alphanumeric character and underscore; same as [a-zA-Z0-9_]
    5. \D — match any character except a digit; inverse of \d
    6. \S — match any character except a space character; inverse of \s
    7. \W — match any character except an alphanumeric character or underscore; inverse of \w
    8. \b — match a word boundary; a word is a sequence of alphanumeric characters

  4. Useful functions/methods [recomp.py, renocomp.py, regroup.py]
    1. re.compile(str) compiles the pattern into pc (that is, pc = re.compile(str))
    2. pc.match(str) returns None if compiled pattern pc does not match beginning of string str
    3. pc.search(str) returns None if pattern pc does not match any part of string str
    4. pc.findall(str) returns a list of substrings of the stringstr that match the pattern pc
    5. pc.group(str) returns the substring of the string str that the pattern pc matches
    6. pc.start(str) returns the starting position of the match
    7. pc.end(str) returns the ending position of the match
    8. pc.span(str) returns tuple (start, end) positions of match

  5. “Raw” string notation: backslash not handled specially; put “r” before string

  6. Reading a URL [geturl.py, geturl2.py, geturl3.py]
    1. Opening a URL
    2. Reading the page as a string
    3. The role of decode() [geturl-nd.py]

  7. A program to print links in web pages [urlpat.py, urlpat2.py]

UC Davis sigil
Matt Bishop
Office: 2209 Watershed Sciences
Phone: +1 (530) 752-8060
Email: mabishop@ucdavis.edu
ECS 235A, Computer and Information Security
Version of November 5, 2024 at 9:00PM

You can also obtain a PDF version of this.

Valid HTML 4.01 Transitional Built with BBEdit Built on a Macintosh