Outline for December 1, 2020

Reading: §11, 12.4
Due: Homework 4, due December 1, 2020

  1. Reading a URL [geturl.py,geturl2.py,geturl3.py]
    1. Opening a URL
    2. Reading the page as a string

  2. Pattern matching
    1. Regular expressions
    2. Atoms: letters, digits
    3. Match any character except newline: .
    4. Match any of a set of characters: [0123456789], [^0123456789], [0-9]
    5. Repetition: *, +, {m,n}; greedy matching; put ? after and they match as few characters as possible
    6. Match start, end of string: ^, ; matches end of line, also
    7. Grouping: (, )
    8. Escape metacharacters: \
  3. “Raw” string notation: backslash not handled specially; put “r” before string
  4. Useful functions/methods [recomp.py, renocomp.py, regroup.py]
    1. re.compile(str) compiles the pattern into pc (that is, pc = re.compile(str))
    2. pc.match(str) returns None if compiled pattern pc does not match beginning of string str
    3. pc.search(str) returns None if pattern pc does not match any part of string str
    4. pc.findall(str) returns a list of substrings of the stringstr that match the pattern pc
    5. pc.group(str) returns the substring of the string str that the pattern pc matches
    6. pc.start(str) returns the starting position of the match
    7. pc.end(str) returns the ending position of the match
    8. pc.span(str) returns tuple (start, end) positions of match
  5. Useful abbreviations
    1. \d matches any digit; same as [0-9]
    2. \s matches any space character; same as [ \t\n\r\f\v]
    3. \w matches any alphanumeric character and underscore; same as [a-zA-Z0-9_]
    4. \D matches any character except a digit; inverse of \backslashd
    5. \S matches any character except a space character; inverse of \s
    6. \W matches any character except an alphanumeric character or underscore; inverse of \w
    7. \b matches a word boundary — a word is a sequence of alphanumeric characters

UC Davis sigil
Matt Bishop
Office: 2209 Watershed Sciences
Phone: +1 (530) 752-8060
Email: mabishop@ucdavis.edu
MHI 289I, Programming for Health Informatics
Version of November 30, 2020 at 1:11PM

You can also obtain a PDF version of this.

Valid HTML 4.01 Transitional Built with BBEdit Built on a Macintosh