Commit 169b152f authored by Paul Sokolovsky's avatar Paul Sokolovsky Committed by Damien George

docs/ure: Fully describe supported syntax subset, add example.

parent 1db55381
...@@ -10,47 +10,101 @@ This module implements regular expression operations. Regular expression ...@@ -10,47 +10,101 @@ This module implements regular expression operations. Regular expression
syntax supported is a subset of CPython ``re`` module (and actually is syntax supported is a subset of CPython ``re`` module (and actually is
a subset of POSIX extended regular expressions). a subset of POSIX extended regular expressions).
Supported operators are: Supported operators and special sequences are:
``'.'`` ``.``
Match any character. Match any character.
``'[...]'`` ``[...]``
Match set of characters. Individual characters and ranges are supported, Match set of characters. Individual characters and ranges are supported,
including negated sets (e.g. ``[^a-c]``). including negated sets (e.g. ``[^a-c]``).
``'^'`` ``^``
Match the start of the string. Match the start of the string.
``'$'`` ``$``
Match the end of the string. Match the end of the string.
``'?'`` ``?``
Match zero or one of the previous entity. Match zero or one of the previous sub-pattern.
``'*'`` ``*``
Match zero or more of the previous entity. Match zero or more of the previous sub-pattern.
``'+'`` ``+``
Match one or more of the previous entity. Match one or more of the previous sub-pattern.
``'??'`` ``??``
Non-greedy version of ``?``, match zero or one, with the preference
for zero.
``'*?'`` ``*?``
Non-greedy version of ``*``, match zero or more, with the preference
for the shortest match.
``'+?'`` ``+?``
Non-greedy version of ``+``, match one or more, with the preference
for the shortest match.
``'|'`` ``|``
Match either the LHS or the RHS of this operator. Match either the left-hand side or the right-hand side sub-patterns of
this operator.
``'(...)'`` ``(...)``
Grouping. Each group is capturing (a substring it captures can be accessed Grouping. Each group is capturing (a substring it captures can be accessed
with `match.group()` method). with `match.group()` method).
**NOT SUPPORTED**: Counted repetitions (``{m,n}``), more advanced assertions ``\d``
(``\b``, ``\B``), named groups (``(?P<name>...)``), non-capturing groups Matches digit. Equivalent to ``[0-9]``.
(``(?:...)``), etc.
``\D``
Matches non-digit. Equivalent to ``[^0-9]``.
``\s``
Matches whitespace. Equivalent to ``[ \t-\r]``.
``\S``
Matches non-whitespace. Equivalent to ``[^ \t-\r]``.
``\w``
Matches "word characters" (ASCII only). Equivalent to ``[A-Za-z0-9_]``.
``\W``
Matches non "word characters" (ASCII only). Equivalent to ``[^A-Za-z0-9_]``.
``\``
Escape character. Any other character following the backslash, except
for those listed above, is taken literally. For example, ``\*`` is
equivalent to literal ``*`` (not treated as the ``*`` operator).
Note that ``\r``, ``\n``, etc. are not handled specially, and will be
equivalent to literal letters ``r``, ``n``, etc. Due to this, it's
not recommended to use raw Python strings (``r""``) for regular
expressions. For example, ``r"\r\n"`` when used as the regular
expression is equivalent to ``"rn"``. To match CR character followed
by LF, use ``"\r\n"``.
**NOT SUPPORTED**:
* counted repetitions (``{m,n}``)
* named groups (``(?P<name>...)``)
* non-capturing groups (``(?:...)``)
* more advanced assertions (``\b``, ``\B``)
* special character escapes like ``\r``, ``\n`` - use Python's own escaping
instead
* etc.
Example::
import ure
# As ure doesn't support escapes itself, use of r"" strings is not
# recommended.
regex = ure.compile("[\r\n]")
regex.split("line1\rline2\nline3\r\n")
# Result:
# ['line1', 'line2', 'line3', '', '']
Functions Functions
--------- ---------
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment