From: mittis Date: Sun, 4 Jan 2026 06:55:35 +0000 (+0100) Subject: Regex search: allow for literal search of special regex chars X-Git-Url: https://git.sthu.org/?a=commitdiff_plain;h=f98fb6435a00b51e4321efde39dc0d5e0185a60d;p=oe1archive.git Regex search: allow for literal search of special regex chars Parentheses "(",")" and curly braces "{","}"are escaped now and not interpreted as regex characters. This allows for searching broadcasts with -s "Das.*Leben (2)" which matches the title "Das gute Leben (2)". Previously "Das gute Leben (2)" led to empty search result as the paraenthes are used in regex to capture groups. --- diff --git a/README.markdown b/README.markdown index 2bd52f3..9d488f3 100644 --- a/README.markdown +++ b/README.markdown @@ -62,16 +62,63 @@ extended search: # Fast search by title/subtitle ./oe1archive -s "Show Name" +# Search for show with special characters (automatically escaped) +./oe1archive -s "Das gute Leben (2)" + +# Search using regex patterns (wildcards still work) +./oe1archive -s "Das.*Leben" + +# Search with alternation (OR) patterns +./oe1archive -s "Music|Talk" + # Deep search including descriptions ./oe1archive -s "Some Keyword" -e -r 30 # Batch download all matches (fast search) ./oe1archive -d "Show Name" -p "prefix" -r 30 +# Batch download with special characters in title +./oe1archive -d "Das gute Leben (2)" -p "prefix" + # Batch download all matches (deep search) ./oe1archive -d "Some Keyword" -p "prefix" -e -r 30 ``` +## Search Behavior and Regex Escaping + +The search feature uses **smart escaping** to make searching for titles with special characters intuitive, while preserving regex pattern support. + +### Automatically Escaped Characters +The following characters are **automatically escaped** (treated as literals): +- `(` `)` - parentheses +- `{` `}` - curly braces + +This means you can search for titles like "Das gute Leben (2)" without needing to manually escape the parentheses. + +### Supported Regex Patterns +The following regex patterns **still work** and are NOT escaped: +- `.*` - match any characters (wildcard) +- `.+` - match one or more of any character +- `|` - alternation (OR operator) +- `^` - start of string +- `$` - end of string +- `[...]` - character classes + +### Examples +```bash +# Literal search - parentheses are automatically escaped +./oe1archive -s "Das gute Leben (2)" + +# Regex wildcard - matches "Das gute Leben", "Das schöne Leben", etc. +./oe1archive -s "Das.*Leben" + +# Combine both - wildcard with escaped parentheses +./oe1archive -s "Das.*Leben (2)" + +# Alternation - matches either "Music" or "Talk" +./oe1archive -s "Music|Talk" +``` + ## How It Works The OE1 API provides a rolling weekly window of current broadcasts. Streams diff --git a/oe1archive b/oe1archive index 4c2d33b..1f6ed08 100755 --- a/oe1archive +++ b/oe1archive @@ -265,17 +265,35 @@ class Archive: akm = "" return description + "
" + akm + def _smart_escape_for_regex(self, pattern): + """Escape literal chars () {} while preserving regex patterns like .* + + Args: + pattern: Search pattern string + + Returns: + Pattern with literal characters escaped for regex + """ + chars_to_escape = ['(', ')', '{', '}'] + result = pattern + for char in chars_to_escape: + result = result.replace(char, '\\' + char) + return result + def get_broadcasts_by_regex(self, key, deep_search=False): - """Find broadcasts matching a regex pattern. + """Find broadcasts matching a search pattern (with smart escaping). Args: - key: Search pattern (regex) + key: Search pattern (literal by default, but supports regex patterns) + Literal characters like () and {} are automatically escaped, + while regex patterns like .* and | continue to work. deep_search: If True, search in title, subtitle, and description. If False, search only in title and subtitle (faster). Skips placeholder entries. """ - rex = re.compile(key, re.IGNORECASE) + escaped_key = self._smart_escape_for_regex(key) + rex = re.compile(escaped_key, re.IGNORECASE) res = [] total = sum(len(djson["broadcasts"]) for djson in self.json)