API Reference
+ + +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ opennotebookllm.preprocessing.data_cleaners
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ clean_html(text)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Clean HTML text.
+ + +
+
This function removes
+-
+
- scripts +
- styles +
- links +
- meta tags +
In addition, it calls clean_with_regex.
+ + +Examples:
+>>> clean_html("<html><body><p>Hello, world! </p></body></html>"")
+"Hello, world!"
+
Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
+ text
+ |
+
+ str
+ |
+
+
+
+ The HTML text to clean. + |
+ + required + | +
Returns:
+Name | Type | +Description | +
---|---|---|
str |
+ str
+ |
+
+
+
+ The cleaned text. + |
+
+ Source code in
+
+
+ Source code in src/opennotebookllm/preprocessing/data_cleaners.py
+ 36 +37 +38 +39 +40 +41 +42 +43 +44 +45 +46 +47 +48 +49 +50 +51 +52 +53 +54 +55 +56 +57 +58 +59 +60 +61 |
|
+
+
+
+
+
+
+
+
+
+ clean_markdown(text)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Clean Markdown text.
+ + +
+
This function removes
+-
+
- markdown images +
In addition, it calls clean_with_regex.
+ + +Examples:
+>>> clean_markdown('# Title with image ![alt text](image.jpg "Image Title")')
+"Title with image"
+
Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
+ text
+ |
+
+ str
+ |
+
+
+
+ The Markdown text to clean. + |
+ + required + | +
Returns:
+Name | Type | +Description | +
---|---|---|
str |
+ str
+ |
+
+
+
+ The cleaned text. + |
+
+ Source code in
+
+
+ Source code in src/opennotebookllm/preprocessing/data_cleaners.py
+ 64 +65 +66 +67 +68 +69 +70 +71 +72 +73 +74 +75 +76 +77 +78 +79 +80 +81 +82 +83 +84 |
|
+
+
+
+
+
+
+
+
+
+
+
+ clean_with_regex(text)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Clean text using regular expressions.
+ + +
+
+
+This function removes
+-
+
- URLs +
- emails +
- special characters +
- extra spaces +
Examples:
+>>> clean_with_regex(" Hello, world! http://example.com")
+"Hello, world!"
+
Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
+ text
+ |
+
+ str
+ |
+
+
+
+ The text to clean. + |
+ + required + | +
Returns:
+Name | Type | +Description | +
---|---|---|
str |
+ str
+ |
+
+
+
+ The cleaned text. + |
+
+ Source code in
+
+
+ Source code in src/opennotebookllm/preprocessing/data_cleaners.py
+ 5 + 6 + 7 + 8 + 9 +10 +11 +12 +13 +14 +15 +16 +17 +18 +19 +20 +21 +22 +23 +24 +25 +26 +27 +28 +29 +30 +31 +32 +33 |
|