Skip to content

Commit

Permalink
Add enhancements to new version
Browse files Browse the repository at this point in the history
  • Loading branch information
rossarmstrong committed Oct 26, 2023
1 parent 7c03b37 commit 138ab54
Show file tree
Hide file tree
Showing 3 changed files with 45 additions and 11 deletions.
26 changes: 21 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,23 @@ It is organized by version and release date followed by a list of Enhancements,
<br /><br />


## Version 0.0.4 (Latest)
**Releasd:** May 4, 2023<br />
## Version 0.0.5 (Latest)
**Released:** October 26, 2023<br />
**Tag:** v0.0.5

### Enhancements

- Removed two methods from the "normalization" module:
- `remove_leading_trailing_spaces`: This method was used to remove leading and trailing spaces in the input text. (Method removed in this release)
- `replace_multiple_spaces`: This method was used to convert consecutive double spaces into single spaces in the input text. (Method removed in this release)

- Added a new method to the "normalization" module:
- `remove_whitespace(text)`: This new method efficiently removes all excess spaces in the input text. It replaces consecutive sequences of spaces with a single space and removes any leading or trailing spaces, ensuring a cleaner and more consistent text output.

<br /><br />

## Version 0.0.4
**Released:** May 4, 2023<br />
**Tag:** v0.0.4

### Enhancements
Expand All @@ -20,7 +35,7 @@ It is organized by version and release date followed by a list of Enhancements,
<br /><br />

## Version 0.0.3
**Releasd:** May 2, 2023<br />
**Released:** May 2, 2023<br />
**Tag:** v0.0.3

### Bug Fix
Expand All @@ -30,7 +45,7 @@ It is organized by version and release date followed by a list of Enhancements,
<br /><br />

## Version 0.0.2
**Releasd:** May 1, 2023<br />
**Released:** May 1, 2023<br />
**Tag:** v0.0.2

### General Changes
Expand All @@ -43,8 +58,9 @@ It is organized by version and release date followed by a list of Enhancements,
- Fixed an unidiomatic-typecheck (C0123) from type() to isinstance(). The idiomatic way to perform an explicit typecheck in Python is to use isinstance(x, y) rather than type(x) == Y.

<br /><br />

## Version 0.0.1 (Initial Release)
**Releasd:** April 28, 2023<br />
**Released:** April 28, 2023<br />
**Tag:** v0.0.1

This is the initial release
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "hatchling.build"

[project]
name = "werpy"
version = "0.0.4"
version = "0.0.5"
authors = [
{ name="Ross Armstrong", email="[email protected]" },
]
Expand Down
28 changes: 23 additions & 5 deletions werpy/normalize.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
"""
The Normalize module provides preprocessing methods for normalizing text input to be optimal for the Word Error Rate
(WER) function. The class contains methods for removing punctuation, converting text to lowercase, replacing double
spaces with a single space, and removing leading and trailing spaces. The apply_normalization method applies all the
normalization methods and returns the normalized input as a numpy array.
(WER) function. The class contains methods for removing punctuation, converting text to lowercase, and removing all
whitespace such as leading/trailing spaces and multiple in-text spaces. The apply_normalization method applies all
the normalization methods and returns the normalized input as a numpy array. The normalize function then vectorizes
the apply_normalization output and produces the final normalized version of the input text as a list.
"""

import numpy as np
Expand All @@ -28,6 +29,8 @@ class Normalize:
Changes any instances of a double space back to a standard single space
remove_leading_trailing_spaces
Removes any leading and/or trailing spaces in a text string
remove_whitespace
Removes all extra whitespace including leading/trailing spaces and multiple spaces within text
apply_normalization
Applies all the normalization methods in this class to the input text and outputs an array datatype
"""
Expand Down Expand Up @@ -75,6 +78,20 @@ def remove_leading_trailing_spaces(self):
"""
self.text = np.char.strip(self.text)

def remove_whitespace(self):
"""
Method that removes leading/trailing spaces and multiple spaces within text
"""
if isinstance(self.text, np.ndarray):
if self.text.ndim == 0:
# For scalar arrays, convert to a string, split, and join
self.text = ' '.join(str(self.text).split())
elif self.text.ndim == 1:
# For 1-dimensional arrays, split and join
self.text = ' '.join(self.text.astype(str).tolist())
elif isinstance(self.text, str):
self.text = ' '.join(self.text.split())

def apply_normalization(self) -> np.ndarray:
"""
Method that applies all the normalization methods in this class to the input text
Expand All @@ -86,8 +103,9 @@ def apply_normalization(self) -> np.ndarray:
"""
self.remove_punctuation()
self.convert_to_lowercase()
self.replace_multiple_spaces()
self.remove_leading_trailing_spaces()
#self.replace_multiple_spaces()
#self.remove_leading_trailing_spaces()
self.remove_whitespace()
return self.text


Expand Down

0 comments on commit 138ab54

Please sign in to comment.