Releases: Genaios/TextMachina
Releases · Genaios/TextMachina
v0.2.2: Add sentence rewritting and polish documentation.
This release adds:
- Sentence rewriting extractor and packer to generate mixcase datasets. Contrary to gap and masking, a set of sentences of the documents are selected and the LLM has to rewrite them in its own words.
- Argument validation in extractors.
- Remove private methods from the documentation.
v0.2.1: Add documentation and fix naming.
This release adds:
- Documentation: https://textmachina.readthedocs.io/en/latest/
- Documentation-related extras for developers in the
setup.py
- Fixes some names of functions that were incorrectly autocompleted.
v.0.2.0: More providers, extractors, examples, and refactor🥳
The 0.2.0 release of TextMachina includes:
- New providers:
Amazon Bedrock
,AI21
,Azure OpenAI
, and inference servers (vllm
andtrt
). - Refactor the Huggingface Remote provider to make retries through
HTTPAdapter
. - Two new extractors for mixcase tasks:
sentence_masking
andword_masking
. Differently from thesentence_gap
andword_gap
extractors, LLMs must reconstruct masks in whole texts, instead of writing text between boundaries. - Extend the dataset generator for mixcase tasks to consider masking extractors.
- Add config examples to learn about the extractors.
- Small refactors: colors in logger, inheritance in some tokenizers, etc.
v.0.1.0: Mixcase tasks and more 🥳
This release of TextMachina includes:
- Allow to pass parameters to the extractors out from the prompt templates. The templates must be used only to define placeholders.
- Add
MixCaseDatasetGenerator
to generate datasets for mixcase tasks (detection tagging). Other datasets like mixcase classification can be built out of TextMachina, using the datasets generated by this one. - Add
sentence_gap
andword_gap
extractors for mixcase tasks. - Refactor interactive exploration. Now we have one class per task, and each one must build its own panels.
- Added exploration for mixcase datasets.
- Added a
TokenClassificationMetric
to evaluate HF models on mixcase and boundary tasks. - Better structured and documented examples. Now we have
examples/learning
to illustrate how to use providers/tasks/extractors andexamples/use_cases
with additional config files. - Minor changes to improve quality of life: force to pass
task_type
in the CLI to prevent potential confusions, disablerandom_sample_human
on boundary detection tasks, etc. - Document all the new code and improve existing documentation.
- Extend the README to talk about mixcase tasks, include figures to visualize each type of task.
v0.0.10
v0.0.9
First release 🎉
First release of TextMachina that includes:
- Dataset generators: for detection, attribution, and boundary detection tasks.
- Five model providers: Anthropic, Cohere, HuggingFace (local and remote), OpenAI, and Vertex AI.
- Six extractors to fill prompt templates: Auxiliary, Entities, Nouns, Sentence prefix, Word prefix, and Combined.
- One decoding constrainer: Length constrainer.
- Five metrics to assess task difficulty and dataset quality: MAUVE, Perplexity, Repetition, Diversity, and baseline models.
- Post-processing functions to improve the quality of the datasets and prevent common biases.
- CLI interface to generate and explore datasets.
- Configuration examples, under the folder
etc/examples
, to test different tasks and model providers.