Skip to content

Releases: Genaios/TextMachina

v0.2.2: Add sentence rewritting and polish documentation.

09 Feb 10:12
024f726
Compare
Choose a tag to compare

This release adds:

  • Sentence rewriting extractor and packer to generate mixcase datasets. Contrary to gap and masking, a set of sentences of the documents are selected and the LLM has to rewrite them in its own words.
  • Argument validation in extractors.
  • Remove private methods from the documentation.

v0.2.1: Add documentation and fix naming.

26 Jan 18:22
b8d6b39
Compare
Choose a tag to compare

This release adds:

v.0.2.0: More providers, extractors, examples, and refactor🥳

25 Jan 12:04
10ed71b
Compare
Choose a tag to compare

The 0.2.0 release of TextMachina includes:

  • New providers: Amazon Bedrock, AI21, Azure OpenAI, and inference servers (vllm and trt).
  • Refactor the Huggingface Remote provider to make retries through HTTPAdapter.
  • Two new extractors for mixcase tasks: sentence_masking and word_masking. Differently from the sentence_gap and word_gap extractors, LLMs must reconstruct masks in whole texts, instead of writing text between boundaries.
  • Extend the dataset generator for mixcase tasks to consider masking extractors.
  • Add config examples to learn about the extractors.
  • Small refactors: colors in logger, inheritance in some tokenizers, etc.

v.0.1.0: Mixcase tasks and more 🥳

19 Jan 16:40
48c68a5
Compare
Choose a tag to compare

This release of TextMachina includes:

  • Allow to pass parameters to the extractors out from the prompt templates. The templates must be used only to define placeholders.
  • Add MixCaseDatasetGenerator to generate datasets for mixcase tasks (detection tagging). Other datasets like mixcase classification can be built out of TextMachina, using the datasets generated by this one.
  • Add sentence_gap and word_gap extractors for mixcase tasks.
  • Refactor interactive exploration. Now we have one class per task, and each one must build its own panels.
  • Added exploration for mixcase datasets.
  • Added a TokenClassificationMetric to evaluate HF models on mixcase and boundary tasks.
  • Better structured and documented examples. Now we have examples/learning to illustrate how to use providers/tasks/extractors and examples/use_cases with additional config files.
  • Minor changes to improve quality of life: force to pass task_type in the CLI to prevent potential confusions, disable random_sample_human on boundary detection tasks, etc.
  • Document all the new code and improve existing documentation.
  • Extend the README to talk about mixcase tasks, include figures to visualize each type of task.

v0.0.10

09 Jan 08:28
Compare
Choose a tag to compare
  • Updated Arxiv citation in README

v0.0.9

08 Jan 16:46
e45ebfb
Compare
Choose a tag to compare

First release 🎉

First release of TextMachina that includes:

  • Dataset generators: for detection, attribution, and boundary detection tasks.
  • Five model providers: Anthropic, Cohere, HuggingFace (local and remote), OpenAI, and Vertex AI.
  • Six extractors to fill prompt templates: Auxiliary, Entities, Nouns, Sentence prefix, Word prefix, and Combined.
  • One decoding constrainer: Length constrainer.
  • Five metrics to assess task difficulty and dataset quality: MAUVE, Perplexity, Repetition, Diversity, and baseline models.
  • Post-processing functions to improve the quality of the datasets and prevent common biases.
  • CLI interface to generate and explore datasets.
  • Configuration examples, under the folder etc/examples, to test different tasks and model providers.