Generative Adversarial Networks for Text-to-Face Synthesis & Generation: A Quantitative-Qualitative Analysis of Natural Language Processing Encoders for Spanish
This repository contains the code, models and corpus of the project "Generative Adversarial Networks for Text-to-Face Synthesis & Generation: A Quantitative-Qualitative Analysis of Natural Language Processing Encoders for Spanis" published in Information Processing and Management.
This work develops a study to generate images of faces from a textual description in Spanish. A cDCGAN was used as a generator, and a comparison of the RoBERTa-large-bne (RoBERTa), RoBERTa-large-bne-celebAEs-UNI (RoBERTa+CelebA our model) and Sent2vec (Sent2vec+CelebA). The last two models were trained using a Spanish descriptive corpus of the CelebA image dataset.
RoBERTa base BNE trained with data from the descriptive text corpus of the CelebA dataset The model can be found at the following url: RoBERTa model in Drive.
The new model called RoBERTa-base-bne-celebAEs-UNI has been generated as a result of training the base model RoBERTa-large-bne with a descriptive text corpus of the CelebA dataset in Spanish. For the training, a specific corpus with 249,000 entries was prepared. Each entry is made up of two sentences and their respective similarity value, value between 0 and 1, calculated using the Spacy library on their English pairs. You can download said repository from this repository at the following link or from the Huggingface repository. The total training time using the Sentence-transformer library was 42 days using all the available GPUs of the server, and with exclusive dedication.
A comparison was made between the Spearman's correlation for 1000 test sentences between the base model and our trained model. As can be seen in the following table, our model obtains better results (correlation closer to 1).
Models | Spearman's correlation |
---|---|
RoBERTa-base-bne | 0.827176427 |
RoBERTa-celebA-Sp | 0.999913276 |
- Download the model (full directory) from Drive or Huggingface repository.
- The downloaded model is in a directory named RoBERTa-base-bne-celebAEs-UNI.
- Move the downloaded directory to the same directory where the Python code that will use it is located.
- Install the Sentence-transformer library for python using the following command. To learn more about the management of the library, visit the following link.
pip install -U sentence-transformers
- Write the following code in the Python file to call the library and the model. Captions must be made up of lists of one or more sentences in Spanish.
from sentence_transformers import SentenceTransformer, InputExample, models, losses, util, evaluation
model_sbert = SentenceTransformer('roberta-large-bne-celebAEs-UNI')
caption = ['La mujer tiene pomulos altos. Su cabello es de color negro. Tiene las cejas arqueadas y la boca ligeramente abierta. La joven y atractiva mujer sonriente tiene mucho maquillaje. Lleva aretes, collar y lapiz labial.']
vectors = model_sbert.encode(captions)
print(vector)
- As a result, the encoder will generate a numeric vector whose dimension is 1024.
>>$ print(vector)
>>$ [0.2,0.5,0.45,........0.9]
>>$ len(vector)
>>$ 1024
Sent2vec trained with data from the descriptive text corpus of the CelebA dataset
The model can be found at the following url: Sent2vec Model in Drive or HuggingFace
Sent2vec can be used directly for English texts. For this purpose, all you have to do is download the library and enter the text to be coded, since most of these algorithms were trained using English as the original language. However, since this work is used with text in Spanish, it has been necessary to train it from zero in this new language. This training was carried out using the generated corpus (in this respository) with the following process:
- A corpus composed of a set of descriptive sentences of characteristics of each of the faces of the CelebA dataset in Spanish has been generated. A total of 192,209 sentences are available for training.
- Apply a pre-processing consisting of removing accents. stopwords and connectors were retained as part of the sentence structure during training.
- Install the libraries Sent2vec and FastText, and configure the parameters. The parameters have been fixed empirically after several
- tests, being: 4,800 dimensions of feature vectors, 5,000 epochs, 200 threads, 2 n-grams and a learning rate of 0.05.
In this context, the total training time lasted 7 hours working with all CPUs at maximum performance. As a result, it generates a bin extension file which can be downloaded from this repository.
- Download the model from Drive or Huggingface repository.
- The downloaded model is in a file named sent2vec_celebAEs-UNI.bin.
- Move the downloaded file to the same directory where the Python code that will use it is located.
- Install the libraries Sent2vec and FastText. To learn more about the management of the libraries, visit the following link.
- Write the following code in the Python file to call the library and the model.
import sent2vec
Model_path="sent2vec_celebAEs-UNI.bin"
s2vmodel = sent2vec.Sent2vecModel()
s2vmodel.load_model(Model_path)
caption = """El hombre luce una sombra a las 5 en punto. Su cabello es de color negro. Tiene una nariz grande con cejas tupidas. El hombre se ve atractivo"""
vector = s2vmodel.embed_sentence(caption)
print(vector)
- As a result, the encoder will generate a numeric vector whose dimension is 4800.
>>$ print(vector)
>>$ [[0.1,0.87,0.51,........0.7]]
>>$ len(vector[0])
>>$ 4800
All code and resources of the present work in this repository are under the license Licencia Creative Commons Atribución-NoComercial 4.0 Internacional.
If you use resources from this repository in your work, please cite the paper published in Information Processing and Management:
@article{YAURILOZANO2024103667,
title = {Generative Adversarial Networks for text-to-face synthesis & generation: A quantitative–qualitative analysis of Natural Language Processing encoders for Spanish},
journal = {Information Processing & Management},
volume = {61},
number = {3},
pages = {103667},
year = {2024},
issn = {0306-4573},
doi = {https://doi.org/10.1016/j.ipm.2024.103667},
url = {https://www.sciencedirect.com/science/article/pii/S030645732400027X},
author = {Eduardo Yauri-Lozano and Manuel Castillo-Cara and Luis Orozco-Barbosa and Raúl García-Castro}
}