Skip to content
forked from karpathy/minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training, adapted to be used for tabular predictions

License

Notifications You must be signed in to change notification settings

FelixWick/tabGPT

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tabGPT

An adaption of Andrej Karpathy's minGPT for tabular data.

procedure:

  • create column embedding for each column:
    • sentence embedding of column name/description (forward call of GPT2 with language-pre-trained weights and extract mean of last hidden states for each token in sequence)
    • for categorical values: for each row, same embedding procedure as for column name, then add the two embeddings
    • for numerical values: multiply the numerical values to each element of the column name embedding
  • concatenation of the embeddings for the different column along sequence dimension
  • run adjusted minGPT
    • instead of tokenization and learned embeddings, take the column embeddings as input
    • without positional encoding
    • instead of next-token prediction, use a classification or regression head (same as GPT2ForSequenceClassification in Hugging Face's transformers)

License

MIT

About

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training, adapted to be used for tabular predictions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%