FR: when ingesting data, give all columns labelled in Stata/SPSS/etc the haven_labelled
class
#762
Labels
feature
a feature request or enhancement
haven_labelled
class
#762
Background
Currently, when
haven
ingests a Stata.dta
file, it preserves Stata data attributes of a column in a different ways depending on the collection of attributes found:haven
addshaven_labelled
andvctrs_vctr
classes and stores the these attributes in thelabel
andlabels
attributes, respectively.haven
does not add any classes and stores the label in thelabel
attribute.Rationale
This is all well and good.
haven
does the right thing of preserving Stata data attributes.However, sometimes the different methods for preserving those attributes matters.
The main rationale is for a desirable side-effect of labelled columns having additional classes: when two data frames are combined with
purrr::list_rbind()
orvctrs::vec_rbind()
(whichpurrr::list_rbind()
calls), data attributes preserved byhaven
are only kept for columns with additional classes.See also issues here and here.
How haven stores data attributes
Here are some example Stata files: examples_stata_files.zip
Here's the Stata code that generated them
Here's how haven captures those Stata data. Note the difference between
var1
, which has both variable label and value labels, andvar2
/var3
, which only has a variable label.Created on 2024-10-10 with reprex v2.1.1
How binding data frames drops attributes of columns without additional classes
Created on 2024-10-10 with reprex v2.1.1
(As an aside, I noticed that this behavior does not occur when there is only 1 column with a variable label but no value labels. In that corner case, that column is given the haven class.)
The text was updated successfully, but these errors were encountered: