You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create extra columns for governor word, lemma, POS and function
skip_morph
bool
Enable if you'd like to skip the parsing of morphological and extra fields
v2
bool/'auto'
CONLL-U version of file. By default, detect from data
drop
list
list of column names you don't need
add_meta
bool
add columns for sentence-level metadata
categories
bool
Convert columns to categorical format where possible
file_index
bool
Include filename in index levels
extra_fields
list/'auto'
`Names of extra fields in the last column. By default, detect from data
kwargs
dict
additional arguments to pass to pandas.read_csv()
Configuring these arguments can increase speed a lot, so if speed is important to you, turn off the features you don't need.
Where to from here?
If you're working with Python and CONLL-U, you might want to take a look at tücan, which provides a command-line and web-app interface for exploring CONLL-U datasets.
Alternatively, there's plenty of cool stuff you can do with Pandas by itself. Here are some toy examples:
defsearcher(df, column, query, inverse=False):
"""Search column for regex query"""bool_ix=df[column].str.contains(query)
returndf[bool_ix] ifnotinverseelsedf[~bool_ix]
pd.DataFrame.search=searcher# get nominal subjects starting with a, b or cdf.search('f', 'nsubj').search('w', '^[abc]').head().to_html()
w
l
x
p
g
f
e
type
gender
Case
Definite
Degree
Foreign
Gender
Mood
Number
Person
Poss
Reflex
Tense
Voice
Type
s
i
3
4.0
authorities
authority
NOUN
NNS
5
nsubj
_
_
_
_
_
_
_
_
_
Plur
_
_
_
_
_
_
8
2.0
cells
cell
NOUN
NNS
4
nsubj
_
_
_
_
_
_
_
_
_
Plur
_
_
_
_
_
_
9
3.0
announcement
announcement
NOUN
NN
6
nsubj:pass
_
_
_
_
_
_
_
_
_
Sing
_
_
_
_
_
_
12
3.0
commander
commander
NOUN
NN
7
nsubj
_
_
_
_
_
_
_
_
_
Sing
_
_
_
_
_
_
9.0
bombings
bombing
NOUN
NNS
11
nsubj
_
_
_
_
_
_
_
_
_
Plur
_
_
_
_
_
_
Create a concordancer
def_conclines(match, df=False, column=False):
"""Apply this to each sentence"""s, i=match.namesent=df['w'].loc[s]
match['left'] =sent.loc[:i-1].str.cat(sep=' ')
match['right'] =sent.loc[i+1:].str.cat(sep=' ')
formatted=match['w']
ifcolumn!='w':
formatted+='/'+match[column]
match['match'] =formattedreturnmatchdefconc(df, column, query):
"""Build simple concordancer"""# get query matchesmatches=df[df[column].str.contains(query)]
# add left and right columnslines=matches.apply(_conclines, df=df, column=column, axis=1)
returnlines[['left', 'match', 'right']]
pd.DataFrame.conc=conclines=df.head(1000).conc('l', 'be')
lines.head(10).to_html()