The data in this dataset can be used in various ways. First of all, it's a CLDF dataset, so many of the recipes from the CLDF cookbook apply.
Arguably the most important part of the dataset are the "language polygons", i.e. the geographic features representing areas where particular languages are spoken. These are linked from the CLDF data (via the speakerArea property and a language's ID), with features represented as GeoJSON features.
The three sets of GeoJSON features
- ECAI shapes, i.e. cleaned up and aggregated shapes from ECAI's geo-registered - GIS dataset
- language level areas, i.e. aggregated areas for Glottolog languages
- family level areas, i.e. aggregated areas for Glottolog top-level families and isolates.
can be used with GeoJSON-aware software like QGIS or https://geojson.io right away.
But the data can also be accessed programmatically, e.g. using the pycldf
package.
In the following we'll show how this can be done, roughly following the example given in this blogpost, i.e. we will investigate the data for three languages spoken in the Pacific around the antimeridian:
- Akei, spoken on Espiritu Santo in Vanuatu,
- Western Viti Levu-Yasawas Fijian, spoken on Viti Levu in Fiji (there is no data on Fijian in the dataset, because the area labeled "EASTERN FIJIAN" in the Language Atlas covers the speaker areas of multiple languages of the Eastern Fijian group),
- Samoan spoken in Samoa.
We start by using pycldf
to retrieve language objects corresponding to these three languages:
>>> from pycldf import Dataset
>>> ds = Dataset.from_metadata('cldf/Generic-metadata.json')
>>> lgs = {lg.cldf.name: lg for lg in ds.objects('LanguageTable')
... if lg.cldf.name in ['Akei', 'Western Viti Levu-Yasawas Fijian', 'Samoan']}
We can now create GeoJSON data to draw a map as the one in the blogpost, with markers for the three
languages and lines connecting these markers. For the point markers, we can (roughly) compute
centroids of the speaker areas using
shapely
. To make sure we do not run into
problems with the antimeridian, we translate the
speaker areas to pacific centered geometries using cldfgeojson
functionality:
>>> from shapely.geometry import shape
>>> from cldfgeojson import pacific_centered, feature_collection
>>> geojson = feature_collection([])
>>> centroids = {}
>>> for lname, lg in lgs.items():
... feature = pacific_centered(lg.speaker_area_as_geojson_feature)
... centroids[lname] = centroid = shape(feature['geometry']).centroid
... geojson['features'].append(feature)
... geojson['features'].append({"type": "Feature", "geometry": {"type": "Point", "coordinates": [centroid.x, centroid.y]}})
...
>>> for start, end in [('Akei', 'Western Viti Levu-Yasawas Fijian'), ('Western Viti Levu-Yasawas Fijian', 'Samoan')]:
... geojson['features'].append({"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[centroids[start].x, centroids[start].y], [centroids[end].x, centroids[end].y]]}})
...
A quick way to visualize this GeoJSON is using GitHub flavored markdown's support for GeoJSON. We do this by including the result of
>>> import json
>>> print(json.dumps(geojson))
as GeoJSON snippet in the markdown below (view raw to see the markup and the GeoJSON data):
{"type": "FeatureCollection", "features": [{"type": "Feature", "properties": {"title": "Akei", "fill": "#CC5151", "family": "Austronesian", "cldf:languageReference": "akei1237", "fill-opacity": 0.5}, "geometry": {"type": "Polygon", "coordinates": [[[166.64193338852056, -15.416924277293104], [166.64191594264406, -15.41686367511942], [166.64197198799678, -15.416840536561892], [166.64199384093564, -15.416833278401372], [166.67203924324966, -15.407728611033487], [166.72082326159554, -15.3929455751711], [166.76172302370227, -15.38055170786603], [166.7621701720841, -15.380392673577191], [166.76260022807188, -15.380191961276218], [166.76300931714366, -15.379951379249116], [166.76339375367553, -15.37967309498194], [166.76375007414686, -15.379359615633161], [166.7640750683442, -15.379013765445826], [166.76436580828332, -15.378638660302975], [166.76461967458832, -15.37823767965558], [166.76859499968035, -15.371245028366948], [166.76859784953749, -15.371240293801407], [166.768640680433, -15.371179566806445], [166.76869312077255, -15.371210975173526], [166.76871234375847, -15.371224472817445], [166.83102948501295, -15.417547750293961], [166.83103354584983, -15.417550910080365], [166.83109087110938, -15.41760172097008], [166.83112821288444, -15.41764137367468], [166.831142135776, -15.417658197704466], [166.86474998909873, -15.460314319229443], [166.86481735999132, -15.460405794429182], [166.86487864588653, -15.460501453137084], [166.9010959082561, -15.521294714971685], [166.90109895361567, -15.521300153820087], [166.90116876197428, -15.521451398358588], [166.9011757278643, -15.52147037805465], [166.91835650983325, -15.572889272150778], [166.91835816405992, -15.572894668202178], [166.9183753552558, -15.572969864439651], [166.91832664107758, -15.573010321699538], [166.91830710738188, -15.573024274440497], [166.84944209071452, -15.619437923716859], [166.84906161420446, -15.619721512894799], [166.84870973562843, -15.62003989421379], [166.84838962450027, -15.620390199883436], [166.84810416419148, -15.620769274557698], [166.84785592595952, -15.621173703756373], [166.84764714578736, -15.621599844620718], [166.84747970424317, -15.622043858726162], [166.83991978802914, -15.645391316154631], [166.83991792561378, -15.645396618866684], [166.8398870095455, -15.6454678843747], [166.83982784896196, -15.645490935339902], [166.83980501338908, -15.645497996298024], [166.813949357226, -15.652739606204394], [166.81394351329342, -15.652741079377842], [166.81385904688972, -15.65275566387882], [166.81379195049018, -15.65274585482926], [166.81376689026274, -15.65274021833699], [166.79327304012327, -15.647513246505774], [166.793266387034, -15.647511335273117], [166.79309277048839, -15.647444609265154], [166.793072554246, -15.647434666421297], [166.7721516314498, -15.636280740326505], [166.772146232204, -15.636277671768248], [166.77207303168902, -15.636228100797231], [166.77203076187695, -15.636172388266614], [166.77201645326585, -15.636150089010709], [166.64516708504598, -15.425046590564198], [166.6451646999612, -15.425042431079902], [166.64510818767653, -15.424928733648896], [166.64510216786798, -15.42491449100615], [166.6419386898315, -15.41693865398237], [166.64193338852056, -15.416924277293104]]]}}, {"type": "Feature", "geometry": {"type": "Point", "coordinates": [166.78726225882488, -15.507664584936174]}}, {"type": "Feature", "properties": {"title": "Samoan", "fill": "#CC5151", "family": "Austronesian", "cldf:languageReference": "samo1305", "fill-opacity": 0.5}, "geometry": {"type": "MultiPolygon", "coordinates": [[[[191.77741209014036, -14.616373298300587], [191.77740648725154, -14.616315585244738], [191.77740589559892, -14.616293063794163], [191.77740589559892, -14.457278597817705], [191.77740603197654, -14.457273111728536], [191.7774131587848, -14.457200003975233], [191.7774731542283, -14.457207192795478], [191.77749604779135, -14.457211635425658], [191.90765334478837, -14.486135479202773], [192.04084616003504, -14.51573388259092], [192.04085098142792, -14.515735079172112], [192.04091004931743, -14.51575432838815], [192.04086468231014, -14.515776984523928], [192.04084522293593, -14.515785013455496], [191.77748501232506, -14.616352716586846], [191.77748001026532, -14.616354489264168], [191.77741209014036, -14.616373298300587]]], [[[190.3222530155225, -14.164611197187446], [190.3223129339139, -14.164609808978433], [190.32233617497735, -14.164610963830443], [190.51029319501185, -14.179069196140786], [190.5102986535357, -14.179069751074387], [190.5103717155113, -14.179082496631192], [190.5103730211001, -14.17914348793308], [190.51037185369455, -14.179166774800668], [190.50261581269606, -14.279995307780942], [190.50261525776273, -14.280000766302214], [190.50260251220578, -14.280073828278928], [190.50254152090417, -14.280075133867749], [190.50251823403622, -14.280073966462167], [190.34892743424533, -14.2682592895552], [190.34892203930974, -14.268258744593803], [190.3488489276718, -14.268246182042827], [190.3488282204516, -14.26818888900002], [190.34882191553115, -14.26816673183836], [190.32226538773227, -14.164688553839085], [190.32226415938692, -14.164683217233952], [190.3222530155225, -14.164611197187446]]], [[[189.16580919637028, -14.321432671819622], [189.16584519185855, -14.321383402220162], [189.16586035217148, -14.321365524118077], [189.17979639745573, -14.305802850312581], [189.17980021005914, -14.305798791828709], [189.17990886284463, -14.305698114448635], [189.17992291523535, -14.305686851815203], [189.19130096839086, -14.29704405249774], [189.19130580102262, -14.297040573479071], [189.1913817562055, -14.29699406700405], [189.1914389298276, -14.296966847339942], [189.191462585489, -14.296957681060887], [189.30727492746544, -14.255769717181426], [189.30728027797736, -14.25576795860797], [189.30742559579647, -14.255731252764466], [189.3074435373597, -14.2557280827141], [189.32487514822176, -14.253119372784047], [189.3249940380257, -14.25310572348709], [189.32511358320698, -14.253100242753238], [189.4210202344738, -14.251990670247908], [189.42102595959545, -14.251990748922264], [189.42110419124901, -14.251997598710961], [189.42113328089087, -14.252055046080136], [189.42114265692396, -14.252077533183888], [189.42712117598873, -14.267606962913627], [189.42712299333613, -14.267612043279913], [189.42714281641796, -14.267682607345947], [189.427092954783, -14.267716482713189], [189.4270731453324, -14.26772800946215], [189.41049619347015, -14.276804155052186], [189.4104905774804, -14.276807033591792], [189.41040323285782, -14.276843383359264], [189.41033863942437, -14.276862141602823], [189.41031225581364, -14.276867578646774], [189.35303054501532, -14.286905662006944], [189.3525375441819, -14.287018012642], [189.35205841155974, -14.2871795812569], [189.35159804718998, -14.287388715505662], [189.35116115917276, -14.287643276593347], [189.35075221551793, -14.287940661149294], [189.24519541162252, -14.373184426687759], [189.24519100468888, -14.37318781193184], [189.24512693924123, -14.37323018547162], [189.24506535566616, -14.373217967111604], [189.24504256447722, -14.37321174207758], [189.23967530456295, -14.371591926758152], [189.2396706676219, -14.371590421316382], [189.23960320737413, -14.371563912198706], [189.23955701861473, -14.371541045797553], [189.23953908525556, -14.371530771272262], [189.22262848693947, -14.361325316105695], [189.22258582003818, -14.36129855062904], [189.22252133915103, -14.361255196330877], [189.16586992205248, -14.321484257251125], [189.16586548825688, -14.321480977239638], [189.16580919637028, -14.321432671819622]]], [[[187.93580439626868, -13.870433073250515], [187.93582408197588, -13.870374406775182], [187.93583298207622, -13.87035292242051], [187.94076351159, -13.859269565584727], [187.94076620003955, -13.859263937067563], [187.94081129290572, -13.859185935968368], [187.94085823806878, -13.859134831719217], [187.94087772828942, -13.859116706794401], [187.9711684958507, -13.83256861117792], [187.97117252516358, -13.83256523767245], [187.97123265602255, -13.832521270368808], [187.97128786286382, -13.832500619547295], [187.97130923034595, -13.832494192722459], [188.09375530770197, -13.798932372023561], [188.09376146818394, -13.798930863299567], [188.09392891427026, -13.798903716148523], [188.09394943294524, -13.798902100783435], [188.11657406200368, -13.797798506511256], [188.11663257427662, -13.797796829096038], [188.11672287059352, -13.797797565486839], [188.1782071690452, -13.799739875781368], [188.17821266145015, -13.799740179215219], [188.17829408549764, -13.799750309946706], [188.178351014276, -13.799763067881544], [188.17837342711277, -13.79976973554953], [188.2524494905055, -13.82389930473881], [188.25247318113358, -13.823907246876418], [188.25250953440016, -13.823920072691491], [188.38133466577497, -13.870824045264905], [188.38133995359445, -13.870826117696934], [188.38147646486394, -13.870891099422535], [188.3814924353208, -13.87090018942852], [188.55630053009617, -13.976640143887977], [188.55630520721039, -13.976643127845966], [188.55636937379649, -13.97669063041203], [188.55641085348014, -13.976736035631347], [188.556425515953, -13.976754513728386], [188.56978913577433, -13.994526955179584], [188.56979271185125, -13.994531986481778], [188.56983787925122, -13.994607340886448], [188.56985274816066, -13.994675229253096], [188.5698563587336, -13.994701309767834], [188.57134650393274, -14.008402367015604], [188.57134699313636, -14.008408263332981], [188.57134771984718, -14.008493565588882], [188.57133477180943, -14.00855798473601], [188.57132798351876, -14.008582255463127], [188.55859470929337, -14.049660550258674], [188.5585929516663, -14.049665765025047], [188.55856411684385, -14.049734662497384], [188.5585030255497, -14.049728399862422], [188.5584799889278, -14.049724361670814], [188.53540754780505, -14.045042969414832], [188.535160151225, -14.044999213296265], [188.52077672406426, -14.042826405640367], [188.52037131617803, -14.042781984982566], [188.48041645378873, -14.040047203126662], [188.48000205819835, -14.040036064067365], [188.47958816413225, -14.040059290905521], [188.45448311075074, -14.042515464411728], [188.45444993551877, -14.042518328108608], [188.45439867214478, -14.042521673810247], [188.4117051342757, -14.044737371049086], [188.411640153573, -14.044739291672762], [188.4115398792723, -14.044738154208966], [188.352251269514, -14.042522059805462], [188.35219108339453, -14.042518562553727], [188.35209851116178, -14.042509651977461], [188.23249461083338, -14.028077458167122], [188.2324645388877, -14.028073511769408], [188.2324182685963, -14.028066539827565], [188.08950210355178, -14.004755379559475], [188.08949684831958, -14.004754397698445], [188.08942342562972, -14.00473552552351], [188.08937166986107, -14.00470520208492], [188.08935288838046, -14.004692436504275], [187.95004427345157, -13.904805378000448], [187.95003867481802, -13.904801098456787], [187.9498988901647, -13.904673072595639], [187.94988345938714, -13.904656083748582], [187.94210517568362, -13.895496328808173], [187.94210182031537, -13.895492188647161], [187.94205825995886, -13.895430446126303], [187.9420384924435, -13.895374018916788], [187.94203245057415, -13.895352205281085], [187.93581723706052, -13.870514089812644], [187.93581603502471, -13.870508751762175], [187.93580439626868, -13.870433073250515]]], [[[187.20199224340647, -13.517031067910542], [187.202014311868, -13.516973585098954], [187.20202407106885, -13.516952613609122], [187.20943752073592, -13.502040784225322], [187.20944023330415, -13.502035656847747], [187.20948338577475, -13.501966821102322], [187.20954088086356, -13.501935135095136], [187.20956332374232, -13.501924835911408], [187.2263502077542, -13.494773222915114], [187.22635552022444, -13.494771112392309], [187.22643690894662, -13.494745333339978], [187.22649632595346, -13.494732911600492], [187.22652043105282, -13.494729647986349], [187.24302750133316, -13.492943773273952], [187.24303279260238, -13.492943323168989], [187.24310747733998, -13.492941956656548], [187.24316373599066, -13.492962517588362], [187.24318435575395, -13.49297165304838], [187.25254777146148, -13.497402927792578], [187.25296575137597, -13.497578018434867], [187.25339786742697, -13.49771454557641], [187.2538405699723, -13.49781138770713], [187.25429022240635, -13.497867749311764], [187.32611737917915, -13.503580084809057], [187.32653163184776, -13.503595790417156], [187.3269457617474, -13.503577125856603], [187.32735692211696, -13.503524219429032], [187.38529223952827, -13.493612647732752], [187.38573830379693, -13.493515184962016], [187.43004244731415, -13.481691895977159], [187.43006810173296, -13.481685298689516], [187.43010784559124, -13.481675780809866], [187.50669771452593, -13.46418999664027], [187.50672744961787, -13.464183533670592], [187.5067735070582, -13.464174442002754], [187.5908457714488, -13.448639515338488], [187.590891741195, -13.44863177889947], [187.59096297837928, -13.448621927418523], [187.62109366874304, -13.445026443071937], [187.6210990604536, -13.445025926118786], [187.6212429483505, -13.445021860848291], [187.62126044008906, -13.445022585371303], [187.63358239830106, -13.445845430074645], [187.63358731501017, -13.445845862543415], [187.63371666718976, -13.445865265808301], [187.6337322348422, -13.445868618851867], [187.67850364430473, -13.456593043694902], [187.67850950924847, -13.45659461253368], [187.67859216999054, -13.456623638760323], [187.67864968685737, -13.456658927683234], [187.6786706493219, -13.456674107660461], [187.6941403409015, -13.468566244223272], [187.694145370789, -13.468570339987386], [187.6942706869894, -13.468690585267947], [187.694284606529, -13.468706394806421], [187.75945513654705, -13.547449041805642], [187.75949977322006, -13.547505778634415], [187.75956458089104, -13.547596330691697], [187.7789094856773, -13.57634606310904], [187.7789121233336, -13.576350164722285], [187.7789753133792, -13.576462844990731], [187.7789821360777, -13.576477018916048], [187.78862875262965, -13.597738252802387], [187.78864951201834, -13.597786279067503], [187.78867908229668, -13.597861381061849], [187.8136332167984, -13.665350972100564], [187.81363511063356, -13.665356520794893], [187.81365726620592, -13.6654413153807], [187.81366671393525, -13.665502907876236], [187.8136686781756, -13.665527797736118], [187.81419321016466, -13.675758044851387], [187.81419336168486, -13.67576411488324], [187.81418886201837, -13.675851619764394], [187.81417124973265, -13.67591676966025], [187.81416260596995, -13.675941168817893], [187.76968964256415, -13.790906788269675], [187.7696874164021, -13.790912134704357], [187.76965127271902, -13.790983551801116], [187.7695885157384, -13.791001992912875], [187.7695644143007, -13.791007173779779], [187.75706925781225, -13.793333099795786], [187.75706315692466, -13.793334067861547], [187.756970929202, -13.793341449005643], [187.75690517614558, -13.793339517633601], [187.75687897806395, -13.793336704182634], [187.72364160044856, -13.788786078268807], [187.7234299780231, -13.788761684672107], [187.58931221462635, -13.776190556620133], [187.58888722554946, -13.776168909343692], [187.58846193496203, -13.776183477676184], [187.58803942340236, -13.776234156093741], [187.49609420016034, -13.791255510088487], [187.4960893249452, -13.791256201787478], [187.49595925939377, -13.791266619994028], [187.495943375227, -13.791266887381267], [187.47319008746092, -13.791128655856058], [187.47318429025597, -13.791128476258294], [187.47303059238558, -13.791112543788547], [187.47301211364456, -13.791109217782802], [187.45417018963494, -13.78718696734942], [187.45416493156893, -13.787185744576828], [187.4540916969187, -13.787163406017001], [187.4540409332445, -13.787130341176558], [187.45402260472653, -13.78711655200401], [187.40764378887306, -13.750346732939047], [187.40763973286911, -13.75034336501449], [187.4075847500092, -13.750291238358805], [187.4075504182746, -13.750244037231424], [187.40753846982594, -13.75022510517337], [187.3922298779217, -13.724588456402218], [187.39196920092346, -13.724195313226797], [187.3916726350753, -13.723828485267807], [187.29113554161353, -13.610821012969144], [187.29065910982143, -13.61035263980284], [187.2304128811259, -13.558568343631851], [187.2303233463814, -13.558485788592513], [187.23023977898356, -13.55839719799209], [187.20925041010508, -13.534532209803372], [187.20924635028666, -13.534527329875763], [187.20919149803686, -13.534449535762386], [187.20915895171288, -13.534390200786282], [187.20914785573504, -13.534365473991418], [187.20201646827954, -13.517108309564861], [187.20201450910696, -13.517103230801066], [187.20199224340647, -13.517031067910542]]]]}}, {"type": "Feature", "geometry": {"type": "Point", "coordinates": [188.3326533995618, -13.85462506887418]}}, {"type": "Feature", "properties": {"title": "Western Viti Levu-Yasawas Fijian", "fill": "#CC5151", "family": "Austronesian", "cldf:languageReference": "west2519", "fill-opacity": 0.5}, "geometry": {"type": "Polygon", "coordinates": [[[177.25166839659624, -17.92274691772458], [177.25165709536608, -17.922682283013458], [177.25165799328525, -17.922619134406826], [177.251659896782, -17.922596071313578], [177.2611168154847, -17.835271858638837], [177.26111757283968, -17.835266189571772], [177.26113381866327, -17.835188161359667], [177.26118052637662, -17.835143336340792], [177.26119926101597, -17.83512780251281], [177.2838882510392, -17.817332404776668], [177.28389367520487, -17.817328400481976], [177.28404777639955, -17.817233674565827], [177.28406738568955, -17.81722387361284], [177.3735691275274, -17.775994642586582], [177.37398137500676, -17.77578100318184], [177.37437202709788, -17.775530043577117], [177.37473771497682, -17.775243927945432], [177.37507528510082, -17.774925123631032], [177.43315441283016, -17.714732727968343], [177.4334880440263, -17.714349810050717], [177.43378110950346, -17.71393502271635], [177.4340305855773, -17.71349264550712], [177.4342338982943, -17.713027242621692], [177.4343889499885, -17.712543615824675], [177.43449414092387, -17.712046754904723], [177.4345483857997, -17.71154178619271], [177.4345511249479, -17.71103391967114], [177.43450233010748, -17.710528395220486], [177.42588707778367, -17.652357787829114], [177.4257957665995, -17.651893853980397], [177.42566104000508, -17.65144061994251], [177.42548410284607, -17.651002138939376], [177.42526653745273, -17.650582332260107], [177.42501028948925, -17.650184954191307], [177.42471765055404, -17.649813558442894], [177.42439123768602, -17.64947146636765], [177.4240339699609, -17.649161737258726], [177.423649042386, -17.648887140990684], [177.3918920026864, -17.628426759691703], [177.3918872349553, -17.62842351481771], [177.3918245533869, -17.628373758858967], [177.39181217969937, -17.628309710013582], [177.391809305256, -17.628285332910522], [177.38973468148762, -17.605267609348566], [177.38973432287315, -17.605262212505977], [177.38973447748558, -17.60518717747948], [177.3897721474471, -17.605138970057368], [177.3897875027495, -17.605121874259577], [177.51771885055913, -17.469833178173424], [177.51772295062815, -17.469829054125377], [177.51778853711195, -17.469772039082873], [177.5178390717917, -17.469736484593092], [177.5178603036081, -17.469723749154383], [177.62765190783256, -17.40787742401907], [177.62775068180278, -17.40782586013496], [177.62785254044329, -17.407780694344183], [177.82720806083807, -17.326884459047484], [177.82721325277151, -17.326882500392177], [177.82728541953608, -17.326861061691854], [177.82731884882926, -17.326913152638962], [177.8273301267714, -17.32693387154678], [177.8366786453857, -17.34526627487698], [177.8369069421159, -17.345667121676346], [177.8371712099708, -17.3460452222641], [177.83746919952821, -17.34639735828504], [177.83779837433025, -17.346720532391817], [177.83815593247317, -17.347011993758063], [177.83853883045708, -17.347269261493143], [177.83894380909177, -17.347490145759274], [177.83936742123876, -17.347672766411204], [177.90077227268884, -17.370847620544545], [177.90077732131755, -17.370849665802847], [177.90084426253804, -17.37088239561518], [177.9008477379716, -17.370943486772838], [177.9008474343191, -17.370966629677625], [177.86722394177636, -18.226904959295183], [177.86722359767796, -18.22691034626827], [177.86721389917918, -18.22698165935019], [177.86715630781674, -18.22696790681659], [177.8671344446186, -18.22696100261884], [177.6168644607589, -18.140533957312687], [177.61644200373274, -18.140408611432107], [177.52032003025573, -18.116442058959272], [177.51986284015723, -18.11635047226054], [177.51939910642167, -18.11630187254772], [177.51893286188005, -18.11629668246513], [177.5184681611983, -18.116334947147994], [177.5180090456164, -18.116416333829925], [177.47619791364664, -18.125857915632796], [177.4761919171831, -18.12585910341222], [177.47610554778356, -18.125869379543303], [177.47603814847503, -18.12585553649545], [177.4760130963639, -18.12584833779719], [177.3346611840928, -18.080766221841724], [177.334655993116, -18.080764429435963], [177.33458484382274, -18.080734225870128], [177.33454093655828, -18.080690485021936], [177.33452557614035, -18.080672908497085], [177.3022824596152, -18.041792285723567], [177.30227835910853, -18.041787040882895], [177.30218055199438, -18.041637728472338], [177.30217030334722, -18.04161867174885], [177.2622781897116, -17.96151089202923], [177.26227577765994, -17.96150573672825], [177.26222144841805, -17.961364285175517], [177.26221612255208, -17.96134665539175], [177.25167210785912, -17.92276186675403], [177.25166839659624, -17.92274691772458]]]}}, {"type": "Feature", "geometry": {"type": "Point", "coordinates": [177.63344238964277, -17.795599827068596]}}, {"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[166.78726225882488, -15.507664584936174], [177.63344238964277, -17.795599827068596]]}}, {"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[177.63344238964277, -17.795599827068596], [188.3326533995618, -13.85462506887418]]}}], "properties": {}}
Let's compute the lengths of the two LineStrings. Since we adjusted the geometries to be pacific-centered, we'd expect roughly equal results for both distances.
>>> centroids['Akei'].distance(centroids['Western Viti Levu-Yasawas Fijian'])
11.08486676069407
>>> centroids['Western Viti Levu-Yasawas Fijian'].distance(centroids['Samoan'])
11.401947126675251
These distances are in decimal degrees and computed in the cartesian plane. Thus, these computations only work near the equator, where WSG 84 coordinates are close enough to a cartesian plane with roughly 110km units.
While much of the data in this dataset meets the requirement of "being close to the equator", we can also compute "proper" distances, i.e. geodesic (aka "Great Circle") distances. This can be done for example using spatialite, a library that extends the SQLite database system, adding spatial capabilities. On Ubuntu, mod_spatialite - the SQLite extension module - can be installed running
sudo apt install libsqlite3-mod-spatialite
To load GeoJSON data from this dataset into a spatialite database we use the geojson-to-sqlite tool:
pip install geojson-to-sqlite
geojson-to-sqlite laotpa_languages.sqlite features cldf/languages.geojson --spatialite
We can now investigate the data, e.g. using SQLite's command-line program sqlite3
.
First, we need to load the extension module:
$ sqlite3 laotpa_languages.sqlite
SQLite version 3.37.2 2022-01-06 13:25:41
Enter ".help" for usage hints.
sqlite> SELECT load_extension('mod_spatialite');
We loaded the features of our GeoJSON into a table features
(by passing "features" as TABLE argument to geojson-to-sqlite
).
Let's see what the table looks like:
sqlite> pragma table_info(features);
0|title|TEXT|0||0
1|fill|TEXT|0||0
2|family|TEXT|0||0
3|cldf:languageReference|TEXT|0||0
4|fill-opacity|FLOAT|0||0
5|geometry|GEOMETRY|0||0
So each field from the properties
member
of features in our GeoJSON is available as column, in particular the title
field which holds the
Glottolog name of the corresponding languages. The additional geometry
column is the one spatialite
operates on.
Thus, to compute geodesic distances for the two lines of interest, we'll use spatialite's
Centroid
function to compute the centroids of the respective speaker areas,makeline
function to create the proper input forGeodesicLength
, which computes the distance between the end points of a line.
sqlite> select GeodesicLength(makeline(Centroid(l1.geometry), Centroid(l2.geometry)))
...> from features as l1, features as l2 where l1.title = 'Samoan' and l2.title = 'Western Viti Levu-Yasawas Fijian';
1225957.11115731
sqlite> select GeodesicLength(makeline(Centroid(l1.geometry), Centroid(l2.geometry)))
...> from features as l1, features as l2 where l1.title = 'Akei' and l2.title = 'Western Viti Levu-Yasawas Fijian';
1184221.69034282
spatialite
returns geodesic lengths in meters and comparing these results to the ones above (multiplied by 110,000
assuming 110km for one degree)
we find errors of about 2.5%.
In order to provide a useful dataset for language comparison, speaker areas are aggregated on
Glottolog language level. It's still
possible to drill down and investigate which ECAI shapes contributed to such a language-level
aggregation anmd to which Glottocodes these were linked. The relevant information is available from
the dataset's ContributionTable
, more specifically from the Properties
column of this table.
This column has datatype JSON, and its values are JSON objects with an optional, array-valued member Glottocodes
.
Again, this data could be easily accessed using pycldf
, which will read typed data, i.e. provide
>>> ds = Dataset.from_metadata('cldf/Generic-metadata.json')
>>> ds.objects('ContributionTable')[100].data['Properties']
OrderedDict([('COUNTRY_NAME', 'Philippines'), ('SOVEREIGN', 'Philippines'), ('Glottocodes', ['isna1241'])])
But we could also exploit the fact that each CLDF dataset can be converted to a SQLite database and use SQLite's excellent JSON support. First, load the tabular data of the CLDF dataset into an SQLite database:
cldf createdb cldf laotpa.sqlite
Now we can verify that the speaker area for the language Yopno was pieced together from shapes for its four dialects (shown below in the tree generated from Glottolog data)
Nuclear Trans New Guinea [nucl1709]
└─ Finisterre-Huon [fini1244]
└─ Finisterre-Saruwaged [fini1245]
└─ Yupna [yupn1242]
└─ Kewieng-Bonkiman-Nokopo [kewi1241]
└─ Yopno [yopn1238]
├─ Isan [isan1244]
├─ Kewieng [kewi1240]
├─ Nokopo [noko1240]
└─ Wandabong [wand1268]
using the following SQL:
SELECT
l.cldf_id AS Language_Glottocode,
l.cldf_name AS Language_Name,
gcs.gc AS Dialect_Glottocode,
c.cldf_name AS Dialect_Name
FROM
LanguageTable AS l,
LanguageTable_ContributionTable AS lc,
ContributionTable as c,
(
SELECT c1.cldf_id, c2.value AS gc
FROM
ContributionTable AS c1
JOIN
json_each((
SELECT json_extract(properties, '$.Glottocodes')
FROM ContributionTable
WHERE cldf_id = c1.cldf_id AND json_extract(Properties, '$.Glottocodes') IS NOT NULL
)) AS c2
) as gcs
WHERE
l.cldf_id = lc.LanguageTable_cldf_id AND
lc.ContributionTable_cldf_id = c.cldf_id AND
lc.ContributionTable_cldf_id = gcs.cldf_id AND
l.cldf_glottocode = 'yopn1238'
;
Let's break this down:
- The outermost
SELECT
pulls together related rows fromLanguageTable
andContributionTable
for the language Yopno, specified by its Glottocodeyopn1238
. The many-to-many relation betweenLanguageTable
andContributionTable
is mediated through the association tableLanguageTable_ContributionTable
. - Since the
Glottocodes
member inContributionTable.Properties
is array-valued, we need to create individual rows for each Glottocode using json_each. These rows are then joined to each row ofContributionTable
.
When stored in a file query.sql
and run with sqlite3
via
sqlite3 -header laotpa.sqlite < query.sql
this will compute the following result:
Language_Glottocode | Language_Name | Dialect_Glottocode | Dialect_Name |
---|---|---|---|
yopn1238 | Yopno | kewi1240 | KEWIENG |
yopn1238 | Yopno | noko1240 | NOKOPO |
yopn1238 | Yopno | isan1244 | ISAN |
yopn1238 | Yopno | wand1268 | WANOABONG |
Two variants of geo-referenced Atlas leaves are available in this dataset.
Geo-referenced images for the EPSG:4326
coordinate reference system in GeoTIFF format can be found at cldf/atlas/*/epsg4326.tif
and can be used as raster layer with GIS tools such as QGIS.
Below is a screenshot of QGIS, viewing a vector layer of Indonesia's administrative boundaries and a raster layer of leaf L001.
Images re-projected to web mercator projection
in JPEG format: The files can be found at cldf/atlas/*/web.jpg
and can be used as image overlays
on web maps, e.g. using libraries such as leaflet. Since
the JPEG images do not contain any geographic metadata, the geographic bounding boxes in WGS 84 coordinates
are provided in cldf/atlas/*/bounds.geojson
.
Thus, plugging the "bbox"
(lon, lat) coordinates from cldf/atlas/L001/bounds.geojson
"bbox": [
128.69369434997662,
-9.412595810623932,
142.39506844201907,
-0.25903341847583594
],
and cldf/atlas/L001/web.jpg
into minimal HTML as below
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<link rel="stylesheet" href="https://unpkg.com/[email protected]/dist/leaflet.css" integrity="sha256-p4NxAoJBhIIN+hmNHrzRCf9tD/miZyoHS5obTRR9BMY=" crossorigin=""/>
<script src="https://unpkg.com/[email protected]/dist/leaflet.js" integrity="sha256-20nQCchB9co0qIjJZRGuk2/Z9VM+kNiyxNV1lvTlZBo=" crossorigin=""></script>
</head>
<body>
<div id='map' style="width: 600px; height: 400px;"></div>
<script>
const map = L.map('map').setView([180, -96], 3);
const osm = L.tileLayer('https://tile.openstreetmap.org/{z}/{x}/{y}.png', {
maxZoom: 19,
attribution: '© <a href="http://www.openstreetmap.org/copyright">OpenStreetMap</a>'
}).addTo(map);
const imageUrl = 'cldf/atlas/L001/web.jpg';
const latLngBounds = L.latLngBounds([[-9.412595810623932, 128.69369434997662], [-0.25903341847583594, 142.39506844201907]]);
const imageOverlay = L.imageOverlay(imageUrl, latLngBounds, {
opacity: 0.8,
interactive: true
}).addTo(map);
L.rectangle(latLngBounds).addTo(map);
map.fitBounds(latLngBounds);
</script>
</body>
</html>
the resulting page - overlaying the Atlas leaf on OpenStreetMap base tiles - should display as follows in a browser:
Most of the data in this dataset has been collected, transcribed and curated by multiple people. Thus, there's no doubt it will (still) contain errors. But the dataset creation pipeline in this repository has maintenance hatches built in, so we can fix errors and recreate and release the dataset easily - and will do so periodically. Any errors should be reported at https://github.com/cldf-datasets/languageatlasofthepacificarea/issues in particular:
- incorrect assignment of Glottolog languoids. (It should be noted, though, that "correctness" means fidelity to the source, not necessarily "reality". So if the source claims that a language, that can be clearly matched to a Glottolog language, is spoken in some location but this is factually not the case, the dataset would still be correct in relaying Wurm & Hattori's claim.)
- incorrect digitization of the source, e.g. mis-labeled polygons.
A good way of investigating fidelity of the digitization to the source (Wurm & Hattori's Atlas), is browsing maps where the polygons are overlaid over geo-referenced scans. Such maps can be created running a cldfbench subcommand provided with this repository.
After installing the commands via
pip install -e .
you can run
cldfbench laotpa.browser
to create a set of HTML pages in a subdirectory language_atlas_of_the_pacific_area
that can be navigated pointing your browser to
language_atlas_of_the_pacific_area/index.html
.
The index page provides a list of Atlas leaves with clickable titles leading to the individual pages, and equally clickable polygons on a map, depicting the extent of the geo-referenced area on the corresponding leaf.
Pages for individual leaves provide a list of languages on the right hand side, with clickable names to highlight the speaker are on the map. Both, the polygon layer as well as the image overlay can be toggled using the layer control on the map. Opacity of the image overlay can be controlled with the corresponding range slider. Clicking on a polygon will open a popup window listing all language (variety) names given in the ECAI data for shapes aggregated under the given Glottolog language.