Skip to content

X03. London prototype: Open data assessment for London UK 2021

Polly Hudson edited this page Nov 16, 2022 · 1 revision

Currently being edited by polly@64

Introduction

In this section problems with availability of open spatial data relating to each of the 12 core data categories, stock composition, operation and dynamics, required for inclusion in the Colouring London prototype is discussed. These address. In the Table below, the availability of open spatial data types, at building or sub-building-level for Greater London, is shown. Information on access is divided into:

  • academic access only (facilitated by public sector mapping agreements)
  • commercial data/paid-for access
  • open access for London
  • open access for the UK

Blue cells represent open data availability and red cells, absence or restrictions. The number of datasets marked in red, in the third and fourth columns, illustrate the scale of the task faced when building open data platforms for London, and other cities in the UK. Where partial access is available, this is recorded in pale blue. Where commercial access is not relevant, as data are released already, or if data access is unknown, the box is marked in grey. Sources for, and access problems relating to each data type for London are discussed in the remainder of the chapter.

Chapter open availability open availability 2

1. LOCATION. Streetnames, postcodes, coordinates and addresses

Overview

Significant issue exist in terms of accessing open spatial location data for the UK. Though some progress has been made, restrictions continue to be imposed on building footprints and address data and to block progress in urban problem solving.

OSMasterMap (OSMM) is an Ordnance Survey product provides 100% footprint coverage in for Britain, geometric precision, systematic updating, and the tracking of footprint lifecycles (Ordnance Survey, 2021b). In the UK, footprints are contained within the OS MasterMap Topography Layer product (OSMM), overseen by The Department for Business, Energy and Industrial Strategy (BEIS) and managed by Ordnance Survey, the UK’s national mapping agency (Ordnance Survey, 2021a). In the UK comprehensive open footprints are not available. Since 2016 permission has been given by Ordnance Survey for OSMM to be used to test the Colouring London prototype. ADD more info on agreement. For this a separate agreement was also set up with The Greater London Authority which offered access to its OSMM licence. A similar arrangement is being explored with OS and government departments with regard to the roll out of Colouring Britain. OSMM footprints for London are the basic building block from which Colouring London is able to be built. Without these comprehensive, updated spatial data on current current and historical stocks cannot be collated, captured, visualised or released. Nor can spatial patterns relating to composition and change be easily understood - with the use of point data in stock visualisations offering only limited scope.

Colouring London's collaboration with Ordnance Survey has allowed the platform to demonstrate how and why the open release of comprehensive, updated national mapping agency data at building level is essential to meet urban sustainability and resilience goals. The project also shows how the release of footprint data has the capacity to unlock vast repositories of currently untapped knowledge about the city's buildings, held by stakeholders, and willing to be donated, for the common good. Though Colouring London is permitted to visualise spatial data using OSMM, and since 2020 has been allowed to offer open downloads of data with a UPRN linked centroid coordinate, it still cannot access government held UK address data, nor is it able to release footprint geometries, or fully exploit these using computational approaches to infer many types of building characteristic.

In terms of UK restricted address data a comprehensive database is held by OS and addresses released within its AddressBase commercial products (Ordnance Survey, 2021f). The database draws from the National Address Gazetteer (NAG), run by Geoplace, which is owned and managed as part of a joint venture partnership between the Local Government Association and OS (Geoplace, 2021a). Access to this is restricted, despite the fact address data are publicly owned, and, as Geoplace’s strapline states, 'underpins our lives' (Geoplace, 2021b). Though street geometry and street names for Greater London are available through OpenStreetMap, with open postcode data for the city available from OS Code-Point Open (Ordnance Survey, 2021e) and open addresses from OpenAddresses (OpenAddresses, 2021), these are neither comprehensive nor regularly updated.

As a result of the work of the open data movement, including Wikipedia, OpenStreetMap and The Open Data Institute, and sustained pressure from built environment stakeholders and others on the UK government to release location data and footprint geometries data, in 2017 the UK Treasury announced that, owing to the potential significance of OS MasterMap to the UK economy, its premium product, OSMM would begin to be opened up for public use. In 2019 The Geospatial Commission, comprising OS and other holders of large-scale spatial datasets, was also set up the government, to increase use of geospatial data. Further information on the opening up of UK location data can be found [here](https://www.ordnancesurvey.co.uk/business-government/public-sector-geospatial-agreement https://www.geoplace.co.uk/new-freedoms-to-share-uprns-and-usrns).

In 2020, restrictions were lifted on centroid coordinate data for building footprints which are attached to Unique Property Reference Numbers (UPRNS), allocated to local authorities by Geoplace. These provide unique identifiers allowing properties to be tracked over their lifecycle (Geoplace, 2021). UPRNs are also used in OS’s AddressBase products. In 2020, OS released Open UPRNs under an Open Government Licence, along with Open TOIDS which provide the unique ID for OSMM polygons/footprints (Ordnance Survey, 2021g). This has meant that centroid coordinate references for each building in the Colouring London prototype, can now be linked to capture and release attribute data collected by the prototype using OSMM footprints.

Chapter 6 Address base image

Proposed location subcategories

  • Address - Building Name, numbers, street name, town, postcode

  • Ordnance Survey TOID (UK national mapping agency Topographic Identifier/ building polygon ID)

  • Unique property reference number (UPRN) (unique UK government property ref)

  • OpenStreetmap ID

  • Latitude (Polygon centroid)

  • Longitude (Polygon centroid)

Relevant Links

https://colouringlondon.org/view/location https://github.com/colouring-london/colouring-london/issues/565 https://www.pages.colouring.london/location

Examples of location data uses in relation to building attribute data provision

  • Address data and UPRNs allows datasets to be linked;
  • Building numbers are also important in crowdsourcing to help users colour-in the correct building when adding data;
  • Building footprint polygons provide: • Spatial ‘filing cabinets’ allowing geolocated data to be captured, collated and visualised;
  • Clearer and more dramatic colour visualisations than possible with point data;
  • Ground floor outline plan data • Ground floor area data;
  • Footprint geometry data;
  • Building perimeter data;
  • Total floorspace area data (when combined with data for other floors)calculation;
  • Baseline data to track footprint change over time;
  • Baseline data to assess capacity of buildings to adapt within plots;
  • Baseline data to measure accessibility from street;
  • Building adjacency and streetblock structure information (when combined with streetblock data);
  • A rough metric for counting buildings (In OSMM multiple polygons may be attached to a single address for a lage estate/building) • Data to infer internal layout (with age data) • Data to infer number of storeys (with age and height data);
  • Data to infer total floorspace (with age and storeys data);
  • Data to infer dynamic tissue type (with land use and adjacency data).

2. USE. Open data for London on land/building use

Next to location data, land use data, which describe current activities in buildings, is the most commonly used building attribute data type. Land use data are required for example for planning, property development, property taxation, housing, employment, transportation, recreation and culture; and are critical for land use policy development, as well as forming the basis on which policy is applied to individual cases (Bibby and Shepherd, 1997).

However despite high demand, no comprehensive, detailed, updated and address-linked open land use database, at property level, is provided by the UK government. Several currently restricted, government sources do however exist. These are OS’s Addressbase, the VOA’s Non-domestic rating database, and the Planning Portal (MHCLG, 2021a). Other relevant open, non-comprehensive sources for London include OSM (OpenStreetMap Wiki, 2021b) (open); OS OpenMap – Local (see below); the Cabinet Office for government owned properties, recently published under an OGL (Cabinet Office, 2019); and independent organisations e.g the Church of England (Church of England, 2021). Geomni UK also hold commercial datasets for around thirty land use classes for London (Geomni UK, 2021).

In OS’s Addressbase range, Addressbase Premium is the most detailed land use product (OS, 2021f). An illustration of ‘fourth-tier’ classification is given in Figure 6.1. For university use, only the basic Addressbase layer can be accessed immediately, with Addressbase Premium and Addressbase Plus available by request. Open land use data on hospitals, cultural facilities, schools, leisure grounds and sports buildings, emergency services, transport hubs, and some religious and retail buildings, have been available via OS OpenMap – Local since 2019 (OS, 2020). The second major UK land use data source is VOA. 93% of VOA property tax records are classified as domestic, so where non-domestic use is known, domestic use can be inferred. Data not however collected on tax-exempt properties such as welfare, religious or agricultural buildings (HM Revenue and Customs, 2021c). Non-domestic land-use data at property level can be downloaded from the VOA, but only for tax-related work (HM Revenue and Customs, 2021b), otherwise these must be purchased. VOA provides summary statistics showing the proportion of different types of non-domestic properties in London (ibid.) There are 369 VOA ‘SCat’ codes representing different land uses (HM Revenue and Customs, 2021b).

Land use data are also held within the Planning Portal. Restructuring of planning data is discussed under ‘Planning’ below. The London Building Stock Model represents the most comprehensive, sophisticated and detailed land use database for London, and is the only database to provide detailed information on mixed use. Methods developed within 3DStock to represent the relationship between buildings, ‘premises’ and ‘self-contained units’ (SCUs) mean that the LBSM can provide more accurate estimates of domestic, non-domestic and mixed-use floorspace available for the capital than ever before. In the image below non-domestic, domestic and mixed use land use data visualised for London, by Steve Evans, at SCU level for the Greater London Authority. Blue tones indicate a greater proportion of domestic use, and red, of non-domestic use. Steadman et al. estimate there are around 250,000 non-domestic and mixed-use premises in London, and around 1.5 million houses and 1.9 million flats (Steadman et al., 2020). Mixed use and non-domestic buildings can be seen clustering along high accessibility, older routes and at the historic core. Land use data from the LBSM cannot however be accessed as the model is built using data from restricted sources, including OS, VOA and Geomini UK.

Chapter 6 steve

In 2019, a UK National Asset Register (NAR) was proposed to collate public-sector property and land use data with socio-economic data, and to align the work of OS, the MHCLG, the Land Registry, the Geospatial Commission and others, with data release proposed under an OGL (UK Authority, 2019). At present it is still unclear exactly what data will be released and when.

Colouring London offers the opportunity to begin to address problems with land use data access by combining the four methods of data generation summarised in Chapter 4: existing bulk uploads (referred to above); computational generation (see Chapter 7) ; live streaming (see ‘Planning below), and crowdsourcing (se Chapter 6), where the model presented by the first Land Utilisation Survey of Britain, in which land cover data were crowdsourced from schools, in the 1930s (EDINA, 2021b), is also relevant.

Issues in relation to land use also exist with regard to classification systems, with a number operating simultaneously in the UK. The most significant are the ‘use class’ system used in UK planning (MHCLG, 2020a); VOA SCat codes; and OSMM Addressbase classes used in its national mapping products. In 2006, the government looked to harmonise these fragmented systems through the development of the National Land Use Database (NLUD) (LandInform, 2006). The aim was also to move towards greater compatibility with the Eurostat land-use classification system and the Land Use/Cover Area-Frame Survey (ibid.). Though NLUD classification remains unimplemented, harmonisation is a current area of focus for MHCLG. The NLUD system is tested owing to its logic and compatibility with European systems but requires cross-referencing with other UK classification systems.

3. TYPE. Open data for London on building typology

Open spatial data relating to ‘Type’ are rarely available; i.e. in relation to: base 3D form; local architectural style/period; original use; adjacency, number and type of ‘dynamic mechanisms’ and ‘parasite’ forms. In the Colouring London prototype, in line with Philip Steadman’s approach, activities are clearly differentiated from physical form, with the former placed under the main category heading ‘Land Use’ and the latter under ‘Type’. Original land use is also important to collect as it is a good indicator of 3D form especially where age, footprint and height data are also available.

Individual open datasets on building types with a specific 3D base form, or from a specific architectural period may exist; for example data on churches (as referred to above) or on tower blocks, as held by the ‘Tower Block UK’ database run by the University of Edinburgh. However until an open database for London is available, able to collate and visualise data, exactly how many datasets exist cannot be known. The VOA holds the largest and most detailed spatial databases in the UK for the domestic and non-domestic stock. Non-domestic properties are classified by SCat codes which provide information on current activities not built type and original use. Data for domestic properties are held with the VOA’s Property Details database. This was set up in the 1970s ‘to provide a simple system for understanding the main features and attributes of a property’ for taxation purposes (HM Revenue and Customs, 2014h).

Bulk uploads were carried out for England in 2003–4 (ibid.). Subsequent updates are not generated through national surveys but via local authority notification to the VOA of demolition, new build or alterations, or through the VOA’s communications with builders, developers and/or the public (ibid.). Fourteen domestic property attributes are collected, some of which are mandatory and some optional. Data on ‘group’ i.e. information on general architectural style, period and typology characteristics is mandatory (HM Revenue and Customs, 2014a, 2014b, 2014c). The VOA notes that, ‘Architectural style and characteristics are nearly always more important than size or accommodation’ (HM Revenue and Customs, 2014b). Detailed typology classes mix local architectural style with size, e.g. ‘small basic’ and ‘larger basic houses’; small and large ‘villa types’; ‘three or more storey houses’; and ‘substantial Georgian or Regency town houses’ (HM Revenue and Customs, 2014a). Each type is allocated a code. The VOA database also contains images (HM Revenue and Customs, 2014g). To facilitate rapid accommodation of future releases of VOA data, in line with property tax release in other countries, into the prototype platform, a VOA typology sub-category is also proposed.

Another mandatory ‘type’, category relates to adjacency, the main subcategories being ‘cluster’, ‘detached’, ‘semi-detached’, ‘end of terrace’, ‘mid-terrace’ and ‘bungalow’. Adjacency data are also available from Energy Performance Certificates (EPCs), which are collated by MHCLG from surveys by registered assessors. Detailed EPC data, relating to sold, rented or converted domestic properties and new build in London began to be released from 2020, as a result of pressure from the PropTech industry (MHCLG, 2019c). Current and historical EPCs, and Display Energy Certificates (DECs) (required for buildings over 250m2 used for public services where public access is permitted), are now available from MHCLG under an OGL (MHCLG, 2020a and 2020b). Though data are not comprehensive for London, this source, will over time, (owing to the number of property sales occurring), provide an increasingly rich resource. EPC data are able to be uploaded onto open building attribute data platforms, using addresses provided by Royal Mail to locate the data, provided they are used in such a way as to support the improvement of energy performance, which Colouring London does.

A quick visual overview of the spatial distribution of adjacency types in London may also be gained using open 2011 Census data (Office for National Statistics, 2011), available from the Office for National Statistics, released at Lower Layer Super Output Area (LSOA) level. This is the smallest geographical unit at which these are released with each LSOA representing 400 to 1,200 households (Office for National Statistics, 2021c). These are shown in Figure 6.3. Most London residents are recorded by the Census as living in terraced houses (26%), followed by semi-detached houses (23%) and then purpose-built flat (31%). In Inner London, most residents (64%) live in purpose-built flats. In terms of ownership, 86% of terraced and semi-detached dwellings in London are owned privately, (Office for National Statistics, 2011) meaning that for much of London’s suburban ‘static’ tissue, residents are likely to control decisions to extend and adapt. Where footprint and address data are open, adjacency can also be inferred from the number of addresses located in footprints and the number of shared sides. Related algorithms were published by Orford and Ratcliffe in 2007, building on previous findings that dwelling types have distinct geographies (Orford and Ratcliffe, 2007). For example, a footprint with no shared sides with another property, with one address, can be used to infer a detached house; with fifty addresses, a block of flats; and with one shared wall and one address, a semi-detached house. In 2010, Smith and Crooks applied Orford and Ratcliffe’s method to London using OS’s restricted OSMM and OS Addressbase products, generating striking colour-coded visualisations at building level (Smith and Crooks, 2010). They also showed that multiple addresses, when matched with terraced houses, could also be used to infer conversion. This presents a potentially interesting indicator of adaptability, i.e. whether a single building can be easily splt into multiple units and reformed into a single unit again. An adjacency algorithm is also used in the LBSM with adjacency data kindly supplied by UCL Energy Institute to Colouring London as discussed in Chapter 8.

Significant potential is considered to exist, beyond the scope of this study, with regard to large-scale computational generation of typology datasets as discussed in Chapter 4. Also of relevance is the application, by Law et al., of Computer Vision to Google Street View façade images (Law et al., 2019). Discussion with Law has however raised the important point that care must be taken, when deriving data from commercial sources, not only in relation to possible infringement of copyrights, but also to possible future restrictions on currently accessible data. In the next chapter, manual capture of typology data, through collection of original use data for around 20,000 buildings in London, is tested.

Chapter 6 spatial distribution types

4. AGE. Open data for London on building age

Building age data are one of the most important types of data necessary for the analysis of resilience and sustainability in urban stocks and the generation of 3D procedural models. However until 2020, no significant open sources of age data, at building level, were available for London. In rare cases, small open spatial have been were identified, for example within local authority GIS systems relating to conservation areas, but here data could not be collated owing to use of different age intervals.

Since 2020, the largest source of open age data has been MHCLG through its release of EPC data. This however, as noted above, is not comprehensive. Data are grouped into twelve age bands (MHCLG, 2020a and 2020b). Owing to the recentness of the release data have not been assessed in this study and quality is unknown. Open age data are also available for over 19,000 designated properties in London on the National Heritage List for England (Historic England, 2019a), though owing to formatting issues dates are difficult to extract as discussed below.

Once age is known, information on building regulations and byelaws of the time, technological inventions in materials and construction, and architectural fashions can all be easily attached. The more precise the age data the more detailed an understanding of the form of the building will therefore be. Building age collected by year allows for data to be grouped in an unlimited number of ways, and for comparison of age data cross systems. However the smaller the age interval required, the harder the task of dating becomes. When dating at year level, expert input from building historians and access to archive data are required. Age intervals also do permit the visualisation of complex age patterns.

In terms of restricted data, Geomni UK, provides charged-for data for London, available for six age bands (Geomni, 2021b). The VOA’s also holds data on the age of most properties in London and the UK, with age a mandatory collection category (HM Revenue and Customs, 2014d, 2014f). Despite their importance, VOA age data at property level are not available to academia, (with the LSBM reliant on commercial Geomini UK data). As far as can be understood, even MHCLG continues to use age data collected with small housing surveys in government housing analysis, rather than draw from VOA’s vast database. Twelve age bands are used by VOA (HM Revenue and Customs, 2018), compared to five bands in MHCLG’s English Housing Survey (MHCLG, 2020d).

VOA data are covered by the Commissioners for Revenue and Customs Act 2005 which allows data to be shared only where there is a link to the delivery of its functions, unless there is legislation that explicitly provides grounds for sharing the data (ONS, 2017). In 2015, following earlier correspondence with VOA (HM Revenue and Customs, 2014), a Freedom of Information request was submitted by the author requesting access to property level age data for the London Borough of Camden for academic purposes only (see Appendix). The VOA rejected the application, arguing that ‘the construction date at individual property level, may be used to deduce the specific property and from that, can be used to identify a taxpayer’. VOA also noted that it ‘does not necessarily need to hold a precise “construction date”’. Open data platforms therefore need to demonstrate to the VOA, and commercial providers, how contributing age data could benefit their business models.

The age map below uses aggregated VOA domestic age data, released at Lower Super Output Area level for London, and accessed from the Consumer Research Data Centre to provide a general overview of the spatial distribution of building age in London. Surviving pre-1900 London domestic stock (primarily Georgian and Victorian terraces), coloured seagreen, can be seen mainly located in Inner London (the main area of London’s pre-1900 growth). At the Inner-Outer London boundary (shown in black), a pale-blue ring represents Edwardian housing, built between 1901 and 1918. Beyond this, in pink and lilac, suburban semi-detached and small-plot ‘interwar’ detached housing can be seen comprising most of Outer London (1919–1939). This also corresponds with spatial distributions for domestic typologies shown in Figure 6.4. During the interwar period, London’s built-up area almost doubled in size, with around half a million suburban semi-detached and small detached houses built; creating large amounts of ‘static tissue’ in outer London within just twenty years (Greater London Authority, 2015). From 1955, clustering around the Outer London boundary and infill development, shown in red and orange, illustrates the impact of spatial constraints introduced by the Green Belt. Significant domestic infill since the late twentieth century is also visible, including large areas of regeneration of industrial areas of East London.

Chapter 6 VOA age data

VOA data (HM Revenue and Customs, 2018) show 31% of all surviving properties as built before 1919; 24% between 1919 and 1939; and 44% in the seventy or so years since 1945. Seventy per cent of surviving pre-1918 domestic properties are located in Inner London, compared with 30% in Outer London. By contrast, over 80% of surviving interwar properties are located in Outer London. For the post-war years when development was spatially constrained by the Green Belt, slightly more surviving properties are located in Outer London. The same is true for the late twentieth century (1983–1999) and for the twenty-first century.

Discussions with UCL Energy Institute also identified the need for more precise dating of major refurbishments and extensions. However, issues exist with the capture of these data using OSMM footprints, especially for small plots where later extensions are often subsumed within the main polygon. An ideal situation, in the longer term, would be to have open OSMM polygons subdivided by building stage, and 3D open procedural models with later mutations and accretions colour-coded by date.

In Section x the manual collection of age data for over 20,000 buildings in London is described. In section x a semi-automated method of using historical network data to infer building age is tested, generating 750.000 open age data entries. Through this process the need for expert input from historians is confirmed. In section x the way in which Colouring London's features are designed, tailored to encourage and support London’s network of local historic environment experts to add, check and enrich age data is also discussed.

5. SIZE. Open data for London on building size

Data on building size, and particularly on floorspace, are also sought after e.g. for energy modelling, planning, housing analysis, and property valuations and sales. For an approximate measure of floorspace, data on number of floors and footprint area are needed. Some floorspace data can be extracted from EPCs and DECs, but in EPCs these may relate to flats or maisonettes not whole buildings. Open storey data are not available other than domestic storey data are available from EPCs for apartment blocks (MHCLG, 2020a).

Restricted storey and floorspace data for London are held by VOA (Office for National Statistics, 2017), and are mandatory domestic data collection categories (MHCLG, 2014d). Both are also available commercially from Geomni UK (Geomni, 2021b). In the LBSM, storey data for domestic building, and for non-domestic buildings for which VOA data are not available, are generated by dividing total building height (generated using LiDAR and OS charged-for height data) and assuming an average floor-to-floor height, ranging from 2.7m to 4.2m depending on the land use/activity involved, (based on the four towns study referred to in Chapter 4). From this floorspace can be generated (Steadman et al., 2020). Height is a rare example of a (virtually) comprehensive open building attribute dataset available for London. This is released by the UK’s Environment Agency under an OGL (Environment Agency, 2020; 2021). The Agency’s LiDAR archive also includes point cloud data. LiDAR does not however provide data on storeys, and attics and basements. (Basement data are collected by VOA and by Geomni UK). Computational generation of storeys through inference using building footprint, age and height data is considered feasible and is proposed as future work linked to procedural model generation. In Chapter 6, manual collection of storey data is tested. The estimation of floor space and storeys using archive data and manually vectorised historical footprints is also experimented with in Chapter 7.

In terms of other dimensions, as listed in Table 6.1, plot frontage may be possible to generate using INSPIRE land parcels. Data on window and door opening sizes is considered particularly difficult to collect, and like land use may require a combination of approaches.

6. CONSTRUCTION. Open data for London on materials and construction systems

Open datasets on building materials and construction systems for London are also not available, though partial data for London (as for other types of attribute data) may exist in restricted databases held by insurance and mortgage companies, and for new build by architectural firms. Open data on roof shape and materials are available for some properties through EPCs (MHCLG, 2020a). Commercial materials and construction data are also available for London from Geomni UK (Geomni, 2021b).

However, materials and construction systems can be relatively easily inferred for most of the stock domestic buildings built before the Second World War, which form the bulk of London’s stock (HM Revenue and Customs, 2020). The VOA, in its 2014 instruction on data collection on materials and construction for domestic properties, states that ‘unless otherwise stated in the group classification, always assume that the dwellings are constructed of standard or traditional materials, such as brick or stone with a tile or slate roof’ (HM Revenue and Customs, 2014e). In London, buildings were generally built of wood up to the Great Fire of London in 1666, other than religious buildings and grand institutions which, as shown by the National Heritage List for England, were often built of stone (Historic England, 2019). After the Fire, regulations banned the use of wood as an exposed construction material, leading to a shift from wood to brick construction (with timber concealed in floors and roofs) during the Georgian (Summerson, 1945) and Victorian periods (Muthesius, 1982). For pre WW11 buildings, computational generation of materials and construction system data based on building age, with crowdsourced verification from historic building experts, is therefore proposed. For the post-war period, when there was much more experimentation with new building materials and construction systems (White and Nowicki, 2018), inferring these attributes from age is more complicated. Where a clear relationship between age, typology and construction type is known to exist, and age and typology are known, data may be able to be auto-generated (e.g. in relation to specific types of reinforced concrete systems and specific ages of post-war estate). However now opportunities also exist to harness, through crowdsourcing, knowledge held by architects, surveyors and engineers to develop highly accurate datasets.

7. STREETSCAPE. Open data for London on plots, streets and land ownership

Plot and land parcel geometries

Open land parcel geometries, provided by HM Land Registry in its INSPIRE Index Polygons spatial database, can be downloaded for London under an OGL (HM Land Registry, 2021c). The INSPIRE database was produced to comply with the EU INSPIRE Directive of 2007 which looked to harmonise spatial data across European borders (European Parliament, 2007). UK INSPIRE Polygons combine HM Land Registry information with spatial information from OS. INSPIRE Polygons differ from those used in European cadastral systems in that geometrical descriptions do not define ownership but are instead indicative of ownership boundaries that are recorded in text (Grover, 2008). Kropf discusses the way in which the UK Land Registry is unusual in that both parcels for both freeholds and leaseholds are shown (Kropf, 2018).

Kropf notes how much ‘coarser’ Land Registry parcels are than those for physical boundaries around buildings – defined, for example, by walls or fences, as shown for in the London Borough of Camden in the figure below ‘While the outer boundary of the larger ownership may correspond to some physical features, the area includes individual physical plots and buildings with no corresponding boundary of freehold ownership’ (Kropf, 2018, p. 6). He also explains how freehold parcel boundaries can often cut across physical boundaries, whereas leasehold boundaries generally correspond to them, and how legislation giving social housing tenants the right to purchase the freehold of property in the UK can lead to ‘shotgun’ patterns of individual plots within large parcels (ibid.), as indicated in Figure 6.5 (bottom left). Measurement of building adaptability within London plots requires access to both INSPIRE open polygons, and to data on physical plot boundaries.

Interestingly, in its domestic property tax guidance, the VOA highlights the importance of collecting data on plot size: ‘Whilst it is no longer mandatory to collect plot size, every effort should be made to do so. Plot size is an important factor in the value of a dwelling’ (HM Revenue and Customs, 2014d). Physical plot boundary data are not available. Even under an OSMM licence, these cannot be accessed as individually vectorised layers. In the below image (top right), physical boundary lines are shown in green, extracted from the OSMM Topography Layer using the ‘topographic line’ descriptive group and the ‘general feature’ descriptive term. Gaps in these outlines separating the plots can be seen. These represent missing segments of party wall between buildings. These missing data, are held separately in OSMM under the ‘building’ descriptive term for ‘buildings’, shown in the purple outlines (bottom right). Full plot boundary lines are only revealed when these two features are combined.

Chaper 6 plot data image

It is unclear whether this boundary line data might be considered by OS for open release and discussion with OS is needed. Computer Vision approaches to feature extraction from historical maps, combined with INSPIRE parcel data, offer a possible alternative.

Open data for London on land ownership

Ownership data provides information on whether decisions on buildings, and change to them over time, are being/have been made by private individuals, private companies, independent public institution, charitable trusts, government/local government bodies etc. The importance of information on ownership type (rather than actual owner) was touched on in Chapter 4 in relation of dynamic tissue types, and is further discussed in Chapter 7. While INSPIRE land parcel geometry for London are available from HM Land Registry, (a non-ministerial department founded in 1862), free of charge, data on owner/ownership type is charged for (HM Land Registry, 2021a). As a consequence, the spatial distribution of ownership types and their relationship with the form of buildings and the way buildings adapt within their plots, is poorly understood. The two main websites providing open spatial data for London on landowners are Who Owns England (Shrubsole and Powell-Smith, 2021), screenshots from which are shown in Figure 6.6, and Private Eye’s website, set up in 2015, which focuses on land in London owned by offshore companies (Private Eye, 2021). As noted in section 5.8, data on government land holdings are available as open data, with freehold and leasehold information also collected in EPCs.

Chapter 6 ownership

Open data for London on street networks and pavements

Street network centrelines and widths, and pavement widths, were identified in Chapter 4 as necessary for the analysis of accessibility, density and adaptability, and to help infer 3D form. Open street centreline data are available for London and the UK from OSM along with many different types of network attribute (OpenStreetMap Wiki, 2021c). Centrelines are also released as part of OS OpenMap – Local (OS, 2019b; 2020). In Figure 6.7, street network area is shown within turquoise outlines in the OSMM Topography Layer, using the ‘topographic line’ descriptive group and the ‘streets and tracks’ descriptive term. Research by Nicolas Palominos into the computational generation of open street area, and pavement width for London is currently underway (Palominos and Smith, 2019).

Chapter 6 street networks

9. PLANNING. Open data for London on planning and short-term dynamics

Data sub-categories for new build, demolitions, designation are captured under ‘Planning’, as in the UK planning system, at least some information for each of these areas will be held within planning applications.

Open data for London on new-build

Owing to development in London being constrained, new construction can only be generated in one of three ways: as whole new buildings on greenfield sites remaining within the Green Belt boundary; as whole new buildings on brownfield sites, normally involving demolition of older stock, (which may also involve the merging of or splitting of, land parcels); or as accretions to existing buildings. ‘New build’ generally refers to the first two types of development. Spatial records for these are captured within the UK Planning Portal (MHCLG and Terraquest Ltd, 2021a), where they can be viewed and downloaded. New build data post-1999 are also available, annually, for London from the Land Registry’s price paid registry (HM Land Registry, 2021b). Data on accretions, and incremental development in plots will only be picked up by planning authorities where planning permission is required, or where these are captured as (unpublished) records by building control departments.

Between 2004 and 2020, approved planning applications for London involving new residential buildings, loss or gain of dwellings, 7+ bedroom care home or hotel accommodation, or large-scale non-domestic development involving change to a planning land use class were mapped by the GLA within its London Development Database (LDD) (London Development Database, 2017), illustrated below.

Chapter 6 LDD

In 2021 the LDD was replaced by the GLA’s ‘Planning Data Hub’ which was launched to provide data on all development in London, and includes a live feed from all 33 London authorities (Greater London Authority, 2021a and 2021b). The GLA and MHCLG state that they ‘have prioritised this work because of the vast number of uses a public stream of planning application data can support’ (Greater London Authority, 2020) The Planning Data Hub, represents planning applications as point rather than polygons data.

Chapter 6 planning

Below examples of a London local authority (the London Borough of Hackney), mapping planning applications by site geometry, and mapping both current and historical planning applications, are illustrated (London Borough of Hackney, 2020). Current planning applications are shown in red and historical applications in blue, August 2020 (LBH, 2020).

Chapter 6 planning2

The GLA’s creation of the Planning Data Hub has been driven by an awareness of the need for greater access to, and automation of London’s planning system. The Hub, at the time of writing, is still under development. In March 2019, the author began discussions with Peter Kemp (Hub Project Lead) and Paul Hodgson (GIS Infrastructure Data Manager) regarding the transferral of current and historical planning applications, stored as PDFs, into machine-readable files.

The idea of live-streaming structured planning data into the Colouring London platform was proposed, to help increase transparency in the planning system, accessibility of data and to promote joined-up working. This included the author’s idea of integrating an easy-to-read live streamed ‘traffic-light’ system into Colouring London, to visualise the stages of planning permission a building has reached (i.e. submission, approval, appeal, completion), influenced by Peter Larkham’s work. It was also noticed that Colouring London could be used to crowdsource actual start dates of construction (as projects may in some cases have been approved but be delayed for several years), and actual completion dates. This would allow actual change at building level – not currently captured by the planning system – to be visualised. Each OSMM footprint was also identified as able to be quickly linked with its planning portal address. This package of work was funded in 2022 by Loughborough University.

Open data for London on demolition

Spatial demolition data, as previous discussed, are essential to ensure all attribute datasets are kept up-to-date; to enable typology loss and survival to be assessed; to quantify flows of energy and waste in London, and to provide baseline data required for lifespan assessment. Information on applications to demolish are also important for local communities looking to conserve and reuse local buildings wherever possible. Of the little demolition data currently available for London, none provide open up-to-date, comprehensive, spatial information necessary for these purposes. (As discussed in Chapter 3, absence of spatial demolition data is common to many countries). The most accessible type of spatial demolition data for London is for large developments for which data are collected by the GLA, through the Planning London DataHub (and since 2004 through the LDD).

Under current UK law, the demolition of buildings does not require planning consent, unless the building is designated, unsafe/uninhabitable, or, (only applied recently), if it is a pub or drinking establishment, concert hall, theatre or live music venue (MHCLG and Terraquest Ltd, 2021b; MHCLG, 2020a). Records of demolition are also known to be held by the Health and Safety Executive, though again these sources are not publicly available. Confusingly, demolition data, collected by local authority Building Control departments as part of the issuing of building permits, are not integrated with planning application information. Demolition permits are required for demolition under the Building Act of 1984. These do not record loss of typology, use or age, or quantify the volume of material removed, but inform utilities regarding cessation of supply and notify neighbours of potential dust and dirt (MHCLG and Terraquest Ltd, 2021b). Permits are unpublished and are only available on a case by case basis, ‘Building Control records are not public records which mean there is no public right to view or obtain information submitted under the Building Regulations’ (London Borough of Islington, 2021). If publication was mandatory, streamed live spatial data on change to stock could be provided. If characteristics of demolished buildings were also required by permits, type of buildings being lost would also be known.

The most significant recent research into the spatial tracking of demolition in London has been carried out by Kimon Krenz at UCL (Krenz, Forthcoming). Krenz’s method uses changes to OSMM TOIDs to track demolition year by year. An extract showing demolition between 2005 and 2018 in the London Borough of Camden visualised by Krenz using this methods is illustrated below.

kimon krenz

Both Krenz’s interest in collaboration on Colouring London, and the release of OSMM TOIDS as open data, now offers the potential to provide annual spatial demolition data for London within the prototype open data platform. An additional important step forward, is that actual updates to change to OS maps for London are now available for individual years, going forward, whereas previously these were aggregated for specific areas over several years (see also Chapter 7). Integration of spatial demolition data is proposed for the second development stage. It is hoped that at some point demolition data will be live-streamed by the GLA via the planning portal to allow a colour-coded demolition map may be incorporated. A sketch of how this might work is provided in Table 6.3. The draft design allows demolition to be anticipated, tracked, and verified, though it is important to stress that this approach cannot be used for the capture of partial and minor demolitions occurring as part of extension and adaptation.

Chapter 6 live stream

Open data for London on protection and designation

Designation data for London provides information on the geolocation of buildings where the rate of change is consciously slowed; on buildings classified as of value national; on the location of pre 1840 building types; and on probability of demolition. The principal designation categories of relevance to London are ‘listed buildings’ and ‘conservation areas’ (Historic England, 2019b; 2021). Additional designed categories can be viewed on the National Heritage List for England (NHLE) (Historic England 2019a). The NHLE is the portal though which open designation data for London (and England) are released by Historic England (the government’s advisory body on England’s built heritage). Listed building data held within the NHLE has also been noted by Dominic Humphrey to represent, at present, the largest and richest open database on buildings in London, and England, with each list entry providing a description of the building’s age, size, material, and significant adaptations to it.

Chapter 6 Listed Buldings

Around 4% of London’s stock is estimated to be listed, though the exact number of listed buildings cannot, as yet be calculated as discussed below. This is relatively high for Europe, as 1–4% of buildings are estimated to be protected within individual countries (Hassler, 2009). Virtually all pre-1840 intact buildings are recorded on the NHLE. Owing to the rapid increase in the size of the stock during the nineteenth century, only the highest quality buildings from later cohorts are protected. Eligibility for statutory protection is based on the following order of criteria: period, rarity, documentation or finds, group value, survival condition, fragility or vulnerability, diversity and potential (Historic England, 2019). Buildings cannot be listed unless they are over thirty years old. This is the length of time deemed sufficient to assess the ‘value’ of a building in terms of national significance. There are three grades of listed building: Grade I (making up 2% of listed buildings), Grade II* (making up 4%) and Grade II making up 96%. London is shown in the NHLE to have 9,251 listed building entries, the largest number of entries for any English city (Historic England, 2019a).

Listed building data are released as open, spatially located point data. ‘Entries’ may represent one building or many. Problems with mapping these data are shown in Figure 6.13. Here point data have been translated into polygons by the City of Westminster. However buildings remain aggregated with terraces for example counted as single entry. It is important to note here that list entries are text-based, and, as in the case of HM Land Registry INSPIRE land parcels, legal boundaries are based on NHLE descriptions, not on boundary geometries. As such, disclaimers will be needed within a prototype platform to ensure that users understand building outlines to be indicative only.

Since 2016, the author has worked closely with Historic England, and colleagues at UCL, to try to identify ways in which computational approaches could be employed to address the issue of entry aggregation; in 2016 with Clementine Cottineau at CASA, and, between 2018 and 2020, with Dominic Humphrey and Maciej Ziarkowski, as part of Colouring London work. However, owing to the way in which information is structured in the NHLE (as free text), and in Historic England spreadsheets (where age for example can be described by century or year) these data cannot be easily extracted. It is understood however that progress has been made within work commissioned, by Historic England though details of this have yet to be released.

Conservation Areas in London represent areas where demolition is controlled but where buildings are subject to less onerous controls than listed buildings (which may also be held within them). Based on email correspondence between the author and Historic England, it is estimated that around 15% of London’s buildings are probably under conservation area controls. Data on this type of designation is extremely important as Conservation Areas act as a brake on change, and allow rates of churn, uniqueness and reuse to be monitored and calibrated. Conservation Areas in London are managed by individual London local authorities, though they are commonly proposed for designation, and largely monitored by local amenity societies. Demolition in conservation areas is only permitted where the building is deemed not to contribute to the special historic or architectural character of the designated area (Historic England, 2019b). New building insertions are permitted provided that these are in keeping with the character of the area. Until recently data on conservation areas had to be accessed directly from individual local authorities, or by special request from Historic England. During the search for these data a small number of London local authority conservation departments in London failed to make their conservation data available despite repeated requests. In 2021, comprehensive data for conservation areas for London were released independently by Ian Hall http://www.bedfordpark.net/leo/planning/English%20Conservation%20Areas%20-%20an%20updated%20spatial%20review.pdf. As local authorities have no obligation to submit datasets, or updates, to a centralised database, updating, however, as noted by Hall, remains a problem.

10. SUSTAINABILITY. Open data on the sustainability and quality of buildings for London

Though all data categories are designed to support sustainable development within the stock, this category looks at specific existing or potential indicators of energy efficiency, and potential to lock in carbon and maximise efficient use of existing resources through lifespan extension As well as providing data for analysis and forecasting the idea here is also to help drive up quality/longevity of new build and to promote reuse and upgrade of existing buildings.The main type of open data available, used as an indicator of sustainability is energy performance data. Energy performance assessments for domestic and non-domestic properties are, as discussed above now available as open data through EPCS and DECs released by MHCLG as discussed above (MHCLG, 2020a and 2020b). BREEAM data collected by the Building Research establishment (see Team above) can be used as a more precise indicator of energy efficiency, and of construction quality, but this data is not a government requirement and only available for specific buildings where applications for certification have been made. Other quality marks do exist, and appear to be being considered by a number of professional bodies. These have the potential to be incorporated where open though in the current study only a cursory assessment through conversation with prototype consultees (e.g. RICS, RIBA) has been made. No statistical data, restricted or open, has been identified on building lifespans or vulnerability of typologies. However significant opportunities were identified in relation in terms of the use of lifespan assessments to generate a number of new types of sustainability indicator. Three types of ratings are proposed: ‘repairability’, ‘adaptability', and ‘lifespan potential’. These would be based on current data on age, use and construction systems and materials, plot size and demolition dates, and historical information on demolished stock relating to characteristics of demolished cohorts, and average cohort lifespans and survival rates.

Statutory lists are useful in identifying buildings that are considered by government to be of such a quality to be of national value (see 'Planning'). However data, restricted or open, on the quality of buildings as assessed by occupiers and local communities is rarely captured. This is discussed further under 'Community' below.

11. DYNAMICS. Open data on long-term dynamics for London

This final section of this chapter deals with long-term dynamics data focusing on how long specific types of building, in specific locations have lasted. For this date of construction and demolition of buildings are needed as well as enough information to understand what type of building has been demolished- e.g. terraced house. Discussions with the Survey of London have confirmed that apart no large-scale statistical databases exist for London containing historical construction and demolition dates.

The main source of spatial long-term dynamics data is historical maps, from which typology can potentially be inferred, however as these do not provide precise construction dates, nor detailed information on use/form, supplementary information from other types of historical publications, e.g. gazetteers, texts, drawings, archive photos etc. is needed. Of these gazetteers, such as those published for recent volumes of the Survey of London, provide a vital, quick-to-access source of lifespan data.

London’s historical evolution is documented extensively. Sources of historical data, are however are highly dispersed. Examples of their location range from art and architectural history publications focusing on individual buildings or specific cohorts; city gazetteers and surveys (e.g. the Survey of London); documents complied by preservation and amenity societies and community planning groups. Major physical archives for London are held by the British Library, the London Metropolitan Archives and the Guildhall Library, with local authorities also curating smaller local collections. Relevant image archives are also held by Historic England and Ordnance Survey. Other sources of information include conservation area reports, available from London local authority websites; academic papers, particularly micromorphology studies (though these may be paid-for access); local history society publications; historical planning records and drainage plans; and online databases such as the NHLE. Expert knowledge will held also be held by historians and by bodies including Historic England, English Heritage, London Historic Buildings Trust, Heritage of London Trust, as well as national and local amenity societies, and other specialist groups. A significant amount of unpublished information on historical building lifespans will also be held within the personal archives of many amateur and professional historians.

The most significant online historical source of text based historical data on the history of individual buildings in London is British History Online, founded by the Institute of Historical Research (IHR) in 2003. This collates information from a number of key London sources, including the Victoria County Histories, (founded in 1899) and the Survey of London. The Survey of London, founded in 1894, is the longest-running building survey in the world. It is a rare example, along with the Commission de Vieux Paris, of building-by-building surveys for cities as a whole, undertaken by historians over more than 100 years (Saint and Guillery, 2011). Andrew Saint states that ‘No other city in the world can boast a publication about its urban history with the same depth and breadth as the Survey of London. It is an outstanding example of continuity and innovation in the field of descriptive and analytical urban history’ (University College London, 2013).

Survey of London volumes are written in continuous prose and include photographs, historical images and bespoke survey drawings (Survey of London, 2018). Owing to the time required to carry out primary research, geographic coverage for London has been slow, with central London areas covered by the Survey over the past century shown in below in pink. However, once produced, these highly accurate records on the development of sites have an indefinite shelf life. Investment in this type of research is therefore, in the medium to long term, extremely cost effective. Records only need updating where buildings are demolished or subject to major structural alterations, or where additional primary information may come to light.
Colouring London has been developed in close collaboration with historians at The Survey of London (Survey of London, 2020). The idea of crowdsourcing building attribute data was also in large part influenced by the Survey’s Whitechapel site (Survey of London, 2021), which demonstrated that local historians and local communities could be encouraged to upload data onto VGI platforms, on the characteristics and history of buildings. The next major stage in this collaboration is to extend work with the Survey, and Historic England, on the development of an historic environment data upload group for Colouring London. ‘Age’ and ‘Lifespan’ features, and the introduction of Team’ and ‘Community’ categories, designed and built to make the open platform as attractive and historian-friendly as possible.

Chapter 6 survey of London

The quickest way identified to estimate lifespans for geolocated demolished buildings is to use historical maps. The most relevant and comprehensive data source for London is OS’s County Series and National Grid series, used in the Batty/Stanilov study, and available for approximately 20 year intervals from mid-19th century to the late 20th century. Though these are unable to provide precise year of construction they can provide a date interval of approximately 20 years. Major UK sources of these maps can be seen at this National Archives link https://www.nationalarchives.gov.uk/help-with-your-research/research-guides/ordnance-survey/.

Chapter 6 historical

Ordnance Survey’s County Series maps for London, dating from the 1860s, are available at 1:2500 scale, and post-war National Grid maps at 1:1250 scale. These are accessible to universities, (alongside OSMM), via the EDINA Digimap site, as scanned, georeferenced raster images, presented chronologically (EDINA, 2021). The EDINA site allows map tiles to be compared online or downloaded, but not released (ibid.). Access is not available without a licence. Citywide London surveys are available for c.1875, c.1895 and c.1910, though precise dates of map tiles differ owing to differences in survey dates. Coverage for the 1930s survey is more patchy, as shown in Figure 6.14, probably a result of a slowdown in map production, and focus on newly built areas, before the Second World War. After the war the National Grid system was introduced, with building data available at the more detailed 1:1250 scale. However during the latter half of the century city-wide surveys are rare with local surveys occurring only as-and-when required.

In 2017, the author began to collaborate with Robert Hecht and Hendrick Herold at IOER to explore the possibility of applying machine learning methods to see if large-scale, open building footprint datasets could be created for London from OS historical maps. Significant work in opening up OS map collections has, and continues to be made by the National Library of Scotland (NLS), driven by the NLS’s map curator Christopher Fleet. The NLS offered IOER samples of photographed map tiles for experimentation which were compared with scanned map samples supplied by Historic England. IOER tests concluded that not only had scanning resulted in an additional, problematic, layer of copyright restrictions, but that the scanned OS maps were harder for the computers to read than photographed OS maps from NLS.

Problems with releasing data derived through this process had already been identified by the author in 2016. In the 1990s OS’s paper archive was scanned by Landmark Information Group resulting in joint OS/Landmark copyright on map scans (EDINA, 2021a), to which collections held by OS itself, EDINA Digimap and Historic England are all subject. Correspondence between the author, OS and Landmark in 2016, with regard to the open release of manually vectorised footprint data for section of Camden, resulted in permission only being granted to release data at borough level. Vectorised footprints were classified as derived data not new artwork.

In 2019 Fleet offered the author use of NLS maps for London for the 1890s and 1950s, for integration within the London open data platform as raster layers, and from which open vectorised open footprints could also be extracted. He noted that the 1890s map could be released immediately under a Creative Commons CC-BY licence but that the1950s layer could only be released for third party use in 2022, as it had been scanned by a third-party. In 2019 the author, Hecht and Herold joined an international group set up by the Alan Turing Institute (ATI), as part of Turing’s Living with Machines project, and began to work with Katie MacDonough to advance the use of Computer Vision in historical footprint extraction. Reference to the author’s contribution to this group is made at https://livingwithmachines.ac.uk/computer-vision-for-digital-heritage/. Work to advance large-scale historical footprint vectorisation and open release is ongoing. The ultimate aim is to integrate (and animate) as many historical map layers for London as possible, and to collaborate with Living with Machines and NLS, to make open raster and vector data, relating to 1:2500 and 1:1250 OS historical maps available for London, for as many survey dates as possible.

12. COMMUNITY. Open data on community assessment of building quality/performance

Statistical datasets on the quality and performance of buildings, in terms how well buildings specific buildings or types of building operate from an occupier and local community point of view do not exist for London- i.e. do they work well when you are in them? do they work well when you live near them/walk past them? The largest source of public feedback on buildings held within the Planning Portal recorded in comments on planning applications. However these are text based and only relate to specific changes proposed to each specific buildings. They do not indicate how well specific types of buildings with specific characteristics work.

Local lists - coordinated by local authorities and driven by the historic environment sector- are the most useful current source of either open or restricted data in terms of indicating the value placed on specific local buildings https://historicengland.org.uk/listing/what-is-designation/local/local-designations/. Ian Hall is currently independently working on collating all local lists to produce a national spatial database of all locally listed buildings (ADD link).

However the limited number of buildings on local lists, and their selection mainly for local uniqueness, means that data on typologies is difficult to extract. Memberships of specific amenity societies e.g. the Victorian Society can give some indication of scale of public interest in specific cohorts/types, and therefore collaboration with these membership has been identified as helpful when attempting to capture public feedback on typology quality. However new mechanisms of capturing spatial statistics from experts with the most detailed knowledge of these buildings in now required. Sepa

rating out the emotional response of users/local communities to buildings (owing to personal associations) from characteristics of typologies that make them work better than others in specific respects is seen as only possible through analysing very large volumes of responses to how well typologies work to identify patterns. For this geodata on typologies (see 'Type') and data on public responses to these are both required. The open data availability for London assessment also identified that questions to the public also need to discourage users from being able to add data on what they felt about buildings in such a way that could make other members of the local communities feel excluded, bullied, or upset.

Clone this wiki locally