- python 3.6+ (no support for Python 2)
- numpy
- scipy
- pyproj
For flash sorting and plotting:
- scikit-learn
- pytables
- matplotlib
See examples/README for an example of how to
- Turn LMA ASCII data into HDF5 flash files.
- Turn the HDF5 flash data into 2D and 3D grids of flash properties
- Make some quick-look plots of the grids.
- Use an interactive interface to manually identify cells
- Use a log of the cells identified above to calculate flash statistics
VHF sources are grouped into flashes by first filtering out noise sources by using the reduced chi-squared value and number of contributing stations for each VHF source. Chi-squared of less than 5.0 is typical; 1.0 is one threshold in wide use. For all but the worst-performing networks, a minimum of six stations should be used; where available, seven or more will give the most noise-free results. These values should be determined by examining the source data for time periods of interest on a case-by-case basis. This step is especially important for networks with frequent station drop-outs, where the network is asymmetric, or where the network has poor sensitivity.
Sources are grouped into flashes using space and time criteria. These default to 3 km spatial and 0.15 second temporal thresholds, with a maximum duration of 3 s. The only fully integrated algorithm in this codebase uses the DBSCAN clustering features in Python's scikit-learn. The flash sorting process is described in Fuchs et al. (2016),
Fuchs, B. R., E. C. Bruning, S. A. Rutledge, L. D. Carey, P. R. Krehbiel, and W. Rison, 2016: Climatological analyses of LMA data with an open-source lightning flash-clustering algorithm. J. Geophys. Res. Atmos., 121, 8625–8648, doi:10.1002/2015JD024663.
There is historical support for the McCaul et al. (2009, Weather and Forecasting) flash sorting algorithm, but it is not included here. The code architecture supports the addition of other algorithms.
Flashes are turned into gridded products by grabbing a flash, including associated VHF source points, performing filtering to select for certain kinds of flashes, doing necessary coordinate transformations, and then determining how each flash contributes to a gridded product field that has some meaning of interest. A requirement is levied on the minimum number of sources per flash, usually ten. The processing is implemented as a coroutine-style pipeline where flashes are pushed down the pipe before landing on a grid.
Grids may be specified in one of the many map projections available through pyproj, or a regular lat/lon grid may be used. The gridding process allows for simultaneous processing of multiple sequential time windows whose duration can be controlled.
For each of these products, except for perhaps flash initiation density, a logarithmic color mapping usually works best.
This product is a simple count of the VHF sources on the grid. It is the most basic form of gridded data that can be derived from the LMA data. It only counts those sources that contributed to a valid flash.
For each flash, the first VHF source point in time is taken as the flash initiation location. Therefore, each flash is represented by a single grid cell location A sum of all values on this grid gives the total number of flashes in the domain. The electric field rebuilds relatively rapidly in regions with larger flash initiation rates, since flashes initiate in regions of large electric field.
Flash extent density gives a column-local flash rate; it is a count of how many flashes passed through that grid cell. This product highlights a key advantage of the LMA: it shows regions where it was energetically favorable (local maximum in electric potential) for flashes to frequently revisit.
The average flash area product calculates the average area of all flashes that passed through a grid cell. It is the sum of the areas of all flashes that passed through a grid cell divided by the flash extent density. This product is good at highlighting regions with infrequent but very extensive flashes, such as MCS stratiform regions and supercell anvils. It can also highlight convective regions that have unusually small, frequent flashes.
Grids are produced in a Climate and Forecast metadata (CF) compliant NetCDF format. The filename convention is as follows:
WTLMA_20120609_235000_3600_10src_0.0323deg-dx_source.nc
WTLMA: User-selectable prefix, often the network name 20120609: year, month, day 235000: hour, minute, second (start time) 3600: duration, seconds 10src: min points per flash 0.0323deg-dx: grid size in the x direction, in this case 0.0323 deg longitude source: product type (may be source, flash_init, flash_extent, or footprint)
The flash sorting process creates an intermediate HDF5 file that contains the LMA source data in an "events" table, as well as a "flash" table with flash metadata, such as center location, area, and start time. There is a common flash_id key in both tables that allows the sources for each flash to be retrieved.
Code also exists in the repository to take the NetCDF grid files and produce multi-panel summary images of the grid, with a panel for each time increment. Each file is turned into a single PDF, with control over the number of columns. The scripts that make these plots can serve as the basis for additional types of plots customized to serve other analysis or visualization needs.