1. Introduction
Emissions inventories have been fundamental tools within air quality management regarding emissions generated mainly by human activities and other sources (Stella, 2002) and are relevant to understanding air pollution sources and designing effective emission mitigation actions (Day et al., 2019). Historically, they have originated from different needs within environmental management and science: criteria pollutants (EEA, 2016; EPA, 1999; INE, 2005), greenhouse gases (GHG) (IPCC, 2006), toxic compounds (RC, 2011), or short-lived climate pollutants (SLCP) (Shoemaker et al., 2013; Klimont et al., 2017). They have in common several processes that co-emit these chemical species, which are accounted for in these seemingly different emissions inventories. In many cases, they are compiled using different methodologies and resolution approaches, either top-down or bottom-up (Thunis et al., 2016). This is the case for national criteria pollutants (SINEA, 2015), SLCPs (UNEP-CCAC, 2016) and the GHG national emissions inventories of Mexico (INECC, 2015). Most of the national and local capacities are supported by the same teams involved in compiling different emissions inventories, also following in many cases different methodologies, software or toolkits (UNEP, 2013).
Some of the emissions inventories are compiled as part of international commitments and conventions (UNFCC, 2012; UNEP, 2004). Another challenging scientific task is to build, compile and maintain a harmonized emissions inventory with high spatial and temporal resolution and low uncertainty in the process, which is needed for modeling the chemistry of the atmosphere at local, regional and global scale (Russell and Dennis, 2000; Vendrenne et al., 2012; Borge et al., 2014). The emissions inventory is an important source of information on air quality studies (EPA, 2002; Bang et al., 2019), environmental management, and policies. It is significantly used in chemistry transport models (CTM). The contribution to air quality modeling is relevant in terms of evaluating source emission impact, transport and destination of pollutants (Pan et al., 2008) and air quality scenarios (Vijayaraghavan et al., 2016). It is also considered an important tool to generate sustainable national and local policies that allow urban development with the lowest impact on population and ecosystems.
Currently, there is a lack of appropriate methodology, semantic and ontological structures (Ortega, 2009) to ensure and manage emissions inventory information. These have three major sources of uncertainty: activity data, emission factors and geolocation data. A markup language can facilitate and automate the consistency analysis to partially resolve some of the most common sources of error and uncertainty when the integration of the emissions inventory is developed. Also, there are limited emissions inventory models for direct application on atmospheric models where the main target is to convert source-level emissions into temporal and spatial emissions on grid modeling cells.
For modeling community users (MCU), there is the Sparse Matrix Operator Kernel Emissions (SMOKE) model developed by US-EPA (UNC, 2013). It is one of the most used emissions models worldwide (Houyoux et al., 2000). Another modelling emission model used by the CTM is the High-Elective Resolution Modelling Emission System version 3 (HERMESv3) that estimates atmospheric emissions for use in multiple air quality models (e.g., the Community Multiscale Air Quality Modeling System [CMAQ], the Weather Research and Forecasting model coupled with Chemistry [WRF-CHEM] and the Multiscale Online Nonhydrostatic Atmosphere Chemistry model [MONARCH]), which estimate anthropogenic emissions at high spatial resolution (Guevara et al., 2020). Nowadays, inventories are facing a crucial scientific challenge (Frost et al., 2013) on how to quantify and temporally and spatially display emissions, which has been addressed by the creation of emission models. However, these methodologies and tools are shared only within the MCU and are not easily accessible for other users in charge of management and air quality policy.
This paper presents an emissions inventory methodology based on the KML standard of Google Earth, combined with HYSPLIT-NOAA (Draxler et al., 2000, 2012). This approach allows the KML emissions inventory model and the trajectory model to be used and coupled on the same platform. The methodology developed could be useful in the atmospheric sciences research field and for other potential users to be able to visualize information with any kind of computer system and mobile devices such as smartphones.
2. How emissions inventories work
National emissions inventories contain information from several emissions source categories and include emitted pollutants, emission factors or functions, activities levels, the time period over which they are estimated (Gkatzoflias et al., 2013), emission location at different levels of geographical detail (García-Reynoso et al., 2019), and total emissions (Parra et al., 2006). In order to construct an emissions inventory with a high level of spatial and temporal resolution to provide consistent data for modelling and environmental policy issues (van Aardenne, 2002), data are collected from other information sources, mostly provided by federal, state, and local agencies (EEA, 2016). All these information items need to be integrated in databases on computing systems (Symeonidis et al., 2004). Most of the data bases are from individual source sectors (e.g., transport, energy, household) (EPA, 2016).
2.1 National emissions inventory
In Mexico, the legal framework for emissions inventories is comprised in two general laws, the General Law of Ecological Equilibrium and Protection of the Environment (LGEEPA, Spanish acronym) (SEMARNAT, 2016), which mandates the atmospheric pollutants emissions inventory (INEM), and the General Law on Climate Change (LGCC) (SEMARNAT, 2012a), which mandates the GHGs emissions inventory (INEGEI). The responsibility for compiling the national emissions inventory falls within the Ministry of Environment and Natural Resources (SEMARNAT) and the National Institute of Ecology and Climate Change (INECC), with information provided by federal and state agencies. In most cases, responsibility for the national compilation of these emissions inventories lies within the same working teams.
INEM was first released, mostly focused on criteria pollutants, in 2002 for the base year 1999. I was then was updated for base years 2005, 2008, 2013 and 2016. INEM generally follows the EPA calculations and estimations model, as do other countries (e.g., Colombia [MINAMBIENTE, 2017], Chile [CONAMA, 2009], and Korea [Kim et al., 2008]). In all these versions substantial changes in compilation methods were applied, which makes them inconsistent for comparisons or trend analysis; besides, the spatial resolution of INEM is at municipal level.
For GHGs, the first release of INEGEI was in 1995 for the year 1990 (Gay, 1995), and updates in all the six national communications following the IPCC guidelines were a mixture of tier 1 and tier 2 levels, which allowed yearly estimates, recalculations and trend analysis. In the Fifth National Communication, black carbon was reported as an annex to INEGEI (SEMARNAT, 2012b). This emissions inventory was later updated to include organic carbon (MCE2-INECC, 2018). The Sixth National Communication (SEMARNAT-INECC, 2018) involved a policy decision to move stepwise to a tier 3 national GHGs and SLCPs emissions inventory. This meant a decision to compile a unified emissions inventory for criteria pollutant, GHG, SLCP and toxics, given that, in many cases, these local or global pollutants are co-emitted in the same processes.
2.2 Main challenges of emission inventories
The task to reduce processing time to compile the large volume of datasets, estimate annual emissions, and manage the high temporal and spatial resolution of the emissions inventory for CTM has become ever more challenging. Emission information requires accuracy and consistency. Currently, some platforms can provide a simple, flexible, and easy model to standardize and visualize the emissions inventory. For that purpose, the Earth sciences community has integrated the above features and developed on-line visualization and evaluation tools (GEIA, 2019) with data exchange protocols, metadata and conventions (Husar et al., 2008).
2.3 Improving emissions inventory visualization
Another scientific challenge is to have useful and effective data visualization (Jeong et al., 2006). In recent years, earth scientists (de Paor et al., 2008) have found a universal platform in KML for managing, visualizing, and integrating geospatial information (Zhu et al., 2014). KML (which is maintained by the Open Geospatial Consortium) is the current standardized format to display geographic information (OGC, 2019). KML is an Extended Markup Language (XML) used for visualizing geographic information and complements other OGC standards including Geography Markup Language (GML). KML uses geometry elements derived from GML such as point, line string, linear ring and polygon. Another format compatible with KML is the Collaborative Design Activity, which is used to display 3D applications such as buildings and texture (de Paor and Whitmeyer, 2011).
The standardization of the emissions inventory model based on international standards such as KML, GML (Ron, 2005), and Newtork Common Data Form (UCAR, 2019), under the OGC and other implementations of XML, such as the Chemical Markup Language (CML) (Murray-Rust and Rzepa, 1999), is the way forward to build a robust, unified and interoperable emissions inventory. One long-term goal has been a hyphenated online emissions inventory model in CTM (Jacobson et al., 1996). Biogenic emissions are already been calculated online with CTM in a straightforward mode (Pierce et al., 2002). However, online integration of other emission sources has faced stronger challenges.
3. Methodology
3.1 Emissions inventory model
Although an emission inventory model is an obvious candidate application, until recently, there was no conceptual model developed for managing national emissions inventories based on XML and related applications as GML and CML. The first of such applications is the EPA emissions inventory model, which is based primarily upon emissions estimates and emissions model inputs provided by state, local and tribal air agencies, and supplemented by data developed by EPA (EPA, 2008, 2009). The objective of the Consolidated Emissions Reporting Schema (EPA, 2019) is to develop a common air emissions reporting schema on XML that can be used for sharing and reporting air pollution emissions data.
Currently, emissions from point sources under the Mexican federal government jurisdiction are reported to SEMARNAT using a web-based system. Before that, the information was collected, archived, and transferred on hard copy documents and records. Ortega (2009) proposed an integrated metadata model based on the Machine-Readable Cataloging Standard (MARC21) mapped onto Federal Geographic Data Committee (FGDC) and International Organization for Standardization (ISO) standards, to be used as the backbone for Mexican environmental agencies in emissions inventories systems and other environmental applications, in a way similar to Vardakosta and Kapidakis (2013). The subset of metadata categories proposed by Ortega (2009) originated the FGDC_ISO1900 standards (https://www.fgdc.gov/metadata/iso-standards), which provided the basic framework to describe INEM. We proposed developing an emissions inventory model based on standards and decided to develop the point source category as the first building block of such proposed model, following the Ortega (2009) subset.
Due to the strong economic integration between the Mexican, Canadian and US economies, the general structure of INEM has followed the EPA National Emissions Inventory (NEI) design. Specifically, the point sources definition is the same for both inventories, but the thresholds are completely different. For point sources, INEM consists of state and federal point sources. The later are regulated by the federal government and include the most important sectors of the economy, intensity of emissions, or key toxic compounds.
Federal point sources report emissions to the National Pollutant Release and Transfer Register (RETC, Spanish acronym) (SEMARNAT, 2018). State point sources report in some sort of in-house formats defined by each federal state. Although much more structured than state emissions inventories, even the federal emissions inventory is built and calculated on commercial spreadsheets. Currently, PRTR reports are based on standard electronic formats (SEMARNAT, 2015). The report structure has a lack of ontology, semantic content, and protocols to systematize the process of data compilation, quality control, and queries of inventory information (Boone and McKenzie, 2003).
3.2 Developed point source inventory emissions model
The developed point source model can be divided into two main blocks of categories, one deals with the identity of the source and the other deals with emissions from that source. In terms of intended interoperability, our model takes the tags used by the Emissions Inventory System from the EPA (2008). The identity metadata can be split into two sub-categories: identification and geolocation. The metadata category dealing with emissions can be divided into process, properties, equipment, and emissions (Fig. 1).
3.3 Keyhole Markup Language (KML) as the main standard platform of emissions
The emissions inventory for air quality modeling needs to be temporally and spatially disaggregated. All the point sources in the emissions inventory should include geospatial information and the release coordinates. Therefore, KML is a useful standard for the identity component of the emissions inventory model (Fig. 2). The basic data representation level in KML are placemarks (de Paor and Whitmeyer, 2011), which use virtual globes containing the coordinates and the rest of the identity information. There are different ways of adding the metadata to a <Placemark> tag using applications directly on the Google Earth platform or using a text editor that complies with the KML format. To incorporate the point source, a Python script was developed to migrate the databases of the national emissions inventory to the model developed and to map the point sources onto the Google Earth platform.
3.4 Migration process of source emissions inventory to KML
Currently, there is no process that complies with standards and methods to compile and integrate the national emissions inventory. Compilation of the point sources emissions inventory is developed and stored in an Excel sheet where each row is a record that corresponds to a fixed source without any semantic or ontological rule. The methodological proposal is based on two main sources of geographic information and emissions metadata: identification and release location (Fig. 2); the rest of the “Types” are important but not relevant for aspects of air quality modeling. However, they are especially useful for applications in evaluating the source and establishing air quality and climate change mitigation actions, management, and policies.
Emissions point source data is generated directly by the industrial sector and feeds into the main interface system, which includes information content on the point source conceptual model. Once obtained, the information is hosted on a federal database and crossed over with the previous database information of the source. Unfortunately, the geographic information lacks a proper quality control process, until the database is used by the MCU, where most of the information is unsuitable to use. In that sense, the proposed model (Fig. 3) allows us to evaluate suitable geographic information on the Google Earth platform in two points of the flow process, where the industrial sector and governments would be able to reduce geodata uncertainties and update the information in less time.
3.5 Transformation of emissions inventory algorithm to KML
Compilation of the point sources emissions inventory is developed and stored in an Excel sheet, where each row is a record that corresponds to a fixed source and the columns include all the metadata information of the point source. In the methodology proposed for transformation, based on inventory information in that format, which is completely migrated to spreadsheets, each file or database should contain the proposed information of each point source as a standardized identification tag. This will be visible within the metadata of the virtual globes in KML, incorporating two important methodological contributions, which are basically the standard tags from EPA on a standardized file in KML.
The information obtained from the inventory of point sources is structured through a base Python script (Marks, 2008) that allows us to read the documents or databases structured in columns previously sent to a comma separated value (CSV) format to generate a KML file, which meets the fundamental characteristics of being a well-formed and valid file according to the KML metalanguage standard. The annex header and part of the code body are developed in Python to create the KML file (Fig. 4).
4. Visualization
Two contributions to the development of the national emissions inventory are the ability to integrate with the Google Earth platform to visualize locations (Fig. 5), where we show a small sample of the total national point sources; and the potential capability to interoperate with dispersion models. In this regard, we selected HYSPLIT-NOAA model (Draxler et al., 2000, 2012) since it currently is the only open model that is able to generate KML output files and display through Google Earth, and overlap using our model outputs on KML (Ortínez et al., 2017).
The main metadata must comply with the general model characteristics of the “Point Sources” and be associated with universal identifiers that can be used by other platforms or programs for managing emissions inventories. There is a proposal developed by the EPA for the systematic management of its inventories and reports. The Consolidated Emissions Reporting Schema (CERS-EPA, 2019) is meant to develop a common air emissions reporting schema for sharing and reporting emissions data that include criteria air pollutants, toxic air emissions, and GHGs, as well as SLCP. The CERS-EPA (2019) is fully coupled with the integrated emissions inventory effort. Our model consists of an adapted subset of the data blocks defined by the EPA implementation of XML schemas.
According to the CERS-EPA (2019) tags standard and the conceptual emissions model developed in this project, each point source includes several descriptions in standardized tags to describe the emissions source in detail. The objective of these tags is to standardize emissions information and integrate several items of description data from the source, such as emissions, identification, geographic information, process, properties, and equipment. The metadata could be accessed through virtual globe information displaying the metadata information.
5. Application
5.1 Virtual globe point source emissions tag on Google Earth.
KML representing and visualizing geospatial information on Virtual Globes (Bailey and Chen, 2011) has been widely used by the Earth science communities in most of the popular virtual globe systems, such as Google Earth (Ballagh et al., 2011). In the current state, we can visualize the geolocation of each point source according to the emissions inventory on KML, dispersion, and deposition of emissions in Google Earth by a checklist procedure. The user selects and displays the KML tag from the main source in Google Earth (Fig. 6), then through the HYSPLIT-NOAA server, the user feeds the modeling parameters, and the KML balloon information is transferred from the point source to HYSPLIT-NOAA. The model is run, and the output is selected in KML format to be read by Google Earth and is overlapped on the emissions inventory KML layer. These operations may be automated in future developments for HYSPLIT-NOAA and other models that can interact with XML formats.
5.2 Smart phone visualization
Several applications have been created for smartphones that allow us to analyze data in the field of geosciences, i.e., geologists use different tools that allow them to integrate multiple functions in their smartphones such as keyboard/virtual keyboard, camera, recorder, digital compass, GPS receiver, and accelerometer. Just a single smartphone can provide multiple functions in the field for geologists. Nowadays, storage capacity, connectivity and portability make it possible to replace personal computers in the field (Weng et al., 2012).
In that sense, for the atmospheric sciences, this work shows that Google Earth or any app for smartphone reading and displaying KML datafiles allows to visualize the point source location and its emission puff releases or trajectories (Fig. 7). Also, it works for air quality management, consulting and receiving information for the evaluation of any source and could include data from atmospheric monitoring networks and air quality research monitoring sites (e.g., there is a site description of black carbon measurements used by Peralta [2019]). Streamlining this process requires software development that is beyond the scope of this paper.
6. Conclusions
The spatial emissions inventory model was constructed by combining Keyhole Markup Language with the CERS-EPA (2019) emissions standard and adapting it to INEM point sources categories. A Python script was adapted to transfer the INEM current format in Excel files to the KML standard. The development achieved opens the possibility for extending the model to other types of emission sources such as mobile and area sources. The application allows us to visualize and consult emissions information on Google Earth and other applications capable of reading KML formats on any type of computing device.
The implementation of markup languages in standard schemes such as KML and CERS-EPA (2019) (all of them referring to FGDC-ISO-1900) for the development of emissions inventory models, allows progress in the interoperability with air quality and dispersion models of pollutants to evaluate public policies and reinforce environmental or emergency cases monitoring. In this last case, we show how an output from the inventory model in KML format can be fed into the HYSPLIT-NOAA model.
One great advantage of the KML standard is the ability to visualize and deploy it on any Geographic Information System (GIS), computing platform, and smartphone, expanding general use regardless of the type of computer system that is being worked on. Likewise, the compatibility of KML with other standards allows the exchange of metadata with GISs under protocols and standards approved by the Open Geospatial Consortium, such as GML, CML, and NetCDF.
Methodological development for other emissions sources should be continued under the same geometries established in the KML standard to geo-represent area sources as polygons and mobile sources on the road as lines. Standards and protocols should be established for atmospheric monitoring data, either of a scientific nature or to inform atmospheric monitoring networks.
Finally, even if our emissions inventory model in the KML standard is a simplified version of the EPA model, due to its structured construction following ISO standards, there is no limitation on its future harmonization with this and other ISO-compliant international emissions inventories.