1 Introduction
Large-magnitude earthquakes may induce strong ground shaking, surface ruptures, landslides, liquefaction and Tsunamis, being any of these direct or indirect causes of human depths and severe economic losses. Fortunately, earthquake magnitude and frequency are inversely proportional, so events of large and moderate magnitudes are scarce on seismic catalogs compared to earthquakes of low magnitudes. However, human or automatic detection by traditional techniques may fail for low-magnitude events, in case of traces with poor signal-to-noise ratios or recording overlapping events. Even though, small-magnitude earthquakes are not usually harmful for lives and civil structures, they must be also recorded in seismic catalogs. These catalogs are used in seismic hazard assessment to map the expected ground motion under a given earthquake magnitude, and must influence building codes. In addition, the detection of small earthquakes in an aftershock sequence left by a big event, are used to study the distribution of the fault system, total rupture area, and the evolution of the stress configuration. This information, in combination to additional geological studies, allow assessing the destructive potential of that given fault.
The detection problem of local earthquake events consists of estimating the arrival times of the primary (P) and secondary (S) waves to a recording seismic station. These onset times feed subsequent processing analyses to estimate hypocenter location, focal source mechanism and important spectral properties, being this information recorded in earthquake catalogs afterward. Detection and processing analyses are routinely performed by a human analyst, who must estimate arrival times based on his own knowledge and experience. This estimation requires a significant amount of pattern recognition, and is tied to the identification problem of individual P and S arrivals from their amplitude, propagation speed, and induced signal polarization. Nowadays, the need of automatic processing tools grows rapidly with the amount of seismic data delivered by expanding monitoring networks, so detection and identification of P and S waves can be performed in a faster, robust and objective way.
The automation of seismic phase detection and identification must deal with characteristic features of seismic records. As commonly used for data analysis, signals are processed on time windows. In these cases, those windows may contain a lot of noise and high redundancy, so true P and S waves can be placed anywhere in the time span and polluted by alternative signals. For instance, wave conversions that depend on underground structures and site conditions, or even surface waves, may be also present on long time windows. Also, these windows may contain P and S waves triggered by simultaneous or overlapping seismic events. In many cases, seismic stations record data in several channels, allowing either a separate or combined processing of horizontal, vertical and transverse components of soil kinematics.
All these features make difficult and laborious the recognition of similar patterns on seismic signals, and the P and S detection and identification from one or various events. Filtering, polarization analysis, and additional signal processing techniques have been of primary help for automation.
Artificial Neural Network (ANN) represent a natural implementation framework for P and S wave detection and identification because of the huge amount of pattern recognition involved. The pioneer applications in [16] and [17] have been followed by multiple and diverse networks, using either assisted, or semisupervised or completely unsupervised learning. In the supervised family, we find several generic and convolutional networks, and even some scatter cases of recurrent and residual networks. In the unassisted family, there are applications based on generative adversarial networks and others exploiting autoencoders.
Only few cases are built upon ensemble learning, but their potentials are worthy of mentioning. Now, all these ANN applications must deal with the processing difficulties of time windows of seismic records, mentioned above. These difficulties translate into more complex heuristic rules to estimate the number of neurons per each network level, and the size of the learning set, if it is required. This paper reviews all these implementation aspects, as well as the detection and identification performance achieved by ANN.
Alternative machine learning (ML) techniques can also offer advanced pattern-matching abilities and well serve as automated earthquake monitoring algorithms. Moreover, the expanding ML applications to seismology have reached areas such as earthquake early warnings, ground motion estimations, and seismic tomography and inversion, and we here also briefly comment on prominent works, for the sake of completeness. However, we refer the reader to the following complementary survey papers [18,23,29], Kong2019, for additional insights.
2 Picking of Seismic Phases by a Trained Analyst
Modern data acquisition systems use sample rates of more than 100 Hz, and this limits the estimation accuracy of P and S arrival times, among other factors such as signal-to-noise conditions at monitoring stations [11]. In addition, hand-picked data are human dependent and errors are difficult to estimate, since this activity is performed by several analysts. Different trained analysts may give different onset times for the same signal phase, and sometimes the same analyst may give different interpretations after some time has passed.
For events at epicentral distances not larger than 10° (~ 1000 Km), recorded seismic waves have propagated to the crust and along the Moho discontinuity. At those stations, these waves are called crustal waves, being triggered by local and regional earthquakes. On the contrary, teleseismic events refer to earthquakes that occur at distances larger than 10° from the measurement site. The next description follows [32] and it is focused on crustal waves.
In general, a seismic phase is identified by two main features. A signal amplitude increase that exceeds the background noise in the case of P waves, or exceeds the coda of earlier phases in the case of S and later waves. A change of the dominant frequency that is often much more difficult to visualize and quantify [21]. Once the phase is identified, we have to determine the precise wavelet onset or arrival time, and the precise phase type. That is, a Pg phase or a direct P wave, a Pn that is the refracted P wave along the uppermost mantle, a Sg phase or a direct S wave, among other possibilities. The visual examination of amplitude and frequency changes can be rather subjective and would depend on the chosen width of time windows. As illustrative examples, figures 1 and 2 show the P and S phases identified at the CUMV and CACV stations of the Venezuelan Funvisis seismic network (2019/01/16 13:42, 4.0 Mw). In figure 1, both P and S phases are identified through amplitude changes of the recorded seismogram. In figure 2, identification of the P wave is straightforward from a clear change of the frequency content after its arrival, with respect to the background noise.
Signal filtering can help to identify the phase arrivals by improving the signal-to-noise ratio. However, filtering can also introduce shape changes on the waveform, and even a phase shift. This time shift could modify the phase arrivals in the order of tenth of seconds [21]. Figure 3 shows the vertical component of ground acceleration recorded at the BAUV station of the Funvisis network, in Cojedes, Venezuela. This station is located 198 km away from the epicenter of the event (2018/04/02 16:24, 2.0 Mw). Later, this seismogram is passband filtered (5-10 Hz) and resulting signal is depicted in figure 4. Here, we can see how the P wave is enhanced, and it can be easily identified by the first notorious change in amplitude.
There are software available to assist the human analyst that could even be used for automatic processing, most of them being created by university groups or seismological institutions. Some packages widely used are SHM, SEISAN, SeisComp3, Earthworm and ObsPy. These processing software were developed and published under a free, or partially free license. Software libraries for routinely analysis and additional research work, can be downloaded from web sites of the United State Geological Survey (USGS: https://earthquake.usgs.gov/research/software/), and of the Incorporated Research Institutes for Seismology (https://www.iris.edu/hq/data_and_software/).
3 Machine Learning Algorithms and Deep Neural Networks
The fundamentals and pivotal developments on automatic pattern recognition and ML techniques can be reviewed on reference textbooks, such as [10], [41], and [25], as well as on some comprehensive online courses, as for instance, the "Neural Networks for Machine Learning" from G. Hinton1 and the "Machine Learning" from A. Ng2. This section highly lacks of such generality, and only presents a basic ML classification based on the learning strategy, that later serves to introduce ML and ANN applications to geophysical data, in next sections. Thus, we recommend aforementioned books and materials for a broader presentation.
The rich variety of ML algorithms can be grouped according to the type of data used during training, a stage where optimization strategies allow tuning the algorithm parameters usually by minimizing a cost function. To this purpose, the training dataset can be fully or partly labelled, and even it may present no labels at all. Coarsely speaking, this dataset property frames the learning algorithm and also defines its potential applications. Supervised learning employs labelled datasets and develops models used for either quantitative prediction or categorical classification. In this case, the training and evaluation stages are time consuming because of the labelling and data preparation. Alternatively, unsupervised learning works on unlabelled datasets aiming at clustering the input data into groups based on similarity measures, or at reducing the input data dimensions. Because of labelling is not required, the preparation of the training and evaluation datasets takes less effort, which allows using bigger amount of data. Finally, semisupervised learning uses a hybrid dataset for training and evaluation, typically with a small fraction of labelled data, compared to a larger amount of unlabelled data. These different machine learning algorithms are shown in the categorization figure 5 along with their typical applications that vary from data classification, quantitative regression, grouping or clustering, and reduction of data dimensionality. An additional application pursuing data grouping by following implicit association rules to discover relations among variables in large databases, is also considered in this figure. The specific algorithms referred in the technical literature are also shown for each category at the bottom.
Figure 5 places ANN into the classification hierarchy of supervised learning algorithms, delineating also their possible applications when processing earthquake data. In fact, specific ANN implementations on earthquake detection are mentioned in next sections, along with interesting additional works based on autoencoder algorithms, whose applications also facilitate this interpretation task.
To emphasize the focus of our study on these ML algorithms, we use grey boxes in figure 5. Most of those ANN applications are based on deep learning, as a way to achieve high accuracy by the incorporation of several neuron layers, and therefore heuristic rules. The ANN prediction accuracy would partly depend on a previous good training, but also on the number of network layers that allow a parallel and multi-level detection of important data features (also referred as abstraction levels).
The presence of two or more neuron layers is defined as deep learning, while very deep learning is referred to using more than ten layers. Nowadays, applications based on very deep learning are scarce (because of complexity of the network structure), as also those using a shallow ANN with only one hidden layer (because of the limited learning capability).
It is worthy of mentioning that deep architectures could first use unsupervised learning, and then appeal to fine-tuning in a final stage of supervised training, aiming at a superior performance [49].
4 Machine Learning Applications to Earthquake Data
The physics of earthquakes is very complex and seismologists rely on processing and interpreting massive data sets in search for insights. ANN, among other ML strategies, have the potential to find unseen patterns and new features, both with physical consistency, in available datasets. Thus, seismologists and computer scientists have been actively developing ML tools to process earthquake data and assist interpretation. Numerous applications have focused on earthquake detection and phase picking, exploiting the ML proficiency of data classification and regression when processing real seismic traces.
Recent and interesting ML applications to earthquake detection are the classification Markov model proposed in [6], the image-based search for waveform matching presented in [56] and the seismic arrival picking based on fuzzy clustering developed in [14]. In next section, we review additional and prominent works in this area, but focus on contributions based on ANN. However, we before comment on three main additional ML applications to seismology, which are under current expansion, and relate to earthquake early warnings, ground motion estimations, and seismic tomography and inversion.
Earthquake early-warning systems intend to identify a seismic event onset at remote stations, and exploits the faster propagation of P waves compared to more destructive S and surface waves, to trigger alerts at local sites, seconds or few minutes before the strongest shaking happens. Earthquake early warning require tools for earthquake detection, and some works based on ANN can be found in [26,13,30]. Alternatively, similar applications have been also addressed by the ML approaches in [15,12,30].
During an earthquake, ground motion experienced at a given surface location would depend on several parameters of the seismic source, material properties along wave propagation paths and local site conditions. The statistical prediction of ground motion is based on regression models that combines these parameters, and strives for reducing uncertainties by increasing observational data and model complexity. Ground motion estimates are key inputs to probabilistic analyses of seismic hazard, that are sensitive to these prediction uncertainties [4].
Alternatively, Authors in [19,20,28] employ ANN for ground motion estimation using as input, key parameters of the seismic source and the velocity model. In particular, results in [28] are more accurate than those provided by the Central and Eastern North America attenuation model (CENA) developed in [8]. Alternative ML approaches have been also used in similar applications, as discussed in [45,48].
Travel-time tomography and full-wavefrom inversion are subsurface imaging procedures based on the minimization of certain discrepancies between seismic data and simulations. Given a reference model, the former seeks the reduction of source-receiver travel times differences by relocating material interfaces, while the latter updates medium properties to reduce mismatchings between data and synthetic waveforms. The convergence of both methods to a realistic structural model is time consuming, while tomography may require continuous intervention of a human expert, full-wavefrom inversion is computationally demanding and requires a good initial model.
Recent ANN applications to subsurface imaging claim to overcome these deficiencies, using seismic data as input to identify important structures. In particular, [39] uses earthquake data to accurately predict 1-D velocity models, and applications in [5,34,27,53] employ data collected in seismic surveys for structural model building with interest in hydrocarbon exploration. In addition, the study in [37] applies a ANN to infer the prior distribution of acoustic properties of a geological model, that is later improved by full waveform inversion. This work includes promising applications to a synthetic reservoir-scale dataset of channel bodies. Lastly, we would like to note on alternative ML approaches that use data-driven dictionaries to adaptively capture complex profiles of geophysical parameters, and used to improve the convergence of seismic tomography [9] or inversion [58].
5 Neural Networks for Earthquake Detection
According to the figure 5, the identification of P and S waves on seismic traces is a classification problem, while the estimation of onset arrival times, as required for complete detection, corresponds to a regression problem. Either processing objective will be highlighted below when reviewing ANN solution approaches developed during the last thirty years, and this discussion will also include a ANN feature suitable to achieving it. The network input data can be the direct (probably processed) seismograms, or some previously extracted (primarily statistical) features from these signals.
Full traces provide to the ANN with the whole information carried by the seismic signal, but due to its high dimensionality, the mapping of learning examples to the feature space might be highly sparse, making the ANN prone to wrong convergences at the production stage. Alternatively, a more compact feature space results from a network learning process using only relevant signal features as input data, that will translate into high operational accuracy. However, if some important features are omitted, the ANN might performs poorly. In the following, we only focus on earthquake detection ANN under supervised and unsupervised learning given their very active development, and enough contributions to motivate this survey paper.
5.1 Supervised Networks
Supervised ANN fall into four main categories: Feed forward (FFNN), recurrent (RcNN), convolutional (CNN) and residual (RsNN). FFNN present a simple structure and are commonly used for data classification. RcNN are usually applied to time series analysis to identify sequence patterns, while CNN are typically employed for image recognition. Finally, among all supervised ANN, RsNN present a more complex structure with skipped connections or shortcuts, that enable a fast learning and give better performance than a plain network.
5.1.1 FFNN
For picking seismic phases on local earthquake data, deep FFNN using the direct full waveforms as input are employed in [16,17,22,57]. Network in [16] uses the amplitude of the three-component input seismograms, and achieves successful results on more of the 90% of the testing data for both P and S phases. Same Authors in [17] employ a single component of the input data, and attain P and S detection performances mostly above 80% in tested cases, with strong variations according to the chosen signal component, given the influences of the ray path and source position. The input traces to the three-layer ANN in [22] were band-pass filtered for a better arrival time prediction of P waves. The detection performance is assessed in terms of different noise levels, that consistently affects prediction accuracy as graphically shown by Authors. The technique in [57] behaves successfully for the phase identification of P waves on 95% of a testing set, that comprises 1254 seismograms of the IRIS network. The estimated onset times present an error less than 0.5 sec in 80% of the cases.
The parallel application in [51] for P wave detection implements two FFNN, which employ trace features as input data. The first one uses an extension of the standard STA/LTA triggering algorithm, developed in [2,3], where input are given by the ratios of short- and long-term averages (STA and LTA, respectively) of normalized trace windows. The rate of correct phase identification was nearly 92%. The second network operates with spectrograms of moving trace windows, which typically present a different behavior of those from non-earthquake signals. For the same testing data set, 98% of correct detection is observed. Same Authors in [50], develop a S-phase identification and picking FFNN that includes as input attributes, autoregressive model coefficients and measures of the signal power and polarization angles. On testing results, 86% correct rate of phase identification was achieved, and 74% of them were picked with onset time errors less than 0.1 sec.
Networks developed in all aforementioned works have been trained on and applied to local and regional earthquake data. FFNN's introduced in [46] use a training data base comprising P-wave signals of 193 teleseismic events, and operate on input data consisting of STA/LTA values computed in seven frequency bands. After training, these ANN detectors found 25% more events of the official event bulletin, compared to the reference Murdock-Hutt detector [40]. Appropriate ANN weight tuning led to a reduction of the false alarm rate. In addition, [46] explores the detection capabilities of Elman-Jordan RcNN, but found poor performances.
An important contribution to the automatic onset time picking of P and S waves is the FFNN proposed in [24], a neural tree called IUANT2, that presents a problem adaptive structure. The best structure of IUANT2 is inferred during the training phase, while allowing for noise filtering. The three-component input seismograms are preprocessed by removing trace mean and pass-band filtering, followed by the calculation of the vector magnitude of both horizontal components.
The estimation of P and S onset times are performed on statistical features, such variance, absolute skewness, kurtosis and combinations of these statistics, which are altogether computed on time windows of preprocessed traces. The testing dataset consists of more than 300 local earthquakes recorded by 23 different stations, and the picking accuracy, quantified through the normal distribution of time differences relative to manual picks, have standard deviations of 0.06 sec and 0.1 sec, for P and S waves, respectively. The accuracy of IUANT2 is then measured as the inverse of these deviation values, even though manual picks could also be erroneous.
5.1.2 RcNN
A RcNN for the real time detection of small magnitude (below 2.5 Mw) earthquakes is presented in [52]. In this case, the distribution of seismic stations in populated areas yields data with significant levels of noise. This requires a special detection method capable of recognizing small events without using preprocessing based on standard pass-band filtering. Instead, [52] applies a filter bank of STA/LTA ratios of the vertical component of seismic waveforms, with an elaborate design to keep all important frequencies of the signal, including the highest range, where disturbances are less significant. Thus, this RcNN is low prone to false detection occurrences. The initial training set comprises about 170 events including regional and teleseismic earthquakes, but the full operatibility was achieved after progressive tuning for real time P and S phase detection, on data from various monitoring stations. Testing was carried out during different periods of time in 2009, and relative to the standard STA/LTA method, this RcNN misses a few percentage of real events, but behaves much less sensitive to false detections, as expected. Authors found that the number of triggering stations is an important parameter to adjust, so it is crucial for potential applications in new regions.
Recent applications of RcNN to earthquake data are given in [26,7] with the purpose of earthquake prediction. The common architecture of a RcNN presents a set of Long Short-Term Memory (LSTM) layers, combined to dropout layers to avoid overfitting, and supply an efficient framework for detection and inference of temporal patterns. In both works, a series of past earthquakes is used for network training, that is used to predict a next trend of known events, to allow testing. Results are encouraging, although earthquake forecast remains as a highly controversial subject.
5.1.3 CNN
Among the state-of-the-art earthquake detection methods, the most popular deep CNN are probably ConvNetQuake [44], the generalized phase detection (GPD) [42], and PhaseNet [59]. However, we next also comment on a few additional and prominent works. Figure 6 illustrates the schematic structure of an automatic detection seismic system based on a fully connected ANN for classification and regression.
The raw seismic traces are usually subjected to waveform normalization (mean removal and amplitude scaling), band-pass filtering to isolate seismic phases, time splitting and windowing. Then, preprocessed data is used for network learning and validation, probably preceded by data labeling to allow supervised learning. In many cases, the network outputs reduce to class probabilities that associate window traces with P waves, S waves, and simple noise. The actual architecture of CNN mentioned below are somewhat more complex, and some cases are also capable of locating earthquakes.
ConvNetQuake is based on collection of nonlinear filters, and operates on time windows of single-station three-channel seismograms for P-wave detection and event location. Its main application has been the detection of natural or human-induced (related to waste water injection) low-magnitude earthquakes in Central Oklahoma, USA. Data preprocessing splits traces into monthly streams, applies mean subtraction and divides each trace component by the absolute peak amplitude, to finally defines appropriate windows. For training, near to two thousand cataloged seismic events occurred during 2014-2016 were employed, and 209 were later used for testing.
Here, ConvNetQuake displays a precision of 94.8%, measured as the fraction of detected events that are true events, with 74.5% of them being correctly located on a coarse Voronoi geographic partition of the study region. Also, the recall was 100%, defined as the fraction of true events correctly detected. On an independent testing set comprising 21 earthquakes with magnitude below 4.1 Mw that occurred in Northern California, ConvNetQuake achieves 100% detection, confirmed by autocorrelation, and 74.6% location correctness.
In seismology, nonetheless, an earthquake's hypocenter corresponds to be the physical location of the starting point of the rupture process, where stored strain energy in the rock is first released. Commonly, hypocenter coordinates are given in terms of longitude, latitude and depth below surface, and those define the earthquake location. Mapping the hypocenters of foreshocks, main event, and aftershocks allows picturing the three-dimensional movement of the fault system. Thus, the event association to a cluster of different geographical areas, as performed by ConvNetQuake, represents an incomplete information to seismologists. Recent CNN based approaches aim at actually solving the location problem.
The network in [31], handles three component seismic records of multiple stations, and after training the first convolutional layer becomes sensitive to characteristic features of seismic waveforms. Thus, this layer can behave as an event detector by itself. Training employs an earthquake swarm of 2000 events recorded by nine local stations. Later, during testing, this network successfully locates 908 earthquakes with standard deviations nearly of 56 m, 124 m and 136 m, along east-west, north-south and vertical directions, respectively. Alternatively, the network in [54] operates on single-station waveforms located in Oklahoma, USA, where earthquakes have been induced by hydrocarbon (oil and gas) production.
A training dataset consists of 1,013 historic events, and the output is a 3D probability volume of location likelihood inside the Earth. Testing was carried out using 194 earthquakes, and results present errors of approximately 4.9 km to the epicenter and 1.0 km to the hypocentral depth, on average, based on data from 30 network stations.
GPD employs a feature extraction system from seismic data, that are later used as input to a fully connected ANN, and finally outputs a classification as P waves, S waves, or just noise. The feature extraction system is a collection of preprocessing layers where seismic data is sequentially convolved with a set of digital filters for characteristic recognition, and then decimated for down-sampling and trace evaluation at different length scales.
Both processes combined to an activation function, allows the identification of seismic phases anywhere in a seismogram, regardless their duration and amplitude. The final stage of ANN classification outputs probabilities of the likelihood of each possible class (P, S or noise). For training and validation, GDP employs 4.5 millions of four second windows, where P and S waves correspond to events of magnitude below 5.7 Mw, recorded by the Southern California Seismic Network. The validation precision is nearly 99% for both phases and various detection probability thresholds, and recall is somewhat lower, between 96% and 99% for most threshold choices, suggesting a minor number of misclassifications of seismic phases as noise. GDP detection has been also validated on the 2016 Bombay Beach, California swarm of small and moderate (≤ 4.8 Mw) earthquakes, and with data of the 2016 Mw 7.0 Kumamoto earthquake recorded at multiple stations, within 100 km of the hypocenter. Highly successful results were shown for both cases.
PhaseNet operates on three-component unfiltered seismograms, and estimates P- and S-wave arrival times by generating probability time series that quantify the likelihood of P- and S-wave onsets. Its exhaustive training and validation make use of the extremely large dataset of analyst-labelled P and S arrival times from the Northern California Earthquake Data Center. Specifically, Authors employ stratified sampling based on stations and divide data into a training, validation and test sets, with more than 600K, 77K and 78K waveform samples, respectively.
A minimum preprocessing is applied to the training data, where each data component is normalized by removing its mean and dividing it by the standard deviation. Testing results are compared to those obtained by the standard AR picker algorithm [1], and PhaseNet outperforms this scheme with significant improvements on both phases, particularly for the S waves.
A densely connected CNN to capture laboratory slip events of different durations is given in [53]. This network presents a cascaded architecture to generate multi-scale slip proposals and detect events with various lengths. It also exploits atrous convolutions with different dilation rates to enrich feature extraction.
Training, validation and testing proceed using acoustic data acquired at the Rock and Sediment Mechanics Laboratory of Penn State University, that presents a thousand manually picked events, and 800 are taken for training. Relative to a well established template matching algorithm, Authors report a detection accuracy significantly higher, especially for highly different subsequent events.
5.1.4 RsNN
Residual networks are usually a combination of the three previous types of supervised ANN, and have achieved higher accuracy when solving same problems already tackled by one of them, or even dealing with more complex problems. For earthquake detection, the RsNN proposed in [38] and named CRED, first employs a CNN for feature extraction, that are later used by a RcNN for recognition of temporal patterns. The next and last stage of this network is a fully connected FFNN used for phase identification. Training and validation is based on a dataset of 550K three component seismograms recorded by 889 broadband and short-period stations in North California, and P-wave and S-wave arrivals have been labelled by manual picking. Traces of seismic noise are also records from the same network stations.
A first test on 50K waveform samples, yields detection precision higher than 96% and a recall above 99%, regardless the threshold choice on the output probabilities. Further testing for microseismic detection was undertaken in Central Arkansas, with a substantially different crustal structure, and events of lower magnitude with shorter epicentral distances. Without additional training, CRED finds 3 orders of magnitude more events compared with STA/LTA results. Comparisons against the FAST algorithm [55], reveal that the CRED detection rate is much lower, but it takes less than a hundredth of the computation time spent by FAST (non-parallel version).
5.2 Unsupervised
In this final section, we discuss few applications of unsupervised ANN to seismic event detection. In general, ANN of this kind are classified as autoencoders (AE) and generative adversarial networks (GAN).
5.2.1 AE
An autoencoder is a deep learning approach with noticeable advantages over most other ML techniques, especially when processing unlabelled data. This unsupervised technique is able to directly process the input raw data, with no need for preprocessing and previous labeling. A main application of AE is data compression, aiming at extracting representative features of the input and removing redundancies, so the overall size of the data may be highly reduced [33], [36]. This concept is called Dimensionality Reduction. Denoising AE (DAE) are also trained locally to mitigate noise on corrupted versions of the input data, with a consecutive step of reconstruction a cleaner version of the data, even under the presence of high noise levels. These networks are known as stacked DAE's (SDAE) [49]. In [43], a SDAE is developed to mitigate background noise on seismic traces, to later pick the P wave onset times on the cleaner waveforms. For evaluation, both synthetic and field seismic data are employed, and the SDAE algorithm accurately picks the onset time of 94.1% for 407 of the field seismograms, with a standard deviation error of 0.10 s. Results also indicates that this algorithm can make precise arrival time inferences on data with SNR as lows as -14 dB.
A hybrid network, that combines a SAE and a deep-belief ANN, has been applied to classify seven different classes of volcano-seismic events in [47]. Classification results validate the efficient capturing of complex relationships on volcano-seismic data by this new network, and compared to those delivered by other ML algorithms, the performance is higher under faster convergence.
5.2.2 GANs
The GAN proposed in [35] is trained to learn the characteristics of both earthquake first arrival P waves and background noise, resulting in a discriminator that mitigates false triggering. Training employs 300K seismic records from southern California and Japan to act as an automatic feature extractor, and then a Random Forest classifier is subjected to a secondary learning on a large set of earthquake signals and noise waveforms. When tested, the recognition performance was about 99.2% for P waves, and 98.4% for signals of pure noise. This excellent performance is accompanied by a very low sensitivity to false triggers from local impulsive noise.
6 Discussion and Conclusions
Detection of earthquake signals is fundamental for observational seismology. Main features of a reliable detection algorithm of seismic waves must include: high sensitivity to small magnitude events with variable trace patterns, low sensitivity to ambient noise and non-earthquake signals, enough flexibility to account for data from one or multiple seismic stations, and high computational efficiency for fast real time assistance or large dataset processing. These features are naturally fulfilled by Artificial Neural Networks (ANN). ANN, among few other machine learning techniques, offer advanced pattern recognition abilities that allow matching the detection performance of experienced human analysts, after being appropriate trained.
During the mid-late 90s, training of pioneer ANN proceeds on subsets of hundreds or few thousands of seismic traces, but the paramount advances of computational platforms and ANN implementation frameworks afterwards, have allowed using regional-wide earthquake catalogs with hundreds of thousands or even millions traces, during the ANN learning over last years. Thus, training of state-of-the-art ANN really accounts for multiple station information, seismic and noise signals with different shapes, local and non local earthquake data, and labeling (in case it is required) from various human experts. Because of different analysts may pick seismic phases in a dissimilar way, such a modern well-trained ANN become practically free of any human bias or likely errors.
Inspired on the basic time and frequency processing of seismic traces undertaken by human analysts, usually assisted by interpretation software, the input data to most of detecting ANN are not raw seismograms. A big variety of ANN operate on preprocessed traces, after a standard normalization (with optional filtering) and windowing, but other specimens only treat previously extracted statistical features, or even more specialized data representations such as spectrograms.
Developments before 2010 were mainly FFNN, mostly applied to local and regional earthquake data, although few cases target teleseismic events. For historical reasons, this survey paper cited few prominent works in that period, but the contribution in [24] pushed forward the feature-based trace processing, detection for single or multi station information, and the performance assessment as well. In current decade RcNN, with an inherent efficiency for recognition and inference of temporal patterns, have been used for real-time small-event monitoring under poor signal-to-noise ratios, and even have found successful applications in early warning systems and earthquake forecasts. During last two years, CNN and RsNN have taken the automatic seismic phase identification and onset estimation to a new level of regional-wide training and operation, with few cases also capable of seismological hypocentral location. With special emphasis, we here discussed contributions in [38], [44], [42] and [59].
A performance assessment among different algorithms for automatic phase detection and picking is rather difficult because of the specialized target application, the variety of accuracy metrics used, and the case-dependent testing dataset employed by each algorithm. For a systematic evaluation of the accuracy in phase onset pickings, for instance in [24].
Authors use the standard deviation of the time differences between the automatic and manual picks. This metric tends to be insensitive to the size of the testing dataset. They also employ the metrics of recall and precision to measure the ability of the system to detect the correct onset of seismic waves and to reject false alarms, considering the total number of reference picks. These metrics were rapidly adopted by others studies, and few others emerged. Because of the highly active developments of ANN, among others ML and general algorithms, in this area, there is a pressing need for establishing standardized benchmark datasets, of several sizes and SNR levels, to facilitate full assessment with clear quality measures.