1. Introduction
The logistic operations that originate from ports of commerce around the world play an important role and reflect the economic growth and development of the region, country, and even continent where they are located (Chan et al., 2019; Le et al., 2014; Zhang et al., 2013). The maritime world trade volume between 2000 and 2015 increased from 3.17 to 9.02 million metric tons for container cargo and from 6.90 to 16.69 million metric tons for bulk cargo (grain, coal, iron ore, phosphate rock, and bauxite/alumina), a 185% and 142% growth that helped to increase the worldwide economy (United Nations Conference on Trade and Development, 2015). In this light, the study of the demand of cargo volumes is of the great importance for the development of the economy of any country in the world. The development of forecasting models allows to analyze the demand of cargo volumes, which help governments and policy makers to make decisions on planning and investments looking to enhance the economic growth, and to analyze the logistic operations (e.g., the port’s capacity of cargo volume) because these are the factors that must be estimated to determine ports growth in cargo capacity and infrastructure, while reducing the costs associated with its expansion (Gaur et al., 2011; Thill & Lim, 2010). Therefore, the corresponding administrative port authorities and government representatives benefit by understanding the factors that determine future demands of cargo volumes to expand and modernize ports (Hales et al., 2017). The problem of forecasting cargo volumes is the accuracy of the prediction delivered by the applied forecasting models. In literature, different forecasting methods have been developed or used to cope with different types of cargo data making the selection of the appropriate method an important problem to solve because forecasting accuracy depends on many factors, and the availability of time to forecast (Nieto & Carmona-Benitez, 2018). Consequently, the aim of this paper is to increase the degree of accuracy by comparing the predictive performance of a univariate forecasting model (ARIMA + GARCH + Bootstrap (AGB)), a robust univariate forecasting models (multiplicative Holt-Winters (HWM)), a multivariate forecasting model (support vector regression (SVR), and an autoregressive integrated moving average with explanatory variables model (ARIMAX). To the best of our knowledge, there are no previous works that analyze the accuracy of AGB model to forecast the demand of cargo volume. The case study is the ports of San Pedro, California formed by the port of Los Angeles and the port of Long Beach. These ports are studied because they represent 30% of the United States market share, they rank number 9 in world trade volume, and they now handle more containers per ship call than any other port in the world (The Port of Los Angeles, 2020). The database contains imports and exports of bulk, container, reefer, and ro-ro cargo. Results show that the HWM model is the best method to forecast imports and exports of bulk cargo, while the SVR model is the best method to forecast imports and exports of container, reefer, and ro-ro cargo. The Diebold-Mariano Test, the MSE metric and the MAPE metric validate the results.
This paper is organized as follows: Section 2 presents a literature review about forecasting models applied to analyze the maritime port transportation industry; Section 3 presents some insights about the forecasting models applied in this research; Section 4 details the case study; Section 5 shows an analysis and discussion of the results; and Section 6 concludes this paper and discusses potential future research.
2. Literature review
There are several time series univariate models that have been applied to forecast the volume of cargo for different ports. Peng and Chu (2009) consider 6 different univariate forecasting models (the time series decomposition model, the trigonometric regression, the regression model with seasonal dummy variables, the grey model, the hybrid grey model, and the SARIMA model) to predict the container throughput in ports. Their goal is to determine the most accurate forecasting model for container throughput. They test these models using data from three Taiwan ports. The analysis of the mean absolute error (MAD) metric, the mean absolute percentage error (MAPE) metric, and the root mean squared error (RMSE) metric indicate that the classical time series decomposition model outperforms the other 5 univariate forecasting models considered.
Additionally, Dragan et al. (2014) forecast the volume of container cargo using quarterly data for three ports (Adriatic port of Koper (Slovenia), Venece, Trieste and Ravenna (Italy)) applying an exponential smoothing Holt-Winters model, a classical time series decomposition model, and an Autoregressive, Integrated, Moving Average model (ARIMA). Among these, the most accurate forecast comes from the ARIMA model because the comparison of the MAD metric, MAPE metric, and the RMSE metric validate it.
Zhang et al. (2103) mix the grey forecasting model with logistic-growth-curve model to develop a new model to forecast cargo throughput for the port. The results indicate that their model performs better than the grey forecasting model and the logistic-growth-curve model. However, their model shows a disadvantage because data must show a S-shaped change trend.
Jansen (2014) studies the different factors that determine the demand of cargo throughput of a port and must be included to develop forecasting methods. According to this paper, a forecasting method must consider production, distribution, modal split, and assignment to the network factors to forecast cargo throughput of a port. Jansen (2014) determines a four steps approach: production, distribution, modal split, and assignment to the network. In the first step, I/O models or O/D tables are the most accurate to forecast production. In the second step, gravity models are usually applied to forecast distribution demand. In this step, gravity models considered macroeconomic variables such as the gross domestic product (GDP), population (Pop), and income per capita. In the third step, discrete choice models such as multinomial network models are applied to consider competition between ports. Finally, qualitative assessments with a group of experts must be performed to consider factors that cannot be obtained or quantified such as port focus, location, and shipping lines strategic decisions.
Although many time-series methods can be used to forecast the container throughput of a port, few studies have compared the accuracy of time-series methods applied to forecast container throughput of a port. To narrow this gap, Chan et al. (2019) compare the accuracy of four regression-based forecasting methods (Moving average (MA), multivariate adaptive regression splines, ARIMA, and Grey Model) and two machine learning based forecasting methods (artificial neural network and SVR) to forecast the container throughput of a port. Their results show that the SVR (machine learning approach) is the most accurate among all. The results prove that machine learning approaches can be used to train forecasting methods, but the nature of data might affect their accuracy because socioeconomic data should be included to improve accuracy.
To the best of our knowledge, there are no previous works that analyze the accuracy of AGB model to forecast the demand of cargo volume. This forecasting model is proposed by Nieto and Carmona-Benítez (2018) to forecast the demand of the air transportation industry. They compare the AGB model with the ARIMA model, the additive Holt-Winters, the HWM and the Damp Trend Grey model. Among these, the most accurate forecasts come from the AGB model.
3. Methodology
In this paper, four methodologies are applied to forecast the demand of different types of cargo at ports: the AGB model, the HWM model, the SVR model, and the ARIMAX model.
3.1. ARIMA + GARCH + Bootstrap model
This method divides cargo demand in three components: trend, variability, and distribution. Therefore, the dynamic method is a combination of ARIMA method to estimate and forecast trend, a GARCH method to forecast and estimate variability, and Bootstrap methods to estimate the distribution. The objective is to construct a method capable of combining the trend and its variations, to eliminate the detrimental effects on forecasting.
The ARMA method analyses the trend and seasonality based on probabilistic properties of data. For a complete description of the ARMA model see Box et al., (2016). As cargo demand is not stationary, d differences are applied to become data stationary. Therefore, the ARMA model is transformed into the ARIMA model.
The GARCH model analyzes the variability implied in the time-series method. The GARCH(p, q) model of Bollerslev (1986) shape where p is the lagged square error term and q is the term that indicates the lagged variance. In this paper, the GARCH(1,1) model is used meaning the variability of this month depends on the variability of the last month. There is a vast literature on GARCH models, for a review see Tsay (2005).
Finally, Bootstrap methods are used to simulate the distribution pattern over time and offer an alternative to provide a better approach in finite samples.
Assuming cargo demand follows the next model
Where
For
Finally, drawing random samples (
For a complete description of the AGB model read Nieto and Carmona-Benitez (2018).
3.2. Multiplicative Holt-Winters model
This model is among the most used methods for forecasting. The model captures the trend and seasonality effects of time-series. It is a seasonal forecasting model that multiplies trend by seasonality calculating a multiplicative forecast. This model includes three smoothing equations to estimate the level (Eq. 6), the trend (Eq. 7), and the seasonal components (Eq. 8) and the forecast equation (Eq. 9). For a complete description of the HWM model read Winters (1960). Eqs. 6 to 9 calculate the HWM model.
3.3. Support vector regression model
The support vector machine (SVM) algorithm was proposed by Vapnik and Lerner (1963) and Vapnik and Chervonenkis (1964). SVM can be applied to solve
classification problems but also regression problems for estimating a variable
through the behavior of certain explanatory variables. In this paper, we apply
the SVM algorithm using the SVR model. The idea behind SVR is to find a function
which deviates a small fixed (ε) quantity from the response variable, i.e., an
error is allowed but it should be less than
The objective function of SVR is the following:
s.t.
Where
3.4. Autoregressive integrated moving average with explanatory variables
An ARIMAX model is an extension of the ARIMA model described in Eq. 12 with the inclusion of explanatory variables (Stock & Watson, 1999). The ARIMAX model can be represented as follows:
Where r is the lag degree of the explanatory variable. For a detailed description of the ARIMAX model see Stock and Watson (1999).
4. Case study
California is the most populous state in the United States, and it is home to more than 40 million residents. According to the Bureau of Economic Analysis, California’s GDP makes it the fifth largest economy in the world. Los Angeles County is by far the most populous county in the state of California, and it is home to more than 10 million inhabitants. Its GDP is equivalent to that of Saudi Arabia, the 18th largest economy in the world. The port of Los Angeles and the port of Long Beach are adjacent to each other. Together, they are known as the San Pedro Bay Ports and have been in operation for more than 100 years. The San Pedro port complex includes over 35 cargo terminals that handle all cargo types including containerized, break-bulk, dry bulk, liquid bulk, and ro-ro. Commodities handled by the port include crude oil and other liquid bulk petroleum products as well as petroleum coke, manufactured products, electrical machinery, and pulp and wastepaper. Container terminals have on-dock rail with access to Class I railroads via short-line rail. The ports also have access to the Alameda Corridor, a 20-mile-long rail line connecting the port of Los Angeles and port of Long Beach with the national rail network (U.S. Department of Transportation, 2017; 2018). The San Pedro Ports rank number 9 in world trade volume, and they now handle more containers per ship call than any other port in the world. They also represent 73% of the West Coast market share, and 30% of the United States market share. In 2020, the ports of San Pedro handled more than 17 million containers (The Port of Los Angeles, 2020). Along with other ports in the West Coast
(Oakland, Seattle, and Tacoma), the ports of San Pedro can handle the largest vessels in terms of containers in the United States. Therefore, they can handle mega ships with 18,000 containers (The Port of Los Angeles, 2020).
This paper analyzes exports and imports of bulk, container, reefer, and ro-ro cargo divided into cargo data from the ports of San Pedro. The dataset contains monthly data from January 2008 to December 2016. The dataset is then divided in two groups: in sample data from January 2008 to December 2015 (84 data points), and out of sample data from January 2016 to December 2016 (12 data points). In sample data is used for estimating the HWM model, SVR model, the AGB model, and the ARIMAX model.
For the SVR model and the ARIMAX model, economic data for Los Angeles County and for the state of California is used. As mentioned before, the volume of trade reflects the economic conditions in a region. Accordingly, our dataset is divided in three sets: the first set includes data for the port of Los Angeles
and the port of Long Beach separately (bulk, containter, refeer, and ro-ro cargo); the second set include economic data for the state of California (initial and continuing unemployment (UNEMP) claims, covered UNEMP, and insured UNEMP rate); and the third set include economic data for both the state of California and the Los Angeles County (civilian labor force, number of employed and unemployed workers, and UNEMP rates).
Table 1 presents some of the summary statistics for volume of different types of cargo and for some of the variables included in the SVR model. The economic variables included in Table 1 represent model inputs that help to predict the desired target output, in this case the volume of cargo at the San Pedro Ports.
Exports | bulk* | container | reefer | ro-ro |
Average | 795.20 | 2,386.80 | 154.60 | 29.10 |
Median | 797.10 | 2,390.60 | 153.60 | 28.70 |
Std Dev | 165.50 | 261.60 | 24.40 | 5.30 |
Min | 243.50 | 1,530.60 | 107.50 | 19.00 |
Max | 1,135.30 | 2,812.00 | 214.30 | 42.10 |
Kurtosis | 0.20 | 1.00 | -0.40 | -0.40 |
Skewness | -0.20 | -0.80 | 0.30 | 0.30 |
Imports | bulk | container | reefer | ro-ro |
Average | 2,171.60 | 3,489.60 | 155.10 | 87.80 |
Median | 2,161.80 | 3,573.60 | 151.90 | 88.90 |
Std Dev | 460.80 | 415.80 | 19.60 | 19.50 |
Min | 1,101.30 | 2,034.00 | 119.90 | 34.50 |
Max | 3,614.40 | 4,213.40 | 214.30 | 133.10 |
Kurtosis | 0.60 | 0.30 | -0.20 | - |
Skewness | 0.30 | -0.70 | 0.50 | -0.20 |
Los Angeles | Labor Force** | Employed | Unemployed | UNEMP Rate |
4,952.80 | 4,487.60 | 465.2 | 9.40% | |
45.9 | 156.2 | 123 | 2.50% | |
California | Labor Force | Employed | Unemployed | UNEMP Rate |
18,539.80 | 16,877.80 | 1,662.00 | 9.00% | |
280.7 | 644.3 | 434.4 | 2.40% |
Notes: *Cargo figures in 1,000's metric tons. **Employment numbers in 1,000's.
Consistent with the historical U.S. trade deficit, the volume of cargo for all types is larger for imports than for exports. Considering the average volume of cargo, imports are 75 percent higher than exports.
As shown in Table 1, container is the main type of cargo for the ports of San Pedro and ro-ro is the least common type of cargo for both exports and imports. At the same time, container cargo presents the lowest rate of variation for both imports and exports, bulk cargo presents the highest variation for exports, and ro-ro cargo presents the highest variation for imports.
5. Empirical results and models’ accuracy
5.1. Empirical results
Figure 1 shows the forecasts of bulk (upper left), container (upper right), reefer (lower left), and ro-ro cargo (lower right). In Figure 1, the continuous line shows the real data, the dashed line represents the AGB model forecast, the asterisk line represents the SVR model forecast, the pointed line represents the HWM model forecast, and the line with circle marks is the ARIMAX model forecast.
In Figure 1 (upper left), the HWM model is the best method to forecast exports of bulk cargo and the AGB model is the worst method to forecast exports of bulk cargo. The latter is true because the AGB model significantly underestimates the actual value for May 2016. The SVR model and the ARIMAX model are also relatively good models to forecast exports of bulk cargo, Figure 1 shows that the absolute values of these forecasts are not far from real data. The upper right graph in Figure 1 shows that the SVR model is the best method to forecast exports of container cargo, and the AGB model is again the worst method to forecast exports of container cargo. The HWM model and the ARIMAX are relatively good models to forecast exports of container cargo. But Figure 1 shows that the HWM model tends to overestimate the real data. The lower left graph of Figure 1 shows that the SVR model is slightly the best method to forecast exports of reefer cargo, and the AGB model is the worst method to forecast exports of reefer cargo. The AGB model forecasts are not far from the SVR model forecasts, the HWM model forecasts, and real data. Finally, the lower right graph of Figure 1 shows that the SVR model and the HWM model are the best methods to forecast exports of ro-ro cargo, their forecasts are relatively close to one another and to real data. The ARIMAX model is a relatively good model to forecast exports of ro-ro cargo, because its forecasts are not so far in absolute value from real data. The AGB model is the worst model to forecast exports of ro-ro cargo, its forecasts tend to overestimate the changes in real data.
Figure 2 shows the forecasts for imports of bulk (upper left), container (upper right), reefer (lower left), and ro-ro (lower right) cargo. As before, the continuous line shows the real data, the dashed line represents the AGB model forecast, the asterisk line represents the SVR model forecast, the pointed line represents the HWM model forecast, and the line with circle marks represents the ARIMAX model forecast. Like in the cargo exports forecast analysis, Figure 2 (upper left) shows that the HWM model is the best method to forecast imports of bulk cargo, and the AGB model is the worst method to forecast imports of bulk cargo. Regarding the latter, the AGB model significantly overestimates the real value of the fifth month (May 2016) and underestimates the real value of the seventh month (July 2016). The SVR model is a good model to forecast imports of bulk cargo because its forecasts are not so far in absolute values from real data. The ARIMAX model underestimates the real data. The SVR model is the best method to forecast imports of container cargo (Figure 2, upper right corner). The AGB and the HWM models overestimate the second month (February 2016). Figure 2 upper right corner shows that the forecasts of the SVR model, the AGB model, the HWM model, and the ARIMAX model are close to real data from April 2016 to December 2016, the accuracy of these models improve after certain data point for the imports of container data. The SVR model is slightly the best method to forecast imports of reefer cargo, and the AGB model is the worst method to forecast imports of reefer cargo (Figure 2, lower left corner). The AGB model forecasts are not far from the SVR, ARIMAX, HWM models forecasts, and from real data. Both the HWM and AGB model forecasts significantly overestimate the imports of cargo for November and December 2016. Finally, the SVR model and the HWM model are the best methods to forecast imports of ro-ro cargo (Figure 2, lower right corner). The AGB model significantly overestimates the real data for March and August 2016, and underestimates for April and October 2016.
5.2. Models’ accuracy
Table 2 presents the RMSE metric and the MAPE metric for each model and for imports and exports separately. Table 2 shows that the HWM model is the most accurate method to forecast exports and imports of bulk cargo according to the RMSE metric and the MAPE metric, and the SVR model is the most accurate method to forecast exports and imports of container, reefer, and ro-ro cargo. Table 2 also demonstrates that the AGB model is the least accurate method to forecast exports and imports of bulk, container, reefer, and ro-ro cargo. These statistical metrics are consistent with the results shown by Figure 1 and Figure 2.
RMSE | Export | Export | Export | Export |
bulk | container | reefer | ro-ro | |
AGB | 223,422.51 | 185,712.63 | 16,978.70 | 2,450.72 |
SVR | 132,838.32 | 76,431.40 | 11,433.81 | 1,205.67 |
HWM | 77,194.10 | 160,819.51 | 14,026.50 | 1,237.11 |
ARIMAX | 205,004.39 | 140,915.85 | 16,525.90 | 1,567.60 |
RMSE | Import | Import | Import | Import |
bulk | container | reefer | ro-ro | |
AGB | 766,584.59 | 603,086.88 | 19,648.34 | 20,982.09 |
SVR | 305,003.23 | 151,636.42 | 7,948.46 | 2,760.96 |
HWM | 276,256.34 | 293,427.15 | 18,359.19 | 7,252.10 |
ARIMAX | 511,282.30 | 299,687.78 | 17,540.00 | 9,695.91 |
MAPE | Export | Export | Export | Export |
bulk | container | reefer | ro-ro | |
AGB | 0.2774 | 0.0647 | 0.1001 | 0.0962 |
SVR | 0.159 | 0.021 | 0.0655 | 0.0421 |
HWM | 0.0961 | 0.0636 | 0.0907 | 0.0467 |
ARIMAX | 0.2747 | 0.0467 | 0.1022 | 0.0573 |
MAPE | Import | Import | Import | Import |
bulk | container | reefer | ro-ro | |
AGB | 0.2958 | 0.1258 | 0.1028 | 0.214 |
SVR | 0.0826 | 0.0286 | 0.0334 | 0.0259 |
HWM | 0.103 | 0.0646 | 0.0997 | 0.0641 |
ARIMAX | 0.1886 | 0.0704 | 0.0891 | 0.0975 |
Table 3 presents the Diebold-Mariano tests results (Diebold & Mariano, 1995). The Diebold-Marino test compares the accuracy of the four models under study against each other. In the first row of Table 3, the SVR model is a better method than the AGB model to forecast exports and imports of bulk, container, reefer, and ro-ro cargo. In the second row of Table 3, the HWM model is a better method than the SVR model to forecast exports of bulk, reefer, and ro-ro cargo, and imports of bulk cargo; the SVR model is a better method than the HWM model to forecast exports of container, and imports of container, reefer, and ro-ro cargo. In the third row of Table 3, the SVR model is a better method than the ARIMAX model to forecast exports of bulk, container, reefer, and imports of bulk, container, reefer, and ro-ro cargo; and the ARIMAX model is a better method than the SVR model to forecast exports of ro-ro cargo. In the fourth row of Table 3, the HWM model is a better method than the AGB model to forecast exports and imports of bulk, container, reefer, and ro-ro cargo. In the fifth row of Table 3, the ARIMAX model is a better method than the AGB model to forecast exports and imports of bulk, container, reefer, and ro-ro cargo. Finally, in the sixth row of Table 3, the HWM model is a better method than the ARIMAX model to forecast exports and imports of bulk cargo; and the ARIMAX model is a better method than the HWM model to forecast exports and imports of container, reefer, and ro-ro cargo.
D-M test | Export | Export | Export | Export |
p-values | bulk | container | reefer | ro-ro |
SVR - AGB | 0.0855 | 0.0423 | 0.0923 | 0.0148 |
SVR - HWM | 0.9076 | 0.0304 | 0.2803 | 0.4651 |
SVR-ARIMAX | 0.0078 | 0.0206 | 0.0663 | 0.1839 |
AGB - HWM | 0.9725 | 0.7539 | 0.7268 | 0.9865 |
AGB - ARIMAX | 0.624 | 0.8265 | 0.5356 | 0.9629 |
HWM - ARIMAX | 0.0178 | 0.6705 | 0.3062 | 0.1961 |
D-M test | Import | Import | Import | Import |
p-values | bulk | container | reefer | ro-ro |
SVR - AGB | 0.0243 | 0.0348 | 0.0096 | 0.005 |
SVR - HWM | 0.5998 | 0.0786 | 0.0223 | 0.0652 |
SVR-ARIMAX | 0.001 | 0.0246 | 0.0373 | 0.0113 |
AGB - HWM | 0.9733 | 0.9564 | 0.6968 | 0.9868 |
AGB - ARIMAX | 0.9287 | 0.9213 | 0.6739 | 0.9832 |
HWM - ARIMAX | 0.0366 | 0.4725 | 0.5701 | 0.1905 |
6. Conclusions
The aim of this paper is to compare the performance of four forecasting models that are commonly used to forecast the volume of exports and imports of bulk, container, reefer, and ro-ro cargo. The four forecasting models under study are the univariate forecasting model AGB, a robust univariate forecasting models HWM, the multivariate machine learning forecasting model SVR, and the time-series model with explanatory variables ARIMAX. The data under analysis is for the ports of San Pedro (the port of Los Angeles and the port of Long Beach) in California. According to the Diebold-Mariano test, the RMSE metric, and the MAPE metric, the machine learning forecasting model SVR shows superior predictive ability to forecast exports and imports of bulk, container, reefer, and ro-ro cargo. In the case of bulk, the Diebold-Mariano test, the RMSE metric, and the MAPE metric indicate that the HWM model performs better than the SVR model, the AGB model, and the ARIMAX model. The results clearly show that the behavior of cargo demand is highly affected by economic indicators, the AGB model and the HWM model are not capable of capturing the impacts of explanatory variables. It is because these models only consider the information inside the time-series data (trend, seasonality, and variability) and not explanatory variables which the SVR model and ARIMAX model consider.
As future work, it is important to extend the database of this study to increase the out of sample horizon to robust the study because the Diebold-Mariano test, the RMSE metric, and the MAPE metric would be more reliable. This is important because the out of sample of a forecast depends on how long the time series is and how far ahead you want to forecast. Moreover, the AGB literature review reports that this forecasting model is more accurate when using large time-series, and consequently, the results might change. Another future work is to replicate the study in other Port systems if data is available.