Association between Crop Acreage and Warming Temperature in US

1. Introduction
2. Data Exploration
3. Fit a model
4. Model Diagnostics
- Corn
- Soybean
5. Conclusion
6. Sources

1. Introduction

U.S. agricultural sector is facing a changing production environment due to global climatic changes. V Our planet’s average surface temperature has risen about 2.12 degrees Fahrenheit (1.18 degrees Celsius) since the late 19th century. These changes has broader impacts on crop yields, agricultural water supply and consumer welfare. Farmers and policy makers worked together to use different adaptive strategies for climate variability such as adjusting preseason planting, switching to crop types and implementing more efficient irrigation practices.
Corn and soybean are two primary crop types in U.S.. Corn is the most widely produced feed grain in the United States (U.S.), accounting for more than 95 percent of total production and use. Soybeans comprise about 90 percent of U.S. oilseed production. In this study, I am interested in how corn&soybean acreage is changing across years under the impact of rising temperature.
The corn&soybean data are from National Agricultural Statistics Service (NASS). This database contains official published aggregate estimates related to U.S. agricultural production. Among the planted, harvested, and glazed acreage, I chose to use harvested acreage. The measuring unit is acre. The reason for using the harvested measurement is that harvested acreage could indicate not only how much grains are harvested but also how much areage farmers choose to grow crops. As for the temperature data, I used the global temperature anomalies in degree Celcius from datahub. The original data is from GISS Surface Temperature.
In general, my specific objectives are (1) how the acreage of these two types are changing differently to warming temperature (2) how to use time series model to describe the relationship between corn acreage and temperature & between soybean acreage and temperature.

2. Data Exploration

Our temperature data and acreage data has different time ranges. The first step is to intersect the overlapping years. This gives us yearly data from 1924-2016.
Firstly, I plot the corn acreage, soybean acreage and temperature anomalies against years.We could see from the plot that the corn acreage firstly decrease then increase. There are two sharp fluctuations in around 1935 and in around 1982. The 1930s fluctuation can be resulted from the Great Depression coupled with featured historic heat happened almost every years. The other fluctuation are from the 1980’s intense heat wave and drought. The soybean acreage increase steadily since 1924. This is because since early 1900, soybean is exported from east to U.S. and during 1930s, soybean productions start to increase. The temperature plot shows the global average temperature is rising with slight fluctuations.

We can tell there is a obvious trend in corn acreage and soybean acreage. These trends can be caused by the technology advancement in agricultural practices like automatic tillage tools. This trend can also be explained by land cover change. There is evidence from research that more and more forest areas are converted into agricultural lands.
In this case, I want to study the acreage growth rate rather than acreage because we’ll definitely see a increasing trend of crop production. If we want to analyze how temperature is associated, we might want to study whether growth rate is stagnated by warming temperature.
From this plot, we could see soybean acreage growth rate (red) and corn acreage growth rate (green) concord with temperature anomaly(blue) in small fluctuations. When there is a temperature peak, there is usually a peak in soybean. This might result from these two types are both heat-tolerant crop types.

3. Fit a model

In the above plot, we could see a slight decreasing trend of corn as well as acreage growth rate. I think this could associate with the warming temperature. Before, fitting a time series model, I firstly analyze the time series structure of the residuals to determine if they have an AR structure.I plot the ACF and PACF of usual linear model. In the ACF and PACF of corn acreage growth rate model, the residuals spike and then cut off and we should compare AR(1), MA(1), and ARMA(1,1) for error. For soybean, there is no spike and no seasonality in residuals, it might be the case that we can use a simple linear model without any ARMA error.

##       [,1]  [,2]  [,3]  [,4] [,5]  [,6] [,7]  [,8]  [,9] [,10] [,11] [,12]
## ACF  -0.36  0.00 -0.12  0.04 0.08 -0.03 0.01 -0.01 -0.12  0.00  0.11 -0.08
## PACF -0.36 -0.15 -0.21 -0.11 0.03  0.00 0.02  0.03 -0.14 -0.13  0.03 -0.08
##      [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
## ACF   0.00 -0.02  0.03  0.01 -0.06  0.05  0.02 -0.09
## PACF -0.06 -0.02 -0.01  0.00 -0.06 -0.02  0.02 -0.08

##       [,1]  [,2]  [,3] [,4] [,5]  [,6] [,7] [,8]  [,9] [,10] [,11] [,12] [,13]
## ACF  -0.04 -0.14 -0.01 0.21 0.13 -0.17 0.08 0.21 -0.10 -0.14  0.09  0.19  0.00
## PACF -0.04 -0.15 -0.02 0.19 0.15 -0.11 0.11 0.16 -0.14 -0.10  0.08  0.08  0.03
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20]
## ACF  -0.14  0.11  0.01 -0.03 -0.13  0.05  0.03
## PACF  0.00  0.08 -0.11 -0.01 -0.09 -0.05 -0.06

After looking at residuals, we then fit the model with ARMA error. The linear model with ARMA error can be written in the following format. \[ Y = X\beta + \eta\] In this equation, Y stands for corn/soybean acreage growth rate and X stands for temperature anomalies. And \(\eta\) is a is a stationary, causal, invertible ARMA(p,q) process with mean zero. The trend \(\mu_n\) has a linear specification: \[\mu_n = \sum_{k=1}^K X_{k}\beta_k\] In order to find the best models for corn and soybean, we plot the AIC table. From the AIC table, we can see the ARMA(0,1) has the lowest AIC value for corn. I also check the roots to make sure it satisfied the invertibility. As for soybean, ARMA(0,0) has the lowest AIC value for soybean and this coincides with our residual plot. Then I keep the simple linear model for soybean.

Corn
	MA0	MA1	MA2
AR0	696.63	679.10	680.30
AR1	685.32	678.38	677.11
AR2	684.84	677.51	679.09

Soybean
	MA0	MA1	MA2
AR0	700.26	702.07	702.69
AR1	702.12	703.69	704.66
AR2	702.11	704.09	694.84

## 
## Call:
## arima(x = df$cornpdt.rate, order = c(0, 0, 1), xreg = df$temp.Mean)
## 
## Coefficients:
##           ma1  intercept  df$temp.Mean
##       -0.5443    -0.9837        3.0961
## s.e.   0.1015     0.5088        1.6326
## 
## sigma^2 estimated as 85.88:  log likelihood = -335.55,  aic = 679.1

## [1] 1.837222

We also add a ARMA model without any temperature trend to compare the performance of our trend-with-error model. We got a p-value of 0.06981809, which is slightly higher than 0.05. If we relax the confidence interval of 95%, we believe we cannot reject the null hypothesis that our trend-with-error performs better.

## 
## Call:
## arima(x = df$cornpdt.rate, order = c(0, 0, 1))
## 
## Coefficients:
##           ma1  intercept
##       -0.4979    -0.5157
## s.e.   0.0951     0.4995
## 
## sigma^2 estimated as 89.07:  log likelihood = -337.2,  aic = 680.39

The spectrum analysis indicates there is no clear seasonality in the dataset. Thus, there is no need to use SARIMA model.

Finally, for corn, we have the model in this format: \[Y_n = -1.249 + 3.0961 x_n + -0.9837 + -0.5443 \epsilon_{n-1}\] where \(Y_n\) represents the corn acreage growth rate at time \(n,\) \(x_n\) represents mean temperature anomaly. This indicates for an increase in 1 degree Celcius of mean temperature anomaly from the mean, we expect that the US corn acreage growth rate to increase by 3.0961%.
For soybean, we have the model: \[Y_n = 6.462 + -10.910 x_n\] where \(Y_n\) represents the corn acreage growth rate at time \(n,\) \(x_n\) represents mean temperature anomaly. This indicates for an increase in 1 degree celcius of mean temperature anomaly from the mean, we expect that the US corn acreage growth rate to decrease by 10.910%.

4. Model Diagnostics

Corn

In ACF plot, the horizontal dashed lines are all within the band, indicating that we cannot reject the null hypothesis of iid residuals at the 95% confidence intervals. The residual plots show there are several outliers in the data, which are exactly the extreme hear events as I mentioned in the introduction part. The QQ plot suggest that the error generally shows normal distribution, maybe a little “heavy tails” which means there are more data in the tails of the histogram.

Soybean

As for the simple linear regression model, the residual plot and QQ plot show some outlier points from year 1935, 1936 and 1942, which is after 1936 heat wave. Apart from these outliers, the residuals generally have non-linear patterns and follow normal distributions. The scale-location plot shows the residuals appear randomly spread. As for the residuals-leverage plot, it is a typical look when there is no influential case since we can barely see Cook’s distance lines in the plot.

5. Conclusion

From our above models, we could see corn acreage growth rate could be associated with temperature anomaly by a linear regression model with ARMA(0,1) errors and soybean acreage growth rate can be associated with temperature anomaly by a simple linear regression model. Both crop types are heat-tolerant. However, the soybean acreage will increase more slowly from linear model while corn acreage will increase more quickly if global temperature is increasing. I think this might result from corn is a feed grain, which it often provides energy for human and animal diets. Farmers may substitute corn for other feed grains. In order to study how temperature is influencing corn&soybean acreage, I think future work includes studying the ratio of corn/soybean acreage to all the agricultural acreage. Also, if crop yield data is available, then we could look at the relationship of crop yield growth and warming temperature.

6. Sources

[1] ‘Soybeans & Oil Crops’ https://www.ers.usda.gov/topics/crops/soybeans-oil-crops/
[2] https://ionides.github.io/531w21/
[3] Marshall, Elizabeth, et al. Climate change, water scarcity, and adaptation in the US fieldcrop sector. No. 1477-2017-3969. 2015.
[4] ‘Lesson 8: Regression with ARIMA errors, Cross correlation functions, and Relationships between 2 Time Series’. https://online.stat.psu.edu/stat510/book/export/html/669
[5] Schnitkey, G. “Acre Changes in Crops from 1991 to 2016.” farmdoc daily (7):76, Department of Agricultural and Consumer Economics, University of Illinois at Urbana-Champaign, April 25, 2017.
[6] ‘Climate Change: How Do We Know?’ https://climate.nasa.gov/evidence/