# Stationarity Analysis in Time Series Data

Hey Geeks !!! in this blog, we’ll dive into the concept of stationarity using time series data. We’ll first understand what is time-series data, what is stationarity, why and when data should be stationary etc…

We’ll use the dataset I created specifically for this blog to analyze whether the data is stationary or not. We’ll also see how to convert the non-stationary data to stationary.

## Index

1. Introduction
2. Import Libraries and Dependencies
3. Define TimeSeriesData Class
4. Import Dataset
5. Accumulating Number of Sales by month
6. Create object
7. Stationarity Tests
1. Graphical Test
2. Rolling-Statistics Test
4. Kwiatkowski-Phillips-Schmidt-Shin Test (KPSS)
5. Zivot-Andrews Test
8. Conclusion
9. Convert data to Stationary
1. Derivatives
2. Transformation using Logarithmic Function
2. KPSS Test
3. Zivot-Andrews Test
4. Rolling-Statistics Test
10. Conclusion

## 1. Introduction

1.1 What is time-series data?
A time-series data is a dataset that tracks the movement of data points over a period of time, recorded at regular intervals.

1.2 What is data stationarity?
Time series data are said to be stationary if they do not have any seasonal effects or any trends.

A stationary data has the property that the mean, variance, and autocorrelation remain almost the same over various time intervals.

1.3 Why is stationary data necessary for forecasting?
When forecasting or predicting the future, most time series models assume that each point is independent of one another.

Therefore, stationary time series data is necessary for forecasting in order to obtain acceptable results.

1.4 What will happen if data is not stationary?
In non-stationary data, summary statistics like the mean and variance do change over time, providing a drift in the concepts a model may try to capture.

Therefore, the data cannot be forecasted using traditional time series models if the data is not stationary.

## 2. Import Libraries and Dependencies

Let’s import the packages required and install the dependencies needed in our code.

```!pip install statsmodels --upgrade
!pip install openpyxl==3.0.0

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import zivot_andrews
```

## 3. Define TimeSeriesData Class

Let’s define a class `TimeSeriesData` that contains necessary methods which are defined below.

We’ll look into the function definitions and explanation of the code in this section.
The implementation cases are explained in the fore-coming sections.

##### 1. Constructor

First of all, a constructor is defined that taking a dataset and the target column specific to the dataset as arguments.

```def __init__(self, dataset, target_column): #constructor
#dataset - time series dataset
#target_column - name of target column in the dataset
self.dataset = dataset    self.target_column = target_column
```
• self.dataset – Contains the dataset
• self.target_column – Contains the target column specific to each dataset
##### 2. Display Column Names

A method to display column names of the dataset.

```def display_columns(self): #display column names present in the dataset
print(self.dataset.columns)
```
##### 3. Display random samples

A method to display random samples using random_state = 42 as default.

```def display_samples(self, random_state=42): #display random samples from the dataset
display(self.dataset.sample(10, random_state=random_state))
```
##### 4. Drop Columns

Method to drop the columns given as arguments

```def drop_columns(self, columns): # drop columns inplace from the dataset
# columns - list of columns to drop from the dataset
self.dataset.drop(columns, axis=1, inplace=True)
```
##### 5. Graphical Analysis

Now, let’s see some mathematical/statistical functions needed to analyze stationarity in the dataset.

```def graphical_analysis(self): #analyse the stationarity by histogram
self.dataset[[self.target_column]].hist()
```

The above method just plots the histogram of the target variable of the dataset. The interpretation from the graph will be explained in later sections.

##### 6. Distribution Plot

Next, the distribution plot of the target variable with respect to time can be plotted by using the below method.

```def distribution_plot(self, column_name): #plot graph
plt.figure(figsize=(22,8))
plt.title(self.target_column)
plt.xlabel('Date')
plt.ylabel(self.target_column)
plt.plot(self.dataset[column_name]);
plt.show()
```
##### 7. Mean Variance Stationarity Analysis

As we already know that stationary data has the property that the mean, variance remains almost the same over various time intervals.

So let’s define a method that splits the data into two contiguous sequences. Then the method computes the mean and variance for the two splits. After that, both mean are compared to check if they are near to each other. Similarly, nearness for variance is checked.

From the comparisons, we can conclude that, if the corresponding measures for the different time intervals are nearer to each other (with some considered significance level) then the data is stationary else it is non-stationary data.

```def mean_variance_stationary_analysis(self, column_name):
# Splitting the time series data into two contiguous sequence and calculating mean and variance to compare the means and variances of the two sequence.
#column_name - name of the column to be analysed

X = self.dataset[[column_name]].values
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]

mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()

print("Mean :",mean1, mean2)
print("Variance :", var1, var2)
```
##### 8. Rolling Statistics Test

Next, we’ll define a method, which I call The Rolling Statistics Test, that plots the mean and standard deviation of the target variable for every x (Here, x=12) time intervals cumulatively.

So that, from the graph, we can infer whether the mean and standard deviation are almost constant throughout the data.

```def rolling_statistics_test(self, column_name):
# Function to give a visual representation of the data to define its stationarity.
#column_name - name of the column to be tested

X = self.dataset[column_name]
rolling_mean = X.rolling(window=12).mean()
rolling_std = X.rolling(window=12).std()

plt.figure(figsize=(20,8))
orignal_data = plt.plot(X , color='black', label='Original') #original data
roll_mean_plot = plt.plot(rolling_mean , color='red', label='Rolling Mean')  #rolling mean
roll_std_plot = plt.plot(rolling_std, color='blue', label = 'Rolling Standard Deviation')  #rolling SD
plt.legend(loc='best')
plt.title("Rolling mean and Standard Deviation")
plt.show(block=False)
```

Next comes the Augmented Dickey-Fuller Test to analyze stationarity using hypothesis.

The Augmented Dickey-Fuller test is one of the more widely used type of statistical test (called a unit root test) that it determines how strongly a time series is defined by a trend.

• The null hypothesis of the test is that it is not stationary (has some time-dependent structure).
• The alternate hypothesis (rejecting the null hypothesis) is that the time series is stationary.

We interpret this result using the p-value from the test.

• p-value > 0.05: Fail to reject the null hypothesis (H0), the data is non-stationary.
• p-value <= 0.05: Reject the null hypothesis (H0), the data is stationary.
```def augmented_dickey_fuller_test(self, column_name):
#The Augmented Dickey-Fuller test is one of the more widely used type of statistical test (called a unit root test)
#that it determines how strongly a time series is defined by a trend.
#column_name - name of the column to be tested

X = self.dataset[column_name].dropna()

print('Critial Values:')

print(f'   {key}, {value}')

print("\n The Data is Stationary")
else:
print("\nThe Data is Non-Stationary")
```
##### 10. KPSS Test

Next comes, Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test

If p-value < 0.05, then the series is non-stationary.

• Null Hypothesis (HO): Series is trend stationary.
• Alternate Hypothesis(HA): Series is non-stationary.

Note: Hypothesis is reversed in KPSS test compared to ADF Test.

If the null hypothesis is failed to be rejected, this test may provide evidence that the series is trend stationery.

```def kwiatkowski_phillips_schmidt_shin_test(self, column_name):
# The Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test figures out if a time series is stationary around
# a mean or linear trend, or is non-stationary due to a unit root.
#column_name - name of the column to be tested

X = self.dataset[column_name].dropna()
print ('Results of KPSS Test:')
kpss_test = kpss(X, regression='c')
kpss_test_output = pd.Series(kpss_test[0:3], index=['Test Statistic','p-value','#Lags Used'])

for key,value in kpss_test.items():
kpss_test_output['Critical Value (%s)'%key] = value
print(kpss_test_output)

if(kpss_test > 0.05):
print("\n The Data is Stationary\n\n")
else:
print("\nThe Data is Non-Stationary\n\n")
```
##### 11. Zivot-Andrews Test

The final test we will be using for our analysis is Zivot-Andrews Test

In this test, if p-value is <= 0.05, our data is stationary else it is non-stationary.

```def zivot_andrews_test(self, column_name):
#column_name - name of the column to be tested

X = self.dataset[column_name].dropna()
t_stat, p_value, critical_values, _, _ = zivot_andrews(X)
print(f'Zivot-Andrews Statistic: {t_stat:.2f}')

for key, value in critical_values.items():
print('Critial Values:')
print(f'   {key}, {value:.2f}')
print(f'\np-value: {p_value:.6f}')

if(p_value <= 0.05):
print("Stationary")
else:
print("Non-Stationary")
```

Methods for stationarity analysis ends here. We can be able to analyze the stationarity of our data by using the above tests.

Now, as we have seen earlier that in order to perform forecasting, most time series models assume that each point is independent of one another, which means the data should be stationary.

So the tests above conclude whether our data is stationary or not. If the data is stationary, the data can be straight-away be used for predictive modeling using time series models (if necessary).

Contrary to that, if the data we considered is not stationary, then it is our task to transform the data to stationary.

So, Let’s discuss some methods of our class `TimeSeriesData` which transforms the data to stationary if they are not.

We will be following two methodologies :

• Calculating three-order of derivatives
• Transformation using logarithmic function and then calculating first derivative
##### 12. Calculating Derivatives

The method to calculate three orders of derivatives is defined below

```def calculating_derivatives(self): #calculating three orders of differentiations
self.dataset['diff_1'] = self.dataset[self.target_column].diff(periods=1)
self.dataset['diff_2'] = self.dataset[self.target_column].diff(periods=2)
self.dataset['diff_3'] = self.dataset[self.target_column].diff(periods=3)
```
##### 13. Logarithmic Transformation

Method to perform logarithmic transformation and the first derivative is defined below

```def log_transform_derivative_1(self): #transform the column by logarithmic and then calculating the first order derivative
self.dataset['log_diff_1'] = np.log(self.dataset[self.target_column]).diff().dropna()
```
##### Class TimeSeriesData

So that’s all our methods of the class. Now the `TimeSeriesData` class is defined as below,

```class TimeSeriesData:
def __init__(self, dataset, target_column): #constructor
#dataset - time series dataset
#target_column - name of target column in the dataset
self.dataset = dataset
self.target_column = target_column

def display_columns(self): #display column names present in the dataset
print(self.dataset.columns)

def display_samples(self, random_state=42): #display random samples from the dataset
display(self.dataset.sample(10, random_state=random_state))

def drop_columns(self, columns): # drop columns inplace from the dataset
# columns - list of columns to drop from the dataset
self.dataset.drop(columns, axis=1, inplace=True)

def graphical_analysis(self): #analyse the stationarity by histogram
self.dataset[[self.target_column]].hist()

def distribution_plot(self, column_name): #plot graph
plt.figure(figsize=(22,8))
plt.title(self.target_column)
plt.xlabel('Date')
plt.ylabel(self.target_column)
plt.plot(self.dataset[column_name]);
plt.show()

def mean_variance_stationary_analysis(self, column_name):
# Splitting the time series data into two contiguous sequence and calculating mean and variance to compare the means and variances of the two sequence.
#column_name - name of the column to be analysed

X = self.dataset[[column_name]].values
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]

mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()

print("Mean :",mean1, mean2)
print("Variance :", var1, var2)

def rolling_statistics_test(self, column_name):
# Function to give a visual representation of the data to define its stationarity.
#column_name - name of the column to be tested

X = self.dataset[column_name]
rolling_mean = X.rolling(window=12).mean()
rolling_std = X.rolling(window=12).std()

plt.figure(figsize=(20,8))
orignal_data = plt.plot(X , color='black', label='Original') #original data
roll_mean_plot = plt.plot(rolling_mean , color='red', label='Rolling Mean')  #rolling mean
roll_std_plot = plt.plot(rolling_std, color='blue', label = 'Rolling Standard Deviation')  #rolling SD
plt.legend(loc='best')
plt.title("Rolling mean and Standard Deviation")
plt.show(block=False)

def augmented_dickey_fuller_test(self, column_name):
#The Augmented Dickey-Fuller test is one of the more widely used type of statistical test (called a unit root test)
#that it determines how strongly a time series is defined by a trend.
#column_name - name of the column to be tested

X = self.dataset[column_name].dropna()

print('Critial Values:')

print(f'   {key}, {value}')

print("\n The Data is Stationary")
else:
print("\nThe Data is Non-Stationary")

def kwiatkowski_phillips_schmidt_shin_test(self, column_name):
# The Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test figures out if a time series is stationary around
# a mean or linear trend, or is non-stationary due to a unit root.
#column_name - name of the column to be tested

X = self.dataset[column_name].dropna()
print ('Results of KPSS Test:')
kpss_test = kpss(X, regression='c')
kpss_test_output = pd.Series(kpss_test[0:3], index=['Test Statistic','p-value','#Lags Used'])

for key,value in kpss_test.items():
kpss_test_output['Critical Value (%s)'%key] = value
print(kpss_test_output)

if(kpss_test > 0.05):
print("\n The Data is Stationary\n\n")
else:
print("\nThe Data is Non-Stationary\n\n")

def zivot_andrews_test(self, column_name):
#column_name - name of the column to be tested

X = self.dataset[column_name].dropna()
t_stat, p_value, critical_values, _, _ = zivot_andrews(X)
print(f'Zivot-Andrews Statistic: {t_stat:.2f}')

for key, value in critical_values.items():
print('Critial Values:')
print(f'   {key}, {value:.2f}')
print(f'\np-value: {p_value:.6f}')

if(p_value <= 0.05):
print("Stationary")
else:
print("Non-Stationary")

def calculating_derivatives(self): #calculating three orders of differentiations
self.dataset['diff_1'] = self.dataset[self.target_column].diff(periods=1)
self.dataset['diff_2'] = self.dataset[self.target_column].diff(periods=2)
self.dataset['diff_3'] = self.dataset[self.target_column].diff(periods=3)

def log_transform_derivative_1(self): #transform the column by logarithmic and then calculating the first order derivative
self.dataset['log_diff_1'] = np.log(self.dataset[self.target_column]).diff().dropna()
```

## 4. Import Dataset

The dataset we will be using is the sales of Maruti Suzuki cars from the year 2018 to 2021 for every months in a particular location.
Actually, the dataset is not from original source and since the aim of this blog is to perform analysis and transformation only, the dataset is not important.

The dataset contains 3 columns,

• Cars – Type of car
• Date – Date in yyyy-mm-dd
• Number of Sales – Number of car sales in the location. (Location is not specified)

Let’s import our dataset

```cars_data  = pd.read_excel("/content/drive/MyDrive/Datasets/Car_Sales.xlsx")
cars_data
```
```	Cars	        Date	        Number of Sales
0	Swift Dzire	2018-01-01	101.0
1	Swift Dzire	2018-02-01	99.0
2	Swift Dzire	2018-03-01	101.0
3	Swift Dzire	2018-04-01	89.0
4	Swift Dzire	2018-05-01	99.0
...	...	...	...
283	Celerio	        2021-08-01	118.0
284	Celerio	        2021-09-01	124.0
285	Celerio	        2021-10-01	108.0
286	Celerio	        2021-11-01	106.0
287	Celerio	        2021-12-01	67.0
288 rows × 3 columns```

## 5. Accumulating Number of Sales by month

Let’s accumulate the number of sales for all cars with respect to date

```cars_data  = cars_data .groupby('Date').sum()
cars_data.sample(10)
```
```
Date	        Number of Sales
2020-04-01	786.0
2021-11-01	610.0
2018-11-01	518.0
2019-10-01	602.0
2018-12-01	593.0
2020-03-01	798.0
2018-06-01	557.0
2021-02-01	663.0
2020-05-01	814.0
2019-04-01	630.0```

## 6. Create Object

```cars_data  = TimeSeriesData(cars_data , target_column='Number of Sales')
cars_data.display_columns()
```
`Index(['Number of Sales'], dtype='object')`
```cars_data.display_samples()
```
```Date	        Number of Sales
2020-04-01	786.0
2021-05-01	640.0
2020-03-01	798.0
2021-08-01	589.0
2020-01-01	791.0
2021-02-01	663.0
2019-01-01	566.0
2019-08-01	647.0
2018-05-01	676.0
2020-02-01	789.0```

## 7. Stationarity Tests

Below are the techniques we’ll follow in this blog to analyze the stationarity of our dataset.

• Graphical
• Rolling-Statistics Test
• KPSS test
• Zivot-Andrews Test

### 1. Graphical

The plot below depicts the number of sales from the year January 2018 to December 2021.

From the plot, it can be seen that there is a trend in the data in the date January 2020 till December 2020. From this, it can be proved that the data is non-stationary.

But still, let us perform some tests to prove mathematically / statistically whether the data is stationary or non-stationary.

```cars_data.distribution_plot('Number of Sales')
```
```cars_data.graphical_analysis()
```

Let’s split the time series data into two contiguous sequences and calculate the mean and variance for each split. Then we’ll compare the corresponding means and variances of the two splits.

If the means are nearer to each other and the variances are nearer to each other, then the data can be said to be in stationary.

```cars_data.mean_variance_stationary_analysis('Number of Sales')
```
```Mean : 608.625 701.4583333333334
Variance : 2126.0677083333335 12394.164930555555```

From the output displayed above, the mean varies significantly (so does the variance). Therefore once again it is proved that there is some trend in the data (non-stationary data) we have taken.

### 2. Rolling-Statistics Test

Now, let’s do some mathematical calculations and interpret the result visually based on the above section 7.1 Graphical Analysis.
In the above section, we split the data into two and compared the means and variances of the splits.

To generalize throughout the entire data, we’ll use the rolling_statistics_test() which plots a graph of mean and variances throughout the data so that we can visually analyze if there is any trend in the data.

```cars_data.rolling_statistics_test('Number of Sales')
```

The graph of rolling mean and rolling standard deviation is not constant at every time intervals, this shows that the dataset might be non-stationary.

### 3. Augmented Dickey-Fuller (ADF) Test

Enough of tests by visualization, now let’s perform some statistical tests on our data to analyze the stationarity, starting with ADF test.

```cars_data.augmented_dickey_fuller_test('Number of Sales')
```
```ADF Statistic: -2.160524410739529
p-value: 0.2208833697150417
Critial Values:
1%, -3.596635636000432
5%, -2.933297331821618
10%, -2.6049909750566895

The Data is Non-Stationary```

We can see that our statistic value of -2.16 is greater than the value of -3.60 at 1%. This suggests that we can not reject the null hypothesis with a significance level of less than 1%.

Accepting the null hypothesis means that the time series is non-stationary or it have time-dependent structure.

### 4. Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test

```cars_data.kwiatkowski_phillips_schmidt_shin_test('Number of Sales')
```
```Results of KPSS Test:
Test Statistic           0.210379
p-value                  0.100000
#Lags Used               4.000000
Critical Value (10%)     0.347000
Critical Value (5%)      0.463000
Critical Value (2.5%)    0.574000
Critical Value (1%)      0.739000
dtype: float64

The Data is Stationary```

The p-value 0.1 is > 0.05, which doesn’t fail to reject the null hypothesis. Therefore, the data is stationary.

### 5. Zivot-Andrews Test

```cars_data.zivot_andrews_test('Number of Sales')
```
```Zivot-Andrews Statistic: -4.21
Critial Values:
1%, -5.28
Critial Values:
5%, -4.81
Critial Values:
10%, -4.57

p-value: 0.235844
Non-Stationary```

Since p-value, 0.235844 > 0.05, it is not a stationary data.

## 8. Conclusion

1. Mean and Variance of the split data proves the data to be non-stationary.
2. Rolling-Statistics proves the data to be non-stationary.
3. ADF test proves the data to be non-stationary.
4. KPSS test proves the data to be stationary.
5. Zivot-Andrews test proves the data to be non-stationary.

Therefore, the majority of the tests prove the data to be non-stationary.

## 9. Convert data into Stationary

Now, we have seen that non-stationary time series data is not good for predictive modeling. Hence it is a mandatory step to convert the data into stationary before performing any predictive modeling.

Note: It is not necessary to follow this section if the data is stationary. But since the scope of this blog also covers the conversion, the data is generated in such a way.

The Conversion techniques we’ll be following are,

1. Derivatives
2. Logarithmic transformation

### 1. Derivatives

Let us perform three orders of differences on the ‘Number of Enrollments’ column.

```cars_data.calculating_derivatives()
cars_data.display_samples()
```
```Date		Number of Sales	diff_1	diff_2	diff_3
2020-04-01	786.0   	-12.0	-3.0	-5.0
2021-05-01	640.0	         43.0	62.0	-23.0
2020-03-01	798.0          	 9.0	7.0	145.0
2021-08-01	589.0   	53.0	90.0	-51.0
2020-01-01	791.0   	138.0	272.0	189.0
2021-02-01	663.0   	54.0	-122.0	-185.0
2019-01-01	566.0   	-27.0	48.0	-61.0
2019-08-01	647.0   	1.0	18.0	-26.0
2018-05-01	676.0   	116.0	64.0	57.0
2020-02-01	789.0   	-2.0	136.0	270.0```

Let’s perform an ADF test on the three orders of derivatives to check for stationarity.

#### 1. First Derivative

```cars_data.augmented_dickey_fuller_test('diff_1')
cars_data.distribution_plot('diff_1')
```
```ADF Statistic: -2.607621497021281
p-value: 0.09143226802145005
Critial Values:
1%, -3.596635636000432
5%, -2.933297331821618
10%, -2.6049909750566895

The Data is Non-Stationary```

We can see that our statistic value of -2.607 is greater than the value of -3.60 at 1%. This suggests that we can not reject the null hypothesis with a significance level of less than 1%.

Accepting the null hypothesis means that the time series is non-stationary or it has a time-dependent structure.

#### 2. Second Derivative

```cars_data.augmented_dickey_fuller_test('diff_2')
cars_data.distribution_plot('diff_2')
```
```ADF Statistic: -2.737569268981533
p-value: 0.0677596974413701
Critial Values:
1%, -3.6055648906249997
5%, -2.937069375
10%, -2.606985625

The Data is Non-Stationary```

We can see that our statistic value of -2.74 is greater than the value of -3.61 at 1%. This suggests that we can not reject the null hypothesis with a significance level of less than 1%.

Accepting the null hypothesis means that the time series is non-stationary or it has a time-dependent structure.

#### 3. Third Derivative

```cars_data.augmented_dickey_fuller_test('diff_3')
cars_data.distribution_plot('diff_3')
```
```ADF Statistic: -2.9231663307597175
p-value: 0.04271963058833072
Critial Values:
1%, -3.610399601308181
5%, -2.939108945868946
10%, -2.6080629651545038

The Data is Non-Stationary```

We can see that our statistic value of -2.92 is greater than the value of -3.61 at 1%. This suggests that we can not reject the null hypothesis with a significance level of less than 1%.

Accepting the null hypothesis means that the time series is non-stationary or it has a time-dependent structure.

### 2. Transformation using Logarithmic Function

Now, we will transform the data by logarithmic function and then, take the first-order difference to check for stationarity

```cars_data.log_transform_derivative_1()
cars_data.display_samples()
```
```Date		Number of Sales	diff_1	diff_2	diff_3	log_diff_1
2020-04-01	786.0   	-12.0	-3.0	-5.0	-0.015152
2021-05-01	640.0   	43.0	62.0	-23.0	0.069551
2020-03-01	798.0   	9.0	7.0	145.0	0.011342
2021-08-01	589.0   	53.0	90.0	-51.0	0.094292
2020-01-01	791.0   	138.0	272.0	189.0	0.191721
2021-02-01	663.0   	54.0	-122.0	-185.0	0.084957
2019-01-01	566.0   	-27.0	48.0	-61.0	-0.046600
2019-08-01	647.0   	1.0	18.0	-26.0	0.001547
2018-05-01	676.0   	116.0	64.0	57.0	0.188256
2020-02-01	789.0   	-2.0	136.0	270.0	-0.002532```

```cars_data.augmented_dickey_fuller_test('log_diff_1')
cars_data.distribution_plot('log_diff_1')
```
```ADF Statistic: -9.510294279497844
p-value: 3.263135273864606e-16
Critial Values:
1%, -3.5812576580093696
5%, -2.9267849124681518
10%, -2.6015409829867675

The Data is Stationary```

We can see that our statistic value of -9.51 is lesser than the value of -3.58 at 1%. This suggests that we can reject the null hypothesis with a significance level of less than 1%.

Rejecting the null hypothesis means that the time series is stationary or it does not have a time-dependent structure.

#### 2. KPSS Test

```cars_data.kwiatkowski_phillips_schmidt_shin_test('log_diff_1')
```
```Results of KPSS Test:
Test Statistic           0.077049
p-value                  0.100000
#Lags Used               3.000000
Critical Value (10%)     0.347000
Critical Value (5%)      0.463000
Critical Value (2.5%)    0.574000
Critical Value (1%)      0.739000
dtype: float64

The Data is Stationary```

The p-value 0.1 is > 0.05, which doesn’t fail to reject the null hypothesis. Therefore, the data is stationary according to the KPSS test.

#### 3. Zivot-Andrews Test

```cars_data.zivot_andrews_test('log_diff_1')
```
```Zivot-Andrews Statistic: -9.82
Critial Values:
1%, -5.28
Critial Values:
5%, -4.81
Critial Values:
10%, -4.57

p-value: 0.000010
Stationary```

Since p-value <= 0.05, it is a stationary data.

#### 4. Rolling-Statistics Test

```cars_data.rolling_statistics_test('log_diff_1')
```

The graph of rolling mean and rolling standard deviation is almost constant in every time interval, therefore the dataset might be stationary.

## 10. Conclusion

• Rolling-Statistics might prove that the data is stationary.
• ADF test proves the data to be stationary.
• KPSS test proves the data to be stationary.
• Zivot-Andrews test proves the data to be stationary.

Therefore, the majority of the tests prove the data to be stationary.

Hence, the non-stationary data is transformed into stationary data.

##### Important Notice for college students

If you’re a college student and have skills in programming languages, Want to earn through blogging? Mail us at geekycomail@gmail.com

For more Programming related blogs Visit Us Geekycodes. Follow us on Instagram.