Hey Geeks !!! in this blog, we’ll dive into the concept of stationarity using time series data. We’ll first understand what is time-series data, what is stationarity, why and when data should be stationary etc…
We’ll use the dataset I created specifically for this blog to analyze whether the data is stationary or not. We’ll also see how to convert the non-stationary data to stationary.
Index
- Introduction
- Import Libraries and Dependencies
- Define TimeSeriesData Class
- Import Dataset
- Accumulating Number of Sales by month
- Create object
- Stationarity Tests
- Graphical Test
- Rolling-Statistics Test
- Augmented Dickey-Fuller Test (ADF)
- Kwiatkowski-Phillips-Schmidt-Shin Test (KPSS)
- Zivot-Andrews Test
- Conclusion
- Convert data to Stationary
- Derivatives
- Transformation using Logarithmic Function
- ADF Test
- KPSS Test
- Zivot-Andrews Test
- Rolling-Statistics Test
- Conclusion
1. Introduction
1.1 What is time-series data?
A time-series data is a dataset that tracks the movement of data points over a period of time, recorded at regular intervals.
1.2 What is data stationarity?
Time series data are said to be stationary if they do not have any seasonal effects or any trends.
A stationary data has the property that the mean, variance, and autocorrelation remain almost the same over various time intervals.
1.3 Why is stationary data necessary for forecasting?
When forecasting or predicting the future, most time series models assume that each point is independent of one another.
Therefore, stationary time series data is necessary for forecasting in order to obtain acceptable results.
1.4 What will happen if data is not stationary?
In non-stationary data, summary statistics like the mean and variance do change over time, providing a drift in the concepts a model may try to capture.
Therefore, the data cannot be forecasted using traditional time series models if the data is not stationary.
2. Import Libraries and Dependencies
Let’s import the packages required and install the dependencies needed in our code.
!pip install statsmodels --upgrade
!pip install openpyxl==3.0.0
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller, kpss
from statsmodels.tsa.stattools import zivot_andrews
3. Define TimeSeriesData Class
Let’s define a class TimeSeriesData
that contains necessary methods which are defined below.
We’ll look into the function definitions and explanation of the code in this section.
The implementation cases are explained in the fore-coming sections.
1. Constructor
First of all, a constructor is defined that taking a dataset and the target column specific to the dataset as arguments.
def __init__(self, dataset, target_column): #constructor
#dataset - time series dataset
#target_column - name of target column in the dataset
self.dataset = dataset self.target_column = target_column
- self.dataset – Contains the dataset
- self.target_column – Contains the target column specific to each dataset
2. Display Column Names
A method to display column names of the dataset.
def display_columns(self): #display column names present in the dataset
print(self.dataset.columns)
3. Display random samples
A method to display random samples using random_state = 42 as default.
def display_samples(self, random_state=42): #display random samples from the dataset
display(self.dataset.sample(10, random_state=random_state))
4. Drop Columns
Method to drop the columns given as arguments
def drop_columns(self, columns): # drop columns inplace from the dataset
# columns - list of columns to drop from the dataset
self.dataset.drop(columns, axis=1, inplace=True)
5. Graphical Analysis
Now, let’s see some mathematical/statistical functions needed to analyze stationarity in the dataset.
def graphical_analysis(self): #analyse the stationarity by histogram
self.dataset[[self.target_column]].hist()
The above method just plots the histogram of the target variable of the dataset. The interpretation from the graph will be explained in later sections.
6. Distribution Plot
Next, the distribution plot of the target variable with respect to time can be plotted by using the below method.
def distribution_plot(self, column_name): #plot graph
plt.figure(figsize=(22,8))
plt.title(self.target_column)
plt.xlabel('Date')
plt.ylabel(self.target_column)
plt.plot(self.dataset[column_name]);
plt.show()
7. Mean Variance Stationarity Analysis
As we already know that stationary data has the property that the mean, variance remains almost the same over various time intervals.
So let’s define a method that splits the data into two contiguous sequences. Then the method computes the mean and variance for the two splits. After that, both mean are compared to check if they are near to each other. Similarly, nearness for variance is checked.
From the comparisons, we can conclude that, if the corresponding measures for the different time intervals are nearer to each other (with some considered significance level) then the data is stationary else it is non-stationary data.
def mean_variance_stationary_analysis(self, column_name):
# Splitting the time series data into two contiguous sequence and calculating mean and variance to compare the means and variances of the two sequence.
#column_name - name of the column to be analysed
X = self.dataset[[column_name]].values
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print("Mean :",mean1, mean2)
print("Variance :", var1, var2)
8. Rolling Statistics Test
Next, we’ll define a method, which I call The Rolling Statistics Test, that plots the mean and standard deviation of the target variable for every x (Here, x=12) time intervals cumulatively.
So that, from the graph, we can infer whether the mean and standard deviation are almost constant throughout the data.
def rolling_statistics_test(self, column_name):
# Function to give a visual representation of the data to define its stationarity.
#column_name - name of the column to be tested
X = self.dataset[column_name]
rolling_mean = X.rolling(window=12).mean()
rolling_std = X.rolling(window=12).std()
plt.figure(figsize=(20,8))
orignal_data = plt.plot(X , color='black', label='Original') #original data
roll_mean_plot = plt.plot(rolling_mean , color='red', label='Rolling Mean') #rolling mean
roll_std_plot = plt.plot(rolling_std, color='blue', label = 'Rolling Standard Deviation') #rolling SD
plt.legend(loc='best')
plt.title("Rolling mean and Standard Deviation")
plt.show(block=False)
9. ADF Test
Next comes the Augmented Dickey-Fuller Test to analyze stationarity using hypothesis.
The Augmented Dickey-Fuller test is one of the more widely used type of statistical test (called a unit root test) that it determines how strongly a time series is defined by a trend.
- The null hypothesis of the test is that it is not stationary (has some time-dependent structure).
- The alternate hypothesis (rejecting the null hypothesis) is that the time series is stationary.
We interpret this result using the p-value from the test.
- p-value > 0.05: Fail to reject the null hypothesis (H0), the data is non-stationary.
- p-value <= 0.05: Reject the null hypothesis (H0), the data is stationary.
def augmented_dickey_fuller_test(self, column_name):
#The Augmented Dickey-Fuller test is one of the more widely used type of statistical test (called a unit root test)
#that it determines how strongly a time series is defined by a trend.
#column_name - name of the column to be tested
X = self.dataset[column_name].dropna()
adf_test_result = adfuller(X)
print(f'ADF Statistic: {adf_test_result[0]}')
print(f'p-value: {adf_test_result[1]}')
print('Critial Values:')
for key, value in adf_test_result[4].items():
print(f' {key}, {value}')
if(adf_test_result[0] < adf_test_result[4]['1%']):
print("\n The Data is Stationary")
else:
print("\nThe Data is Non-Stationary")
10. KPSS Test
Next comes, Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test
If p-value < 0.05, then the series is non-stationary.
- Null Hypothesis (HO): Series is trend stationary.
- Alternate Hypothesis(HA): Series is non-stationary.
Note: Hypothesis is reversed in KPSS test compared to ADF Test.
If the null hypothesis is failed to be rejected, this test may provide evidence that the series is trend stationery.
def kwiatkowski_phillips_schmidt_shin_test(self, column_name):
# The Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test figures out if a time series is stationary around
# a mean or linear trend, or is non-stationary due to a unit root.
#column_name - name of the column to be tested
X = self.dataset[column_name].dropna()
print ('Results of KPSS Test:')
kpss_test = kpss(X, regression='c')
kpss_test_output = pd.Series(kpss_test[0:3], index=['Test Statistic','p-value','#Lags Used'])
for key,value in kpss_test[3].items():
kpss_test_output['Critical Value (%s)'%key] = value
print(kpss_test_output)
if(kpss_test[1] > 0.05):
print("\n The Data is Stationary\n\n")
else:
print("\nThe Data is Non-Stationary\n\n")
11. Zivot-Andrews Test
The final test we will be using for our analysis is Zivot-Andrews Test
In this test, if p-value is <= 0.05, our data is stationary else it is non-stationary.
def zivot_andrews_test(self, column_name):
#column_name - name of the column to be tested
X = self.dataset[column_name].dropna()
t_stat, p_value, critical_values, _, _ = zivot_andrews(X)
print(f'Zivot-Andrews Statistic: {t_stat:.2f}')
for key, value in critical_values.items():
print('Critial Values:')
print(f' {key}, {value:.2f}')
print(f'\np-value: {p_value:.6f}')
if(p_value <= 0.05):
print("Stationary")
else:
print("Non-Stationary")
Methods for stationarity analysis ends here. We can be able to analyze the stationarity of our data by using the above tests.
Now, as we have seen earlier that in order to perform forecasting, most time series models assume that each point is independent of one another, which means the data should be stationary.
So the tests above conclude whether our data is stationary or not. If the data is stationary, the data can be straight-away be used for predictive modeling using time series models (if necessary).
Contrary to that, if the data we considered is not stationary, then it is our task to transform the data to stationary.
So, Let’s discuss some methods of our class TimeSeriesData
which transforms the data to stationary if they are not.
We will be following two methodologies :
- Calculating three-order of derivatives
- Transformation using logarithmic function and then calculating first derivative
12. Calculating Derivatives
The method to calculate three orders of derivatives is defined below
def calculating_derivatives(self): #calculating three orders of differentiations
self.dataset['diff_1'] = self.dataset[self.target_column].diff(periods=1)
self.dataset['diff_2'] = self.dataset[self.target_column].diff(periods=2)
self.dataset['diff_3'] = self.dataset[self.target_column].diff(periods=3)
13. Logarithmic Transformation
Method to perform logarithmic transformation and the first derivative is defined below
def log_transform_derivative_1(self): #transform the column by logarithmic and then calculating the first order derivative
self.dataset['log_diff_1'] = np.log(self.dataset[self.target_column]).diff().dropna()
Class TimeSeriesData
So that’s all our methods of the class. Now the TimeSeriesData
class is defined as below,
class TimeSeriesData:
def __init__(self, dataset, target_column): #constructor
#dataset - time series dataset
#target_column - name of target column in the dataset
self.dataset = dataset
self.target_column = target_column
def display_columns(self): #display column names present in the dataset
print(self.dataset.columns)
def display_samples(self, random_state=42): #display random samples from the dataset
display(self.dataset.sample(10, random_state=random_state))
def drop_columns(self, columns): # drop columns inplace from the dataset
# columns - list of columns to drop from the dataset
self.dataset.drop(columns, axis=1, inplace=True)
def graphical_analysis(self): #analyse the stationarity by histogram
self.dataset[[self.target_column]].hist()
def distribution_plot(self, column_name): #plot graph
plt.figure(figsize=(22,8))
plt.title(self.target_column)
plt.xlabel('Date')
plt.ylabel(self.target_column)
plt.plot(self.dataset[column_name]);
plt.show()
def mean_variance_stationary_analysis(self, column_name):
# Splitting the time series data into two contiguous sequence and calculating mean and variance to compare the means and variances of the two sequence.
#column_name - name of the column to be analysed
X = self.dataset[[column_name]].values
split = round(len(X) / 2)
X1, X2 = X[0:split], X[split:]
mean1, mean2 = X1.mean(), X2.mean()
var1, var2 = X1.var(), X2.var()
print("Mean :",mean1, mean2)
print("Variance :", var1, var2)
def rolling_statistics_test(self, column_name):
# Function to give a visual representation of the data to define its stationarity.
#column_name - name of the column to be tested
X = self.dataset[column_name]
rolling_mean = X.rolling(window=12).mean()
rolling_std = X.rolling(window=12).std()
plt.figure(figsize=(20,8))
orignal_data = plt.plot(X , color='black', label='Original') #original data
roll_mean_plot = plt.plot(rolling_mean , color='red', label='Rolling Mean') #rolling mean
roll_std_plot = plt.plot(rolling_std, color='blue', label = 'Rolling Standard Deviation') #rolling SD
plt.legend(loc='best')
plt.title("Rolling mean and Standard Deviation")
plt.show(block=False)
def augmented_dickey_fuller_test(self, column_name):
#The Augmented Dickey-Fuller test is one of the more widely used type of statistical test (called a unit root test)
#that it determines how strongly a time series is defined by a trend.
#column_name - name of the column to be tested
X = self.dataset[column_name].dropna()
adf_test_result = adfuller(X)
print(f'ADF Statistic: {adf_test_result[0]}')
print(f'p-value: {adf_test_result[1]}')
print('Critial Values:')
for key, value in adf_test_result[4].items():
print(f' {key}, {value}')
if(adf_test_result[0] < adf_test_result[4]['1%']):
print("\n The Data is Stationary")
else:
print("\nThe Data is Non-Stationary")
def kwiatkowski_phillips_schmidt_shin_test(self, column_name):
# The Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test figures out if a time series is stationary around
# a mean or linear trend, or is non-stationary due to a unit root.
#column_name - name of the column to be tested
X = self.dataset[column_name].dropna()
print ('Results of KPSS Test:')
kpss_test = kpss(X, regression='c')
kpss_test_output = pd.Series(kpss_test[0:3], index=['Test Statistic','p-value','#Lags Used'])
for key,value in kpss_test[3].items():
kpss_test_output['Critical Value (%s)'%key] = value
print(kpss_test_output)
if(kpss_test[1] > 0.05):
print("\n The Data is Stationary\n\n")
else:
print("\nThe Data is Non-Stationary\n\n")
def zivot_andrews_test(self, column_name):
#column_name - name of the column to be tested
X = self.dataset[column_name].dropna()
t_stat, p_value, critical_values, _, _ = zivot_andrews(X)
print(f'Zivot-Andrews Statistic: {t_stat:.2f}')
for key, value in critical_values.items():
print('Critial Values:')
print(f' {key}, {value:.2f}')
print(f'\np-value: {p_value:.6f}')
if(p_value <= 0.05):
print("Stationary")
else:
print("Non-Stationary")
def calculating_derivatives(self): #calculating three orders of differentiations
self.dataset['diff_1'] = self.dataset[self.target_column].diff(periods=1)
self.dataset['diff_2'] = self.dataset[self.target_column].diff(periods=2)
self.dataset['diff_3'] = self.dataset[self.target_column].diff(periods=3)
def log_transform_derivative_1(self): #transform the column by logarithmic and then calculating the first order derivative
self.dataset['log_diff_1'] = np.log(self.dataset[self.target_column]).diff().dropna()
4. Import Dataset
The dataset we will be using is the sales of Maruti Suzuki cars from the year 2018 to 2021 for every months in a particular location.
Actually, the dataset is not from original source and since the aim of this blog is to perform analysis and transformation only, the dataset is not important.
Dataset Link : https://docs.google.com/spreadsheets/d/1LuMrs8IONus2wT_JgvbdhrlJ7SPNveR-/edit?usp=sharing&ouid=101071717239207047296&rtpof=true&sd=true
The dataset contains 3 columns,
- Cars – Type of car
- Date – Date in yyyy-mm-dd
- Number of Sales – Number of car sales in the location. (Location is not specified)
Let’s import our dataset
cars_data = pd.read_excel("/content/drive/MyDrive/Datasets/Car_Sales.xlsx")
cars_data
Cars Date Number of Sales 0 Swift Dzire 2018-01-01 101.0 1 Swift Dzire 2018-02-01 99.0 2 Swift Dzire 2018-03-01 101.0 3 Swift Dzire 2018-04-01 89.0 4 Swift Dzire 2018-05-01 99.0 ... ... ... ... 283 Celerio 2021-08-01 118.0 284 Celerio 2021-09-01 124.0 285 Celerio 2021-10-01 108.0 286 Celerio 2021-11-01 106.0 287 Celerio 2021-12-01 67.0 288 rows × 3 columns
5. Accumulating Number of Sales by month
Let’s accumulate the number of sales for all cars with respect to date
cars_data = cars_data .groupby('Date').sum()
cars_data.sample(10)
Date Number of Sales 2020-04-01 786.0 2021-11-01 610.0 2018-11-01 518.0 2019-10-01 602.0 2018-12-01 593.0 2020-03-01 798.0 2018-06-01 557.0 2021-02-01 663.0 2020-05-01 814.0 2019-04-01 630.0
6. Create Object
cars_data = TimeSeriesData(cars_data , target_column='Number of Sales')
cars_data.display_columns()
Index(['Number of Sales'], dtype='object')
cars_data.display_samples()
Date Number of Sales 2020-04-01 786.0 2021-05-01 640.0 2020-03-01 798.0 2021-08-01 589.0 2020-01-01 791.0 2021-02-01 663.0 2019-01-01 566.0 2019-08-01 647.0 2018-05-01 676.0 2020-02-01 789.0
7. Stationarity Tests
Below are the techniques we’ll follow in this blog to analyze the stationarity of our dataset.
- Graphical
- Rolling-Statistics Test
- ADF test
- KPSS test
- Zivot-Andrews Test
1. Graphical
The plot below depicts the number of sales from the year January 2018 to December 2021.
From the plot, it can be seen that there is a trend in the data in the date January 2020 till December 2020. From this, it can be proved that the data is non-stationary.
But still, let us perform some tests to prove mathematically / statistically whether the data is stationary or non-stationary.
cars_data.distribution_plot('Number of Sales')

cars_data.graphical_analysis()

Let’s split the time series data into two contiguous sequences and calculate the mean and variance for each split. Then we’ll compare the corresponding means and variances of the two splits.
If the means are nearer to each other and the variances are nearer to each other, then the data can be said to be in stationary.
cars_data.mean_variance_stationary_analysis('Number of Sales')
Mean : 608.625 701.4583333333334 Variance : 2126.0677083333335 12394.164930555555
From the output displayed above, the mean varies significantly (so does the variance). Therefore once again it is proved that there is some trend in the data (non-stationary data) we have taken.
2. Rolling-Statistics Test
Now, let’s do some mathematical calculations and interpret the result visually based on the above section 7.1 Graphical Analysis.
In the above section, we split the data into two and compared the means and variances of the splits.
To generalize throughout the entire data, we’ll use the rolling_statistics_test() which plots a graph of mean and variances throughout the data so that we can visually analyze if there is any trend in the data.
cars_data.rolling_statistics_test('Number of Sales')

The graph of rolling mean and rolling standard deviation is not constant at every time intervals, this shows that the dataset might be non-stationary.
3. Augmented Dickey-Fuller (ADF) Test
Enough of tests by visualization, now let’s perform some statistical tests on our data to analyze the stationarity, starting with ADF test.
cars_data.augmented_dickey_fuller_test('Number of Sales')
ADF Statistic: -2.160524410739529 p-value: 0.2208833697150417 Critial Values: 1%, -3.596635636000432 5%, -2.933297331821618 10%, -2.6049909750566895 The Data is Non-Stationary
We can see that our statistic value of -2.16 is greater than the value of -3.60 at 1%. This suggests that we can not reject the null hypothesis with a significance level of less than 1%.
Accepting the null hypothesis means that the time series is non-stationary or it have time-dependent structure.
4. Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test
cars_data.kwiatkowski_phillips_schmidt_shin_test('Number of Sales')
Results of KPSS Test: Test Statistic 0.210379 p-value 0.100000 #Lags Used 4.000000 Critical Value (10%) 0.347000 Critical Value (5%) 0.463000 Critical Value (2.5%) 0.574000 Critical Value (1%) 0.739000 dtype: float64 The Data is Stationary
The p-value 0.1 is > 0.05, which doesn’t fail to reject the null hypothesis. Therefore, the data is stationary.
5. Zivot-Andrews Test
cars_data.zivot_andrews_test('Number of Sales')
Zivot-Andrews Statistic: -4.21 Critial Values: 1%, -5.28 Critial Values: 5%, -4.81 Critial Values: 10%, -4.57 p-value: 0.235844 Non-Stationary
Since p-value, 0.235844 > 0.05, it is not a stationary data.
8. Conclusion
- Mean and Variance of the split data proves the data to be non-stationary.
- Rolling-Statistics proves the data to be non-stationary.
- ADF test proves the data to be non-stationary.
- KPSS test proves the data to be stationary.
- Zivot-Andrews test proves the data to be non-stationary.
Therefore, the majority of the tests prove the data to be non-stationary.
9. Convert data into Stationary
Now, we have seen that non-stationary time series data is not good for predictive modeling. Hence it is a mandatory step to convert the data into stationary before performing any predictive modeling.
Note: It is not necessary to follow this section if the data is stationary. But since the scope of this blog also covers the conversion, the data is generated in such a way.
The Conversion techniques we’ll be following are,
- Derivatives
- Logarithmic transformation
1. Derivatives
Let us perform three orders of differences on the ‘Number of Enrollments’ column.
cars_data.calculating_derivatives()
cars_data.display_samples()
Date Number of Sales diff_1 diff_2 diff_3 2020-04-01 786.0 -12.0 -3.0 -5.0 2021-05-01 640.0 43.0 62.0 -23.0 2020-03-01 798.0 9.0 7.0 145.0 2021-08-01 589.0 53.0 90.0 -51.0 2020-01-01 791.0 138.0 272.0 189.0 2021-02-01 663.0 54.0 -122.0 -185.0 2019-01-01 566.0 -27.0 48.0 -61.0 2019-08-01 647.0 1.0 18.0 -26.0 2018-05-01 676.0 116.0 64.0 57.0 2020-02-01 789.0 -2.0 136.0 270.0
Let’s perform an ADF test on the three orders of derivatives to check for stationarity.
1. First Derivative
cars_data.augmented_dickey_fuller_test('diff_1')
cars_data.distribution_plot('diff_1')
ADF Statistic: -2.607621497021281 p-value: 0.09143226802145005 Critial Values: 1%, -3.596635636000432 5%, -2.933297331821618 10%, -2.6049909750566895 The Data is Non-Stationary

We can see that our statistic value of -2.607 is greater than the value of -3.60 at 1%. This suggests that we can not reject the null hypothesis with a significance level of less than 1%.
Accepting the null hypothesis means that the time series is non-stationary or it has a time-dependent structure.
2. Second Derivative
cars_data.augmented_dickey_fuller_test('diff_2')
cars_data.distribution_plot('diff_2')
ADF Statistic: -2.737569268981533 p-value: 0.0677596974413701 Critial Values: 1%, -3.6055648906249997 5%, -2.937069375 10%, -2.606985625 The Data is Non-Stationary

We can see that our statistic value of -2.74 is greater than the value of -3.61 at 1%. This suggests that we can not reject the null hypothesis with a significance level of less than 1%.
Accepting the null hypothesis means that the time series is non-stationary or it has a time-dependent structure.
3. Third Derivative
cars_data.augmented_dickey_fuller_test('diff_3')
cars_data.distribution_plot('diff_3')
ADF Statistic: -2.9231663307597175 p-value: 0.04271963058833072 Critial Values: 1%, -3.610399601308181 5%, -2.939108945868946 10%, -2.6080629651545038 The Data is Non-Stationary

We can see that our statistic value of -2.92 is greater than the value of -3.61 at 1%. This suggests that we can not reject the null hypothesis with a significance level of less than 1%.
Accepting the null hypothesis means that the time series is non-stationary or it has a time-dependent structure.
2. Transformation using Logarithmic Function
Now, we will transform the data by logarithmic function and then, take the first-order difference to check for stationarity
cars_data.log_transform_derivative_1()
cars_data.display_samples()
Date Number of Sales diff_1 diff_2 diff_3 log_diff_1 2020-04-01 786.0 -12.0 -3.0 -5.0 -0.015152 2021-05-01 640.0 43.0 62.0 -23.0 0.069551 2020-03-01 798.0 9.0 7.0 145.0 0.011342 2021-08-01 589.0 53.0 90.0 -51.0 0.094292 2020-01-01 791.0 138.0 272.0 189.0 0.191721 2021-02-01 663.0 54.0 -122.0 -185.0 0.084957 2019-01-01 566.0 -27.0 48.0 -61.0 -0.046600 2019-08-01 647.0 1.0 18.0 -26.0 0.001547 2018-05-01 676.0 116.0 64.0 57.0 0.188256 2020-02-01 789.0 -2.0 136.0 270.0 -0.002532
1. ADF Test
cars_data.augmented_dickey_fuller_test('log_diff_1')
cars_data.distribution_plot('log_diff_1')
ADF Statistic: -9.510294279497844 p-value: 3.263135273864606e-16 Critial Values: 1%, -3.5812576580093696 5%, -2.9267849124681518 10%, -2.6015409829867675 The Data is Stationary

We can see that our statistic value of -9.51 is lesser than the value of -3.58 at 1%. This suggests that we can reject the null hypothesis with a significance level of less than 1%.
Rejecting the null hypothesis means that the time series is stationary or it does not have a time-dependent structure.
2. KPSS Test
cars_data.kwiatkowski_phillips_schmidt_shin_test('log_diff_1')
Results of KPSS Test: Test Statistic 0.077049 p-value 0.100000 #Lags Used 3.000000 Critical Value (10%) 0.347000 Critical Value (5%) 0.463000 Critical Value (2.5%) 0.574000 Critical Value (1%) 0.739000 dtype: float64 The Data is Stationary
The p-value 0.1 is > 0.05, which doesn’t fail to reject the null hypothesis. Therefore, the data is stationary according to the KPSS test.
3. Zivot-Andrews Test
cars_data.zivot_andrews_test('log_diff_1')
Zivot-Andrews Statistic: -9.82 Critial Values: 1%, -5.28 Critial Values: 5%, -4.81 Critial Values: 10%, -4.57 p-value: 0.000010 Stationary
Since p-value <= 0.05, it is a stationary data.
4. Rolling-Statistics Test
cars_data.rolling_statistics_test('log_diff_1')

The graph of rolling mean and rolling standard deviation is almost constant in every time interval, therefore the dataset might be stationary.
10. Conclusion
- Rolling-Statistics might prove that the data is stationary.
- ADF test proves the data to be stationary.
- KPSS test proves the data to be stationary.
- Zivot-Andrews test proves the data to be stationary.
Therefore, the majority of the tests prove the data to be stationary.
Hence, the non-stationary data is transformed into stationary data.
Important Notice for college students
If you’re a college student and have skills in programming languages, Want to earn through blogging? Mail us at geekycomail@gmail.com
For more Programming related blogs Visit Us Geekycodes. Follow us on Instagram.
About the Author :
Hii, I’m Avinash, pursuing a Bachelor of Engineering in Computer Science and Engineering from Mepco Schlenk Engineering College, Sivakasi.
I’m currently working as a Data Science Intern at @DeepSphere.AI.
I’m an AI enthusiast and Open-Source contributor.
Connect me through:
Feel free to correct me !! 🙂
Thank you folks for reading. Happy Learning !!! 😊