Study tips: Linear regression part 2 – Regression focus

This series looks at linear regression for the Professional Diploma in Accounting qualification from AAT. We start off with the high low technique, and follow up with a focus on regression.


Linear regression series


In part one of this article on linear regression, we looked at some sales figures that were increasing in a consistent way over the course of six months. We produced a graph that showed the figures plotted a straight line and were therefore able to forecast future sales figures, based on the assumption that data will behave in the future as it did in the past.

We also used the mathematical equation of a straight line* to show that, when data behaves in a consistent way, the assumption that it will continue to do so, is correct.

This is great when all the data is available and behaves predictably! 

Unfortunately, we know that real life is rarely that straightforward. We therefore, looked that the equation: y = a + bx in terms of cost behaviour where:

  • ‘a’ is the fixed point or element that the rest of the data changes in relation to
  • ‘b’ is the variable amount per unit that will change in proportion to the number of time periods
  • ‘x’ is the forecast time period (in relation to the start of the data set)
  • ‘y’ is the forecast.

Using this understanding, we applied the high low technique to some incomplete information and were able to forecast some future costs. If this doesn’t sound familiar, then it would be a good idea to read part one before continuing.

So, what happens when the data is not only incomplete but isn’t linear either?

Linear regression with incomplete & non-linear data

This is when we need to incorporate time series analysis more fully into our thinking. 

So far, we have examined how data has changed over a given time period, and the trend has been the actual data, which we’ve extended to forecast future figures.

The process of time series analysis however, involves:

  • calculating moving averages to determine the underlying trend
  • calculating the average change over the period
  • adjusting for seasonal variation
  • and then forecasting. 

Again, if this doesn’t sound familiar, then Matthew Pickering’s article on trend analysis is worth reading before you go any further.

Depending on what information we have, we’ll need to rearrange the order of calculations so that we can determine the trend. Because the underlying trend will be a straight line, we will then be able to apply the equation.

Forecasting with linear regression

Let’s return to our imaginary role as the management accountant for a company that manufactures reusable bamboo products. Say we’re forecasting 2020’s sales of a crockery set that has been manufactured since January 2016.   

Over the years, the linear regression equation:  y = 1,725 + 450x, has been established, where

  • ‘y’ equals the sales trend
  • and ‘x‘ equals the time period.

Time series analysis has been used to identify the quarterly seasonal variation in sales volumes:

Let’s think about the equation in relation to the first quarter of 2020. 

We know that ‘y’ is going to be the forecast. If the data varied in a linear way, then the trend could just be extended to give the forecast. However, the seasonal variations tell us the actual data is likely to be above the trend line for one half of the year and below for the other. 

So, when using linear regression in this way, we need to be clear that the ‘forecast’ is the extension of the trend line as opposed to the forecast sales figures.

The position of 1,725 in the formula tells us that it is ‘a’. Again, we need to be careful to recognise that, in this context, it’s the fixed point of the data set, which is different to it being a fixed element of a total cost, as it was when we applied the high low technique.

In this case, it will be the first figure of the data set, in other words quarter 1 of 2016, when the sales volume was 1,725 units.

We also know that 450 is the variable amount per unit or ‘b’. 

In the context of average annual change, this means that the underlying trend of the data is a 450 unit increase in the sales volume every quarter. This variable amount per unit ‘b’ will need to be multiplied by the time period for relevant quarter of 2020 in relation to quarter 1 of 2016, in other words ‘x’, in order to calculate the total variable element of the trend. 

This is the same process as calculating the average change over the time period in time series analysis, and would be calculated as the difference between the first and last moving averages divided by the number of moving averages, less one.

We add the value 450x to the fixed point of ‘a’, in this case 1,725, in order to extend the trend for 2020. 

Again, this is the same as extrapolating the trend by adding the average change to the last moving average, when using time series analysis to forecast.

Let’s put the information we know into a table:

Calculating the trend figures

In order to calculate ‘y’, the trend figures, we first need to work how many time periods the actual data set covers.

In this case it’s 16 quarters, as the product has been manufactured since January 2016 so that’s four quarters a year for four years, to get to the end of 2019. This therefore makes ‘x’ 17 for quarter 1 2020 (16 + 1) and 18 (16 + 2) for quarter 2 because ‘x’ is the forecast time period in relation to the start of the data set.

As we said before, when using linear regression in this way, the ‘forecast’ is the extension of the trend line as opposed to the forecast sales figures. Therefore, we can now use the linear regression equation to forecast the trend ‘y’:

Our use of the linear regression equation is now complete, however, we still need to forecast the sales volumes, and the seasonal variations tells us the actual data is likely to be above the trend line in quarters 1 and 4, and below it in quarters 2 and 3. 

Therefore, the forecast sales volumes are:

In summary

Using linear regression techniques successfully requires an element of thinking flexibly about the information you have and the information that’s required. 

  • If the information is complete, you may be able to use time series analysis.
  • If the information is incomplete, but varies in a linear manner, you might be able to use the high low technique.
  • But if the information fluctuates over time, you may need to apply the average annual change. 

* The formula of a straight line is y = mx + c however it can also be written as y = a + bx and this used by the AAT.  The component parts are the same and the ‘fixed’ point/element is represented by ‘a’ or ‘c’ and the ‘variable amount per unit’ by ‘b’ or ‘m’.

Read more study tips for the AAT Professional Diploma in Accounting here;

Gill Myers is a self-employed accounts consultant. She has taught AAT qualifications since 2005 and written numerous articles and e-learning resources.

Related articles