# Correlation and Regression: Key Points, Notes and Questions

## What is a Correlation?

Correlation is a statistical concept that measures the degree to which two or more variables are related or associated with each other. It quantifies the strength and direction of the linear relationship between variables. For the A level’s syllabus, correlation is used to measure the strength of the linear association between two variables.* IF a linear relationship exist you will be required to find the best fit line through the data and use the best fit to predict the value of the dependent variable*

It is often measured on a range of -1 to 1 where:

A correlation coefficient of 1 indicates a perfect positive correlation, meaning that as one variable increases, the other increases proportionally.

A correlation coefficient of -1 indicates a perfect negative correlation, meaning that as one variable increases, the other decreases proportionally.

A correlation coefficient of 0 indicates no linear correlation, meaning that there is no systematic relationship between the variables.

Correlation is important to analyse significant data sets and analytical inference. You might recall a sentence that goes along the lines of “**correlation does not imply causation.**” This is meant to explain that even if two variables are correlated, it does not necessarily mean that one variable causes the other.

Other Important notes on Correlation:

Contrary to the value, zero correlation **does not necessarily imply “no relationship”,** but rather “no linear relationship”. A very low value of r will only indicate a lack of linear correlation but that does not imply that the variables are independent.

#### Correlation Coefficient Formula

## Interpolation and Extrapolation

The least squares of the regression line of y on x can be used to predict the values of y for a given set of values of x. When using regression lines it is important to realise that it is only possible to predict, with a degree of confidence, values of y for values of x within the given range of the data. This is “interpolation’.

Outside of the range of data, extreme care must be exercised since attempting to predict such values of y for which no data has been collected may lead to incorrect conclusions. This process of predicting values outside the range of data collected is known as extrapolation and should in general, be avoided.

### Linear Regression

Regression is a statistical method used for modeling and analyzing the relationship between a dependent variable (also known as the response variable) and one or more independent variables (predictor variables or covariates).

For the A level Math Syllabus, linear regression is used to find the line of best fit.

**Key notes when using regression lines to make prediction: **

#### Interpolation

If the value of X or Y is within the given range of the data. The prediction can be made with a certain degree of confidence. (Likely to be accurate).

#### Extrapolation

If the value of X or Y is outside the given range of the data. The prediction may be inaccurate and extreme care must be exercised, these predictions are not typically used/ avoided.

## Questions on Correlation from Past year Papers

- (a) Explain why it is advisable to plot a scatter diagram before interpreting a correlation coefficient calculated for a sample drawn from a bivariate distribution.

(b) Sketch two scatter diagrams indicating the following

(i) two variables having a strong, negative linear correlation

(ii) two variables having a weak, positive linear correlation

*JJC Prelim 2009*

### 2)

(a) Eight pairs of values of variables x and y are measured. Draw a sketch of a possible scatter diagram of the data for each of the following cases:

(i) the product moment correlation coefficient is approximately zero,

(ii) the product moment correlation coefficient is approximately -0.8

(b)The monthly earnings, y thousand dollars, of 7 workers of different ages, x years, in a particular company are given in the table above

(i) Give a sketch of the scatter diagram for the data, as shown on your calculator

(ii) Find the product moment correlation coefficient

(iii) Find the equation of the regression line of y on x in the form of y = mx + c, with the values of m and c corrected to 4 decimal places.

(iv) Calculate an estimate of the monthly earnings of a 40 year-old worker. State why you would expect this to be a reliable estimate.

(iv) All workers are given an increase of N thousand dollars per month. Without any further calculations, state any change you would expect in the values of your constants m and c found in part (iii).

## Start Your Mathematics Learning Adventure With Musclemath

At MuscleMath, we guide student to score well through **O-level math tuition,** **A-level math tuition** and **H2 Math tuition.** Our team consists of ex-MOE HOD’s, NIE teachers and full-time tutors.

As a leading and an MOE certified math tuition centre in Singapore, students will be provided with a wide range of options such as holiday tuition lessons, crash courses, free trials for all levels.

We provide fully customised lessons in conjunction with specialised notes and materials that are constantly updated. If you’re looking for math tuition or need help to figure out how to study maths, our team will be there for you to support your learning journey, start now and achieve better results!