Covariance & Correlation

Introduction :

Covariance and Correlation are two mathematical principles that are frequently used in the field of statistics and probability. These both techniques have a common goal, to depict the linear relationship between two variable or data samples.

Covariance :

The covariance determines the relationship between two random variables or samples — how they change together. Or in other words we can say that Covariance is a measure of how much two random variables fluctuate together.

Covariance is nothing but a measure of correlation. Covariance denotes the direction of the linear relationship between the two data variables. By finding direction of relationship we can check if selected variables are directly proportional or inversely proportional with each other. It returns any value between -infinity(-) to +infinity(+), where the negative value shows the negative relationship whereas a positive value shows the positive relationship between variables.

So, there are mainly Three types of covariance ..

  1. Positive Covariance :Indicates that two variables tend to move in the same direction.
  2. Negative covariance :Indicates that two variables tend to move in the inverse directions.
  3. If covariance =0 means dataset doesn’t vary together.

Covariance can be calculated as,

  1. Population Covariance Formula.
  1. Sample Covariance Formula.

where,

  • Xi — the values of the X-variable.
  • Yi — the values of the Y-variable.
  • x̄ — mean of variable X.
  • ȳ — mean of variable Y.
  • n — number of data values.

Consider a given table, where three columns are given. Salesman, Number of Customers and Net sales.

here, Number of Customers is denoted by X variable and Net Sales is denoted by Y variable. After calculate covariance in excel using data analysis tool we get,

As per above we get 323684.5561 as covariance of given table. We can calculate covariance by =COVARIANCE.P() or =COVARIANCE.S() in Excel.

Correlation:

Correlation is a statistical measure that indicates how strongly two variables are related linearly. Or Correlation is a statistical measure that indicates the extent to which two or more variables fluctuate together.

It provides direction and strength of relationship between variables. This mathematical concepts can be applied for more than two variables. The major advantage this concepts is, it is applicable for any type of data whether they are Continuous and Continuous, Categorical and Categorical, Continuous and Categorical.

For example, height and weight are related; taller people tend to be weigh more than shorter people.

Pearson Correlation Coefficient

During process of analysis of data or datasets Pearson Correlation concepts is used frequently. It is denoted by (ρ). It is especially used to measure linear relationship among two continuous variables X and Y. Pearson Correlation returns value between -1 and +1.

Let’s see different types of correlation..

Correlation can be calculated as,

Where,

  • COV(X,Y) — Covariance of variable X and Variable Y.
  • σx and σy — are Standard deviations of X and Y.

Consider a given table, where three columns are given. Salesman, Number of Customers and Net sales.

here, Number of Customers is denoted by X variable and Net Sales is denoted by Y variable. After calculate correlation in excel using data analysis tool we get,

As per above we get 0.955980727 as correlation of given data. The relationship between Number of Customers and Net sales has a very strong positive correlation since the value is close to +1 .We can calculate covariance by =CORREL() in Excel.

Overview :

This article gives basic knowledge about correlation and covariance and also small example is given.

Both Correlation and Covariance are very closely related to each other and yet they differ a lot.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store