Using multiple data sets to improve the accuracy of your results, or to expand the number of things you can compare. Aggregating data requires the data sets to have a single point in common, for example a date range. Also known as aggregating data.
The technique used to show if and how strongly variables are related. For example, taller people tend to be heavier than shorter people, but there are always going to be exceptions to this for very over-weight people.
A data set that contains values that are incorrect.
For example, a user has entered their date of birth incorrectly as 1902 instead of 2002.
Such values skew the data set and make any analysis invalid.
Using a rank to describe the relationship between data. This data is sorted from the smallest value to the largest, and is given a rank, known as a p-value, to determine if there is a significant relationship between the data.
Data that increases whilst any data related to it also increases is ranked as a positive number; data that shrinks while the other related data grows is ranked as a negative number; and data that has no relationship at all is ranked as 0.
P-values below 0.05 are considered significant and so anything given these values can be said to have a strong relationship.
The process of correcting any data that is in an incorrect unit when compared to the rest of your data.
When aggregating data, some records may have different units of measurement even though they represent the same thing.
For example, temperature measured in both Celsius and Fahrenheit.