Thursday, 11 April 2013

Statistics: Making sense of data : scatter plot &correlations

Scatter plot


Statisticians and quality control technicians gather data to determine correlations (relationships) between such events.  Scatter plots will often show at a glance whether a relationship exists between two sets of data ( In this lecture Quantitative valuables).
Let's decide if studying longer will affect Regents grades based upon a specific set of data.  Given the data below, a scatter plot has been prepared to represent the data.  Remember when making a scatter plot, do NOT connect the dots.
Study HoursRegents Score
380
590
275
680
790
150
265
785
140
7100
Notice:  Certain values may have more than one result,
such as (7,90) and (7,85) and (7,100).

The data displayed on the graph resembles a line rising from left to right.  Since the slope of the line is positive, there is a positive correlation between the two sets of data.  This means that according to this set of data, the longer I study, the better grade I will get on my Regents examination.
Note:  Just because this set of data showed a positive correlation does not mean that the relationship is positive for all sets of data concerning study time and Regents scores.  There may be sets of data that show that there is NOT a positive correlation between hours studying and better Regents scores.
It all depends on the data being examined.

If the slope of the line had been negative (falling from left to right), a negative correlation would exist since the slope of the line would have been negative.  Under a negative correlation, the longer I study, the worse grade I would get on my Regents examination.  YEEK!!

If the plot on the graph is scattered in such a way that it does not approximate a line (it does not appear to rise or fall), there is
no correlation between the sets of data.  No correlation means that the data just doesn't show if studying longer has any affect on Regents examination scores.

Check out these graphs for visual interpretations of types of correlations:

The points are clustered as to resemble a rising straight line with a positive slope.

While the points "tend" to be rising, it is not a clearly positive relationship since points are not clustered as to show a clear straight line.


The points are clustered as to resemble a falling straight line with a negative slope.

While the points "tend" to be falling, it is not a clearly negative relationship since points are not clustered as to show a clear straight line.


There is no way of determining from these points, if the pattern is rising or falling.  There is no evidence of a straight line.


Warning!! 
Correlation does not necessarily mean Causation.
 Just because there is a strong correlation between data, does not necessarily mean that one set of data is causing the affect that is occurring in the other set of data.
 
During the months of February and March, the weekly number of jars of strawberry jam sold at a local market in New York was recorded. For the same time frame, the number of copies of a popular classical music CD sold in Florida was recorded.  The data was examined and was plotted
     From looking at the graph, it can be seen that there is a high positive correlation between these two sets of data. 

So, this must mean that the number of jars of strawberry jam sold in New York was causing an increase in the number of classical music CDs sold in Florida.  Of course this is not true!
     Always be careful what you infer from your statistical analyses.  Be sure the relationship makes sense.  Also keep in mind that other factors may be involved in a cause-effect relationship.













resource: http://www.regentsprep.org/Regents/math/ALGEBRA/AD4/scatter.htm
Weekly Data Collection
The jars of strawberry jam sold in New York
The number of CDs sold in Florida
5 jars25 CDs
730
935
1042
1148
1152
1256



No comments:

Post a Comment