Unit 11: Scatterplots

The SAT tests three basic skills that are needed to be successful with scatterplots.

  1. Understand how scatter plots work, including each individual point and the line of best fit
  2. Understand how to find a difference from the line of best fit, or use one variable to find the other variable using the line of best fit
  3. Construct your own scatter plot

Lets start with the first one.  Scatterplots are a kind of graph used to represent numerical data.  They have an x-axis (the horizontal axis) and a y-axis (vertical).  Typically, they are used to show how one variable is related to another, also called correlation.  You need a large number of measurements of something to make a scatterplot – each individual point is a single measurement.  Here’s an example:

On this plot, we have average daily temperature on the x-axis, and number of visitors to the beach on the y-axis.  By plotting out how many people come to the beach at each temperature, we can clearly see a relationship in the data – as it gets warmer, more people come to the beach.  There is a positive correlation (as one goes up the other also goes up).  However, look closely at the graph when it is 87 degrees out.  Notice that only about 125 people came to the beach then, when we might have expected many more to come.  How many more?  That’s what a line of best fit can tell us.

The line of best fit is a statistical term meaning you can’t draw a straight line that comes closer to all the points.  Note that this may mean it doesn’t actually pass through any points at all.  Computers are very good at drawing lines of best fit; people just have to eyeball it.  Fortunately for us, the SAT will only ever give you scatter plots that are easy to eyeball, and will only ever ask questions where “close enough” will suffice.  

If you have a scatter plot question, and there is not a line of best fit given, draw one in.  It does not need to be perfect.  If the question does require precision, they will provide you with a line of best fit already on the plot.  Note that the line may be straight line or a parabola.

Here is the same graph, with a line of best fit on it.  Now if we were given the question:

At a certain beach, the number of visitors was recorded along with average daily temperature.  When it was 87°F, how many fewer visitors came than expected?

A) 20
B) 0
C) 160
D) 600

Strategy:

To answer this type of question, we look at the difference between the data point and the line of best fit.  It is purely a subtraction problem.

In our example, we need to subtract 125 (the actual number of people) from where the line of best fit intersects 87° (what we would have expected based on the relationship in the data).  

Going over to our y axis, it looks like about 275.  Subtracting 125 gives us 150; of our answers, only 160 could possibly be true.  Note that if you are given a graph like this with fairly low precision, there will be only one possible answer – they won’t make you choose between, say, 152 and 154 for a problem like this.

You might also have to start from the y-axis to the line of best fit.  If instead of the previous question, they had asked us:

At a certain beach, the number of visitors was recorded along with average daily temperature.  One day the temperature wasn’t recorded, but 375 people came.  How hot would we expect it to have been that day?

A) 84°
B) 86°
C) 90°
D) 92°

To solve, we start at the y-axis value we are given, go to the line of best fit, and then drop down to the temperature on the x-axis.  The correct answer is C).

Strategy:

If they give you a value for one of the variables, you are almost always going to have to report back on what we would expect the other variable to be.  Just find what you are given on the correct axis, draw a straight line to the line of best fit, and then go straight down or across to the other axis to get back the right value.

Finally, two other skills you may need are the ability to create your own scatterplot from values, and recognizing that a scatter plot’s line of best fit may not be linear.  Here, you can try drawing your own scatterplot from the table and line of best fit on the graph that are both below.

SizeAge
5020
8025
8825
4016
9024
10028
4418