Scatterplots: Complete Guide to Correlation, Lines of Best Fit & Predictions with 8 Examples

Master SAT scatterplot questions with this comprehensive guide. Learn positive/negative correlation, interpret slope and y-intercept, make predictions, and avoid causation errors with 8 fully worked examples.

SAT Math – Problem Solving & Data Analysis

Scatterplots

Analyzing relationships, correlation, and lines of best fit

Scatterplots are the visual language of relationships between variables. On the SAT, you'll analyze these coordinate graphs to identify correlations, interpret lines of best fit, make predictions, and distinguish between correlation and causation—skills that form the foundation of data science and statistical reasoning.

Success requires understanding positive and negative correlation, reading trend lines, interpreting slope and y-intercept in context, identifying outliers, and using linear models to make predictions. These aren't abstract concepts—they're the tools researchers use to discover relationships, from medical studies linking variables to economic analyses predicting trends.

Understanding Scatterplots

What is a Scatterplot?

A scatterplot displays the relationship between two quantitative variables by plotting ordered pairs \((x, y)\) as points on a coordinate plane.

Purpose: Reveal patterns, trends, and correlations between variables
Each point: Represents one observation with an x-value and y-value
Pattern: Shows if and how the variables are related

Types of Correlation

Correlation describes the direction and strength of the linear relationship between variables.

Positive correlation: As x increases, y increases (upward trend)
Negative correlation: As x increases, y decreases (downward trend)
No correlation: No clear linear pattern between variables
Strength: How closely points cluster around a line (strong vs. weak)

Line of Best Fit (Trend Line)

The line of best fit is a straight line that best represents the trend in the data, minimizing the distance to all points.

Equation: \(y = mx + b\) (linear equation)
Slope (m): Rate of change in y for each unit increase in x
Y-intercept (b): Predicted y-value when x = 0
Use: Make predictions for values within or near the data range

Outliers

An outlier is a data point that falls far from the overall pattern of the scatterplot.

Impact: Can influence the line of best fit
Interpretation: May represent measurement error, unusual case, or important exception
SAT tip: Questions often ask you to identify or interpret outliers

Essential Concepts & Formulas

Line of Best Fit Equation

\(y = mx + b\)

m (slope): Change in y per unit change in x

b (y-intercept): Value of y when x = 0

Context matters: Always interpret slope and intercept in terms of the variables

Interpreting Slope

If \(y = 3x + 10\) models temperature (°F) over time (hours):

Slope = 3: Temperature increases 3°F per hour

Positive slope: Both variables increase together

Negative slope: As one increases, the other decreases

Making Predictions

To predict y for a given x:

1. Substitute the x-value into the equation

2. Calculate the corresponding y-value

⚠️ Predictions most reliable within the data range (interpolation)

Correlation vs. Causation

Correlation: Two variables are related (move together)

Causation: One variable directly causes changes in the other

⚠️ Correlation does NOT prove causation! Both may be influenced by a third variable.

Common Pitfalls & Expert Tips

❌ Confusing correlation with causation

Just because ice cream sales and drownings both increase in summer doesn't mean ice cream causes drowning! Both are caused by hot weather.

❌ Misinterpreting slope sign

Negative slope means an inverse relationship—as x increases, y decreases. Don't confuse the direction!

❌ Extrapolating too far beyond data

Predictions far outside the data range (extrapolation) are unreliable. Relationships may not continue in the same pattern.

❌ Forgetting to check axis labels

Always verify which variable is on which axis. Swapping x and y completely changes interpretation!

✓ Expert Tip: Look at the overall pattern

Don't focus on individual points. Step back and see the general trend—is it upward, downward, or scattered?

✓ Expert Tip: Context is everything

Always interpret slope and intercept using the variable names. "The slope is 5" is incomplete; "Temperature increases 5°F per hour" is correct.

✓ Expert Tip: Outliers tell stories

When you spot an outlier, think about what it might represent—measurement error, exceptional case, or data entry mistake.

Fully Worked SAT-Style Examples

Example 1: Identifying Correlation Type

A scatterplot shows the relationship between hours studied (x-axis) and test scores (y-axis). The points generally rise from left to right, clustering loosely around an upward-sloping line. Which best describes the relationship?

A) Strong negative correlation

B) Weak negative correlation

C) Weak positive correlation

D) No correlation

Solution:

Step 1: Identify direction

Points rise from left to right = upward trend

As hours studied (x) increases, test scores (y) increase

This is POSITIVE correlation

Step 2: Assess strength

Points cluster "loosely" around the line

Not tightly packed → weak correlation

But still showing a clear pattern → not "no correlation"

Correlation Guide:

Positive: Upward slope (both increase)

Negative: Downward slope (one increases, other decreases)

Strong: Points tightly clustered around line

Weak: Points loosely scattered around line

Answer: C) Weak positive correlation

Example 2: Interpreting Slope

A scatterplot shows the relationship between years of experience (x) and annual salary in thousands (y). The line of best fit is \(y = 5x + 40\). What does the slope represent?

Solution:

Identify the slope:

In \(y = 5x + 40\), the slope \(m = 5\)

Interpret in context:

Slope = change in y per unit change in x

x = years of experience

y = salary in thousands of dollars

Interpretation: For each additional year of experience, salary increases by $5,000

Answer: The slope of 5 means salary increases by $5,000 per year of experience

Example 3: Interpreting Y-Intercept

Using the same equation from Example 2: \(y = 5x + 40\), where x is years of experience and y is salary in thousands. What does the y-intercept represent?

Solution:

Identify the y-intercept:

In \(y = 5x + 40\), the y-intercept \(b = 40\)

This is the y-value when x = 0

Interpret in context:

When x = 0 (zero years of experience)

y = 40 (salary in thousands)

Interpretation: Starting salary with no experience is $40,000

Y-intercept meaning:

Always represents the predicted y-value when x = 0

Context determines if this makes practical sense

Sometimes x = 0 isn't realistic (e.g., age = 0 for adult study)

Answer: Y-intercept of 40 means the starting salary is $40,000

Example 4: Making Predictions

A scatterplot relates study time (x, in hours) to quiz scores (y). The line of best fit is \(y = 8x + 50\). According to this model, what score would be predicted for a student who studies 4 hours?

Solution:

Step 1: Identify the equation and value

Equation: \(y = 8x + 50\)

Given: \(x = 4\) hours

Step 2: Substitute and calculate

\(y = 8(4) + 50\)

\(y = 32 + 50\)

\(y = 82\)

Reality Check:

For 0 hours: \(y = 50\) (base score)

Each hour adds 8 points

4 hours: \(50 + (4 \times 8) = 82\) ✓

Answer: 82 points

Example 5: Negative Correlation

A scatterplot shows car value (y, in thousands) versus age (x, in years). The line of best fit is \(y = -2x + 25\). What does the slope tell you?

Solution:

Identify the slope:

Slope \(m = -2\)

The negative sign is crucial!

Interpret the negative slope:

Negative slope = inverse relationship

As x (age) increases, y (value) decreases

Interpretation: For each year older, the car loses $2,000 in value

Example calculation:

New car (age 0): \(y = -2(0) + 25 = \$25{,}000\)

After 5 years: \(y = -2(5) + 25 = \$15{,}000\)

Lost \(25 - 15 = \$10{,}000\) over 5 years

Answer: The car depreciates (loses) $2,000 in value per year

Example 6: Identifying Outliers

A scatterplot shows height (x) versus weight (y) for 20 people. Most points cluster around the line of best fit. One point at (70, 110) falls far below the line. What does this outlier suggest?

Solution:

Understanding outliers:

An outlier is a point far from the general pattern

Point (70, 110): Height = 70 inches, Weight = 110 lbs

Falls BELOW the line of best fit

Interpret the outlier:

Point below line = actual y is less than predicted y

This person weighs LESS than expected for their height

Possible explanations:

• Unusually low body weight

• Measurement error

• Data entry mistake

Answer: This person weighs significantly less than predicted for their height

Example 7: Correlation vs. Causation

A scatterplot shows a strong positive correlation between number of firefighters (x) and amount of fire damage (y) across different fires. Which conclusion is valid?

A) More firefighters cause more damage

B) Larger fires attract more firefighters and cause more damage

C) There is no relationship

D) Fewer firefighters would reduce damage

Solution:

Analyze each choice:

A) Implies causation—firefighters CAUSE damage (obviously wrong!)

B) Recognizes a lurking variable (fire size) affects both

C) Contradicts the stated positive correlation

D) Implies causation in reverse direction

Critical Thinking:

Correlation does NOT mean causation!

Both variables may be caused by a third factor (fire size)

Bigger fires → more firefighters AND more damage

Answer: B) Larger fires attract more firefighters and cause more damage

Example 8: Finding the Equation from Context

A scatterplot shows temperature (°C) versus altitude (meters). The line of best fit passes through points (0, 20) and (1000, 10). What is the equation of this line?

Solution:

Step 1: Find the y-intercept

Point (0, 20) means when \(x = 0\), \(y = 20\)

Y-intercept \(b = 20\)

Step 2: Calculate slope

\(m = \frac{y_2 - y_1}{x_2 - x_1} = \frac{10 - 20}{1000 - 0} = \frac{-10}{1000} = -0.01\)

Step 3: Write equation

\(y = mx + b\)

\(y = -0.01x + 20\)

Interpretation:

Slope = -0.01°C per meter

Temperature decreases 0.01°C for each meter of altitude gained

Or: decreases 1°C per 100 meters

Answer: \(y = -0.01x + 20\)

Scatterplot Quick Reference

Pattern Correlation Type What It Means
Points rise left to right Positive As x ↑, y ↑
Points fall left to right Negative As x ↑, y ↓
Tight cluster around line Strong Relationship is consistent
Loose scatter around line Weak Relationship is variable
Random scatter, no pattern No correlation Variables not linearly related

SAT Scatterplot Checklist

Reading the Plot

  • Check which variable is on each axis
  • Look at overall pattern, not individual points
  • Identify correlation type and strength
  • Note any outliers

Interpreting Equations

  • Slope = rate of change (with units)
  • Y-intercept = value when x = 0
  • Always use context/variable names
  • Check if slope sign makes sense

Making Predictions

  • Substitute x-value into equation
  • Calculate y-value carefully
  • Check if prediction is reasonable
  • Be cautious with extrapolation

Critical Thinking

  • Correlation ≠ causation
  • Consider lurking variables
  • Outliers may have meaning
  • Context drives interpretation

Scatterplots: Seeing Relationships in Data

Scatterplots are the fundamental tool for exploring relationships between variables across every scientific and social discipline. Whether studying how temperature affects reaction rates, how education correlates with income, how advertising spending relates to sales, or how practice time predicts performance, researchers plot data points to reveal patterns invisible in tables of numbers. The SAT tests these interpretation skills because they represent genuine data literacy: recognizing correlation types, understanding what slope and intercept mean in context, making evidence-based predictions, and most critically, distinguishing between correlation and causation. This last skill—knowing that relationship doesn't imply cause—is perhaps the most important statistical concept for informed citizenship. Master scatterplot analysis not just for test success, but to become someone who can read research, evaluate claims, and draw sound conclusions from data in an age where visualizations influence everything from medical decisions to policy debates.