SAT Math – Problem Solving & Data Analysis
Scatterplots
Analyzing relationships, correlation, and lines of best fit
Scatterplots are the visual language of relationships between variables. On the SAT, you'll analyze these coordinate graphs to identify correlations, interpret lines of best fit, make predictions, and distinguish between correlation and causation—skills that form the foundation of data science and statistical reasoning.
Success requires understanding positive and negative correlation, reading trend lines, interpreting slope and y-intercept in context, identifying outliers, and using linear models to make predictions. These aren't abstract concepts—they're the tools researchers use to discover relationships, from medical studies linking variables to economic analyses predicting trends.
Understanding Scatterplots
What is a Scatterplot?
A scatterplot displays the relationship between two quantitative variables by plotting ordered pairs \((x, y)\) as points on a coordinate plane.
Each point: Represents one observation with an x-value and y-value
Pattern: Shows if and how the variables are related
Types of Correlation
Correlation describes the direction and strength of the linear relationship between variables.
Negative correlation: As x increases, y decreases (downward trend)
No correlation: No clear linear pattern between variables
Strength: How closely points cluster around a line (strong vs. weak)
Line of Best Fit (Trend Line)
The line of best fit is a straight line that best represents the trend in the data, minimizing the distance to all points.
Slope (m): Rate of change in y for each unit increase in x
Y-intercept (b): Predicted y-value when x = 0
Use: Make predictions for values within or near the data range
Outliers
An outlier is a data point that falls far from the overall pattern of the scatterplot.
Interpretation: May represent measurement error, unusual case, or important exception
SAT tip: Questions often ask you to identify or interpret outliers
Essential Concepts & Formulas
Line of Best Fit Equation
\(y = mx + b\)
m (slope): Change in y per unit change in x
b (y-intercept): Value of y when x = 0
Context matters: Always interpret slope and intercept in terms of the variables
Interpreting Slope
If \(y = 3x + 10\) models temperature (°F) over time (hours):
Slope = 3: Temperature increases 3°F per hour
Positive slope: Both variables increase together
Negative slope: As one increases, the other decreases
Making Predictions
To predict y for a given x:
1. Substitute the x-value into the equation
2. Calculate the corresponding y-value
⚠️ Predictions most reliable within the data range (interpolation)
Correlation vs. Causation
Correlation: Two variables are related (move together)
Causation: One variable directly causes changes in the other
⚠️ Correlation does NOT prove causation! Both may be influenced by a third variable.
Common Pitfalls & Expert Tips
❌ Confusing correlation with causation
Just because ice cream sales and drownings both increase in summer doesn't mean ice cream causes drowning! Both are caused by hot weather.
❌ Misinterpreting slope sign
Negative slope means an inverse relationship—as x increases, y decreases. Don't confuse the direction!
❌ Extrapolating too far beyond data
Predictions far outside the data range (extrapolation) are unreliable. Relationships may not continue in the same pattern.
❌ Forgetting to check axis labels
Always verify which variable is on which axis. Swapping x and y completely changes interpretation!
✓ Expert Tip: Look at the overall pattern
Don't focus on individual points. Step back and see the general trend—is it upward, downward, or scattered?
✓ Expert Tip: Context is everything
Always interpret slope and intercept using the variable names. "The slope is 5" is incomplete; "Temperature increases 5°F per hour" is correct.
✓ Expert Tip: Outliers tell stories
When you spot an outlier, think about what it might represent—measurement error, exceptional case, or data entry mistake.
Fully Worked SAT-Style Examples
A scatterplot shows the relationship between hours studied (x-axis) and test scores (y-axis). The points generally rise from left to right, clustering loosely around an upward-sloping line. Which best describes the relationship?
A) Strong negative correlation
B) Weak negative correlation
C) Weak positive correlation
D) No correlation
Solution:
Step 1: Identify direction
Points rise from left to right = upward trend
As hours studied (x) increases, test scores (y) increase
This is POSITIVE correlation
Step 2: Assess strength
Points cluster "loosely" around the line
Not tightly packed → weak correlation
But still showing a clear pattern → not "no correlation"
Correlation Guide:
Positive: Upward slope (both increase)
Negative: Downward slope (one increases, other decreases)
Strong: Points tightly clustered around line
Weak: Points loosely scattered around line
Answer: C) Weak positive correlation
A scatterplot shows the relationship between years of experience (x) and annual salary in thousands (y). The line of best fit is \(y = 5x + 40\). What does the slope represent?
Solution:
Identify the slope:
In \(y = 5x + 40\), the slope \(m = 5\)
Interpret in context:
Slope = change in y per unit change in x
x = years of experience
y = salary in thousands of dollars
Interpretation: For each additional year of experience, salary increases by $5,000
Answer: The slope of 5 means salary increases by $5,000 per year of experience
Using the same equation from Example 2: \(y = 5x + 40\), where x is years of experience and y is salary in thousands. What does the y-intercept represent?
Solution:
Identify the y-intercept:
In \(y = 5x + 40\), the y-intercept \(b = 40\)
This is the y-value when x = 0
Interpret in context:
When x = 0 (zero years of experience)
y = 40 (salary in thousands)
Interpretation: Starting salary with no experience is $40,000
Y-intercept meaning:
Always represents the predicted y-value when x = 0
Context determines if this makes practical sense
Sometimes x = 0 isn't realistic (e.g., age = 0 for adult study)
Answer: Y-intercept of 40 means the starting salary is $40,000
A scatterplot relates study time (x, in hours) to quiz scores (y). The line of best fit is \(y = 8x + 50\). According to this model, what score would be predicted for a student who studies 4 hours?
Solution:
Step 1: Identify the equation and value
Equation: \(y = 8x + 50\)
Given: \(x = 4\) hours
Step 2: Substitute and calculate
\(y = 8(4) + 50\)
\(y = 32 + 50\)
\(y = 82\)
Reality Check:
For 0 hours: \(y = 50\) (base score)
Each hour adds 8 points
4 hours: \(50 + (4 \times 8) = 82\) ✓
Answer: 82 points
A scatterplot shows car value (y, in thousands) versus age (x, in years). The line of best fit is \(y = -2x + 25\). What does the slope tell you?
Solution:
Identify the slope:
Slope \(m = -2\)
The negative sign is crucial!
Interpret the negative slope:
Negative slope = inverse relationship
As x (age) increases, y (value) decreases
Interpretation: For each year older, the car loses $2,000 in value
Example calculation:
New car (age 0): \(y = -2(0) + 25 = \$25{,}000\)
After 5 years: \(y = -2(5) + 25 = \$15{,}000\)
Lost \(25 - 15 = \$10{,}000\) over 5 years
Answer: The car depreciates (loses) $2,000 in value per year
A scatterplot shows height (x) versus weight (y) for 20 people. Most points cluster around the line of best fit. One point at (70, 110) falls far below the line. What does this outlier suggest?
Solution:
Understanding outliers:
An outlier is a point far from the general pattern
Point (70, 110): Height = 70 inches, Weight = 110 lbs
Falls BELOW the line of best fit
Interpret the outlier:
Point below line = actual y is less than predicted y
This person weighs LESS than expected for their height
Possible explanations:
• Unusually low body weight
• Measurement error
• Data entry mistake
Answer: This person weighs significantly less than predicted for their height
A scatterplot shows a strong positive correlation between number of firefighters (x) and amount of fire damage (y) across different fires. Which conclusion is valid?
A) More firefighters cause more damage
B) Larger fires attract more firefighters and cause more damage
C) There is no relationship
D) Fewer firefighters would reduce damage
Solution:
Analyze each choice:
A) Implies causation—firefighters CAUSE damage (obviously wrong!)
B) Recognizes a lurking variable (fire size) affects both
C) Contradicts the stated positive correlation
D) Implies causation in reverse direction
Critical Thinking:
Correlation does NOT mean causation!
Both variables may be caused by a third factor (fire size)
Bigger fires → more firefighters AND more damage
Answer: B) Larger fires attract more firefighters and cause more damage
A scatterplot shows temperature (°C) versus altitude (meters). The line of best fit passes through points (0, 20) and (1000, 10). What is the equation of this line?
Solution:
Step 1: Find the y-intercept
Point (0, 20) means when \(x = 0\), \(y = 20\)
Y-intercept \(b = 20\)
Step 2: Calculate slope
\(m = \frac{y_2 - y_1}{x_2 - x_1} = \frac{10 - 20}{1000 - 0} = \frac{-10}{1000} = -0.01\)
Step 3: Write equation
\(y = mx + b\)
\(y = -0.01x + 20\)
Interpretation:
Slope = -0.01°C per meter
Temperature decreases 0.01°C for each meter of altitude gained
Or: decreases 1°C per 100 meters
Answer: \(y = -0.01x + 20\)
Scatterplot Quick Reference
Pattern | Correlation Type | What It Means |
---|---|---|
Points rise left to right | Positive | As x ↑, y ↑ |
Points fall left to right | Negative | As x ↑, y ↓ |
Tight cluster around line | Strong | Relationship is consistent |
Loose scatter around line | Weak | Relationship is variable |
Random scatter, no pattern | No correlation | Variables not linearly related |
SAT Scatterplot Checklist
Reading the Plot
- Check which variable is on each axis
- Look at overall pattern, not individual points
- Identify correlation type and strength
- Note any outliers
Interpreting Equations
- Slope = rate of change (with units)
- Y-intercept = value when x = 0
- Always use context/variable names
- Check if slope sign makes sense
Making Predictions
- Substitute x-value into equation
- Calculate y-value carefully
- Check if prediction is reasonable
- Be cautious with extrapolation
Critical Thinking
- Correlation ≠ causation
- Consider lurking variables
- Outliers may have meaning
- Context drives interpretation
Scatterplots: Seeing Relationships in Data
Scatterplots are the fundamental tool for exploring relationships between variables across every scientific and social discipline. Whether studying how temperature affects reaction rates, how education correlates with income, how advertising spending relates to sales, or how practice time predicts performance, researchers plot data points to reveal patterns invisible in tables of numbers. The SAT tests these interpretation skills because they represent genuine data literacy: recognizing correlation types, understanding what slope and intercept mean in context, making evidence-based predictions, and most critically, distinguishing between correlation and causation. This last skill—knowing that relationship doesn't imply cause—is perhaps the most important statistical concept for informed citizenship. Master scatterplot analysis not just for test success, but to become someone who can read research, evaluate claims, and draw sound conclusions from data in an age where visualizations influence everything from medical decisions to policy debates.