Relationship
Scatter Plot
Plots two numeric variables as points, revealing correlations, clusters, and outliers.
Study Hours vs Test Score
50 students, midterm exam
View data (50 rows)
| Hours | Score |
|---|---|
| 1 | 35 |
| 1 | 41 |
| 1 | 38 |
| 2 | 42 |
| 2 | 45 |
| 2 | 47 |
| 2 | 40 |
| 3 | 46 |
| 3 | 50 |
| 3 | 53 |
| 3 | 48 |
| 4 | 51 |
| 4 | 55 |
| 4 | 58 |
| 4 | 53 |
| 5 | 56 |
| 5 | 60 |
| 5 | 62 |
| 5 | 58 |
| 5 | 55 |
| 6 | 60 |
| 6 | 64 |
| 6 | 67 |
| 6 | 62 |
| 6 | 65 |
| 7 | 65 |
| 7 | 69 |
| 7 | 72 |
| 7 | 67 |
| 7 | 70 |
| 8 | 71 |
| 8 | 75 |
| 8 | 78 |
| 8 | 72 |
| 8 | 74 |
| 9 | 76 |
| 9 | 80 |
| 9 | 82 |
| 9 | 78 |
| 9 | 79 |
| 10 | 81 |
| 10 | 84 |
| 10 | 88 |
| 10 | 83 |
| 10 | 86 |
| 11 | 87 |
| 11 | 90 |
| 11 | 92 |
| 12 | 93 |
| 12 | 95 |
Use a scatter plot when…
- Exploring correlation between two variables
- Spotting outliers and clusters
- Regression analysis
Avoid when…
- Categorical data
- Evenly-sampled time series (a line chart is usually clearer)
- Too many overlapping points (>1000) — use hexbin or 2D contour
Data it needs
| Property | Value |
|---|---|
| Min Rows | 10 |
| Min Columns | 2 |
| Column Types | numbernumber |
Visual anatomy
Marks
circle
Channels
position-xposition-ycolor-hue (optional)size (optional, for bubble variants)
Axes
x-quantitativey-quantitative
Guiding principles
Consider instead
Common mistakes
Assuming correlation = causation
Overplotting without transparency
Ignoring outliers
History
Pioneered by John Herschel in the 1830s; formalized by Francis Galton in the 1880s.
Accessibility notes
Report the correlation coefficient and describe trend as positive/negative/none. When using color to encode a third variable, pick a colorblind-safe palette and pair color with shape or label so the encoding survives a grayscale print.
Related reading
Got data? Let's see what works.
Drop your CSV. You'll get a Scatter Plot plus four alternatives - ranked by which one actually fits your data best.