build-valuecurve/posts/statistical-tests.qmd

534 lines
30 KiB
Text
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Types of Statistical Tests: A Comprehensive Guide"
format:
html:
toc: true
toc-location: right
toc-depth: 3
theme: cosmo
embed-resources: true
mainfont: "Helvetica Neue"
include-in-header:
- text: |
<style>
h1 { font-size: 2rem; }
h2 { font-size: 1.5rem; }
h3 { font-size: 1.25rem; }
h4 { font-size: 1.1rem; }
table { font-size: 0.85rem; }
table th, table td { padding: 6px 10px; }
table th { font-size: 0.8rem; }
</style>
---
Imagine a carpenter with only a hammer. Every problem becomes a nail, every solution involves pounding. The results would be disastrous—stripped screws, shattered glass, splintered wood. Statistics works the same way. Armed with only one test, researchers force every question into the same mold, producing unreliable answers and misleading conclusions. Mastering the diverse toolkit of statistical tests transforms you from a one-trick amateur into a skilled craftsman of data analysis.
## The Foundation: Why Different Tests Exist
Statistical tests are not interchangeable. Each is designed for specific data types, research questions, and assumptions. Using the wrong test is like measuring temperature with a ruler—the tool simply doesn't match the task.
Three fundamental questions guide test selection: What type of data do you have? What relationship are you investigating? What assumptions can your data satisfy? The answers to these questions narrow the field from dozens of potential tests to the one or two that fit your situation precisely.
Data types form the first filter. **Continuous data** (height, weight, temperature) can take any value within a range. **Categorical data** falls into distinct groups (gender, treatment type, survey responses). **Ordinal data** has categories with a meaningful order but no consistent intervals (satisfaction ratings, education levels). Each data type requires tests designed to handle its unique properties.
## Normality Tests: Checking Your Assumptions
Before selecting a statistical test, you must understand your data's distribution. Many powerful tests assume data follows a normal (bell-shaped) distribution. Violating this assumption can invalidate results entirely.
The **Shapiro-Wilk test** stands as the gold standard for normality testing with small to medium samples (n < 5000). It compares your data's distribution against a theoretical normal distribution, producing a W statistic between 0 and 1. Values close to 1 suggest normality; significantly lower values indicate departure from normality. Its power to detect non-normality exceeds most alternatives, making it the default choice for most applications.
The **D'Agostino-Pearson test** takes a different approach, examining two specific properties: skewness (asymmetry) and kurtosis (tail heaviness). By combining these measures, it identifies not just whether data is non-normal, but why. Is the distribution lopsided? Are the tails too heavy or too light? This diagnostic information guides decisions about data transformation or alternative test selection.
The **Kolmogorov-Smirnov test** offers flexibility that others lack. While typically used for normality testing, it can compare data against any theoretical distribution—exponential, uniform, Poisson, or custom distributions. This generality comes at a cost: it's less powerful than Shapiro-Wilk for detecting non-normality specifically.
The **Anderson-Darling test** improves upon Kolmogorov-Smirnov by weighting tail observations more heavily. Since many important phenomena manifest in distribution tails (extreme events, outliers), this sensitivity often proves valuable. It's particularly useful when tail behavior matters for your analysis.
Visual methods complement these formal tests. **Histograms** reveal distribution shape at a glance. **Q-Q plots** compare data quantiles against theoretical quantiles—points falling along a diagonal line indicate normality. **Box plots** display median, quartiles, and outliers compactly. No single method suffices; combining visual inspection with formal testing provides the most complete picture.
## Parametric Tests: Power Through Assumptions
Parametric tests assume data follows a specific distribution (usually normal) and estimate population parameters like means and variances. When assumptions hold, these tests offer maximum statistical power—the ability to detect real effects.
### T-Tests: Comparing Means
The **one-sample t-test** addresses a simple question: does my sample mean differ from a known or hypothesized value? A manufacturer might test whether average product weight equals the target specification. A teacher might assess whether class performance differs from the national average. The test calculates how many standard errors separate the sample mean from the hypothesized value, translating this distance into a probability.
The **two-sample independent t-test** compares means between two unrelated groups. Do men and women differ in height? Do treatment and control groups show different outcomes? The test assumes both groups are normally distributed with equal variances. When the equal variance assumption fails, **Welch's t-test** provides a robust alternative, adjusting degrees of freedom to account for variance differences. Many statisticians now recommend Welch's test as the default, since it performs well even when variances are equal.
The **paired t-test** handles related measurements—the same subjects measured twice, or naturally matched pairs. Before-and-after studies, twin comparisons, and left-right eye measurements all call for paired analysis. By focusing on within-pair differences rather than raw values, this test eliminates between-subject variability, dramatically increasing statistical power. An effect invisible to independent comparison often emerges clearly with paired analysis.
### ANOVA: Comparing Multiple Groups
When comparing three or more groups, multiple t-tests create problems. Each test carries a 5% false positive risk; conducting many tests accumulates this risk until false positives become likely. **Analysis of Variance (ANOVA)** solves this by testing all groups simultaneously.
**One-way ANOVA** compares means across multiple groups for a single factor. Do students from different schools perform differently? Does crop yield vary across fertilizer types? ANOVA partitions total variability into between-group and within-group components, asking whether between-group differences exceed what within-group variability would predict by chance.
A significant ANOVA result indicates that at least one group differs—but not which one. **Post-hoc tests** like Tukey's HSD, Bonferroni correction, or Scheffé's method identify specific group differences while controlling overall error rate.
**Levene's test** checks the equal variance assumption critical to ANOVA. When variances differ substantially, **Welch's ANOVA** provides a robust alternative that doesn't require this assumption.
## Non-Parametric Tests: Distribution-Free Alternatives
When data violates normality assumptions or consists of ranks and ratings, non-parametric tests provide reliable alternatives. These tests make fewer assumptions, trading some statistical power for broader applicability.
The **Mann-Whitney U test** (also called Wilcoxon rank-sum) serves as the non-parametric counterpart to the independent t-test. Rather than comparing means, it compares rank distributions between two groups. After combining and ranking all observations, it tests whether one group's ranks are systematically higher than the other's. This approach handles skewed distributions, ordinal data, and outliers gracefully.
The **Wilcoxon signed-rank test** parallels the paired t-test for non-normal data. It ranks the absolute differences between paired observations, then compares positive and negative rank sums. If treatment has no effect, positive and negative differences should balance; systematic imbalance suggests a real effect.
The **Kruskal-Wallis test** extends Mann-Whitney to three or more groups, serving as the non-parametric alternative to one-way ANOVA. It ranks all observations regardless of group membership, then tests whether mean ranks differ across groups. Like ANOVA, a significant result requires follow-up tests (typically Dunn's test) to identify which specific groups differ.
### Choosing Between Parametric and Non-Parametric
The decision isn't always straightforward. Parametric tests offer more power when assumptions hold, but non-parametric tests provide protection when they don't. Consider these guidelines:
Use parametric tests when data is continuous, approximately normal (or n > 30 per group), and variances are roughly equal. Use non-parametric tests when data is ordinal, clearly non-normal, contains significant outliers, or sample sizes are small and distribution unknown.
When uncertain, running both types of tests provides insight. If they agree, report the parametric result for its greater power. If they disagree, the non-parametric result is typically more trustworthy.
## Categorical Tests: Analyzing Frequencies
When both variables are categorical, entirely different tests apply. These analyze counts and proportions rather than means.
The **Chi-square test of independence** assesses whether two categorical variables are related. Is survival associated with passenger class? Does political affiliation relate to geographic region? The test compares observed cell frequencies in a contingency table against frequencies expected under independence. Large discrepancies suggest association.
Chi-square requires adequate sample sizes—expected frequencies should exceed 5 in each cell. When this condition fails, **Fisher's exact test** provides an exact probability rather than an approximation. Originally designed for 2×2 tables, extensions now handle larger tables, though computational demands increase rapidly.
## Correlation Tests: Measuring Relationships
Correlation quantifies the strength and direction of association between two continuous variables.
**Pearson correlation** measures linear relationships, producing the familiar r coefficient ranging from -1 to +1. Perfect positive correlation (r = 1) means variables move together proportionally; perfect negative correlation (r = -1) means they move oppositely; zero correlation indicates no linear relationship. Pearson assumes both variables are normally distributed and related linearly.
**Spearman correlation** measures monotonic relationships using ranks rather than raw values. It captures associations where variables consistently move together (or oppositely) without requiring a linear pattern. Robust to outliers and applicable to ordinal data, Spearman serves as the non-parametric alternative to Pearson.
**Kendall's tau** also measures monotonic association but uses a different approach: counting concordant versus discordant pairs of observations. More robust than Spearman with small samples or many tied values, Kendall's coefficient tends toward smaller absolute values than Spearman's, complicating direct comparison.
## Making the Right Choice: A Decision Framework
The following interactive fishbone diagram organizes statistical tests by category. **Click on any test** to see its description and when to use it.
```{ojs}
//| echo: false
// Test data with descriptions
testDescriptions = ({
t1: { name: 'One-Sample T-Test', description: 'Compares a sample mean to a known or hypothesized population value.', use: 'When you have one group and want to test if its mean differs from a specific value.', example: 'Testing if average product weight equals the target specification.', parametric: true },
t2: { name: 'Wilcoxon Signed-Rank', description: 'Non-parametric test comparing a sample to a hypothesized value using ranks.', use: 'When data is not normally distributed but you want to compare to a standard value.', example: 'Testing if median customer satisfaction differs from neutral.', parametric: false },
t3: { name: 'Mann-Whitney U', description: 'Compares distributions of two independent groups using ranks.', use: 'When comparing two groups with non-normal data or ordinal measurements.', example: 'Comparing pain ratings between treatment and placebo groups.', parametric: false },
t4: { name: 'Independent T-Test', description: 'Compares means of two independent groups assuming equal variances.', use: 'When comparing two unrelated groups with normal data and similar spreads.', example: 'Comparing test scores between two different teaching methods.', parametric: true },
t5: { name: "Welch's T-Test", description: 'Compares means of two independent groups without assuming equal variances.', use: 'When comparing two groups that may have different variability.', example: 'Comparing reaction times between young and elderly participants.', parametric: true },
t6: { name: 'Paired T-Test', description: 'Compares means of two related measurements on the same subjects.', use: 'When you have before-after measurements or matched pairs.', example: 'Testing if a training program improved employee performance.', parametric: true },
t7: { name: 'Wilcoxon Paired', description: 'Non-parametric test for paired data using ranks of differences.', use: 'When paired data is not normally distributed.', example: 'Comparing patient pain levels before and after treatment with skewed data.', parametric: false },
t8: { name: 'Kruskal-Wallis', description: 'Non-parametric comparison of three or more independent groups.', use: 'When comparing multiple groups with non-normal or ordinal data.', example: 'Comparing satisfaction ratings across four different product versions.', parametric: false },
t9: { name: 'One-Way ANOVA', description: 'Compares means across three or more groups simultaneously.', use: 'When comparing multiple groups with normal data and equal variances.', example: 'Testing if crop yields differ across three fertilizer types.', parametric: true },
t10: { name: "Welch's ANOVA", description: 'Robust ANOVA that does not assume equal variances across groups.', use: 'When comparing multiple groups that may have different variability.', example: 'Comparing salaries across departments with different spreads.', parametric: true },
t11: { name: 'Pearson Correlation', description: 'Measures the linear relationship between two continuous variables.', use: 'When assessing how strongly two variables move together linearly.', example: 'Examining the relationship between study hours and exam scores.', parametric: true },
t12: { name: 'Spearman Correlation', description: 'Measures monotonic relationships using ranks, robust to outliers.', use: 'When data is ordinal or the relationship is not linear.', example: 'Correlating education level with income category.', parametric: false },
t13: { name: 'Chi-Square Test', description: 'Tests association between two categorical variables.', use: 'When examining if two categorical variables are related.', example: 'Testing if smoking status is associated with disease incidence.', parametric: false },
t14: { name: "Fisher's Exact", description: 'Exact test for association in small sample contingency tables.', use: 'When expected cell counts are too small for chi-square.', example: 'Testing treatment effectiveness with only 20 patients total.', parametric: false }
})
mutable selectedTest = null
width = 1100
height = 580
// Category structure for fishbone
categories = [
{
name: "1 Sample vs Value",
color: "#e91e63",
tests: [
{ id: 't1', label: 'One-Sample T-Test', condition: 'Normal data' },
{ id: 't2', label: 'Wilcoxon Signed-Rank', condition: 'Non-normal' }
]
},
{
name: "2 Groups Independent",
color: "#9c27b0",
tests: [
{ id: 't4', label: 'Independent T-Test', condition: 'Normal, equal var' },
{ id: 't5', label: "Welch's T-Test", condition: 'Normal, unequal var' },
{ id: 't3', label: 'Mann-Whitney U', condition: 'Non-normal' }
]
},
{
name: "2 Groups Paired",
color: "#673ab7",
tests: [
{ id: 't6', label: 'Paired T-Test', condition: 'Normal data' },
{ id: 't7', label: 'Wilcoxon Paired', condition: 'Non-normal' }
]
},
{
name: "3+ Groups",
color: "#3f51b5",
tests: [
{ id: 't9', label: 'One-Way ANOVA', condition: 'Normal, equal var' },
{ id: 't10', label: "Welch's ANOVA", condition: 'Normal, unequal var' },
{ id: 't8', label: 'Kruskal-Wallis', condition: 'Non-normal' }
]
},
{
name: "Correlation",
color: "#00796b",
tests: [
{ id: 't11', label: 'Pearson', condition: 'Linear, normal' },
{ id: 't12', label: 'Spearman', condition: 'Monotonic/ordinal' }
]
},
{
name: "Categorical",
color: "#ff5722",
tests: [
{ id: 't13', label: 'Chi-Square', condition: 'Large sample (n>5)' },
{ id: 't14', label: "Fisher's Exact", condition: 'Small sample' }
]
}
]
{
// Create wrapper div for controls + SVG
const wrapper = d3.create("div")
.style("position", "relative");
// Zoom controls
const controls = wrapper.append("div")
.style("display", "flex")
.style("gap", "8px")
.style("margin-bottom", "10px")
.style("justify-content", "center");
const buttonStyle = `
padding: 8px 16px;
border: none;
border-radius: 6px;
background: #1565c0;
color: white;
font-size: 14px;
font-weight: 600;
cursor: pointer;
transition: background 0.2s;
`;
const zoomInBtn = controls.append("button")
.attr("style", buttonStyle)
.text("+ Zoom In")
.on("mouseover", function() { d3.select(this).style("background", "#1976d2"); })
.on("mouseout", function() { d3.select(this).style("background", "#1565c0"); });
const zoomOutBtn = controls.append("button")
.attr("style", buttonStyle)
.text(" Zoom Out")
.on("mouseover", function() { d3.select(this).style("background", "#1976d2"); })
.on("mouseout", function() { d3.select(this).style("background", "#1565c0"); });
const resetBtn = controls.append("button")
.attr("style", buttonStyle.replace("#1565c0", "#546e7a").replace("#1976d2", "#607d8b"))
.text("Reset View")
.on("mouseover", function() { d3.select(this).style("background", "#607d8b"); })
.on("mouseout", function() { d3.select(this).style("background", "#546e7a"); });
const svg = wrapper.append("svg")
.attr("viewBox", [0, 0, width, height])
.attr("width", "100%")
.attr("height", 520)
.style("font-family", "-apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif")
.style("background", "linear-gradient(180deg, #fafbfc 0%, #f0f2f5 100%)")
.style("border-radius", "12px")
.style("box-shadow", "0 4px 12px rgba(0,0,0,0.08)")
.style("cursor", "grab");
const defs = svg.append("defs");
// Drop shadow
const filter = defs.append("filter")
.attr("id", "shadow")
.attr("x", "-20%").attr("y", "-20%")
.attr("width", "140%").attr("height", "140%");
filter.append("feDropShadow")
.attr("dx", "1").attr("dy", "2")
.attr("stdDeviation", "3")
.attr("flood-opacity", "0.12");
// Create container group for zoom/pan
const container = svg.append("g");
// Zoom behavior
const zoom = d3.zoom()
.scaleExtent([0.5, 3])
.on("zoom", (event) => {
container.attr("transform", event.transform);
});
svg.call(zoom);
// Button handlers
zoomInBtn.on("click", () => {
svg.transition().duration(300).call(zoom.scaleBy, 1.3);
});
zoomOutBtn.on("click", () => {
svg.transition().duration(300).call(zoom.scaleBy, 0.7);
});
resetBtn.on("click", () => {
svg.transition().duration(300).call(zoom.transform, d3.zoomIdentity);
});
// Main spine
const spineY = height / 2;
const spineStartX = 80;
const spineEndX = width - 80;
// Draw main spine (thick line)
container.append("line")
.attr("x1", spineStartX)
.attr("y1", spineY)
.attr("x2", spineEndX)
.attr("y2", spineY)
.attr("stroke", "#37474f")
.attr("stroke-width", 4)
.attr("stroke-linecap", "round");
// Head (effect) - "Which Statistical Test?"
const headGroup = container.append("g")
.attr("transform", `translate(${spineEndX + 10}, ${spineY})`);
headGroup.append("polygon")
.attr("points", "0,-35 100,-35 120,0 100,35 0,35")
.attr("fill", "#1565c0")
.attr("filter", "url(#shadow)");
headGroup.append("text")
.attr("x", 55)
.attr("y", -5)
.attr("text-anchor", "middle")
.attr("fill", "white")
.attr("font-size", "12px")
.attr("font-weight", "600")
.text("Which");
headGroup.append("text")
.attr("x", 55)
.attr("y", 12)
.attr("text-anchor", "middle")
.attr("fill", "white")
.attr("font-size", "12px")
.attr("font-weight", "600")
.text("Statistical Test?");
// Calculate positions for branches
const branchSpacing = (spineEndX - spineStartX - 40) / (categories.length);
const topCategories = categories.filter((_, i) => i % 2 === 0);
const bottomCategories = categories.filter((_, i) => i % 2 === 1);
// Draw branches
categories.forEach((cat, i) => {
const isTop = i % 2 === 0;
const branchX = spineStartX + 60 + (i * branchSpacing);
const branchEndY = isTop ? spineY - 180 : spineY + 180;
const direction = isTop ? -1 : 1;
// Main branch line
container.append("line")
.attr("x1", branchX)
.attr("y1", spineY)
.attr("x2", branchX)
.attr("y2", branchEndY)
.attr("stroke", cat.color)
.attr("stroke-width", 3)
.attr("stroke-linecap", "round");
// Category label box
const labelGroup = container.append("g")
.attr("transform", `translate(${branchX}, ${branchEndY + direction * 25})`);
labelGroup.append("rect")
.attr("x", -65)
.attr("y", -14)
.attr("width", 130)
.attr("height", 28)
.attr("rx", 14)
.attr("fill", cat.color)
.attr("filter", "url(#shadow)");
labelGroup.append("text")
.attr("text-anchor", "middle")
.attr("dominant-baseline", "middle")
.attr("fill", "white")
.attr("font-size", "11px")
.attr("font-weight", "600")
.text(cat.name);
// Draw test nodes along the branch
const testSpacing = 140 / (cat.tests.length + 1);
cat.tests.forEach((test, j) => {
const testY = spineY + direction * (40 + (j + 1) * testSpacing);
const testX = branchX + (isTop ? 70 : -70);
// Small branch to test
container.append("line")
.attr("x1", branchX)
.attr("y1", testY)
.attr("x2", testX - (isTop ? 5 : -5))
.attr("y2", testY)
.attr("stroke", cat.color)
.attr("stroke-width", 2)
.attr("opacity", 0.7);
// Test node group
const testGroup = container.append("g")
.attr("transform", `translate(${testX}, ${testY})`)
.style("cursor", "pointer")
.on("click", () => {
mutable selectedTest = testDescriptions[test.id];
})
.on("mouseover", function() {
d3.select(this).select("rect")
.transition().duration(150)
.attr("transform", "scale(1.05)")
.attr("stroke-width", 3);
})
.on("mouseout", function() {
d3.select(this).select("rect")
.transition().duration(150)
.attr("transform", "scale(1)")
.attr("stroke-width", 2);
});
// Test box
const boxWidth = 115;
testGroup.append("rect")
.attr("x", isTop ? 0 : -boxWidth)
.attr("y", -20)
.attr("width", boxWidth)
.attr("height", 40)
.attr("rx", 6)
.attr("fill", "white")
.attr("stroke", cat.color)
.attr("stroke-width", 2)
.attr("filter", "url(#shadow)");
// Test name
testGroup.append("text")
.attr("x", isTop ? boxWidth/2 : -boxWidth/2)
.attr("y", -4)
.attr("text-anchor", "middle")
.attr("fill", "#37474f")
.attr("font-size", "10px")
.attr("font-weight", "600")
.text(test.label);
// Condition (when to use)
testGroup.append("text")
.attr("x", isTop ? boxWidth/2 : -boxWidth/2)
.attr("y", 10)
.attr("text-anchor", "middle")
.attr("fill", "#455a64")
.attr("font-size", "9px")
.attr("font-weight", "500")
.text(test.condition);
});
});
// Title at the start
container.append("text")
.attr("x", 40)
.attr("y", spineY - 8)
.attr("text-anchor", "start")
.attr("fill", "#37474f")
.attr("font-size", "13px")
.attr("font-weight", "600")
.text("Research");
container.append("text")
.attr("x", 40)
.attr("y", spineY + 10)
.attr("text-anchor", "start")
.attr("fill", "#37474f")
.attr("font-size", "13px")
.attr("font-weight", "600")
.text("Goal");
// Legend
const legendY = height - 35;
container.append("text")
.attr("x", width / 2)
.attr("y", legendY)
.attr("text-anchor", "middle")
.attr("fill", "#546e7a")
.attr("font-size", "12px")
.attr("font-weight", "500")
.text("Click any test • Scroll to zoom • Drag to pan");
return wrapper.node();
}
```
```{ojs}
//| echo: false
{
if (selectedTest) {
return html`
<div style="background: linear-gradient(135deg, #ffffff, #f8f9fa); border: 1px solid #e3f2fd; border-radius: 10px; padding: 14px 16px; margin-top: 14px; box-shadow: 0 2px 8px rgba(0,0,0,0.06);">
<div style="display: flex; align-items: center; gap: 10px; margin-bottom: 10px;">
<h4 style="margin: 0; color: #1565c0; font-size: 16px; font-weight: 600;">${selectedTest.name}</h4>
<span style="display: inline-block; padding: 2px 8px; border-radius: 12px; font-size: 10px; font-weight: 600; background: ${selectedTest.parametric ? '#e8f5e9' : '#fce4ec'}; color: ${selectedTest.parametric ? '#2e7d32' : '#c2185b'};">
${selectedTest.parametric ? 'Parametric' : 'Non-Parametric'}
</span>
</div>
<div style="display: grid; gap: 8px;">
<div style="background: white; padding: 8px 12px; border-radius: 6px; border-left: 3px solid #1976d2;">
<strong style="color: #455a64; font-size: 10px; text-transform: uppercase; letter-spacing: 0.5px;">Description</strong>
<p style="margin: 2px 0 0 0; color: #37474f; font-size: 13px; line-height: 1.4;">${selectedTest.description}</p>
</div>
<div style="background: white; padding: 8px 12px; border-radius: 6px; border-left: 3px solid #43a047;">
<strong style="color: #455a64; font-size: 10px; text-transform: uppercase; letter-spacing: 0.5px;">When to Use</strong>
<p style="margin: 2px 0 0 0; color: #37474f; font-size: 13px; line-height: 1.4;">${selectedTest.use}</p>
</div>
<div style="background: white; padding: 8px 12px; border-radius: 6px; border-left: 3px solid #fb8c00;">
<strong style="color: #455a64; font-size: 10px; text-transform: uppercase; letter-spacing: 0.5px;">Example</strong>
<p style="margin: 2px 0 0 0; color: #37474f; font-size: 13px; line-height: 1.4;">${selectedTest.example}</p>
</div>
</div>
</div>
`;
} else {
return html`
<div style="background: linear-gradient(135deg, #e3f2fd, #f5f5f5); border: 1px dashed #90caf9; border-radius: 10px; padding: 20px; margin-top: 14px; text-align: center;">
<p style="margin: 0; color: #546e7a; font-size: 13px;">Click on any <span style="color: #1565c0; font-weight: 600;">test box</span> in the diagram above to see detailed information</p>
</div>
`;
}
}
```
### Quick Reference Table
| Scenario | Parametric Test | Non-Parametric Alternative |
|----------|-----------------|---------------------------|
| 1 sample vs known value | One-Sample T-Test | Wilcoxon Signed-Rank |
| 2 independent groups | Independent T-Test / Welch's | Mann-Whitney U |
| 2 paired/matched groups | Paired T-Test | Wilcoxon Signed-Rank |
| 3+ independent groups | One-Way ANOVA | Kruskal-Wallis |
| Correlation (continuous) | Pearson | Spearman / Kendall |
| Association (categorical) | Chi-Square | Fisher's Exact |
Selecting the appropriate test follows a logical sequence:
First, identify your research question. Are you comparing groups, measuring association, or testing against a known value? Are you examining one variable, two variables, or more?
Second, characterize your data. Is the outcome continuous, ordinal, or categorical? How many groups or variables are involved? Is the design independent or paired/related?
Third, check assumptions. Is the data approximately normal? Are variances equal across groups? Are expected frequencies sufficient for chi-square?
Fourth, select the test that matches your question, data type, and satisfied assumptions. When assumptions are violated, choose robust alternatives.
Finally, remember that statistical tests answer narrow questions. They indicate whether effects exist, not whether they matter. Always supplement significance tests with effect sizes, confidence intervals, and practical interpretation.
## Conclusion
The diversity of statistical tests reflects the diversity of research questions and data types we encounter. No single test serves all purposes; no universal approach handles all situations. The skilled analyst matches tools to tasks, selecting tests whose assumptions align with data characteristics and whose outputs address research questions.
This matching process requires both technical knowledge and practical judgment. Knowing what each test does, what it assumes, and when it fails empowers researchers to extract valid insights from data while avoiding the pitfalls of misapplied methods.
Statistical tests are not arbitrary rituals but carefully designed tools, each optimized for specific purposes. Understanding their logic—not just their mechanics—transforms test selection from cookbook following to principled reasoning. And principled reasoning, ultimately, is what separates meaningful analysis from statistical theater.