The Kirkpatrick Model: Evaluating Training at Four Levels

The Kirkpatrick Model is a four-level framework for evaluating the effectiveness of training programs, developed by Donald Kirkpatrick in 1959 and refined through subsequent editions of his work. It remains the most widely cited evaluation model in the organizational learning and development field, structuring how practitioners and researchers measure outcomes from learner reaction through demonstrable business results. The framework applies across corporate, government, and nonprofit training contexts and serves as a foundational reference for measuring training effectiveness within modern L&D practice.

Definition and scope

The Kirkpatrick Model defines four sequential levels of training evaluation: Reaction, Learning, Behavior, and Results. Each level builds on the one below it, forming a causal chain that connects immediate learner experience to long-term organizational outcomes. The model was first formally published by Donald Kirkpatrick in a 1959 series in the Journal of the American Society of Training Directors (now TD Magazine, published by the Association for Talent Development).

The scope of the model covers any structured training intervention — from a single-session compliance course to a multi-month leadership development program — and applies equally to instructor-led, eLearning and digital learning, and blended learning formats. The model does not prescribe specific data collection instruments; it defines the categories of evidence that constitute a complete evaluation.

A 2016 update by Jim and Wendy Kirkpatrick introduced the "New World Kirkpatrick Model," which resequenced program design to begin at Level 4 and work backward to Level 1, emphasizing that evaluation planning must precede training design rather than follow it (Kirkpatrick Partners).

How it works

The four levels operate as a structured hierarchy:

Level 1 — Reaction: Measures how participants respond to the training experience immediately after delivery. Common instruments include post-training surveys, Net Promoter Score-style ratings, and facilitator observation. The Association for Talent Development notes that Level 1 data is the most frequently collected but the least predictive of downstream outcomes on its own.
Level 2 — Learning: Assesses the degree to which participants acquired the intended knowledge, skills, attitudes, confidence, or commitment. Pre/post testing, skills demonstrations, and simulations are standard instruments. This level aligns with the evidence requirements in competency frameworks used to certify proficiency against defined standards.
Level 3 — Behavior: Evaluates whether participants apply what they learned on the job, typically measured 30 to 90 days post-training through supervisor observation, 360-degree feedback, or structured performance reviews. Transfer to behavior is contingent on environmental factors — management reinforcement, opportunity to practice, and removal of structural barriers — that the training itself cannot guarantee.
Level 4 — Results: Connects training outcomes to organizational metrics such as error rate reduction, productivity increases, revenue growth, customer satisfaction scores, or regulatory compliance rates. This level provides the evidentiary basis for calculating return on investment in training.

The causal chain is important: positive Level 1 scores do not guarantee Level 2 acquisition; Level 2 acquisition does not guarantee Level 3 transfer; and Level 3 transfer does not automatically produce Level 4 results. Each transition depends on factors beyond the training design itself.

Common scenarios

The Kirkpatrick Model appears across distinct training contexts, each emphasizing different levels:

Compliance training: Organizations subject to regulatory mandates — such as OSHA safety training or HIPAA privacy requirements — typically prioritize Level 2 (demonstrated knowledge acquisition) and Level 3 (verified behavioral compliance) over Level 1 satisfaction data. A passing score on a post-test satisfies the documentation requirement; whether behavior changed on the floor determines actual risk exposure. The compliance training sector uses the model to structure audit-ready evaluation records.

Onboarding programs: Onboarding and new hire training designs frequently run all four levels over a 90-day window. Level 1 data from week one identifies immediate gaps in content delivery; Level 2 assessments verify role-specific knowledge; Level 3 observations at 30 and 60 days track behavioral integration; Level 4 metrics such as time-to-productivity and 90-day retention rates close the loop.

Leadership development: Extended programs covering soft skills training, decision-making, and team management face the highest measurement challenge at Level 3 and Level 4 because leadership behavior changes slowly and results are multi-causal. Practitioners often supplement Kirkpatrick Level 4 data with the 70-20-10 learning model to attribute outcome contributions across formal training, social learning, and on-the-job experience.

Technical skills programs: Technical skills training programs — particularly those tied to skills gap analysis findings — tend to show cleaner Level 2 to Level 4 correlations because performance metrics are more discrete and traceable.

Decision boundaries

The Kirkpatrick Model is not the appropriate evaluation tool in every context, and practitioners working with the broader learning and development strategy landscape recognize its limits.

Kirkpatrick vs. Phillips ROI Model: Jack Phillips extended the Kirkpatrick framework by adding a fifth level — Return on Investment — that converts Level 4 results into a percentage ROI figure using an isolation methodology and benefit-cost ratio. The Phillips model is more resource-intensive and is typically reserved for high-cost or strategically critical programs, whereas Kirkpatrick Levels 1–3 serve routine evaluation needs.

When Level 4 data is not feasible: For short-duration programs, externally delivered training, or contexts where organizational metrics are unavailable, practitioners often cap evaluation at Level 3. A training needs assessment conducted before program design determines whether the organizational infrastructure exists to capture Level 4 evidence.

Structural prerequisites: Level 3 and Level 4 measurement require managerial involvement, performance data access, and pre-established baselines. Without these, evaluation defaults to Level 1 and Level 2 by necessity — not by design. The learning management systems infrastructure in place at an organization directly determines which data collection methods are operationally viable.

The model's four-level structure is also incorporated into federal workforce development contexts. The Workforce Innovation and Opportunity Act (WIOA, 29 U.S.C. § 3101 et seq.) directs state agencies to measure skill gains and employment outcomes — a framework conceptually aligned with Kirkpatrick Levels 2 and 4 — as conditions of federal funding accountability.

Practitioners navigating the full scope of evaluation options across the learning and development field use the Kirkpatrick Model as a baseline reference against which more specialized frameworks are calibrated.

· ·

The Kirkpatrick Model: Evaluating Training at Four Levels

Definition and scope

How it works

Common scenarios

Decision boundaries

References

Read Next