Table of Contents

Item response theory

Item Response Theory (IRT)

Item response theory or IRT is one of the psychometric theories. It can be used

IRT represents a family of mathematical models describing the probability with which examinees with different knowledge levels will provide an correct (or incorrect) answer to an assessment item, and is based on two postulates2):

Although principles of adapting the test to the examinee was recognized as a good practice in psychological measurement even from the middle of the 20th century (for example Carl Binet's, later Stanford-Binet IQ Test3)), it was an unscalable time-consuming task. But computers enabled implementation and automation of IRT procedures.

Item Characteristic Curve (ICC)

Every possible test item is described with its item characteristic curve (ICC) defined by the function P(θ). This function describes the probability that a subject with the knowledge level of θ will answer the given test item correctly. It is defined on the set of real numbers, but often plotted only on the interval of [-3,3]. There are various kinds of item characteristic curves, depending on how many parameters are included. Most commonly, one, two or three.

The One-Parameter Model

The one-parameter model, also called the Rasch model was introduced in 1960s by Georg Rasch.4) The equation describing the Rasch model is:

The Two-Parameter Mode

The two-parameter model or the logistic function, which was first used as a model to describe the item characteristic curve in the 1950s is defined as:

The Three-Parameter Model

A component that accounts for the probability of guessing the correct answer was introduced into the two-parameter model in the 1968 by Birnbaum, thereby creating the three-parameter model described with the following formula:

Often, another extension in the formula is used5)6):

Example of an item characteristic curve. Image borrowed from: Orr, Cornelia. The ABC’s of Pattern Scoring. Florida Department of Education. Click on the picture to follow the link.

The Item Information Function (IIF)

Example of a few item information functions. Image borrowed from: Weiss, David J. University of Minnesota. IRT-Based CAT. Click on the picture to follow the link.

The item information function or just information function provides a measurement of the amount of information provided by an assessment item at a given ability level. The item information function is usually calculated as:

On the IIF graph, the location of the center of the IIF reflects the difficulty of the item, the height of the IIF reflects the item discrimination, and its asymmetry reflects the magnitude of the pseudo guessing parameter.7)

In the usual CAT approach, at any point of the assessment, the next item that will be presented is the one that will provide most information for the assessed ability level.8)

In CAT tests

Computer adaptive testing (CAT)

Computer adaptive testing or CAT refers to

IRT-based CAT algorithm

The development of IRT models provided a theoretical basis for principles and procedures of CAT and further their further improvements. A simple CAT algorithm runs as follows:11))

  1. Unless known otherwise, the initial ability level θ is assumed to be in the middle of the scale; θ=0.
  2. Define step size S, for example 3 (the larger the value, the sooner the loop coming next will finish).
  3. Repeat
    1. Based on the IIF select the item providing most information for the current level θ and present it to the examinee.
    2. If the answer is correct θ=θ+S. Else θ=θ-S.
  4. Until a correct and a false answer have been obtained at least once.
  5. Repeat
    1. Estimate the a-posteriori probability distribution of the student’s knowledge using Bayes' rule
    2. Estimate the knowledge level as the median of the calculated distribution
    3. Item providing most information for estimated knowledge level (based in the IIF) is presented to the examinee and obtain an answer
  6. Until a stopping rule is satisfied (for example: maximum number of presented items is reached, estimated knowledge distribution has small enough variance)