Technical White Paper

Executive summary

AILAT is an adaptive assessment that measures AI literacy across four dimensions: conceptual knowledge, practical application, critical evaluation, and ethics. The test is 20 items long and completes in roughly 20 minutes.

Questions are selected using a two-parameter logistic Item Response Theory model. After each answer, the test updates a proficiency estimate and chooses the next question to maximize measurement information at that proficiency. Open-ended responses are scored by a language-model rubric calibrated against the same dimension framework.

The result is a dimension-level profile, an AI Literacy Level from 1 to 5, and a set of learning recommendations tied to the user's weakest dimensions and industry context.

AILAT is designed to measure, not sort. The score is a starting point for a learning path — not a gate, not a rank, not a hiring signal.

§ 01

Four dimensions of AI literacy

AI literacy is not one skill. AILAT treats it as four, following the competency structure in Long & Magerko (2020) and later refinements by Ng et al. (2021) and Carolus et al. (2022).

Conceptual knowledge: How AI systems work — terminology, capabilities, limits, and the mechanisms behind modern models.
Use and application: Working with AI tools in practice — prompting, integration, and fitting AI into existing workflows.
Evaluation and creation: Judging AI output for accuracy, fit, and failure modes; building AI-assisted solutions on top of that judgment.
Ethics: Bias, privacy, transparency, and accountability — the responsible-use layer that sits across the other three.

§ 02

Psychometric foundation

AILAT is scored using Item Response Theory (IRT), not raw counts. In an IRT model, each question has known parameters and each respondent has an unknown proficiency θ. The model predicts the probability that a given respondent will answer a given question correctly.

AILAT uses the two-parameter logistic (2PL) model, which assigns each item a difficulty (b, ranging −3 to +3) and a discrimination (a, ranging 0.5 to 2.5). Discrimination controls how sharply the question separates high- and low-proficiency respondents.

P(θ)=11 + e^{−a(θ − b)}

Two-parameter logistic model · probability of correct response

Unlike classical test theory, IRT gives us a natural measure of how much information a question provides at a specific proficiency level. That information function is what makes adaptive testing possible — we can pick the next question to ask by maximizing information at the current θ.

§ 03

Assessment flow

The test runs in three phases. Each phase has a distinct job, and the phase boundaries let us combine breadth, depth, and contextual reasoning without making the whole test adaptive from the first question.

Phase 1 · Calibration

5 questions · fixed, mixed difficulty

One question from each dimension plus a fifth balancer, used to establish an initial proficiency estimate before adaptation begins.

Phase 2 · Adaptive

10 questions · 8 IRT-selected MC + 2 open-ended

Each multiple-choice item is chosen to maximize information at the running proficiency estimate. Two open-ended prompts probe reasoning that multiple-choice cannot capture.

Phase 3 · Scenario

5 questions · industry-contextualized

Four multiple-choice items and one open-ended capstone, framed around a scenario chosen from the respondent's stated industry.

Proficiency is re-estimated after every answer, and respondents are assigned to one of three adaptive tracks. Foundational (θ ≤ −1.0) emphasizes easy items and fuller explanations. Standard (−1.0 < θ < 1.0) mixes difficulty evenly. Advanced (θ ≥ 1.0) prioritizes medium and difficult items to tighten the estimate at the top of the scale.

§ 04

Adaptive algorithms

Two small algorithms do most of the work: selecting the next question, and updating the proficiency estimate after each answer.

Question selection. From the pool of candidate items matching the current track and the weakest dimension, pick the one that maximizes Fisher information at the current θ.

Listing 1 — Maximum-information question selection

function findMaxInfoQuestion(questions, proficiency) {
  return questions.reduce((best, question) => {
    const info = calculateInformation(
      question.IRT_parameters.discrimination,
      question.IRT_parameters.difficulty,
      proficiency,
    );
    return !best || info > best.info ? { question, info } : best;
  }, null).question;
}

function calculateInformation(a, b, theta) {
  const p = calculateProbability(a, b, theta);
  return a ** 2 * p * (1 - p);
}

function calculateProbability(a, b, theta) {
  return 1 / (1 + Math.exp(-a * (theta - b)));
}

Proficiency update. After each response, update θ using a bounded gradient step toward the maximum-likelihood direction. The step size is intentionally small (0.4) so that a single lucky or unlucky answer doesn't swing the estimate.

Listing 2 — Gradient-step proficiency update

function updateProficiency(theta, a, b, isCorrect) {
  const stepSize = 0.4;
  const p = calculateProbability(a, b, theta);
  const gradient = a * ((isCorrect ? 1 : 0) - p);
  return Math.max(-3, Math.min(3, theta + stepSize * gradient));
}

Open-ended evaluation. Each open-ended response is scored by an LLM against dimension-specific guidance combined with per-item criteria — conceptual accuracy for knowledge items, practical application for use-and-apply items, and critical analysis for evaluation items. The per-item rubric is not published, which prevents respondents from optimizing for the scorer rather than demonstrating understanding.

§ 05

Scoring and learning paths

The final profile combines three signals: dimension-level multiple-choice scores weighted by item information, rubric-scored open-ended responses, and performance on the scenario capstone. These feed a single AI Literacy Level on a 1–5 scale.

1
Baseline awareness
Recognizes AI as a category. Can use a tool with step-by-step guidance. Limited ability to judge output.
2
Informed user
Uses AI tools for practical tasks. Identifies obvious errors and hallucinations.
3
Capable practitioner
Integrates AI into workflows. Evaluates output systematically. Understands core concepts like tokens, context, and fine-tuning.
4
Critical evaluator
Reasons about bias, failure modes, and appropriate use. Designs AI solutions with safeguards.
5
AI champion
Shapes strategy around AI capability. Can teach others. Engages with the frontier of the field.

Each level has a matching recommendation template: resource types, depth, and progression strategy calibrated for where the respondent starts. The path is further filtered by dimension weakness and stated industry.

§ 06

Ethics and reliability

Questions are reviewed for cultural and demographic bias before entering the bank. Personal data collection is minimal — the test records industry, role, and motivation because they materially change the scenario phase, and nothing beyond that.

Session state persists across interruptions so respondents can resume without losing progress. If the LLM scoring service is unavailable, the open-ended evaluator falls back to a rule-based scorer that preserves comparability at a coarser grain; the fallback is flagged in the result.

The AI Literacy Level is a measurement of current understanding. It is not a hiring signal, not a proxy for job fitness, and not a ranking of respondents against each other.

§ 07

Limitations

AILAT measures knowledge and reasoning about AI, not hands-on execution with any specific tool. A respondent who scores at Level 4 can reason about model limitations; they may or may not be fluent in any particular API.

The field moves faster than the test. The item bank is reviewed quarterly, and older items that reference deprecated model capabilities are retired rather than patched.

Open-ended scoring depends on current LLM capability. Rubric drift is audited by comparing a held-out set of scored responses against human reviewers each quarter.

References

01Carolus, A., Koch, M., Straka, S., Latoschik, M. E., & Wienrich, C. (2022). MAILS — Meta AI Literacy Scale: Development and Testing of an AI Literacy Questionnaire.
02Cetindamar, D., Kitto, K., Wu, M., Zhang, Y., Abedin, B., & Knight, S. (2022). Explicating AI literacy of employees at digital workplaces. IEEE Transactions on Engineering Management.
03Ding, L., Kim, S., & Allday, R. A. (2024). Development of an AI literacy assessment for non-technical individuals. Contemporary Educational Technology, 16(3), ep512.
04Hornberger, M., Bewersdorff, A., & Nerdel, C. (2023). What do university students know about Artificial Intelligence? Computers and Education: Artificial Intelligence, 5.
05Long, D., & Magerko, B. (2020). What is AI Literacy? Competencies and Design Considerations. CHI 2020, 1–16.
06Ng, D. T. K., Leung, J. K. L., Chu, S. K. W., & Qiao, M. S. (2021). Conceptualizing AI literacy: An exploratory review. Computers and Education: Artificial Intelligence, 2.
07Wang, B., Rau, P. L. P., & Yuan, T. (2022). Measuring user competence in using artificial intelligence. Behaviour & Information Technology.