AI Governance & Regulatory Compliance Resource

AI Safety Level

Tiered classification systems for AI risk governance -- assigning graduated safety requirements based on assessed system capability, deployment context, and potential for harm

Risk Classification Tiers EU AI Act Risk Categories Capability Evaluation Graduated Safeguards
Contact Us

Strategic Safeguards Portfolio

11 USPTO Trademark Applications | 143 Strategic Domains | 3 Regulatory Frameworks

11
USPTO Filings
143
Strategic Domains
3
Regulatory Frameworks

Platform Domains

Tiered Safety Classification as a Governance Mechanism

Why Systems Get Sorted Into Safety Levels

Assigning safety levels to systems based on their risk profile is one of the oldest and most universal patterns in governance. The underlying logic is straightforward: not all systems pose equivalent danger, and governance resources are finite, so regulators and institutions sort systems into tiers and calibrate oversight intensity accordingly. A laboratory handling common soil bacteria requires different containment from one working with hemorrhagic fever viruses. A consumer product containing trace amounts of a mild irritant requires different labeling from one containing a lethal toxin. An AI system that recommends restaurant listings requires different oversight from one that screens patients for cancer.

The concept of "AI safety level" applies this tiered classification approach to artificial intelligence systems. As AI capabilities expand across domains with varying potential consequences -- from content recommendation to autonomous vehicle operation to biomedical research assistance -- the governance challenge becomes sorting these systems into meaningful categories and attaching proportionate safety requirements to each tier. Multiple frameworks now implement this concept, from statutory regulatory classifications to voluntary industry governance structures, each defining their own tiers, thresholds, and corresponding obligations.

Biosafety Levels: The Originating Analogy

The biosafety level system maintained by the U.S. Centers for Disease Control and Prevention and the National Institutes of Health provides the most direct precedent for tiered AI safety classification. The four biosafety levels (BSL-1 through BSL-4) define escalating containment requirements matched to the risk posed by biological agents under study.

BSL-1 facilities handle well-characterized agents not known to cause disease in immunocompetent adults, requiring only standard microbiological practices, bench-top work surfaces, and basic personal protective equipment. BSL-2 facilities work with agents posing moderate hazard, adding restricted access, biohazard warning signs, sharps precautions, and biosafety cabinets for aerosol-generating procedures. BSL-3 laboratories handle indigenous or exotic agents with potential for aerosol transmission and serious or lethal infection, requiring respiratory protection, HEPA-filtered exhaust, sealed penetrations, and dual-door access with directional airflow. BSL-4 facilities -- fewer than fifty worldwide -- manage the most dangerous agents for which no vaccines or treatments exist, demanding full positive-pressure personnel suits, chemical shower decontamination on exit, dedicated air supply, and complete physical isolation from all other laboratory functions.

The BSL system demonstrates several properties essential to effective safety level classification: each tier has objective, assessable criteria for agent assignment; the corresponding safeguards at each level are specific and verifiable rather than aspirational; transitions between levels require concrete operational changes; and the system has operated with international consistency for decades. These properties -- objectivity, specificity, verifiability, and interoperability -- define what any AI safety level system must achieve to function as effective governance.

Chemical Hazard Classification: The Globally Harmonized System

The Globally Harmonized System of Classification and Labelling of Chemicals (GHS), adopted through United Nations coordination and implemented in over sixty-five countries, assigns hazard categories and signal words based on quantitative toxicity thresholds. Acute toxicity classifications range from Category 1 (fatal in very small doses) through Category 5 (may be harmful in large doses), with each category triggering specific labeling requirements, pictograms, precautionary statements, and handling protocols. The GHS demonstrates that tiered safety classification can operate at global scale across jurisdictions with different regulatory cultures when the underlying criteria are sufficiently objective and the corresponding requirements sufficiently concrete.

Nuclear Event Classification: The INES Scale

The International Nuclear and Radiological Event Scale (INES), maintained by the International Atomic Energy Agency, classifies nuclear events on a seven-level scale from Level 1 (anomaly) through Level 7 (major accident). Events at Levels 1-3 are classified as incidents; Levels 4-7 are classified as accidents. Each level corresponds to defined criteria regarding radiological consequences, degradation of defence-in-depth, and impact on people and the environment. The Fukushima Daiichi disaster was classified Level 7; the Three Mile Island partial meltdown was classified Level 5. The INES scale serves both operational governance and public communication functions, translating complex technical assessments into standardized severity categories that inform regulatory response, public notification, and international coordination.

AI Safety Level Frameworks in Practice

EU AI Act: Statutory Risk Tiers with Binding Obligations

The EU AI Act (Regulation 2024/1689) establishes the most comprehensive legally binding AI safety level system currently in force. The Act classifies AI systems into four risk tiers, each carrying distinct regulatory obligations:

The Act further implements safety levels for general-purpose AI models, distinguishing between standard GPAI obligations and enhanced obligations for models posing systemic risk. The systemic risk threshold -- currently defined at a cumulative training compute exceeding 10^25 floating point operations -- creates a quantitative boundary between safety levels, with systemic risk classification triggering additional requirements for model evaluation, adversarial testing, incident tracking, and cybersecurity protections. Enforcement begins with prohibited practices from February 2025, GPAI obligations from August 2025, and full high-risk system requirements from August 2026.

Frontier AI Developer Classification Systems

Multiple frontier AI developers have implemented internal classification systems that assign safety levels to models based on assessed capabilities. These frameworks share the structural logic of the BSL system: evaluate the system against defined criteria, assign a level, and apply the safeguards prescribed for that level before proceeding to the next stage of development or deployment.

Google DeepMind's Frontier Safety Framework, published in 2024 and updated in February 2025, defines Critical Capability Levels across domains including autonomous replication, cybersecurity, biosecurity, self-proliferation, and machine learning research. Each domain has graduated severity thresholds, and models are evaluated against these thresholds through structured capability assessments. Reaching a defined capability level triggers corresponding security and deployment mitigation requirements that must be implemented before the model can advance.

OpenAI's Preparedness Framework evaluates models across tracked risk categories using a scorecard methodology, assigning risk levels from low through critical. The framework specifies governance thresholds: models assessed at certain risk levels cannot be deployed without explicit authorization from designated decision-makers, and models exceeding critical thresholds cannot be deployed at all. The governance structure links safety level assessment directly to organizational decision rights.

Anthropic, Meta, and other developers have published or implemented analogous tiered classification systems that assign safety levels based on capability evaluations and prescribe graduated governance requirements at each tier. The convergence across independently designed frameworks -- each arriving at tiered safety level architectures -- reflects that this classification approach addresses a genuine structural governance need rather than any single organization's methodology. Academic researchers and policy analysts increasingly reference "AI safety levels" as a generic category describing the full range of tiered classification systems operating across the industry.

National AI Safety Institutes and Evaluation Infrastructure

Government AI safety evaluation bodies have emerged as key infrastructure for AI safety level assessment. The UK AI Safety Institute, established following the November 2023 Bletchley Park summit, conducts pre-deployment evaluations of frontier AI models. These evaluations assess model capabilities against risk-relevant benchmarks, producing structured assessments that inform both developers and policymakers about where systems fall along capability spectrums. The US AI Safety Institute, housed within NIST, pursues complementary evaluation work including development of standardized measurement approaches for AI capabilities and risks.

These institutions address a fundamental requirement of any safety level system: credible, independent evaluation. The BSL system relies on institutional biosafety committees and federal oversight for agent classification. The GHS relies on standardized testing protocols and regulatory agency review. AI safety level systems require analogous evaluation infrastructure -- organizations capable of assessing AI capabilities reliably enough to assign classification levels that govern downstream safety requirements. The buildout of national and international AI evaluation capacity directly enables the operational viability of tiered AI safety classification.

Technical Foundations and Open Challenges

Capability Evaluation and Threshold Design

The technical integrity of any AI safety level system depends on the reliability of its capability assessments. In biosafety, the properties of biological agents are reasonably stable and well-characterized: a pathogen's transmission route, mortality rate, and treatment availability do not change between evaluations. AI systems present a fundamentally different assessment challenge. Model capabilities can emerge unpredictably during training, may be elicited through novel prompting techniques discovered after initial evaluation, and can shift when models are fine-tuned or combined with external tools and data sources.

Benchmark saturation -- where leading models achieve near-ceiling performance on established tests -- reduces the discriminative power of evaluations used to distinguish safety levels. Evaluation gaming, where systems optimize for benchmark performance without corresponding real-world capability, further complicates assessment. The AI safety research community continues developing more robust evaluation methodologies including red-teaming protocols, uplift studies measuring whether models provide meaningful assistance on dangerous tasks, and agent-based evaluations testing autonomous behavior in realistic environments. These methodological advances are essential for AI safety level systems to maintain their classification validity as AI capabilities advance.

Dynamic Classification and Continuous Monitoring

Unlike biological agents whose risk properties remain constant, AI systems can change in capability through fine-tuning, tool integration, or deployment in novel contexts. This means AI safety level assignment cannot be a one-time classification event. Effective governance requires continuous or periodic reassessment, with mechanisms to reclassify systems whose risk profile has shifted. The EU AI Act addresses this partially through post-market monitoring obligations requiring providers to maintain surveillance systems that detect changes in system behavior or risk profile after deployment.

The monitoring challenge also applies at the population level. As AI systems proliferate, the cumulative risk posed by many individually moderate-risk systems interacting may exceed the sum of individual risk assessments. No current AI safety level framework systematically addresses this systemic interaction dimension, though the EU AI Act's systemic risk category for general-purpose AI models represents an initial approach to classification beyond the individual system level.

International Harmonization

Multiple jurisdictions are developing AI classification systems with different tier structures, threshold definitions, and compliance requirements. The EU AI Act's four-tier risk classification does not map directly onto developer-defined capability level systems, national evaluation frameworks, or sector-specific regulatory classifications such as the FDA's risk categories for AI-enabled medical devices. Organizations deploying AI systems across jurisdictions face the governance overhead of mapping between classification systems that use different criteria to define different numbers of levels with different corresponding obligations.

ISO/IEC 42001:2023 provides partial harmonization through its AI management system standard, which has been adopted by over forty Fortune 500 organizations. The standard establishes common governance processes without prescribing specific safety level definitions, allowing organizations to implement consistent management systems across jurisdictions while adapting to local classification requirements. Efforts toward deeper harmonization continue through bilateral regulatory dialogues, multilateral bodies including the OECD and the Global Partnership on AI, and technical standards development within ISO and IEC working groups.

Accountability at Classification Boundaries

Safety level systems create governance clarity within tiers but raise difficult questions at tier boundaries. When a system falls near the threshold between two safety levels, classification decisions carry significant consequences: the difference between high-risk and limited-risk classification under the EU AI Act determines whether a system faces comprehensive mandatory requirements or merely targeted transparency obligations. Similar boundary effects operate in all tiered systems -- a pathogen near the BSL-2/BSL-3 threshold, a chemical near a GHS category boundary, a nuclear event near the incident/accident classification line.

Mature safety level systems address boundary cases through conservative classification practices, additional review procedures for borderline cases, and appeal mechanisms. AI safety level governance will need equivalent institutional infrastructure: clear protocols for handling systems near classification boundaries, accountability for classification decisions, and mechanisms for challenging or revising assignments as evidence accumulates. These institutional requirements extend well beyond technical capability evaluation into organizational governance, legal liability, and regulatory process design.

Platform Resources

SafeguardsAI.com LLMSafeguards.com AGISafeguards.com GPAISafeguards.com HumanOversight.com MitigationAI.com HealthcareAISafeguards.com ModelSafeguards.com MLSafeguards.com RisksAI.com CertifiedML.com AdversarialTesting.com HiresAI.com

External References

CDC Biosafety in Microbiological and Biomedical Laboratories UN GHS Rev.10 IAEA INES Scale NIST AI RMF 1.0 ISO/IEC 42001:2023