How Spectra Assure scans AI models
As AI models (or LLMs) are increasingly adopted across industries, ensuring their safety, reliability, and compliance is critical. Before being integrated into organizational workflows, each model should be analyzed to uncover potential vulnerabilities and design flaws. By evaluating the model early on, organizations can proactively detect risks and remediate any potential issues.
This is where the Spectra Assure platform plays a key role. During analysis, Spectra Assure detects AI models in a variety of data formats - both standard and solution-specific - by their signature. This allows integrated models to be discovered in the analyzed software and improves risk visibility by revealing how machine learning services are structured and used. Once identified, these models are listed as components in the ML-BOM.
Clearly documented components can help identify risks related to model security, including any potential vulnerabilities or insecure elements that could be exploited. This offers organizations clear visibility into the structure and potential risks of each model before implementation.
This enables the next critical step: thorough testing, to ensure that the model is safe to use, produces reliable outputs, and aligns with the goals and values of your organization.
AI model testingโ
AI safety is the guiding principle behind the testing process that serves as the foundation for trustworthy AI. It defines the expected behaviors and limitations of a model to avoid any harmful, unsafe, or biased outputs. By incorporating safety principles into testing, organizations can assess whether a model operates responsibly and complies with ethical and organizational standards.
Testing not only evaluates whether a model meets these safety expectations before implementation, but also identifies its weaknesses, performance gaps, and scenarios where it may behave unpredictably.
Red-teaming takes standard model testing to a higher level by stress-testing models and simulating targeted attacks to uncover any hidden weaknesses that standard tests might miss. This approach ensures models are resilient to misuse, and are capable of handling unexpected input and withstand real-world risks.
Together, AI safety, model testing, and automated red-teaming form a comprehensive evaluation framework that ensures AI models are responsible, reliable, and safe in practical use.
Spectra Assure supports this process by promptly assessing which models are safe to use by combining malware scanning and SPLX red-teaming data to identify hidden risks.
SPLX reportโ
The SPLX report is developed by Splx. It includes a detailed assessment for each AI model, offering clear insight into its safety, reliability, and overall risk profile. By evaluating the model across key areas - business alignment, hallucinations and trustworthiness, safety, and security - it helps users understand the potential weaknesses of a model and its behaviors in various situations.
In addition to malware scanning, the enhanced SAFE report captures results from simulated attacks, edge-case scenarios, and adversarial prompts, showing how the model performs under stress. Each key area and attack is scored for severity and likelihood, helping teams quantify and prioritize risks. This approach gives the security teams and developers full testing coverage.
By documenting model components, data sources, and test results, the report helps teams stay transparent and accountable. It also provides practical recommendations to improve safety, leading to more trustworthy and reliable AI models.
How does the SPLX report relate to the SAFE report?โ
The purpose of the ReversingLabs SAFE report is to:
- provide a comprehensive inventory of all building blocks of a software product throughout its development lifecycle in the form of an xBOM
- capture the software risk assessment
- give improvement guidance through Levels and mitigation guidance through Issues
While the SAFE report gives organizations visibility into the overall risk landscape of a software, the SPLX report focuses specifically on AI model behavior and performance.
Enhancing the ML-BOM within the SAFE report with SPLX testing data provides a deeper understanding of behavior and security profile of each AI model. In this form, the ML-BOM is no longer only a list of model components and their metadata. It also incorporates assessments based on safety evaluations and red-teaming testing on models used in the analyzed software.
Having all critical information in one place builds trust with users and stakeholders, as it ensures the actions, decisions, and processes of an AI can be traced, explained, and held to specific standards, supporting transparent and secure AI management.
SPLX data in the SAFE reportโ
In the context of Spectra Assure, the SPLX data is included in the SAFE report in the form of an AI Security card.
This card can be found in the expanded Info row on the ML-BOM page.
Currently, this information is displayed only for the following models from Hugging Face:
Models with SPLX data
- google/gemma-3-12b-it
- google/gemma-3-27b-it
- google/gemma-3-4b-it
- google/gemma-3n-E4B-it
- meta-llama/Llama-3.1-70B-Instruct
- meta-llama/Llama-3.1-8B-Instruct
- meta-llama/Llama-3.2-11B-Vision-Instruct
- meta-llama/Llama-3.2-1B-Instruct
- meta-llama/Llama-3.2-3B-Instruct
- meta-llama/Llama-3.2-90B-Vision-Instruct
- meta-llama/Llama-3.3-70B-Instruct
- meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
- meta-llama/Llama-4-Scout-17B-16E-Instruct
- meta-llama/Meta-Llama-3-70B-Instruct
- meta-llama/Meta-Llama-3-8B-Instruct
- openai/gpt-oss-120b
- openai/gpt-oss-20b
- Qwen/QwQ-32B
- Qwen/Qwen2.5-72B-Instruct
If SPLX data is not available for the selected model from Hugging Face (a model that has a pkg:huggingface purl), users can contact the ReversingLabs Support to request testing.
AI Security cardโ
The AI Security card in the SAFE report contains the following information:
Risk score, summarizing model performance, based on the number of simulated attacks and successful attacks across different categories. It ranges 0-100, but the lower the value, the higher the risk:
- Critical risk (< 30) - immediate action is required
- High risk (30 - 59.9) - prioritize mitigation
- Medium risk (60 - 79.9) - monitor closely and address issues
- Low risk (>= 80) - only routine monitoring and improvements are needed
Categories, evaluation areas that reflect key qualities of AI models:
- Business Alignment, measures how well the model supports the goals and values of an organization
- Hallucination & Trustworthiness, assesses the accuracy and reliability of the model's outputs
- Safety, checks the model for any harmful or biased behavior
- Security, tests the resilience of the model to external threats and manipulation
Category-specific data:
Attacks, simulations intended for exposing and identifying particular weaknesses or risks for each category:
- Business Alignment - Intentional Misuse, Off Topic, Legally Binding, Competitor Check
- Hallucination & Trustworthiness - URL Check
- Safety - Fake News, Bias, Profanity, Harmful Content, Pll, Privacy Violation, Cyber Threats, Fraudulent Activities, Illegal Activities
- Security - Data Exfiltration, Jailbreak, Manipulation, Phishing, Context Leakage
Performance score, category-specific score calculated based on the number of failed test cases, their severity, and likelihood, factoring in the risk priority of each attack. Higher scores indicate better model performance
When expanded, the following information for each attack is provided:
- the total number of failed attacks
- the total number of succeeded attacks, where a higher count corresponds to a lower score and higher risk
- performance score, based on the number of succeeded attacks. For example, when the number of succeeded attacks is 170 out of 198 attempted, the performance score for the attack will be 0.5 and the attack is then labelled as Critical risk
To better understand how the SPLX data is visualized in the SAFE report, you can download the example RL-SAFE archive by clicking on the following card:
To work with the RL-SAFE archives, follow the instructions in the SAFE Viewer guide.