TH19101

Detected presence of serialized data formats that can execute code.

priority	CI/CD status	severity	effort	SAFE level	SAFE assessment
	fail	high	high	1	tampering: fail Reason: unsafe AI models detected

About the issue

An AI (Artificial Intelligence) model is a mathematical representation of a process that uses algorithms to learn patterns and make predictions based on provided data. After the models are trained, their mathematical representations are stored in a variety of data serialization formats. Stored AI models can be shared and reused without the need for additional model training. Pickle is a popular Python module that many data scientists use for serializing and deserializing AI model data. Pickle is considered an unsafe data format, as it allows Python code to be executed during AI model deserialization. Attackers commonly abuse Pickle and other unsafe data serialization formats to hide their malicious payloads. It was detected that the serialized data includes Python code that can invoke external scripts and execute arbitrary commands on the computer system that attempts to deserialize the AI model data. While presence of Python code within serialized data does not always imply malicious intent, its use in an AI model should be documented and approved. It is recommended that any custom actions needed to load the AI model be kept separate from the serialized model data.

How to resolve the issue

Investigate reported detections.

You should delay the software release until the investigation is completed, or until the issue is risk accepted.

Consider replacing the selected data serialization format with a safer alternative.

Incidence statistics

ReversingLabs periodically collects and analyzes the contents of popular software package repositories for threat research purposes. Analysis results are used to calculate incidence statistics for issues (policy violations) that Spectra Assure can detect in software packages.

This section is updated when new data becomes available.

Total amount of packages analyzed

Linux: 562K
NPM: 5.12M
Nuget: 735K
PS Gallery: 17K
PyPi: 838K
RubyGems: 203K
VS Code: 113K
Windows: 3.7K

Total detections per repository

For every repository, the chart shows the number of packages that triggered the software assurance policy. In other words, it shows how many packages in each package repository were found to have the specific issue described on this page. This information helps you understand how common the issue is across different software communities.

If a repository is absent from the chart, that means none of the packages in that repository triggered this policy during analysis, or the policy was not used during analysis.

Distribution of total detections by project popularity

For every repository, the chart shows how many of the total detections belong to the Top 100 (1-100), Top 1000 (101-1000) and Top 10 000 (1001-10 000) most downloaded projects. This information helps you understand the impact of the issue within each community, making it clearer when the issue affects the most popular projects.

If the chart shows zero values for all of the top project groups, that means all detections were in unranked projects (lower than 10 000 on the list of most downloaded projects).

Detected presence of serialized data formats that can execute code.

About the issue​

How to resolve the issue​

Incidence statistics​