TH19101
Detected presence of Pickle serialized data that can execute code.
priority | CI/CD status | severity | effort | SAFE level | SAFE assessment |
---|---|---|---|---|---|
fail | high | high | 1 | tampering: fail Reason: unsafe AI models detected |
About the issueโ
An AI (Artificial Intelligence) model is a mathematical representation of a process that uses algorithms to learn patterns and make predictions based on provided data. After the models are trained, their mathematical representations are stored in a variety of data serialization formats. Stored AI models can be shared and reused without the need for additional model training. Pickle is a popular Python module that many data scientists use for serializing and deserializing AI model data. Pickle is considered an unsafe data format, as it allows Python code to be executed during AI model deserialization. Attackers commonly abuse Pickle and other unsafe data serialization formats to hide their malicious payloads. It was detected that the serialized Pickle data includes Python code that can invoke external scripts and execute arbitrary commands on the computer system that attempts to deserialize the AI model data. While presence of Python code within Pickle serialized data does not always imply malicious intent, its use in an AI model should be documented and approved. It is recommended that any custom actions needed to load the AI model be kept separate from the serialized model data.
How to resolve the issueโ
- Investigate reported detections.
- You should delay the software release until the investigation is completed, or until the issue is risk accepted.
- Consider replacing the Pickle data serialization format with a safer alternative.
Incidence statisticsโ
ReversingLabs periodically collects and analyzes the contents of popular software package repositories for threat research purposes. Analysis results are used to calculate incidence statistics for issues (policy violations) that Spectra Assure can detect in software packages.
This section is updated when new data becomes available.
Total amount of packages analyzed
- RubyGems: 183K
- Nuget: 644K
- PyPi: 628K
- NPM: 3.72M
Total detections per repository
For every repository, the chart shows the number of packages that triggered the software assurance policy. In other words, it shows how many packages in each package repository were found to have the specific issue described on this page. This information helps you understand how common the issue is across different software communities.
If a repository is absent from the chart, that means none of the packages in that repository triggered this policy during analysis, or the policy was not used during analysis.
Distribution of total detections by project popularity
For every repository, the chart shows how many of the total detections belong to the Top 100 (1-100), Top 1000 (101-1000) and Top 10 000 (1001-10 000) most downloaded projects. This information helps you understand the impact of the issue within each community, making it clearer when the issue affects the most popular projects.
If the chart shows zero values for all of the top project groups, that means all detections were in unranked projects (lower than 10 000 on the list of most downloaded projects).
Recommended readingโ
- pickle โ Python object serialization (External resource - Python documentation)
- OWASP Top 10 for LLMs and Generative AI Apps (External resource - OWASP.org)
- Paws in the Pickle Jar: Risk & Vulnerability in the Model-sharing Ecosystem (External resource - Splunk)
- Guidelines for secure AI system development (External resource - UK National Cyber Security Centre (NCSC))