Reproducibility checks
Reproducibility checks help users create and distribute secure software by ensuring their software packages have not been tampered with.
With the reproducibility checks feature, users can compare different build artifacts for any version of their software. The comparison is based on detected software behaviors and functionalities. If a reproducibility check fails, detailed analysis reports warn about suspicious and potentially dangerous differences between analyzed build artifacts.
Software integrity is one of the cornerstones of software supply chain security. Every software publisher needs to be able to ensure the code they developed is the same as the code that is later compiled and distributed. The origin of every build artifact must be traceable to allow for authenticity validation, but the challenges of software provenance do not end there. Software publishers must also ensure the software they release and deliver to end-users has not been tampered with along the way.
The more elements there are in the process, the bigger the attack surface and the greater the risk of compromise. Safeguards against software tampering can be implemented at every stage of the software development lifecycle (SDLC) with varying levels of success and reliability. At the very least, developer accounts, source code, build systems, and build artifacts must be protected in some way. From improving code practices and management controls to build system hardening and monitoring, ultimately it's all about establishing trust in the software.
However, trusting the source code does not imply trusting its executable counterparts. Perfectly safe code can be compiled on compromised servers, resulting in malicious build artifacts being distributed and infecting other systems. The problem is that software changes made by compromised build infrastructure generally cannot be detected through source code scanning or composition analysis. The solution is to incorporate binary analysis into the build stage itself, so that every artifact is checked for potential tampering.
The Spectra Assure platform can check build artifacts for reproducibility to reveal if any of the build systems have been targeted by a software supply chain attack. This capability to perform reproducibility checks is ideal for organizations and teams that already use reproducible builds as part of their process.
What are reproducible builds?โ
The concept of reproducible builds refers to the ability to always create the same binary output regardless of where and when the source code is compiled. The goal of this software development approach is to ensure an independently verifiable path from source to binary code.
Depending on the industry and the context, the term "reproducible builds" may have slightly different definitions and interpretations. One of those interpretations states that a build is reproducible only if anyone can create bit-by-bit identical copies of artifacts from the same source code and build instructions. Terms like "deterministic builds", "hermetic builds", and "binary reproducible builds" are also used to describe reproducible builds in this sense.
Another interpretation uses the term "repeatable builds" and focuses more on the build process itself. The point here is that the same build steps always happen regardless of the build environment, but the resulting artifacts do not need to be bit-by-bit identical.
In practice, "perfect reproducibility" - completely identical build artifacts with identical file hashes - is rare and not fully supported by most programming languages and frameworks. When using multiple build systems, non-deterministic elements can occur even if all build inputs are accounted for. For example, there may be differences in timestamps due to time zone settings or configuration that uses the latest commit time instead of build time. Locale settings, file ordering and file paths can also cause differences resulting in build artifacts that are not strictly reproducible.
Instead of expecting bit-by-bit identical build artifacts, organizations and teams that use reproducible builds usually look for a degree of equivalence. This aligns with the Spectra Assure approach to reproducible builds, which is focused on functional equivalence.
The goal of Spectra Assure reproducibility checks is not necessarily to confirm that a build produces the exact same output when repeated. Their purpose is more specific and security-oriented: confirming that the same code compiled during separate builds produces the same software behaviors. This approach automates tampering detection and build system integrity validation without having to tinker with CI/CD tooling to create bit-by-bit reproducible builds.
How do reproducibility checks work?โ
Reproducibility checks rely on the existing file diffing (file comparison) capability of the Spectra Assure platform and extend it to accommodate the reproducible builds use-case where the compared files are expected to be functionally identical.
More specifically, reproducibility checks are an optional step in the Spectra Assure binary analysis that focuses on differences in software behaviors between compiled artifacts of a software version.
To perform a reproducibility check, all Spectra Assure products require exactly two files to compare against each other. Those files should be created by building the same version of source code twice, on different build systems. In this documentation, we refer to those files as artifacts or build artifacts.
Artifactsโ
Every software version added to a Spectra Assure project and package must have at least one artifact. When working with Spectra Assure products, you will usually scan a file and add it to a project and package as a new version. If you want to use reproducibility checks, you have to scan another file built from the same code and associate it with the previously added version as its reproducible build artifact.
In this context, the file you scanned first is the main software version artifact. The main artifact is usually your "representative" software package for that particular version, or even the one you would release and distribute.
The second file you scanned - the reproducible build artifact - is compared against the main artifact. This is why the main artifact is sometimes also called "base", "reference", or "referential". The reproducible build artifact is sometimes referred to as "modified", especially when many changes that may indicate tampering are detected.
Behaviors and functional changesโ
In the context of Spectra Assure analysis, "software behaviors" refers to static behavioral indicators - human-readable descriptions that translate underlying code actions into effects that those actions could have on the machine that runs them.
A few examples of software behaviors as identified and described by Spectra Assure are:
- "Writes to files in Windows system directories."
- "Deletes a file/directory."
- "Tampers with keyboard/mouse status."
Although behaviors can account for the majority of substantial code changes, software development processes and build systems may still have gaps that facilitate code injection. To detect this type of issue, reproducibility checks rely on functional similarity hashes to examine build artifacts for structural and functional changes.
Detecting software behavior differences is an effective way to uncover new malware or stealthy software supply chain attacks on CI/CD environments and build systems. Testing for functional equivalence makes it easier to confirm the absence of malicious code injections, regardless of where the build is performed (a dedicated production machine, a CI instance, or even a developer's laptop).
Changes listed in the following table are detected and reported as part of the reproducibility check.
Change type | Meaning of "Added" | Meaning of "Removed" | Meaning of "Changed" |
---|---|---|---|
Classification | Reproducible build artifact contains a malicious file | Reference artifact contains a malicious file | Classification for a file changed between artifacts |
Format | File in a specific format appears only in the reproducible build artifact | File in a specific format appears only in the reference artifact | File has changed format from X in the reference artifact to Y in the reproducible build artifact |
Functionality change | N/A | N/A | File changed its low-level code signature |
Hash | File appears only in the reproducible build artifact | File appears only in the reference artifact | File content changed between reproducible build artifact and reference artifact |
Behavior | Behavior appears only in the reproducible build artifact | Behavior appears only in the reference artifact | N/A |
Issue | Issue affects only the reproducible build artifact | Issue appears only in the reference artifact | N/A |
Tag | Tag example-tag appears only in the reproducible build artifact | Tag example-tag appears only in the reference artifact | N/A |
Vulnerability | Vulnerability CVE-EXAMPLE affects only the reproducible build artifact | Vulnerability CVE-EXAMPLE affects only the reference artifact | N/A |
Entropy | N/A | N/A | File entropy change exceeded the allowed maximum |
While file size differences are detected and included in analysis reports, they do not impact the reproducibility check outcome.
Analysis steps and reportsโ
When you use reproducibility checks, Spectra Assure products perform several analysis steps:
- Software package analysis - scan of the main package version artifact.
- Software package analysis: repro - scan of the reproducible build artifact
- Diff with: repro - comparison of the main artifact and its reproducible build artifact
- Reproducible build check - functional and behavioral similarity check between the main and the reproducible build artifacts
Results of those steps are visible in analysis reports and in the output of rl-secure
commands that support it.
- Commands with reproducibility support
- Report types with reproducibility support
The following rl-secure commands can retrieve information about reproducibility checks:
The following report types include details about reproducibility checks:
- rl-checks - provides a summary of reproducibility checks as a JSON file structured according to the rl-checks report schema
- SAFE report (rl-html) - shows all detected differences in the Reproducibility page on the left-hand side of the report
Because the reproducible build artifact is analyzed like any other standalone file, you can export and view the analysis reports for it separately from the reports for the main artifact.
The following report types can be exported for the reproducible build artifact:
- CycloneDX
- rl-cve
- rl-html
- rl-json
- rl-uri
- SARIF
- SPDX
You can only create the rl-checks
report for the main artifact, and it will include a summary of the reproducible build check.
What causes a reproducibility check failure?โ
When performing reproducibility checks on build artifacts, Spectra Assure relies on universal supply chain attack detection heuristics.
Any behavior differences between two supposedly equivalent artifacts of a software version will be reported as a failed reproducibility check (RB:FAIL
).
Specifically, the following types of unwanted or unexpected changes are identified as suspicious and result in the reproducibility check failure:
- Behavior changes
- Hash changes on scripts and source files
- Low-level code signature changes
- File format and classification changes
- Vulnerabilities
- Policy violations (issues)
- Entropy changes
Generally, the reproducibility check passes when the main artifact and the reproducible build artifact are functionally identical.
In practice, some changes between artifacts are allowed, and the reproducibility check passes despite those changes.
For example, if the artifacts have the same behaviors, but one was recompiled at some point, the only differences between the two are in their timestamps and hashes. In this case, the reproducibility check passes, as those differences are not considered critical.
How are diffs and reproducibility checks different?โ
The primary difference between reproducibility checks and diffs is their purpose.
Diff compares two artifacts of different software versions to highlight their differences and help you understand what has changed in the newer version.
Reproducibility check compares two artifacts of the same software version to ensure there aren't any significant, unexpected differences between them.
Another difference is that some types of changes are treated as problematic in reproducibility checks, but not in diffs. They can cause the reproducibility check to fail, while the diff will only report them as any other change.
Why should your organization use reproducibility checks?โ
To gain the most value from Spectra Assure reproducibility checks, your organization or team should first implement reproducible builds into the SDLC.
Although reproducible builds introduce an extra step in your build process (the concept does require building your code at least twice), it's a step worth taking because of the following advantages:
Verify the build step wherever it happens - whether it's on the developer's machine or the CI system; on different build systems or on different configurations of the same system.
Speed up testing and improve quality assurance processes - with a deeper insight into differences between build artifacts, your teams can focus on parts of code that changed while preserving confidence in the unchanged ones.
Increase trust in signatures - binaries can be signed during the build step even if they contain malware, which makes tools that identify tampering by relying primarily on signatures inadequate. Reproducibility checks can make sure the binaries haven't been tampered with during the build, prior to signing.
Facilitate certification and business partnerships - the ability to verify software integrity across versions can reduce the need for code audits and improve your organization's score in certification processes. It also provides additional assurance to partners and customers when preparing to deploy your software in their business or in safety-critical environments.
Fortify your brand image - software integrity is not only a security matter. It also affects your reputation. Compromise in your software supply chain can endanger partnerships, investments and customer relations. On the other hand, consistently releasing secure software positively impacts the public perception of your products and your organization in general.
Save resources and reduce costs - reproducibility checks do not require preserving build artifacts for long periods of time, and they're fully compatible with ephemeral, short-lived build systems. This allows for modern, flexible pipelines that can be deployed and dismissed on demand instead of continually consuming resources and remaining open to risk of tampering and persistent threats.
It's important to remember that reproducibility checks alone cannot protect your software from all types of supply chain attacks. Security issues where the human factor plays a major role, such as:
- Insider attacks on source code
- External breaches of source code repositories due to insecure settings
- Typosquatted or malicious open source components
- Misconfigured infrastructure
- Hijacked automated update processes
will require you to rely on other Spectra Assure capabilities or additional third-party solutions to detect and prevent them.
Even the safest build system won't achieve much if the rest of the tools and processes around it are vulnerable or compromised. Improvements should always circulate and propagate through all parts of your organization to maximize your software security and quality.
The following checklist offers quick advice to help you audit your current build systems and prepare the ground for better practices with reproducibility checks.
Best practices for build system security
Build systems should be separate and isolated from each other to reduce the chance of attackers gaining access to them.
The build process should be transparent so that all relevant parties have insight into what exactly happens during each run.
Every new build should start in a fresh environment to prevent potentially compromised environments from persisting and affecting future builds.
Organization policies and team SOPs (Standard Operating Procedures) should be established and enforced to prevent accidental or unauthorized changes to branch protection rules, build configurations, CI/CD settings and other Infrastructure-as-Code components.
How to use reproducibility checksโ
Reproducibility checks are supported in the following Spectra Assure products:
- CLI - starting with
rl-secure
1.4.0 - Portal - only in Projects
- API - in endpoints: "Upload and scan a version", "Show analysis status", "Export analysis report", "Delete a version"
Reproducibility checks do not count towards the monthly analysis capacity.
To run reproducibility checks on your software, you have to:
- create two build artifacts for a version of your code
- scan the artifacts with Spectra Assure and associate them with the same project, package, and version
Different scenarios are supported - you may want to add a reproducible build artifact for a version that you scanned some time ago, or you may want to scan a new version and its reproducible build right away.
Every version can have only one reproducible build artifact at a time.
If a version already has a reproducible build artifact and you want to scan another one, you must first remove the existing reproducible build artifact.
Basic workflowโ
The following workflow lists the basic steps for performing reproducibility checks across the Spectra Assure platform.
1. Scan the main artifact for a software versionโ
In this step, you're scanning the first artifact with Spectra Assure. The artifact is added to a project and package as a new version.
Product | Workflow step |
---|---|
CLI | Use the rl-secure scan command. Example: rl-secure scan /home/armando/my-package.jar pkg:rl/my-project/my-package@1.0.1 |
Portal | Select or create a project and package in your Portal organization and group, then upload a version |
API | Send a POST request to the Upload and scan a version endpoint |
2. Scan the reproducible build artifact for the same software versionโ
In this step, you're scanning the second artifact for a previously created package version.
You must use the same package URL (project, package, and version) as in the previous step and add the reproducibility check parameter.
The build=repro
parameter is used to identify the scanned file as a reproducible artifact of the package version.
The reproducible build artifact is automatically compared against the main artifact. Any significant functional differences between the artifacts fail the reproducibility check and are listed in the analysis reports.
If the package version already has a reproducible build artifact, it won't be possible to scan another one. Remove it first, or use the parameters available in your Spectra Assure product for replacing an artifact during scanning.
Product | Workflow step |
---|---|
CLI | Use the rl-secure scan command to add the artifact to the version with the ?build=repro parameter. Example: rl-secure scan /home/armando/my-package-repro.jar pkg:rl/my-project/my-package@1.0.1?build=repro |
Portal | In the Releases table, access the version uploaded in the previous step. Expand the Info field, then select Upload Reproducible Build |
API | Send a POST request to the Upload and scan a version endpoint with the build=repro query parameter |
3. View reproducibility check resultsโ
In this step, you're confirming that the reproducibility check completed successfully by viewing the status of the main artifact.
The reproducibility check is listed in your Spectra Assure product as one of the checks performed during analysis, and its status is either PASS
or FAIL
.
The PASS
status doesn't necessarily mean the artifacts are bit-by-bit identical.
There may still be differences between the artifacts, but the reproducibility check can pass.
Product | Workflow step |
---|---|
CLI | Use rl-secure checks to view only checks performed for the main artifact: rl-secure checks my-project/my-package@1.0.1 or use rl-secure list to view more detailed analysis results alongside checks: rl-secure list pkg:rl/my-project/my-package@1.0.1 --risks --checks |
Portal | In the Releases table, access the version to which you uploaded the reproducible build artifact in the previous step. Then expand the Info field for a summary of all performed checks |
API | Send a GET request to the Show performed checks endpoint. In the response, look for values of properties in analysis.report.info.summary and analysis.report.scans.scan-repro.checks objects |
4. Examine the analysis report for reproducibility differencesโ
Depending on the reproducibility check status, you may want to examine the results in more detail.
The best way to do this is to export the analysis report for the main artifact in the SAFE report (rl-html
format) and access the Reproducibility page on the left-hand side of the report.
The Reproducibility page indicates the reproducibility check status and shows a summary of differences between the reproducible build artifact and the main artifact ("Reference Version" in the report).
If the reproducibility check failed, changes that caused the fail are listed in the report and can be filtered by type. If any files have been modified between the main artifact and the reproducible build artifact, they are listed in the report and can be filtered by file change type (added, removed, changed, behavior change only).
Product | Workflow step |
---|---|
CLI | Use the rl-secure report command to export the SAFE report with the ?build=repro parameter. Example: rl-secure report rl-html pkg:rl/my-project/my-package@1.0.1?build=repro In the SAFE report, access the Reproducibility page from the sidebar on the left-hand side |
Portal | In the Releases table, expand the Info field for the version. Find Reproducible build check in the summary of checks and select Details to open the report |
API | The SAFE report cannot be downloaded with the API. Send a GET request to the Export analysis report endpoint with the build=repro query parameter and the report_type parameter set to rl-checks |
After reviewing the analysis report, consider investigating the root cause in your development pipeline. If you make some adjustments to your build configuration or want to test another build artifact of the same software version, remove the previously added reproducible build artifact and scan a new one. You can also look into automating reproducibility checks and adding them as a mandatory step in your CI/CD process.
Recommended readingโ
The Reproducible Builds project (External resource)
Best practices for securing your build system (External resource - GitHub)
Chris Lamb, Stefano Zacchiroli - Reproducible Builds: Increasing the Integrity of Software Supply Chains (External resource - PDF document)
Butler, S., Gamalielsson, J., Lundell, B. et al. - On business adoption and use of reproducible builds for open and closed source software (External resource)