Scaling Up Supply Chain Security: Implementing Sigstore for Seamless Container Image Signing
Note: Verizon Media is now known as Yahoo.
On an average day, Yahoo’s internal container registry builds and pushes around five thousand new container images. Those images represent roughly 60,000 daily builds — comprising more than 700 clusters and 100,000 pods.
Unfortunately, container images, like physical packages in the real world, can be tampered with during transit.
And the solution, while obvious, is complex to implement. Digital signatures provide a way to verify the image's authenticity. The use of those signatures is — in many industries, such as healthcare — often bound by strict regulatory requirements.
But getting that done often requires internal built or vendor tooling. Tooling that is neither transparent nor easy to use. And most importantly, tooling that is expensive to maintain.
With that in mind, the Paranoids have taken a different path. The group’s engineering outfit spent the past several months contributing to an open-source project for digitally signing software components: Sigstore.
Sigstore is an OpenSSF graduated project backed by the Linux Foundation. In 2021, in concert with the project announcement, Google described Sigstore like this:
The mission of Sigstore is to make it easy for developers to sign releases and for users to verify them. You can think of it like Let’s Encrypt for Code Signing. Just like how Let’s Encrypt provides free certificates and automation tooling for HTTPS, sigstore provides free certificates and tooling to automate and verify signatures of source code. Sigstore also has the added benefit of being backed by transparency logs, which means that all the certificates and attestations are globally visible, discoverable, and auditable.
In this blog post, we will explore how Yahoo leverages Sigstore, in concert with Athenz, an open source platform for managing X.509 certificates, as an internal Certificate Authority, to sign and verify container images.
And, most importantly, how you can implement this within your organization using open source tools
What is Sigstore?
As mentioned in the Sigstore docs, “Sigstore combines several different technologies that focus on automatic key management and transparency logs” such as Cosign, Fulcio and Rekor.
Cosign is used to sign OCI containers and other artifacts. In a typical signing flow, the user or workload invoking cosign uses a key or code signing certificate to sign an artifact. The signature and associated metadata is then published to a transparency log.
Fulcio is a free code signing Certificate Authority built to make short-lived certificates. The diagram below shows a typical flow where a user or a workload can request a code signing certificate associated with a private key from Fulcio. It uses SPIFFE, which is one of the ways a workload identity can be expressed to Fulcio.
Rekor is a transparency log aimed to provide an immutable, tamper-resistant ledger of metadata generated within a software project’s supply chain. The diagram below shows cosign submitting the code signing certificate and an artifact signature to Rekor which returns a signed bundle including everything in the request along with the timestamp.
We were exploring ways to move away from static long-lived keys for signing container images. Cosign 2.0 release in February this year, with keyless signing for container images becoming a fully supported feature, made it a promising solution for us to dig deeper.
The core property that makes the solution “keyless” is that the key used to sign the artifact is extremely short-lived, think minutes. The private key is meant to be discarded right after the artifact is signed.
There were some challenges in using cosign keyless signing capability as is, so we made some adjustments to deploy this at Yahoo. We had to figure out how to handle interactions with Fulcio and Rekor, especially setting up both these tools internally and then configuring the signing and verification processes to replace public instances. While the initial setup is anticipated to be straightforward, challenges arise in the second phase. Integrating Fulcio with existing internal Certificate Authorities poses complexities, given the likelihood of companies already having established CAs. Additionally, managing Rekor at scale presents difficulties due to the increasing demands on the Relational Database Service (RDS) database and the potential lower priority of transparency in artifact production and consumption within the company. Navigating these challenges requires careful consideration and possibly custom solutions to ensure a seamless transition to utilizing Fulcio and Rekor for internal signing and verification processes.
At Yahoo, we are already using Athenz which acts as an internal CA. Introducing Fulcio as another CA and having it co-exist with Athenz seemed unnecessary. The container images built internally are used within Yahoo infrastructure. Hence, having a transparency log of artifact signatures like Rekor, which is a great property to have, but not a core requirement for enabling keyless signing.
We then came across this excellent talk by Nathan Smith at Chainguard - "Keyless" Code Signing Without Fulcio which demonstrated that we primarily need an Identity provider, Certificate Authority and Timestamp Authority. This provided a blueprint for us to integrate cosign with existing Yahoo infrastructure in a short span.
Running cosign internally
Based on Nathan’s talk and our investigation, we came up with this 4 step approach to enable keyless signing without using Fulcio and Rekor.
Step 1: Deploy a Timestamp Authority
Timestamp Authority (TSA) is a service for issuing RFC3161 timestamps. It provides a timestamp record of when a document was created or modified. Rekor provides signature transparency for artifacts signed with cosign.
With keyless signing, since the key is short-lived, we need something to attest that the key (and the certificate associated with it) was valid at the time the artifact was signed. While Rekor does provide this timestamping capability, for keyless signing, that is the only essential part. When using keyless signing, a signature transparency log provides marginal benefits over using just a TSA, and did not appear to meet our tradeoffs of operational complexity.
Sigstore provides an open-source implementation of TSA.
https://github.com/sigstore/timestamp-authority/#cloud-kms includes instructions on deploying a TSA on AWS or GCP. If you have your own internal root CA with which you need to sign the TSA CA certificate, there are a couple of one-time manual steps needed:
1.1 Generate a CSR
While generating a CSR, according to RFC2459 the certificate must contain only one instance of the Extended Key Usage (EKU) attribute and it is critical. A sample CSR is as shown below.
Go code to set the EKU:
Full example can be found here.
1.2 Issue a CA certificate signed by the internal root CA
A sample CA certificate is as shown below
While deploying the TSA, we discovered an issue with the /ping endpoint when enforcing mTLS which Dmitry was able to resolve promptly and got it merged upstream.
While trying to issue a TSA CA certificate, we discovered and fixed a bug in crypki around setting the EKU attribute.
Step 2: Enable build infrastructure to fetch ephemeral code signing certificates
This might be the most challenging step unless you already have a CA which can authenticate the build job or workload and issue a certificate associated with it.
At Yahoo, we use Screwdriver as the primary platform for building, testing, publishing to artifact registry and deploying software. All the container images are built and published via a Screwdriver job. We use Athenz as the platform for various authentication and authorization needs. Moreover, each Screwdriver job gets an Athenz identity certificate associated with itself.
Introducing Fulcio as a second CA in spite of having Athenz which currently acts as an internal CA would unnecessarily complicate the system.
Similar to how GitHub acts as a OIDC provider for Actions, we added similar capability with Athenz and Screwdriver. Screwdriver requests an ID token from Athenz by providing its identity certificate. The ID token is of the form:
An ID token sent when requesting a code signing certificate is used to authenticate the sending screwdriver job. Similar to Fulcio, Athenz CA issues a code signing certificate.
Key attributes of the code signing certificate are:
- Validity is short (15 minutes)
- The value for “Extended Key Usage” attribute is set to “Code Signing”
- The SPIFFE URI in the certificate maps to the identity of the screwdriver job.
To summarize, the build system should be able to communicate with an OIDC provider which can issue access tokens. And the internal CA should be able to validate the sender’s token and issue a short-lived certificate with appropriate attributes set.
Step 3: Add cosign sign in the container build and publish flow
Thanks to the amazing work done by the sigstore community, this is the easiest step. All we need to do is add the “cosign sign” command right after the container image is built and published to the registry.
Since we are using keyless signing prior to executing “cosign sign”, the build job needs to use the OIDC token received from the OIDC provider and fetch code signing certificate from the CA.
A sample “cosign sign” invocation would look like this:
Note the parameters set for specifying certificate chain for code signing CA and TSA.
Also note that --tlog-upload is set to false since we are NOT using Rekor.
The build system typically authenticates or logs in to the OCI registry prior to uploading the image. If signing happens immediately after uploading the container image in the same workflow, no extra authentication is required. However, if the signing step is executed in a separate workflow, you would have to arrange the authentication to the registry prior to executing the cosign sign command.
The overall design for the signing part is demonstrated below.
If you are using a different build platform like GitHub Actions, cosign sign command remains the same. You just need to set the CA and TSA related parameters correctly.
cosign sign was missing the ability to specify mTLS connection attributes, which again Dmitry was able to fix and contribute upstream. Thanks Dmitry!
Step 4: Add signature verification step in container image pull and deployment flows
Having a robust signing mechanism solves only part of the problem. Ensuring the verification process is simple and ubiquitous is equally important.
For signature verification, we typically need to ship different kinds of data, possibly from different sources, to the verifier. This data includes time stamped signature, signing key, metadata associated with the artifact and the artifact itself. On the other hand, for a container image signed with cosign, the signature and certificate are attached to the container image, and are stored in the OCI registry itself.
How does cosign verification currently work with Rekor?
In the verification process, the first step involves confirming the signature of the bundle by using the public key of Rekor. If successful, the bundle is trusted, signifying the timestamp's authenticity. Subsequently, the code signing certificate is examined for validity, ensuring it is linked to the designated certificate authority (CA). Notably, the certificate's timestamp is checked against its validity window rather than the current time. Following this, the artifact's signature is verified using the extracted public key from the code signing certificate. After peeling all layers, the user's identity, represented by the screwdriver pipeline ID, is revealed, and a comparison is made with the expected pipeline ID to finalize the verification process.
How does cosign verification work with TSA?
The verification remains quite similar to what is currently done when verifying with Rekor with a small caveat. Since we are using keyless signing with TSA, the “cosign verify” command needs to use the TSA certificate & code signing CA certificates for verification.
A sample “cosign verify” invocation would look like this:
The verification of the signature of the container image should be done prior to using it. If the container image is used in kubernetes, verification can be done using the native kubernetes validating admission controller webhook - the webhook can accept or reject the image deployment based on verification result.
How to verify container images in K8s admission webhook?
You can use the official sigstore/policy-controller webhook if starting out.
TrustRoot can be configured as shown below:
ClusterImagePolicy can be configured as shown below:
If you are writing your own admission webhook like us to verify the cosign signature, you need to properly set the CheckOpts struct in the Cosign Go library. Some caveats:
- Ensure the admission webhook has permission to read the manifest & image signature from the OCI registry, i.e. set RegistryClientOpts properly.
- Set appropriate TSACertificate, TSARootCertificates and TSAIntermediateCertificates.
- Since there is no transparency log, set IgnoreTlog flag to true.
After setting the CheckOpts correctly, you can invoke VerifyImageSignatures to verify the signature.
You can use the official sigstore/policy-controller webhook if starting out & set the appropriate parameters for TSA and code signing certificate. (Note: Since we have a custom admission webhook, we have not tried the policy-controller webhook, and if you do find any issues please reach out to the community).
Similar to Kubernetes Webhook, we can add signature verification at appropriate hooks in other ecosystems where containers are deployed like serverless or ECS.
While the Paranoids have made strides with open source technologies — such as Cosign and Sigstore — there is more work to be done to fully realize a widely shared vision of end-to-end integrity and provenance.
To that end, we’re grateful to everyone who made the Paranoids’ implementation of Sigstore possible and looking forward to collaborating with the Sigstore community on future efforts such as signing non-container artifacts.
We’d like to thank:
- Our colleagues: Abhijeet Vaidya, Dean Sutherland, Dmitry Savintsev, Nate Burton, Rohit Chaudhary, Shiva Kodityala, and Yonghe Zhao for their contributions and support towards this initiative.
- As well as Hayden Blauzvern, Nathan Smith and Zachary Newman from the Sigstore community for their guidance and feedback.
If you’re interested in pursuing supply chain security initiatives at your company and would like to learn more about our experience, please reach out to email@example.com or you can reach us on #private-sigstore-users channel on the Sigstore Slack.
About the Authors
Aditya Mahendrakar is an Engineering Director in the Paranoids group at Yahoo. In his current role, he leads a team that researches and builds capabilities to disrupt the attacker lifecycle.
Hemil Kadakia is an Engineering Manager in the Paranoids group at Yahoo. He is currently leading the supply chain security initiative at Yahoo and likes developing tools for making developers’ lives easier.