xmlsec signing broken on RedHat 8

Thu, 10 Nov 2022

I recently spent a long day and night debugging a SAML login issue over at Squirro. Eventually we found out that the xmlsec version shipped with RedHat Enterprise Linux 8 is broken for signing.

Error Introduction

As part of our Single Sign-On implementation we use pysaml2. That library in turns depends on xmlsec, a library supporting XML signature and encryption. In the deployment we were debugging we saw that the xmlsec1 call was erroring out:

$ /usr/bin/xmlsec1 --sign --privkey-pem privkey_file --id-attr:ID urn:oasis:names:tc:SAML:2.0:protocol:AuthnRequest --node-id id-qu56UdhaX0ojT90eL --output /tmp/tmpg48_9_ls.xml /tmp/tmpqusxu0my.xml
func=xmlSecOpenSSLEvpSignatureExecute:file=evp_signatures.c:line=466:obj=rsa-sha1:subj=EVP_SignFinal:error=4:crypto library function failed:openssl error: 67526835: rsa routines: rsa_ossl_private_encrypt missing private key
func=xmlSecTransformDefaultPushBin:file=transforms.c:line=1921:obj=rsa-sha1:subj=xmlSecTransformExecute:error=1:xmlsec library function failed:final=1
func=xmlSecTransformIOBufferClose:file=transforms.c:line=2547:obj=rsa-sha1:subj=xmlSecTransformPushBin:error=1:xmlsec library function failed:
func=xmlSecTransformC14NPushXml:file=c14n.c:line=243:obj=exc-c14n:subj=xmlOutputBufferClose:error=5:libxml2 library function failed:xml error: 0: NULL
func=xmlSecTransformCtxXmlExecute:file=transforms.c:line=1037:obj=exc-c14n:subj=xmlSecTransformPushXml:error=1:xmlsec library function failed:
func=xmlSecDSigCtxProcessSignatureNode:file=xmldsig.c:line=550:obj=unknown:subj=xmlSecTransformCtxXmlExecute:error=1:xmlsec library function failed:
func=xmlSecDSigCtxSign:file=xmldsig.c:line=286:obj=unknown:subj=xmlSecDSigCtxSignatureProcessNode:error=1:xmlsec library function failed:
Error: signature failed
Error: failed to sign file "/tmp/tmpqusxu0my.xml"

Summary of Findings

What we eventually found out is that the xmlsec package version 1.2.25, which is shipped with RedHat Enterprise Linux 8, has a bug when signing documents. The bug manifests itself only when the private key being used has a corresponding public key in the system’s certificate registry (usually /etc/pki/tls/cert.pem).

The bug was fixed as part of the 1.2.26 package. The actual patch is a one-line change.

Fixes

There are currently two known fixes for this problem:

  1. Install an updated version. Unfortunately this is currently not readily available in RedHat Enterprise Linux 8, so a patched package has to be built manually.
  2. Remove the matching public keys from the local certificate file. This will result in error messages (“certificate verification failed”) but the signing succeeds regardless.

Investigation Details

This section includes some information that has already been included above,
but in somewhat more detail.

When signing, an xmlsec1 command is executed as follows:

/usr/bin/xmlsec1 --sign --privkey-pem privkey_file --id-attr:ID urn:oasis:names:tc:SAML:2.0:protocol:AuthnRequest --node-id id-qu56UdhaX0ojT90eL --output /tmp/tmpg48_9_ls.xml /tmp/tmpqusxu0my.xml

In our initial investigation we thought this could be due to some security hardening around crypto policies.
This was supported by the problem not appearing when testing in a different environment using Rocky 8.
However setting the crypto policies to LEGACY did not solve the issue.
Additionally upon further debugging we were able to reproduce the issue also on Rocky 8 without any hardening having been applied.

The next clue was that the problem did not appear when the corresponding public key for the provided private key could not be found.
In that case the following error would appear but the signing actually took place (some internal information redacted):

func=xmlSecOpenSSLX509StoreVerify:file=x509vfy.c:line=341:obj=x509-store:subj=unknown:error=71:certificate verification failed:X509_verify_cert: subject=…; issuer=…; err=20; msg=unable to get local issuer certificate
func=xmlSecOpenSSLX509StoreVerify:file=x509vfy.c:line=380:obj=x509-store:subj=unknown:error=71:certificate verification failed:subject=…; issuer=…; err=20; msg=unable to get local issuer certificate

These combined findings allowed me to debug the problem better by reviewing the xmlsec and OpenSSL source codes (versions 1.2.25 and 1.1.1k as used in RedHat Enterprise Linux 8).
Not having previously been exposed to the source code of either of those libraries, this made for an interesting experience.
The problem was eventually pinpointed to an xmlsec function called xmlSecOpenSSLKeyDataX509VerifyAndExtractKey which seems to assign a public certificate into a structure which later expects a private key.

Armed with this information, I then tested other versions of the library and quickly found that xmlsec 1.2.26 was no longer affected by this problem.
A few steps of git bisect later I finally found the patch for this problem, which had been authored two days after the release of 1.2.25.

“select” isn’t broken – except when it is

As a debugging project this one was a lot of fun. Throughout my work I have always kept this Pragmatic Programmer rule in mind:

Remember, if you see hoof prints, think horses not zebras. The OS is probably not broken. And select is probably just fine.

I always assume my own code to be the culprit of any bug until proven otherwise. In this particular instance it turned out different for once. I could definitely have saved a lot of time by testing other xmlsec versions earlier, rather than narrowing in on the exact dependencies I was dealing with. And initially I was a bit upset with myself for not doing exactly this.

But upon further reflection I had to remind myself that this mindset has certainly saved me much more debugging time in all the instances where it was indeed my code that was broken.