Machine Learning Security in the age of Supply Chain Attacks
As can be seen from the recent “xz attack” discovery that there nation states have realized that this is likely the “best” vector to impact large-scale systems in big organizations. With the cloud computing providers being the “source of computing” for most large corporations today, we should anticipate that a larger portion of the attacks will fall into this category. Also, just like “sleeper cells” in traditional espionage, such “sleepers” may exist in numerous OSS projects. Does that mean we should stop using open source – hell no. All that means is we just need to be careful. Can we detect these attacks? It’s tough to detect but yes we can detect them by good ol’ school, telemetry and observability.
But that’s not what this blog post is about. I think the most interesting bit from the xz attack for me was that the libraries that get harder to debug and decode are much juicier targets. How does that matter? The ML libraries that are super popular like pytorch
and tensorflow
and others are quite hard to compile out of bound from scratch. Such libraries can have interesting attack vectors which allow nice pickle compromises. I say “nice” because the family of insecure deserialization has existed in CWE since 2006! It’s older than many other issues and will continue to exist.
My only hope is that maintainers of core ML projects such as PyTorch, Tensorflow, keras and others start showing a slightly higher level of paranoia and build reproducibility so the supply chain attacks can be avoided on such harder to debug libraries.