In 2016, Microsoft released a prototype chatbot on Twitter. The automated program, dubbed Tay, responded to tweets and incorporated the content of those tweets into its knowledge base, riffing off the topics to carry on conversations.
In less than 24 hours, Microsoft had to yank the program and issue an apology after the software started spewing vile comments, including “I f**king hate feminists” and tweeting that it agreed with Hitler. Online attackers had used crafted comments to pollute the machine-learning algorithm, exploited a specific vulnerability in the program, and recognized that the bot frequently would just repeat comments, a major design flaw.
Microsoft apologized, and Tay has not returned. “Although we had prepared for many types of abuses of the system, we had made a critical oversight for this specific attack,” the company stated in its apology. “As a result, Tay tweeted wildly inappropriate and reprehensible words and images.”
AI is increasingly finding its way into software—including software used in security operations centers. Attacks on artificial-intelligence and ML systems are often thought to be theoretical, but significant attacks have already happened. Software developers who use ML models in their programs, and data scientists who develop the models, need to take such malicious behavior into account when creating ML-powered software.
Unfortunately, there is no hard-and-fast defensive playbook yet, said Jonathan Spring, a senior member of the technical staff of the CERT Division at Carnegie Mellon University’s Software Engineering Institute.
“I’m not sure there is ‘type this on the keyboard’ level advice yet. But there is operational-level advice that you can follow to make the systems more robust against failure.”
Here are five recommendations experts who are focused on AI threats say should guide developers and application security professionals.
1. Build awareness of the threat with your team
While adversarial ML attacks are the focus of a great deal of academic study, they are not just theoretical or an exercise using “spherical cows.”
In 2019, for example, researchers discovered a way to use targeted emails to discover the characteristics for the algorithm that messaging-security firm Proofpoint used to block malicious emails. The attack allowed two security researchers to design a system that replicated the AI model and then use that reconstituted model, dubbed Proof Pudding, to find ways to avoid spam detection.
As the attacks—and others cited in this article—show, meaningful adversarial ML attacks are already here. Developers need to build awareness on their teams and find ways to make deployed models more robust, said Mikel Rodriguez, director of the decision-science research programs at MITRE, a not-for-profit government research contractor.
“People sometimes see cute articles about turning a stop sign into a yield sign, and they think these attacks are not relevant to them. We first need to convince the community that this is a credible threat.”
2. Use tools to eliminate the low-hanging fruit
While practical attacks generally require a lot of research, a trio of open-source toolkits can get DevSecOps practitioners started testing their models. Three Python libraries—CleverHans, FoolBox, and the Adversarial Robustness Toolbox—allow developers to create examples.
Like static analysis and automated vulnerability scanning, these libraries should be considered the low-hanging fruit—tools that every developer should use against an AI model to make sure it is robust and not prone to attack or failure.
“If you have an ML system, those tools have relatively simple—but still complicated—APIs that let you test your model against known common attack patterns. If you use those, and try to make the system more robust against what you find, you will be doing better than most.”
3. Model threats and attack your AI system
Companies should attack their own systems to find weaknesses and potential information leakage. Microsoft, for example, used a red team to analyze its service for running AI workloads at the edge, first gathering information on the product and then querying a public version to infer the responses of the ML model. The knowledge and responses were enough for the red team to build an automated system to modify images in a way that caused the system to misclassify the data without being noticeable to humans.
In other cases, researchers have shown that you can extract data from the model using targeted attacks and inference. An ML model that is made available through an API can be used to recreate the model—essentially allowing the intellectual property to be stolen.
“If you are a company with a production machine-learning system, you want to have a red team, even if it is small. So you need to look at the system level to see what you can do in order to raise the cost to the attacker.”
MITRE has collaborated with Microsoft and other organizations to create the Adversarial ML Threat Matrix as a way to create common language of potential threats.
4. Use multiple models or algorithms as a check
Production anti-spam engines use aggregate scores from several algorithms to create a more robust way of classifying messages. Similarly, combining different models—or even the same model trained on different datasets, a technique known as ensemble learning—can produce more robust results.
In addition, using ongoing checks of results—such as users identifying faces in a photo on Facebook—to improve classification or performance can also help.
“If there is an immediate test of efficacy of the expert system—whether AI or human—that is very easily conducted and very correct, it does not matter what the system is doing. This is the case with anti-spam systems. The user gives feedback to the system by marking messages as spam or not spam.”
5. Beware of bias
No matter how good the ML model may be, if the developer trains the system on a biased dataset, it will not be robust to attack or errors. Facial-recognition models that do not perform well in identifying the faces of Black or Brown subjects or the inherent bias of social media toward divisive speech or “clickbait” headlines are examples of bias, in the training datasets or in the algorithms.
“There is a lot of research that needs to be done. People may want to focus on only using ML systems in applications where they can immediately check the result.”
The entire field of search-engine optimization is essentially about finding ways to attack algorithms for the benefit of the attacker.
Follow the money
The flourishing of that field underscores that when there is money involved, attacks will proliferate, said MITRE’s Rodriguez: