Harnessing dynamic heterogeneous redundancy to empower deep learning safety and security

Fan Zhang; Xi Chen; Wei Huang; Jiangxing Wu; Zijie Zhang; Chenyu Zhou; Jianpeng Li; Ziwen Peng; Wei Guo; Guangze Yang; Xinyuan Miao; Ruiyang Huang; Jiayu Du

doi:10.1051/sands/2024011

All issues

Volume 3 (2024)

Security and Safety, 3 (2024) 2024011

Full HTML

Security and Safety in Artificial Intelligence

Open Access

Issue		Security and Safety Volume 3, 2024 Security and Safety in Artificial Intelligence


Article Number		2024011
Number of page(s)		27
Section		Information Network
DOI		https://doi.org/10.1051/sands/2024011
Published online		20 October 2024

Security and Safety, Vol. 3, 2024011 (2024)

Views

Harnessing dynamic heterogeneous redundancy to empower deep learning safety and security

Fan Zhang¹, Xi Chen², Wei Huang³^*, Jiangxing Wu⁴, Zijie Zhang⁵, Chenyu Zhou⁵, Jianpeng Li⁶, Ziwen Peng⁶, Wei Guo¹, Guangze Yang⁵, Xinyuan Miao³, Ruiyang Huang¹ and Jiayu Du³

¹ National Digital Switching System And Engineering Technological Research Center (NDSC), Zhengzhou, 450002, China
² PLA Information Engineering University, Zhengzhou, 450002, China
³ Purple Mountain Laboratories, Nanjing, 211111, China
⁴ Fudan University, Shanghai, 200433, China
⁵ Southeast University, Nanjing, 210096, China
⁶ Zhengzhou University, Zhengzhou, 450001, China

^* Corresponding authors (email: huangwei@pmlabs.com.cn)

Received: 7 June 2024
Revised: 9 September 2024
Accepted: 9 September 2024

Abstract

The rapid development of deep learning (DL) models has been accompanied by various safety and security challenges, such as adversarial attacks and backdoor attacks. By analyzing the current literature on attacks and defenses in DL, we find that the ongoing adaptation between attack and defense makes it impossible to completely resolve these issues. In this paper, we propose that this situation is caused by the inherent flaws of DL models, namely non-interpretability, non-recognizability, and non-identifiability. We refer to these issues as the Endogenous Safety and Security (ESS) problems. To mitigate the ESS problems in DL, we propose using the Dynamic Heterogeneous Redundant (DHR) architecture. We believe that introducing diversity is crucial for resolving the ESS problems. To validate the effectiveness of this approach, we conduct various case studies across multiple application domains of DL. Our experimental results confirm that constructing DL systems based on the DHR architecture is more effective than existing DL defense strategies.

Key words: Deep learning / Endogenous security / Dynamic heterogeneous redundancy / AI safety

Citation: Zhang F, Chen X and Huang W et al. Harnessing dynamic heterogeneous redundancy to empower deep learning safety and security. Security and Safety 2024; 3: 2024011. https://doi.org/10.1051/sands/2024011

© The Author(s) 2024. Published by EDP Sciences and China Science Publishing & Media Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

Deep learning (DL) models experience remarkable advancements in recent years, transforming various industries with their powerful capabilities, such as autonomous driving [1] and robotic surgery [2]. Despite these breakthroughs, DL models are not without vulnerabilities. They are susceptible to sophisticated threats that can compromise their integrity and effectiveness, including adversarial attacks, backdoor attacks, and poisoning attacks.

Adversarial attacks [3, 4] involve deliberately altering input data to deceive DL models into making incorrect decisions. Backdoor attacks [5, 6] introduce specific trigger patterns during the training process, causing the model to produce incorrect outputs when these triggers are present during its operation. Poisoning attacks [7, 8] compromise the training set by introducing malicious data, thereby degrading the model’s overall performance or causing it to behave erratically. These attacks significantly threaten the safety and security of DL models.

To counteract these threats, several defense strategies have been developed. These include adversarial training [9] and input preprocessing [10] to protect against adversarial attacks, knowledge distillation [11] to prevent backdoor attacks, and data sanitization [12] techniques to defend against poisoning attacks. While these methods have proven effective in mitigating specific threats for DL models, the ongoing development of new attack methods means that security incidents can still occur, even in models fortified against previous vulnerabilities.

The interaction between attack and defense in cybersecurity is similar to the dynamics described in game theory. It is hoped that this evolution will lead to a state similar to Nash equilibrium [13, 14], which could help solve safety and security issues. However, in practice, attack and defense strategies continuously evolve, similar to the saying, “The higher the wall, the taller the ladder”. This indicates that fully resolving safety and security challenges may be unattainable, as it involves ongoing adaptation between adversarial tactics and defensive responses. This ongoing challenge is partly due to the incomplete development of theoretical and technical frameworks in DL, which inherently possess flaws that could lead to safety and security risks. These issues are termed Endogenous Safety and Security (ESS) [15] problems.

This paper introduces a novel categorization of ESS problems, employing first-principle thinking to deconstruct the complex and diverse safety and security challenges in DL into their most basic and fundamental components. Unlike existing categorizations based on specific risks, we divide ESS problems into two types: common and individual problems. Common problems are inherent in the operational environment, such as malicious intrusions common across all software. In contrast, DL models exhibit unique individual problems due to their specific characteristics, which we identify as non-interpretability, non-recognizability, and non-identifiability.

These three limitations underscore that simply enhancing the safety and security of DL models is insufficient to address their inherent vulnerabilities. Consequently, we propose to mitigate emergent ESS problems in DL through the development of ensemble models, drawing inspiration from the Dynamic Heterogeneous Redundant (DHR) [16] architecture. The principal concept involves converting uncertain vulnerabilities in a single model into a probabilistic evaluation across multiple diverse models. The final decision is made by analyzing and leveraging the differences in these models’ outputs. This approach significantly reduces the likelihood of errors in any single model affecting the overall result, emphasizing the role of model diversity in diminishing the potential for attacks to propagate. To demonstrate the effectiveness of the DHR architecture in combating a range of ESS challenges in DL, we conduct case studies in four areas: adversarial defense, backdoor defense, poisoning defense, and real-world applications. Our experimental findings confirm the efficacy and superiority of the DHR method over existing DL defense strategies. The contributions of this paper are summarized as follows:

(1)
We analyze analysis of the ESS problemsin DL. We define ESS problems in DL and propose a novel categorization of these problems into individual and common problems.
(2)
We analyze the intrinsic challenges in resolving ESS individual problems and identify that the Non-Interpretability, Non-Recognizability, and Non-Identifiability characteristics in deep learning make it difficult to address security issues through single-model enhancements.
(3)
We propose utilizing the DHR architecture to address the endogenous security issues in DL and analyze its feasibility. The DHR architecture transformation has the capability to simultaneously address both individual problems and common problems.
(4)
We conduct case studies across various common deep-learning scenarios, such as image classification, sentiment analysis, and object detection. Using the DHR architecture, we designed methods to construct heterogeneous models and applied them in DL model development. These models demonstrate exceptional robustness in computer vision, natural language processing, and graph neural network applications. They exhibit resilience against security issues like adversarial and backdoor attacks.
(5)
We apply the DHR model in the 6th “QiangWang” Mimic Defense International Elite Challenge as the designated task. The effectiveness of our method was subsequently confirmed in practical scenarios.

2. Related work

AI is a technology that enables computers to mimic human cognitive abilities and possess similar learning, reasoning, problem-solving, and decision-making capabilities. In particular, with the introduction of deep learning, AI entered a rapid development phase, and there is a proliferation of AI applications based on deep learning, including natural language processing, machine translation, computer vision, and large-scale language generation models, etc. However, as deep learning technology is widely applied in the AI field, it also faces security threats, with the primary threats being adversarial attacks and backdoor attacks. Addressing the security issues of deep learning models is a highly challenging task.

2.1. Adversarial attack

Adversarial attacks involve introducing subtle perturbations to the inputs of machine learning models, causing them to produce incorrect predictions with high confidence. The concept of adversarial samples, which are inputs intentionally altered to deceive the model, was first proposed by Szegedy et al. [17]. These adversarial samples, which are created by making minor yet significant changes to clean data, pose a serious challenge to learning-based classifiers [18, 19], especially in security-sensitive environments where model robustness is crucial. As research into adversarial attacks has advanced, a variety of attack methods have been identified. Attacks are generally classified into white-box and black-box categories based on the attacker’s knowledge of the target model. White-box attacks, which assume complete access to the model’s architecture and parameters, include techniques such as L-BFGS [20], which uses quasi-Newton optimization, FGSM [21], which employs gradient sign information, and iterative methods like BIM [19]. The C&W attack [22] utilizes different optimization objectives to craft adversarial examples, while PGD [23] uses projected gradient descent, and JSMA [24] relies on the Jacobian matrix. In contrast, black-box attacks occur when the attacker lacks detailed knowledge of the target model and typically involve generating adversarial samples using substitute models or estimating decision boundaries without gradient information [25, 26]. As adversarial sample generation techniques become more sophisticated and harder to detect, the development of effective defense mechanisms has become increasingly important. Integrated attacks, such as AutoAttack [27], which combine multiple strategies, further complicate the defense landscape, highlighting the urgent need for robust and comprehensive security solutions in machine learning.

2.2. Backdoor attack

A backdoor attack involves implanting a hidden entry point into a deep learning model, allowing it to function normally for clean samples but exhibit specific abnormal behaviors when the trigger is present. The first backdoor attack, known as BadNets [28], by adding triggers to part of the training data and altering corresponding labels to the target category. This renders the model learn incorrect association between the trigger and the target category, resulting in any sample containing the trigger being misclassified into the target category, while still exhibiting normal on clean samples. This approach has become the baseline for backdoor attacks in the field of computer vision. Later, more advanced backdoor attack methods are proposed. To evade human recognition, optical triggers imperceptible to the human eye are employed [29, 30]. Researchers also demonstrated that the success rate of backdoor attacks is closely related to the form of the backdoor trigger, therefore, finding a trigger that is easy to learn for a specific model is also crucial. In light of this, Li et al. [31] raised a bilevel optimization based on the L_p norm to optimize a trigger, this type of trigger is not only visually difficult to detect but also significantly enhances the effectiveness of the attack. Given that backdoor attacks based on data poisoning typically involve altering labels, inconsistencies between image content and labels are easy to detect by human eyes. To address this issue, Turner et al. proposed a method called label consistency attack [32]. Because that backdoor attacks behave normally on clean samples, it is difficult to discover that a deep learning model has been implanted with a backdoor, which poses a significant threat to the security of deep learning model deployment.

2.3. Endogenous safety and security

Endogenous safety and security refer to structures or algorithms and their institutional mechanisms that have endogenous effects or endogenous safety or security effects [33]. Endogeneity delineates an inherent effect engendered within a system, stemming autonomously rather than being contingent upon exogenous influences. Consequently, ESS denote the safety or security attributes derived through internal mechanisms encompassing system architecture, algorithms, mechanisms, or scenarios [34]. Any software, hardware, or algorithm inevitably harbors invisible side effects beyond their fundamental functions. Once triggered, these side effects can negatively impact the normal operation of the basic functions. Such effects are defined as ESS problems in cyberspace. To address ESS problems, current solutions involve the redesign of network architectures and incremental patching, such as those based on mimic defense [35] ESS solutions, trusted computing-based trusted network ESS solutions, zero-trust architecture-based ESS solutions, etc. Among these, the mimic defense-based ESS solution refers to normalizing the ESS threats of target objects into unknown disturbances that can be managed by reliability and robustness control theories and methods. Mimic defense employs conditional evasion methods to prevent attackers from forming effective attacks, ensuring that inevitable ESS problems do not escalate into systemic security threats [36]. Mimic defense, as a versatile security technology, is gradually being applied and commercialized. There is a plethora of applications of mimic defense, including cloud infrastructure [37, 38], network slicing protection schemes, and blockchain security enhancement solutions. Mimic-based domain name servers, web servers, and other systems have already been deployed and put into operation. In the field of deep learning (DL) security, there have been several studies on ensemble models, emphasizing that model diversity enhances their robustness. For instance, the GAL [39] method, which is predicated on gradient diversity, the ADP [40] method, which relies on model behavior diversity, and the PDD [41] method, which is based on differentiated dropout. These methods are conceptually similar to ESS and have significantly improved the robustness of the model.

As previously outlined, deep learning, despite its widespread applications, is presently confronted with substantial security challenges. Several methods proposed from the ESS perspective have achieved significant improvements in security performance across various fields. In the emerging domain of deep learning, existing research has demonstrated that introducing diversity has a notable effect on enhancing robustness. This approach aligns closely with the strategies used by ESS to address security issues. In this paper, we attempt to introduce ESS into the DL field by constructing a DL model system with an endogenous security architecture to address the issue of low robustness in single models.

3. ESS problems in DL

The safety and security of DL models are crucial to their widespread application, especially in safety-critical systems. This paper focuses on the ESS properties of DL, the definition of which is given below.

Definition 1. Endogenous Safety and Security (ESS) of Deep Learning (DL) refers to the safety and security functions or properties that DL obtain through their inherent factors, such as model architecture, learning algorithms, and processing mechanisms.

As defined in Definition 1, ESS in DL concerns only the safety and security that stem directly from the DL itself, not those that arise from the environment, such as the application in which the model is used. For example, a DL model might be robust to adversarial perturbations added to the input. The robustness is considered an ESS property of DL, as it can be enhanced by advanced training strategies like adversarial training [9]. In contrast, the legality of DL is not an ESS property, as legality depends on whether the use of DL models complies with laws and regulations, which are extrinsic factors derived from human morality and ethics. Correspondingly, ESS problems in DL have the following definition. Definition 2. Endogenous Safety and Security (ESS) problems in Deep Learning (DL) refer to the fact that the theoretical and technical architectures of DL have not yet been perfected, and their inherent “genetic defects” may give rise to safety and security risks.

Figure 1.

Existing classification of safety and security problems in DL

According to Definition 2, we can categorize all safety and security problems in DL as either ESS or non-ESS problems, based on their root causes. This categorization employs first-principle thinking to deconstruct the complex and diverse safety and security challenges in DL into their most basic and fundamental components. This provides a clear framework for subsequently developing effective defense methods to address ESS problems in DL. In contrast, existing studies on classifying DL safety and security problems tend to be relatively trivial and overly specific. As shown in Figure 1, the existing safety and security threats to DL that have been identified include adversarial attacks, backdoor attacks, DeepFakes, poisoning attacks, and privacy disclosures, among others. Current research typically focuses on one specific safety or security problem but overlooks the relationships and distinctions between them.

Figure 2.

New classification of safety and security problems in DL based on the ESS theroy

To bridge this gap, in addition to the categorization of ESS and non-ESS problems, this paper proposes to further divide the ESS problems into individual and common problems. As shown in Figure 2, individual problems pertain to problems within DL algorithms, whereas common problems relate to the operational environment of DL.

3.1. Individual problems

The ESS individual problems are attributed to the ‘genetic defects’ of DL algorithms. By delving into current research on DL safety and security, we have identified that these genetic defects manifest as ‘three inabilities’ in DL algorithms. These inabilities, which represent structural contradictions within DL models, include Non-Interpretability, Non-Recognizability, and Non-Identifiability.

Non-Interpretability. Due to the black-box nature of the learning and inference processes of DL, DL models are often considered non-interpretable. This has given rise to Explainable Artificial Intelligence (XAI) research studies [42] focused on DL. Nevertheless, to date, the process by which DL learns knowledge and rules from training data remains unclear. The internal learning process of DL is considered a black box, difficult to accurately describe, and even harder for humans to understand. This complexity makes it challenging to locate safety and security problems in DL, as the models are based on data-driven training and fitting mechanisms. Problems may arise concerning the authenticity and completeness of the data, the robustness and generalizability of the model, among other aspects, but pinpointing these problems can be quite challenging.

Non-Recongnizability. The non-recognizability of DL models is attributed to the data-driven learning framework of current DL techniques. DL models make predictions based on the training data on which they are fitted. Consequently, the quality and source of the training data can greatly influence the models’ outputs. DL models lack the ability to recognize whether an output is correct or incorrect, or whether it is fair or biased. This misalignment between the ethical standards of DL models and those of human beings complicates the assessment of models’ output contents. For example, after going online, Microsoft’s chatbot was fed a large amount of inappropriate data and became ‘corrupted’ within 16 hours, continuously emitting profanities. This incident highlights the challenge of ensuring that DL model outputs align with human values across diverse cultural, ethnic, educational, and cognitive backgrounds. Currently, there is no effective technical solution to this problem.

Non-Inferability. DL models excel at inductive reasoning, deriving patterns from known data. However, they typically struggle to understand and make judgments about unfamiliar, unexperienced phenomena and are even less capable of predicting and reasoning about medium to long-term future changes. The reason behind this is that DL models receive more information and knowledge than we have, but do not actually generate new knowledge. DL is still limited to extracting knowledge patterns from known data. In terms of dynamic knowledge, unknowns, and other aspects, there is still a gap compared to the human ability to draw inferences from a single instance. This also indicates that the safety and security problems of DL cannot be addressed by DL alone, as the deduction contains a philosophical logical contradiction. DL cannot foresee or infer the existence of uncertain security threats based on existing knowledge.

3.2. Common problems

Common problems refer to issues in DL systems that, like other information systems, arise from external dependencies on devices and environments. Like other application systems, the entity of an AI application system relies on physical information systems, so its algorithm model “base” is bound to face common ESS problems. Domestic and foreign research reports show that there are widespread security vulnerabilities in the software and hardware environments on which the mainstream deep learning framework relies. Once these vulnerabilities are exploited by attackers, AI systems will face the risk of destruction, tampering, and information theft.

In terms of software, current foreign platforms such as TensorFlow, Torch, and Caffe have all been reported to have security vulnerabilities. According to the data of GitHub, an open-source software community, since 2020, Tensorflow has been exposed to more than a hundred security vulnerabilities, which can lead to system instability, data leakage, memory corruption and other problems. In 2021, the “360 Company” conducted a security evaluation of mainstream open-source AI frameworks at home and abroad, and found more than 150 vulnerabilities in 7 machine learning frameworks (including the currently most widely used Tensorflow, PyTorch, etc.), and more than 200 vulnerabilities in the framework supply chain. This finding was confirmed by the DoS attacks, evasion attacks and system downtime in TensorFlow, Caffe and Torch exposed in 2017. A Tencent security team also found that there were major vulnerabilities in a TensorFlow component, which can make the robot program written by developers based on this component easily controlled remotely by hackers.

In terms of hardware, the GPU hardware products that AI systems mainly rely on also have security vulnerabilities. Among them, the most severe ones are the “Meltdown” and “Specter” vulnerabilities exposed in 2018, which affected multiple series of products including GeForce, Tesla, Grid, NVS, and Quadro, basically covering most of the product lines of NVIDIA, a mainstream GPU manufacturer. In the same year, researchers at the University of California, Riverside, targeted security vulnerabilities in NVIDIA GPUs [27] and discovered three methods that could be used by hackers to breach user security and privacy. In addition, research shows that the neural network model can be destroyed through GPU/CPU overflow, which makes the model invalid or becomes a backdoor network.

4. Addressing ESS problems in DL

4.1. Diversity promoting ESS

Due to the uncertainty inherent in deep learning systems, no single deep learning model can be easily trusted. Furthermore, this inherent uncertainty within deep learning models themselves renders security enhancements targeted solely at a single model unreliable. This situation prompts us to consider whether security can be achieved through system construction in application systems. In the absence of relying on (incorporating) prior knowledge (libraries), a model or constructive mechanism can transform uncertain disturbances within the target object’s environment into controlled probability differential-mode or common-mode properties of ESS events. Indeed, this implies that such a system can operate securely even when incorporating unreliable models. To realize such an application system, we advocate for the integration of diversity in system design. By leveraging diverse models, we can utilize the differential-mode outputs to mask errors inherent in any single model.

Diversity refers to the simultaneous integration of various models within the system. The diverse models integrated within the system exhibit identical functionalities during normal input-output operations. However, when faced with anomalies or attacks, they can generate a differential-mode response, leading to noticeable disparities in the results. The application system uses fusion algorithms designed to arbitrate the final output. This ensures that even if a single model experiences a security incident resulting in abnormal output, the overall system output remains correct.

To analyze the significance of diversity, we introduce the notion of a safe space. Using classification tasks as an example, the objective in enhancing a model’s robustness is to improve its capability to accurately classify perturbed samples. Define the “safety space” S_f for a model f and an input x as follows: This space represents the range of perturbations ρ that can be applied to x and the model’s output remains consistent with the true label l. In other words, if a perturbation ρ ∈ S_f, then for the perturbed sample x + ρ, the model still outputs the correct label, i.e., f(x + ρ)=l. However, perturbations that fall outside of this safety space can lead to misclassifications. The robustness of a model is therefore tied to the size and integrity of this safety space, with larger spaces indicating greater robustness to perturbations.

The safety space for ensemble models, S_F, is jointly determined by its sub-models. Let F be an ensemble model with n sub-models. The outputs through output summation averaging can be described as below:

$\begin{matrix} F {f_{1}, f_{2}, \dots, f_{i}, \dots, f_{n}, x} = \frac{1}{n} \sum_{i}^{N} f_{i} (x) \end{matrix}$ $\begin{aligned} F\{f_1,f_2,\cdots ,f_i,\cdots ,f_n,x\}=\frac{1}{n}\sum _{i}^{N}f_i(x) \end{aligned}$ (1)

According to Equation (1), the safety space for ensemble model F can be defined as follows:

$\begin{matrix} \forall ρ \in S_{F}, \sum_{i}^{n} P (f_{i} (x + ρ) = l) > \sum_{i}^{n} P (f_{i} (x + ρ) = l_{f}) \end{matrix}$ $\begin{aligned} \forall \rho \in S_F, \sum _{i}^{n}P(f_i(x+\rho )=l) > \sum _{i}^{n}P(f_i(x+\rho )=l_{f}) \end{aligned}$ (2)

where l_f denotes the incorrect outputs with the highest sum of probabilities, and P(⋅) signifies the prediction probabilities. In essence, if an attack ρ does not lead a majority of models to converge on the same incorrect result, the integrated system is unlikely to produce an error.

For the same input, the ensemble of multiple models involves overlaying the safety spaces of those models. As illustrated in Figure 3, if there are three models with safety spaces S_f1, S_f2, and S_f3, respectively, the overlay of their safety spaces yields four distinct regions: S₁, S₂, and S₃.

Figure 3.

The essence of an ensemble model is to combine the security spaces of multiple models [43]

The safety space S_F of the ensemble model is indeed formed by the intersection of the safety spaces S₁, S₂, and S₃ of the sub-models. As depicted in Figure 3, the overlap of S₁, S₂, and S₃ notably expands the secure space. It can be seen that under this analysis, multiple models can complement each other to adapt to attacks, thereby enhancing robustness. The core point is how to ensure that the coverage of the safety space among the models is sufficiently large and that the results within the coverage satisfy Equation (2). Based on this analysis, we propose introducing necessary diversity by constructing a Dynamic Heterogeneous Redundancy Architecture (DHR) to enhance the system’s robustness.

4.2. Enhancing DL models through DHR

AI application systems face ESS problems at two levels: the “base” of software/hardware environments and the “ontology” of model algorithms, so the threats and challenges they face are more severe. In the aspect of common problems, through the practical development of ESS in cloud platforms, storage systems, routing and switching, and other network devices in recent years, the “base” environments of AI application systems such as information communication networks, clouds, and data centers have the ESS attributes of trusted services, which provide a feasible solution to the common ESS problems of AI. With respect to individual problems, the author takes DNN, a hot topic at present, as an example for research and discussion.

4.2.1. Motivation for the enhancing design

According to the analysis above, the neural network of AI is a feature engineering based on gradient optimization for fitting. This process makes the existing models pay attention to all the features instead of ignoring the subtle features as human beings do. Currently, both black-box and white-box adversarial attacks fundamentally rely on the idea of creating deceptive samples to deceive models. This is done by employing various optimization strategies, such as gradient descent on model loss while ensuring minimal interference with microscopic features. Just as the vulnerabilities and backdoors in the software/hardware of an information system cannot be predicted or exhausted in advance, the optimization methods adopted by neural networks at present can only try their best to approach but can never achieve the perfect goal of “understanding” everything when the training set is limited. This problem is an architectural defect of the neural network itself and can be regarded as a vulnerability in the neural network model algorithm. From this point of view, adversarial attacks, as the most representative individual ESS problem of AI, are similar to common ESS problems in terms of root causes, presentation forms, and exploitation methods, and it seems that they can be defended against in an integrated manner from the perspective of ESS. From the perspective of the ESS defense paradigm, the feasibility of implementing security protection at the algorithm level of the DNN model based on the dynamics, variety, and redundancy of the DHR architecture and the SR-FC mechanism mainly includes the following two aspects:

(1) The adversarial performance of each functional equivalent conforms to the relatively-true Axiom. It is found that the correct classification boundaries of networks with different structures are similar when searching for gradients in different directions at the same decision point [34, 35], as a result of which the same disturbance can make different models go wrong, that is, adversarial attacks are migratable. At the same time, however, the direction of the gradient descent of each model is extremely random, which leads to different wrong results, so it is difficult to achieve a migratable attack on a specific target. Therefore, the effectiveness of diversity and heterogeneity of neural network sub-models in DHR is guaranteed.

(2) The input and output interfaces of functionally equivalent reconfigurable executors can be normalized or standardized. For a neural network with homogeneity, the input data to be identified or classified is its normalized input, and the result of identification or classification is its normalized output. On this interface, under the excitation of a given input sequence, neural network sub-model executors with functional equivalence have the same probability of multiple output vectors or states, which makes it possible to judge and ensure the equivalence among sub-model executors through the consistency test method of a given function or performance. For heterogeneous models, normalization methods need to be further studied in the future. However, based on the final conclusions derived from inputs and outputs, theoretically, the output targets can be normalized through the transformation of outputs. Based on the above, we believe that AI application systems also have the feasibility to construct architectures with ESS characteristics. The paper explores the use of the DHR architecture to modify AI application systems. By leveraging diversity to generate endogenous security effects, the approach aims to enhance the overall system’s robustness. The ultimate goal is to create a robust system that can maintain resilience even when the robustness of individual models is insufficient.

4.2.2. An design of ESS AI defense framework

The DHR architecture has been proven in practice to be an effective approach to in-scope security. Figure 4 shows the DHR-based ESS defense framework of AI, where multiple functionally equivalent neural network sub-models are used to construct a heterogeneous redundant operating environment, the input agent distributes samples to each sub-model for independent processing, and the identification or classification results enter the strategic ruling. For normal samples, each sub-model can give the same or similar results. For adversarial samples, the sub-model will be triggered to produce differential-mode output, so they are very likely to be discovered by the ruling module, the error correction output link and system scheduling module will be activated, and the algorithm model will be dynamically replaced according to certain rules, thus avoiding the current adversarial attack.

Figure 4.

DHR-based ESS defense framework of AI

Under the above defense framework, how to mine and construct effective diversity for neural networks becomes the key. The core elements of neural networks include the dataset, the network model, and the training method, all of which can be used to construct the entry points of heterogeneous sub-models. What needs to be noted is that due to the similarity in the internal mechanisms of the trained models and methods, even if there are differences in the model structure, the characteristics and methods of learning are still similar, so the same adversarial sample can make different models go wrong, that is, the problem of target migratability still exists. Therefore, if we want to achieve system-level robustness from dynamics and unknowns, we need to conduct in-depth research and experiments to further explore how to obtain differentiated neural network models.The core component of constructing the DHR (Diverse Heterogeneous Redundancy) architecture is the heterogeneity of its sub-executors, i.e., the introduction of necessary diversity. To validate the feasibility of this architecture, we subsequently conducted various case studies on diversity verification, which are described in more detail in Section 5.

5. Case study

To validate the feasibility of our approach, we attempt various methods of diversity construction. We set up four scenarios for testing: adversarial defense, backdoor defense, poisoning defense, and real-world application scenarios. For each different scenario, we select typical tasks within each scenario, such as image classification, object detection, etc. By comparing the performance of individual models, we find that model diversity can significantly enhance the robustness of systems.

5.1. Adversarial defence

An adversarial attack is a technique employed to deliberately manipulate or deceive machine learning models by introducing carefully crafted perturbations into input data. These perturbations are often imperceptible to humans but can cause the model to misclassify or produce erroneous outputs. Adversarial attacks can be categorized into various types, such as white-box attacks, where the attacker has full knowledge of the target model, and black-box attacks, where the attacker has limited or no information about the target model. Adversarial attacks pose significant challenges to the robustness and security of machine learning systems, as they can undermine the reliability of models in real-world applications.

Adversarial defense refers to methods and techniques aimed at protecting machine learning models from adversarial attacks. These defenses seek to enhance the robustness and resilience of models against adversarial perturbations introduced into input data. Adversarial defense methods can be broadly classified into two categories:

Adaptive defenses: Adaptive defenses dynamically adjust the model’s parameters or architecture in response to detected adversarial attacks, such as adversarial training and gradient masking. Adversarial training is a powerful technique that enhances a model’s robustness by exposing it to both pristine and adversarially perturbed data. This method helps the model learn to generalize and resist potential attacks. Another effective strategy is gradient masking, which conceals gradient information to thwart attackers from crafting potent adversarial examples. By obfuscating the gradients, models become less vulnerable to adversarial manipulations.

Detective defenses: Detective defenses are designed to identify and neutralize adversarial examples before they inflict damage. These techniques employ a variety of strategies to ensure the integrity of the model’s predictions. Anomaly detection is one such method, which scrutinizes data for anomalies that deviate from typical patterns, thereby flagging them as potential adversarial threats. Another approach is adversarial sample detection, which meticulously analyzes input samples to identify those that provoke atypical or unpredictable responses from the model. By implementing these proactive measures, detective defenses fortify the system against malicious attempts to compromise its performance.

As previously outlined,the predominant strategy for thwarting adversarial attacks has been to apply supplementary defenses and strengthen individual models. However, we hold the opinion that an exclusive reliance on these supplementary measures and the model’s inherent defenses does not provide a foolproof security solution. Our objective is to develop the DHR architecture, which is intended to significantly improve the model’s security posture. We have made attempts from the following perspectives.

5.1.1. Defending against adversarial attack through preprocessing diversity

Methods. In this approach, we enhance the robustness of machine learning models by promoting diversity in data representation. By applying different transformation techniques to the same dataset, each model is exposed to uniquely represented data, fostering a variety of insights and capabilities. This method makes it challenging for adversarial perturbations to transfer effectively between models, thereby increasing system resilience. Through different transformers, multiple TF data sets are derived from raw data, which are then used to train various TF models, thereby constructing an ensemble model system. The specific process is shown in Figure 5.

Figure 5.

System architecture design

Experimental setup. Emphasizing both representativeness and simplicity, we focused on a classic image classification task for our evaluation. We transformed data from the same source dataset in various ways to ensure that the task objectives remained consistent, maintaining label-sample correspondence. This uniformity aids in achieving comparable classification goals. We selected different image processing techniques to highlight unique features, employing methods such as the Canny [44], LBP [45] and GLCM [46], which uses seven different statistical methods for image transformation. The images, drawn from a subset of the ImageNet dataset, include five categories totaling around 10 000 images. We conducted training using the standard RESNET18 model with images normalized to 224 × 224 pixels.

Figure 6.

Comparison of attack results between different models of transfer attacks: (a) LBP is the attacked mode; (b) Canny is the attacked mode; (c) Mean is the attacked mode; (d) Max is the attacked mode

Results. Figure 6 illustrates that as perturbation size increases, the accuracy of the attacked model declines sharply, while models using different transformations (LBP, CANNY, GLCM) respond variably to the same perturbations. Notably, Table 1 illustrates that when one model’s accuracy dips below 0.01, others still can maintain over 0.9 accuracy, highlighting resilience against transferred perturbations. This indicates that by using different transformation methods, we create diversity in data processing, thereby constructing a DHR-like architecture system that significantly enhances the model’s resistance to adversarial attacks.

Table 1.

Attack transfer statistics for each model

5.1.2. Defending against adversarial attack through weight diversity

Methods. By increasing the diversity of training gradients, we can reduce the transferability of adversarial examples. Considering the complexity of gradient computation, we decided to enhance diversity from the perspective of weights. Since gradients are closely related to the model’s weights, increasing the diversity of weights can to some extent be equivalent to increasing the diversity of gradients. To promote diversified training, we defined two metrics to quantify the diversity of model weights and regulated them during training. Strengthening Weight Concentration (SWC) and Penalizing Weight Correlation (PWC) can encourage greater differentiation among submodels. The weight distribution of the same model and the same layer under different training methods is shown in Figure 7.

Figure 7.

Distribution of weight values for identical layer across Ensemble Model Sub-Models: (a) BASELINE; (b) ADP; (c) WC+SWC

Experimental Setup. We utilized the ImageNet100 dataset to train a RESNET18 network to classify images into 100 distinct classes. Concurrently, we conducted a comparative analysis with existing research on ensemble model diversity. Specifically, we examined the ADP method, which leverages behavioral diversity, and the DEG method, which is rooted in gradient diversity. Our evaluation encompassed three attack methodologies, comprising two white-box attack strategies and one black-box approach, to assess transferability. During testing, we subjected one of the models to attacks and evaluated accuracy by aggregating the outputs of all three models. The constraints applied to the attacks are denoted as ‘para’ in the corresponding Table 2. Notably, the PGD and BIM methods underwent 10 iterations each, while the SPSA method was iterated 5 times.

Table 2.

Recognition accuracy (%) under grey-box attacks with control parameter (Para.) as ϵ in L_∞ for BIM, PGD and SPSA. The ensemble size is K = 3. The best performance is marked in bold.

Results. Figure 7 compares weight distributions across sub-models within an ensemble model, showcasing varying degrees of concentration. While the ADP method exhibits a slight tendency towards concentration, our proposed method demonstrates a more pronounced divergence, with primary weight values concentrated on a minority of nodes, underscoring its distinctive impact on model weight distribution.

Table 2 demonstrates the accuracy of adversarial examples generated by attacking a single model under simultaneous recognition by multiple models on the ImageNet100 dataset.

Based on the presented Table 2, it’s clear that adversarial examples tailored for individual models have severely impaired their performance, rendering them nearly ineffective. In contrast, our proposed method demonstrates superior identification capability when compared to the DEG and ADP methods. All diversity-enhancing techniques show a significant enhancement in recognition accuracy. By leveraging diversity in weights, we have successfully enhanced the robustness of the model, constructing a DHR system.

5.1.3. Defending against adversarial attack through data diversity

Methods. We propose a diversity training method for large datasets to regulate learning data diversity without incurring additional training costs. This method [43] leverages model feedback to control diversity, with four regularization operations tailored to specific categories: enhancing model performance (EMP), enhancing model divergence (EMD), enhancing single individuals (ESI), and enhancing error disagreement (EED). Our empirical evaluation demonstrates that this approach significantly boosts model robustness while having minimal impact on performance. We refer to this method as enhancing adversarial robustness through diversity-supporting robustness (EADSR).

Experimental Setup. We still utilized the ImageNet100 dataset and selected two different configuration parameters: 0.03 and 0.1, representing the proportion of perturbation introduced by additional data. We tested five attack methods: FGSM, BIM, PGD, APGD, and Auto Attack (AA). We employed a stricter attack strategy by treating multiple models as a single entity for simultaneous attack, causing gradients of all models to decrease simultaneously. We also compared our approach with previous methods that attempted to use multi-model defense against attacks, namely ADP and PDD. The ADP method enhances the robustness of ensemble models by increasing the behavioral diversity among the individual models, while the PDD method improves diversity by differentiating the dropout strategies applied to the final layer of each model.

Table 3.

Recognition accuracy (%) under white-box attacks with control parameter (Para.) as ϵ in L_∞ for FGSM, BIM, PGD A-PGD, and Auto Attack Note: ensemble size is K = 3 and optimal performance is marked in bold [43]

Result. As depicted in Table 3, reducing perturbation intensity drastically reduces the performance of attacked models on smaller datasets. EADSR also shows decreased performance under attacks but maintains significant robustness gains without substantial performance decline. Compared to existing diversity methods, EADSR exhibits notable accuracy improvements against iterative attacks, albeit with a slight performance impact (a 4% decrease) when perturbation strength is increased by 0.1. By leveraging the differences in data and training methods, the DHR system we built significantly enhances the robustness of the model.

5.2. Backdoor defence

A Backdoor attack is to implant a backdoor into a model, making the model particularly sensitive to a certain trigger and exhibiting specific abnormal behaviors when the trigger is present. In the field of image classification, the purpose of backdoor attacks is to make the model perform normally on clean samples but classify any sample added with the trigger to a specified wrong class. Since backdoor models behave normally on clean samples, backdoor attacks possess excellent stealthiness. There is significant flexibility in the selection of attack targets, triggers, and attack methods.Backdoor attacks on DNNs raise significant security concerns for the application of artificial intelligence.

Researchers are actively working on developing defense mechanisms to protect models against such attacks,the defense strategies against backdoors mainly include three aspects:

Data-level: To defend against backdoor attacks based on data poisoning,data filtering is conducted first to remove any abnormal data, thus preventing the model from being implanted with a backdoor. A typical method is PatchSearch [47], which utilizes Grad-cam [48] to locate the suspicious patch, but this method can only be applied to patch-form triggers and can’t detect triggers such as optical triggers.

Train-level: To defend against backdoor attacks based on data poisoning, its main idea is to train a clean model on the poisoned dataset by manipulating the training process. One typical method is ABL [49], it employs a strategy of gradient ascent on poisoned samples, thereby preventing the implantation of a backdoor. The shortcoming of this method is also evident, it may misjudge some high-quality normal samples as poisoned samples, thereby affecting the model’s performance.

Model-level: To remove a backdoor from a trained model that may have been implanted with a backdoor. The general process can be further divided into backdoor detection and backdoor removal. In this scenario, the defender often has only a limited amount of training data. A typical method is SSL_Cleanse, which first decides whether the model is attacked and identifies the attack target, then fine-tunes the model to remove the backdoor. However, this method can only address single-target attacks and cannot address multi-target attacks.

Existing methods such as data filtering, and backdoor detection, backdoor removal cannot completely prevent the model from backdoor attacks. Thus, we turn to DHR and hope to defend against backdoor attacks by adopting a DHR architecture,whose main idea is diversity. We have made attempts from the following perspectives.

Table 4.

Our defense method ATA on different attack methods

5.2.1. Defending against backdoor attacks through weight diversity

Method. Self-supervised learning [50–52] aims to train an image encoder on the unlabeled dataset, which is then utilized to train a downstream classifier. A backdoored encoder will produce features of samples added with the trigger close to the target class [53], which then results in any sample embedded with the trigger classified to the target class. Determining whether a model has been attacked or the category of the attack is a very challenging task, with the possibility of misjudgment. In a self-supervised learning model, certain neurons exhibit heightened activity when confronted with triggers [54], leading to the occurrence of backdoor attacks. If we can adjust the weights of these neurons, there is a significant possibility of removing the backdoor.

Therefore, we propose the defense method of Anti-backdoor Through Active Attack (ATA) ,which first uses custom triggers to attack all classes to implant a backdoor in the model. Then, conduct desensitization training to reduce the model’s sensitivity to the custom triggers. Through this process, we make the custom trigger ineffective along with the model’s weights effectively adjusted. Since the effectiveness of the triggers used by the attacker depends on specific weights of the model, the aforementioned adjustments have a high possibility of disrupting the weights associated with the triggers used by the attacker. By training multiple models using this method, we obtain a set of models with diversity in weights. By integrating the set of models into an ensemble model, we can effectively defend against backdoor attacks using the DHR architecture.

Experimental setup. The experiment utilizes the CIFAR-10 dataset and employs the BadEncoder [53] attack method, known for its strong effectiveness, during the active attack stage. The testing evaluates the defensive effectiveness against various attack methods, including SSL_Backdoor [55], ctrl [56] and BadEncoder [53]. The experimental results are shown in Table 4. In the experiment, we employ three metrics: ACC represents the classification accuracy on clean datasets, ASR represents the success rate of backdoor attacks (the poisoned datasets classified into the specified class), and PACC represents the classification accuracy on the poisoned dataset.

Table 5.

Sentences generated while data diversification. SR: synonym replacement. RI: random insertion. RS: random swap. RD: random deletion

Table 6.

The results of data diversification, where “train” and “test” respectively represent the number of samples in the training set and the testing set.

Results. The experimental results demonstrate that our method ATA based on DHR performs excellently across various attack methods, reducing the ASR to a nearly negligible level while maintaining the model’s classification accuracy on clean samples. Additionally, the PACC is close to ACC, indicating that the integrated model of the DHR architecture is completely desensitized to triggers.

5.2.2. Defending against backdoor attacks through data diversity

Method. Backdoor erasing methods are a category of backdoor defense methods. Fine-tuning is the simplest backdoor-erasing method, which erases the backdoor by training the backdoor model on a small subset of clean data. However, training on a small dataset alone can lead to model overfitting to the subset, significantly reducing the model’s performance on clean samples while reducing the success rate of poisoned samples. Data augmentation involves changing input data in certain ways to generate more training samples, thereby improving the model’s generalization ability and effectiveness. In this experiment, the data diversification part improved the simple text data augmentation (EDA) method [57] by using synonym replacement, random word insertion, random word deletion, and random word swapping for data augmentation, while ensuring that specific target entities in the sentence remain unchanged during data augmentation. Examples of data diversification are provided in Table 5, and the results can be found in Table 6. Given input text S_in = {Tok₀, Tok₁, ⋯, Tok_i, ⋯, Tok_n − 1, Tok_n}, where Tok_i represents a specific word. The selection of specific words is determined using the TF-IDF algorithm, selecting the most important words in the sentence.

Experimental Setup. Considering the representativeness and ease of operation of the experiments, we used three real-world datasets corresponding to three different tasks: (1) Stanford Sentiment Treebank (SST-2) [58] for binary sentiment analysis task; (2) Offensive Language Identification (OLID2) [59] for binary offensive language detection task; (3) AG News [60] for four-class news classification task. The attack method adopted BadNets [28], the victim model used BERT-base [61], and the defense method employed fine-tuning. In the data diversification part, we computed the defense effectiveness using only 1, 2, 3, or 4 data augmentation methods, with the initial data set comprising 5% of the original data. We used two metrics to evaluate the effectiveness of backdoor defense: (1) Attack Success Rate (ASR), which is the proportion of poisoned samples classified by the poisoned model as the attacker’s specified label, evaluating whether the poisoned samples can activate the backdoor in the poisoned model; (2) Clean Accuracy (CA), which is the classification accuracy of the poisoned model on clean samples, assesses whether the backdoor samples degrade the model performance.

Results. Table 7 presents the effectiveness of the attack method on the three datasets and evaluates the defense effectiveness using different types of data augmentation methods under the fine-tuning defense method. We observed that with the increase in clean data used by the fine-tuning method, the average ASR decreased by around 3%, while the average CA increased by 20%. These results indicate that fine-tuning with data diversification effectively erased the backdoor in the poisoned model and improved its performance on clean samples by around 20%, effectively addressing the overfitting phenomenon in fine-tuning.

Table 7.

The performance of BadNets on three datasets and the defense effect of fine-tuning with different data augmentation methods

5.2.3. Defending against backdoor attacks through training diversity

Method. Ensemble methods are commonly used in the field of adversarial defense. Attackers need to deceive the majority of sub-models in the ensemble model to achieve their attack goals, where each sub-model has different network structures and weight parameters. Therefore, theoretically, ensemble models are usually more robust than single models and can defend against various adversarial attack methods. However, ensemble methods require a large amount of clean data to train the sub-models, which most backdoor eradication methods cannot provide. As shown in Figure 8, to address this gap, we propose integrating multiple backdoor mitigation strategies through ensemble distillation to strengthen backdoor elimination. We perform backdoor erm madication on the backdoor model using a small amount of clean data and simple backdoor eradication methods. Specifically, we employ Fine-tuning, Re-init, and Fine-Pruning separately, obtaining three distinct and relatively clean models as teacher models for the distillation process. Finally, we use the teacher models obtained from the previous steps and the augmented data to conduct ensemble distillation on the original backdoor model. The student model learns clean knowledge from the three teacher models, with the student network’s intermediate layer emulating the output of the teacher model’s intermediate layer.

Figure 8.

Flowchart of the method: (a) Use a small part of clean samples to erase backdoor on backdoor model to obtain multiple teacher models, then; (b) Utlize ensemble distillation through multiple teacher models to obtain a cleaner student model

Experimental Setup. Considering the representativeness and ease of the experiments, we utilize three real-world datasets corresponding to three different tasks: (1) Stanford Sentiment Treebank (SST-2) [58]; (2) Offensive Language Identification (OLID2) [59]; (3) AG News [60]. The victim model is based on BERT-base [61]. We also utilize both BERT-base and BERT-large as victim models. We select six representative backdoor attack methods: (1) BadNets [28]; (2) RIPPLe [62]; (3) InsertSent [63]; (4) HiddenKiller [64]; (5) StyleBkd [65]; (6) SOS [66]. We employ two metrics to evaluate the effectiveness of backdoor defense: (1) Attack Success Rate (ASR): The proportion of poisoned samples classified by the poisoned model into the label specified by the attacker, evaluating whether the poisoned samples can activate the backdoor in the poisoned model; (2) Clean Accuracy (CA): The classification accuracy of the poisoned model on clean samples, evaluating whether the backdoor samples will degrade the model performance.

Results. Table 8 demonstrates the effectiveness of the six attack methods on the three datasets (ASR and CA), and evaluates the defense effects of ensemble distillation (△ASR and △CA). We observe that ensemble distillation effectively eradicates the backdoors in all backdoor models–average △ASR decreased to 24%. At the same time, the model’s performance on clean samples is not significantly affected, with an average △CA remaining at around 80%. These results indicate that ensemble distillation is effective in defending against various models under different backdoor attacks.

Table 8.

The performance of the six attack methods on three datasets and the defense effect of ensemble distillation. BN denotes BadNets, RP denotes RIPPLe, Ins denotes InsertSent, HK denotes HiddenKiller, Style denotes StyleBkd

Table 9.

Classification performance (Accuracy±Std) under different perturbation rates of Meta

5.3. Poisoning defence

Method. A Poisoning attack is an attack that targets graph-based machine-learning models by manipulating data pre-training [67]. Poisoning attack usually modifies the graph data before the model is trained with the intention of reducing the model’s predictive performance on certain tasks–such as node classification or graph classification–without significantly altering the appearance of the graph data [68, 69]. Poisoning attacks on graph-based models pose substantial threats to the reliability and efficacy of artificial intelligence applications.

While current defense models attempt to clean graph data from the graph structure view and node feature view [70], these models often suffer from shortcomings and limitations of model singularity. In fact, models based on graph structure and node feature view are complementary. By using graph data diversity, the limitations of a single view can be overcome. This strategy exploits the complementary nature of the two views, enabling GNNs to capture and identify potentially anomalous attacks from data diversity, thus improving the comprehensiveness and effectiveness of the defense. In addition, the strategy enhances the robustness of GNNs, and an attacker needs to bypass multiple models at the same time to successfully execute an attack, which greatly increases the difficulty of the attack. At the same time, data diversity also allows GNNs to be more flexible in adapting to new attacks, by adjusting the weights and strategies of different views to cope with changing attack patterns.

Therefore, we achieve a more effective defense based on the idea of DHR by integrating multiple graph views in order to combine the topology and attribute information of the graph. Specifically, we propose DHRGNN that integrates three heterogeneous models to defend against poisoning attacks. Firstly, we use the SVD decomposition-based model (SVD) and Edge-Boosted Attention Model (E-Boost), to clean up the perturbations and generate robust graphs from the graph structure and node feature view, respectively. Then, we use the Robust information combination model (RIC) to fuse the information of the two robust graphs for to reduce the impact of poisoning attacks on a single model and improve the robustness of the model.

Experimental Setup. We compare different models on commonly used datasets in the GNN field, including Cora [71], and Citeseer [72]. Cora comprises 2708 machine learning papers in 7 classes, noted for graph deep learning, with a 5429 link network. CiteSeer is a citation network with 3312 scientific publications in 6 categories and 4732 links. We conduct a comparative evaluation under the Meta [73] and Nettack [74]. Meta treats the graph as a hyperparameter and uses meta-gradients to perturb the graph structure. Nettack aims to change the graph structure of target nodes or nearby nodes with perturbations remaining unnoticeable. In Meta, we set the Perturbation Rate (Ptb Rate) from 0 to 0.25. In Nettack, we set the Perturbation Number (Ptb Num) from 0 to 5.0. We compare DHRGNN with baselines including GCN [75], GCNJaccard [76], GCNSVD [77], ProGNN [78]. For a fair comparison,we report the averaged results of 10 runs for all experiments.

Table 10.

Classification performance (Accuracy±Std) under different perturbation numbers of Nettack

Results. Tables 9 and 10 illustrate the performance of two datasets against Meta and Nettack. The two highest-performing models are highlighted with bold and underlined formatting. From these results, it is clear that DHRGNN can resist adversarial attacks and achieve satisfactory results in all cases. Moreover, it is noteworthy that the proposed DHRGNN minimizes the decrease in performance most among all the methods in the face of adversarial attacks. In all settings, when the perturbation is 0, it is clear that DHRGNN exhibits performance on par with GCN, indicating that the GNN’s capacity to learn from clean graphs remains intact. From the overall performance, we observe that the advantage of DHRGNN is obvious compared with others and its performance is stably great. In conclusion, due to the application of data diversity, DHRGNN is robust enough to defend against different attacks.

5.4. Applications to real-world object detection

To further verify the effectiveness and feasibility of this method, we decided to test it using commonly used object detection systems in real-world scenarios. Object detection is a deep learning task aimed at identifying and locating objects within images by marking them with bounding boxes. This technology has a wide range of applications across various fields like autonomous driving [79], and remote sensing image analysis [80].

Based on the Dynamic Heterogeneous Redundancy (DHR) concept, we developed a diversity training method [81] to combine multiple heterogeneous models into an ensemble object detection model, and performed attack tests on this diversity ensemble model (DEM) based on mainstream adversarial attack methods, and compared the effects with the baseline ensemble model (BEM) without using this training method. The experimental results are shown in the Tables 11 and 12.

Table 11.

The MAP(%) of attacked submodels and the ensemble models trained on the different dataset and YOLOv3 model, under attacks from different methods on the first submodel

Table 12.

The MAP(%) of all submodels and the ensemble models when facing adversarial examples generated by an unrelated model trained on different dataset and YOLOv3 model

The experiment in Table 11 is a white box scenario, in which we perform adversarial attacks on the first sub-model of two ensemble models, and then input the generated adversarial samples into the corresponding ensemble models to obtain the mean average precision (mAP) of each sub-model and ensemble model. It can be seen from the data in Table 11 that the detection performance of the DEM using our diversity training is always higher than that of the BEM under the scenarios of different data sets and different attack methods. In addition, under the same attack environment, the mAP of the two unassailed sub-models of DEM is always higher than that of the corresponding sub-models of BEM, which shows that the diversity training method can enhance the anti transferability between the sub models in the ensemble model in the face of adversarial samples and shows the effectiveness of the diversity training method.

The experiment in Table 12 is the anti-transferability test in the black box scenario. The adversarial samples in this scenario are generated based on model attacks other than DEM and BEM, to prove whether the diversity training method is still effective in the black box scenario. It can be seen from the data in Table 12 that all models can maintain a good mAP when they are not attacked. After using the adversarial attack, the mAP of the attacked model is close to 0, and the detection ability is almost invalid. Then these adversarial samples are input into the two ensemble models to test their anti-transferability respectively. Under the influence of adversarial samples, the mAP of each ensemble model generally decreases. However, the mAP of DEM is always higher than that of BEM, whether from the perspective of the ensemble model or sub-model, which shows that in this black box scenario, the DEM model has less negative impact on the face of adversarial samples and stronger anti transferability.

The data in Tables 11 and 12 show the effectiveness of DHR in ensemble object detection tasks.

In addition, in order to verify the defense effect of this model in the real world, we conducted an attack test on our ensemble model based on the adversarial patch method. This attack is aimed at the classification of “human”, and the attack target is to make the object detection model fail to detect the human who has been patched, and the experimental results are shown in Figure 9.

Figure 9.

Illustration of proposed ensemble adversarial defense against physical adversarial attack on person detectors

Figure 9 illustrates the effectiveness of the ensemble model against a patch attack targeting humans in the physical world. The adversarial patch was generated based on a model trained using the attached dataset in (a). The attacked model (a) and reference model (b) (c) jointly formed an ensemble model. The results showed that neither the attacked model nor the reference model 1 in (b) detected the person holding the adversarial patch. However, reference model 2 in (c) was able to recognize the patch normally, indicating that the adversarial patch generated was based on the attacked model in (a) successfully migrated to reference model 1 in (b), but failed to migrate to reference model 2 in (c).

In the intense competition of the 6th “QiangWang” Mimic Defense International Elite Challenge, as the organizers, we applied the ensemble object detection model based on Dynamic Heterogeneous Redundancy (DHR) architecture and fused the outputs of multiple sub-models into a final output through a specific decision-making mechanism. The model demonstrated excellent performance during the competition, effectively resisting various adversarial attacks and backdoor attacks, fully demonstrating the significant effectiveness of the DHR idea in enhancing the robustness of the AI model in practical application scenarios.

Table 13.

Comparison of transferability among sub-models within the ensemble model based on the self-made Road Traffic Sign Dataset

As shown in Table 13, in an ensemble object detection system, attacking only the first sub-model and inputting the generated adversarial samples into the ensemble model can obtain the mAP of each sub-model. Through observation, it can be seen that when a single model faces white-box attacks, its detection ability will significantly decrease, and the robustness of a single model is seriously insufficient. However, the transfer effect of adversarial samples generated based on the attacked sub model on other sub models is not ideal. Although it may still reduce the mAP of other sub-model to a certain extent, the attack effect will be greatly reduced, and other sub-models can still retain good detection capabilities. Based on this finding, during the competition, we deploy this ensemble model and only disclose one sub-model to the contestants. This strategy, under multiple layers of defense, can significantly reduce the impact of adversarial samples on the ensemble model.

6. Conclusion

In this paper, we analyzed the current state of attack and defense in DL and found that the status between them has evolved into a Nash equilibrium, leading to difficulties in resolving security issues. To address this dilemma, we introduced the concept of ESS and reclassified existing security problems accordingly. We posit that this predicament arises from the challenge of resolving ESS issues. To address this challenge, we emphasize the importance of necessary diversity and propose employing DHR to tackle ESS problems. We conducted multiple diversity case studies across various deep learning applications such as image classification, object detection, natural language processing, and graph neural networks. Through simple ensemble strategies like majority voting and averaging, we validated the importance of introducing diversity, which resulted in significant defense effectiveness against adversarial attacks, backdoor attacks, poisoning attacks, and more. These experiments demonstrate the system’s capability to maintain robustness even when individual models lack robustness, effectively mitigating various security threats. By introducing the necessary diversity, it is possible to enable deep learning-based application systems to operate robustly even when individual models lack robustness.

This work aims to provide a new viable approach to addressing AI security issues. By designing deep learning (DL) application systems with ESS properties at the system architecture level, we aim to address the challenge of ensuring robustness in individual DL models from a novel perspective. Through preliminary validation, we believe this is a viable new path. By introducing diversity, it may be possible to make deep learning applications affected by security issues genuinely applicable in real life. Diversity is indeed at the core of the DHR architecture, but the architecture also encompasses elements such as dynamics, adjudication, and negative feedback adjustment. Verification of diversity is just one step in exploring its feasibility. In subsequent research, key focuses include how to construct diverse models, how to evaluate heterogeneity among models, and how to conduct adjudication. The article introduces a new approach to addressing security issues in deep learning and provides limited validation to demonstrate its feasibility. However, further exploration is needed for solutions to AI security, especially with the rapid development of large language models (LLM) bringing forth new security challenges such as injection attacks, jailbreaking attacks, and more. The article introduces a new approach to addressing security issues in deep learning and provides limited validation to demonstrate its feasibility. However, further exploration is needed for solutions to AI security, especially with the rapid development of LLM bringing forth new security challenges such as injection attacks, jailbreaking attacks, and more. LLM are developed based on deep learning (DL), and inherently possess various security issues characteristic of DL. Whether the ESS method can be used to address the security issues of LLM requires further exploration in the future.

Acknowledgments

We would like to extend our sincere gratitude to all those who have contributed to the completion of this research. We would also like to thank the administrative and technical staff at Purple Mountain Laboratory for their assistance and support throughout this study.

Funding

This work is supported by the National Key Research and Development Program of China (Project No. 2022YFB4500900) and the Jiangsu Provincial Department of Science and Technology (Project No. ZL042401).

Conflicts of interest

The authors declare no conflict of interest.

Data availability statement

No data are associated with this article.

Author contribution statement

Fan Zhang and Xi Chen made equal contributions and were co-first authors. They significantly contributed to the methodology design, experimental methods, data analysis, and initial draft writing. Wei Huang, the corresponding author, contributed to research design, methodology, data collection, and manuscript writing. Jiangxing Wu offered theoretical guidance and assisted in manuscript writing. Zijie Zhang, Chenyu Zhou, Jianpeng Li, and Ziwen Peng each contributed to providing experimental materials, data analysis, and writing the initial draft for various aspects of the study, including backdoor attacks and real-world object detection. Wei Guo provided methodology optimization suggestions and contributed to manuscript writing. Guangze Yang participated in background research and initial draft writing. Xinyuan Miao and Ruiyang Huang provided theoretical guidance and thoroughly reviewed the manuscript. Jiayu Du provided experimental hardware and data support. All authors have read and approved the final version of the manuscript.

References

Grigorescu S, Trasnea B and Cocias T et al. A survey of deep learning techniques for autonomous driving. J Field Rob 2020; 37: 362–386 [CrossRef] [Google Scholar]
Shvets AA, Rakhlin A and Kalinin AA et al. Automatic instrument segmentation in robot-assisted surgery using deep learning. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, 2018, 624–628. [Google Scholar]
Athalye A, Carlini N and Wagner D. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In: International Conference on Machine Learning PMLR 2018: 274–283. [Google Scholar]
Uesato J, O'donoghue B and Kohli P et al. Adversarial risk and the dangers of evaluating against weak attacks. In: International Conference on Machine Learning, PMLR 2018: 5025–5034. [Google Scholar]
Gu T, Liu K and Dolan-Gavitt B et al. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access 2019; 7: 47230–47244. [CrossRef] [Google Scholar]
Huang W, Zhao X and Huang X. Embedding and extraction of knowledge in tree ensemble classifiers. Mach Learn 2022; 111: 1925–1958. [CrossRef] [Google Scholar]
Lu Y, Kamath G and Yu Y. Indiscriminate data poisoning attacks on neural networks, Transactions on Machine Learning Research https://openreview.net/forum?id=x4hmIsWu7e [Google Scholar]
Shejwalkar V, Houmansadr A and Kairouz P et al. Back to the drawing board: A critical evaluation of poisoning attacks on production federated learning. In: 2022 IEEE Symposium on Security and Privacy (SP), IEEE, 2022, 1354–1371. [Google Scholar]
Jin G, Yi X and Huang W et al. Enhancing adversarial training with second-order statistics of weights. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 15273–15283. [Google Scholar]
Qiu H, Zeng Y and Zheng Q et al. An efficient preprocessing-based approach to mitigate advanced adversarial attacks, IEEE Trans Comput 2021; 73: 645–655. [Google Scholar]
Li Y, Lyu X and Koren N et al. Neural attention distillation: Erasing backdoor triggers from deep neural networks. In: International Conference on Learning Representations, 2020. [Google Scholar]
Chan PP, He ZM and Li H et al. Data sanitization against adversarial label contamination based on data complexity. Int J Mach Learn Cybernetics 2018; 9: 1039–1052. [CrossRef] [Google Scholar]
Spyridopoulos T, Karanikas G and Tryfonas T et al. A game theoretic defence framework against dos/ddos cyber attacks. Comput. Security 2013; 38: 39–50. [CrossRef] [Google Scholar]
Pal A and Vidal R. A game theoretic analysis of additive adversarial attacks and defenses. Adv Neural Inf Proc Syst 2020; 33: 1345–1355 [Google Scholar]
Wu J. Cyberspace endogenous safety and security. Engineering 2022; 8 7. [Google Scholar]
Jiangxing WU. Development paradigms of cyberspace endogenous safety and security, Science China 2022; 005: 065. [Google Scholar]
Szegedy C, Zaremba W and Sutskever I et al. Intriguing properties of neural networks, ArXiv preprint [arXiv:1312.6199]. [Google Scholar]
Xi B. Adversarial Classification. [Google Scholar]
Kurakin A, Goodfellow I, Bengio S, Adversarial machine learning at scale, ArXiv preprint [arXiv:1611.01236]. [Google Scholar]
Szegedy C, Zaremba W and Sutskever I et al. Intriguing properties of neural networks 2014, ArXiv preprint [arXiv:1312.6199]. [Google Scholar]
Goodfellow IJ, Shlens J and Szegedy C. Explaining and harnessing adversarial examples. In: ICML, 2015. [Google Scholar]
Carlini N and Wagner D. Towards Evaluating the Robustness of Neural Networks, 2017. [Google Scholar]
Madry A, Makelov A and Schmidt L et al. Towards deep learning models resistant to adversarial attacks, 2017 [Google Scholar]
Papernot N, McDaniel P and Jha S et al. The limitations of deep learning in adversarial settings. In: Proceedings of the 1st IEEE European Symposium on Security and Privacy, IEEE, 2016, 372–387. [Google Scholar]
Demontis A, Melis M and Pintor M et al. Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks. In: Proceedings of the 28th USENIX Conference on Security Symposium, SEC'19, USENIX Association, USA, 2019, 321–338. [Google Scholar]
Brendel W, Rauber J and Bethge M. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models, 2018, ArXiv preprint [arXiv:1712.04248]. [Google Scholar]
Croce F and Hein M. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks, 2020, ArXiv preprint [arXiv:2003.01690]. [Google Scholar]
Gu T, Dolan-Gavitt B and Garg S. Badnets: Identifying vulnerabilities in the machine learning model supply chain, 2017, ArXiv preprint [arXiv:1708.06733]. [Google Scholar]
Liu Y, Ma X and Bailey J et al. Reflection backdoor: A natural backdoor attack on deep neural networks. In: Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part X 16, Springer, 2020, 182–199. [Google Scholar]
Nguyen A and Tran A. Wanet-imperceptible warping-based backdoor attack, 2021, ArXiv preprint [arXiv:2102.10369]. [Google Scholar]
Li S, Xue M and Zhao BZH et al. Invisible backdoor attacks on deep neural networks via steganography and regularization. IEEE Trans Depend Sec Comput 2020; 18: 2088–2105. [Google Scholar]
Turner A, Tsipras D and Madry A. Label-consistent backdoor attacks, 2019, ArXiv preprint [arXiv:1912.02771]. [Google Scholar]
Wu J. Endogenous Safety and Security in Cyberspace: mimic Defense and Generalized Robust Control, 2020. [Google Scholar]
Wu J. Cyberspace endogenous safety and security. Engineering 2022; 15: 179–185. [CrossRef] [Google Scholar]
Hu H, Wu J and Wang Z et al. Mimic defense: a designed-in cybersecurity defense framework. IET Inf Secur 2018; 12: 226–237. [CrossRef] [Google Scholar]
Wu J. Cyberspace mimic defense: Generalized robust control and endogenous security, Cyberspace Mimic Defense, 2020, https://api.semanticscholar.org/CorpusID:208520469 [CrossRef] [Google Scholar]
Feng F, Zhou X and Li B et al. Modelling the mimic defence technology for multimedia cloud servers. Secur Commun Net 2020; 2020: 1–22. [CrossRef] [Google Scholar]
Wei D, Xiao L and Shi L et al. Mimic web application security technology based on dhr architecture. In: International Conference on Artificial Intelligence and Intelligent Information Processing (AIIIP 2022), Vol. 12456, SPIE, 2022, 118–124. [Google Scholar]
Kariyappa S and Qureshi MK, Improving adversarial robustness of ensembles with diversity training, 2019, ArXiv preprint [arXiv:1901.09981] [Google Scholar]
Pang T, Xu K and Du C et al. Improving adversarial robustness via promoting ensemble diversity. In: International Conference on Machine Learning, PMLR, 2019, 4970–4979. [Google Scholar]
Huang B, Ke Z and Wang Y et al. Adversarial defence by diversified simultaneous training of deep ensembles. In: Proceedings of the AAAI conference on artificial intelligence, Vol. 35, 2021, 7823–7831. [Google Scholar]
Zhao X, Huang W and Huang X et al. Baylime: Bayesian local interpretable model-agnostic explanations. In: Uncertainty in artificial intelligence, PMLR, 2021, 887–896 [Google Scholar]
Chen X, Huang W and Peng Z et al. Diversity supporting robustness: Enhancing adversarial robustness via differentiated ensemble predictions. Comput Secur 2024; 142: 103861. [CrossRef] [Google Scholar]
Canny J. A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell PAMI 1986; 8: 679–698 [CrossRef] [Google Scholar]
Ojala T, Pietikäinen M and Harwood D. A comparative study of texture measures with classification based on feature distributions. Pattern Recog 1996; 29; 51–59 [CrossRef] [Google Scholar]
Haralick RM, Shanmugam K and Dinstein I. Textural features for image classification. Stud Media Commun SMC 1973; 3 610–621. [Google Scholar]
Tejankar A, Sanjabi M and Wang Q et al. Defending against patch-based backdoor attacks on self-supervised learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, 12239–12249. [Google Scholar]
Selvaraju RR, Cogswell M and Das A et al. Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, 2017, 618–626. [Google Scholar]
Li Y, Lyu X and Koren N et al. Anti-backdoor learning: Training clean models on poisoned data. Adv Neural Inf Proc Syst 2021; 34: 14900–14912. [Google Scholar]
Chen T, Kornblith S and Norouzi M et al. A simple framework for contrastive learning of visual representations. In: International conference on machine learning, PMLR, 2020, 1597–1607. [Google Scholar]
Grill JB, Strub F and Altché F et al. Bootstrap your own latent-a new approach to self-supervised learning. Adv Neural Inf Proc Syst 2020; 33 21271–21284 [Google Scholar]
Chen X, Fan H and Girshick R et al. Improved baselines with momentum contrastive learning, 2020, ArXiv preprint [arXiv:2003.04297] [Google Scholar]
Jia J, Liu Y and Gong NZ. Badencoder: Backdoor attacks to pre-trained encoders in self-supervised learning. In: 2022 IEEE Symposium on Security and Privacy (SP), IEEE, 2022, 2043–2059. [Google Scholar]
Zheng R, Tang R and Li J et al. Data-free backdoor removal based on channel lipschitzness. In: European Conference on Computer Vision, Springer, 2022, 175–191. [Google Scholar]
Saha A, Tejankar A and Koohpayegani SA et al. Backdoor attacks on self-supervised learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 13337–13346. [Google Scholar]
Li C, Pang R and Xi Z et al. An embarrassingly simple backdoor attack on self-supervised learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, 4367–4378. [Google Scholar]
Wei J and Zou K. Eda: Easy data augmentation techniques for boosting performance on text classification tasks. In: Conference on Empirical Methods in Natural Language Processing, 2019, https://api.semanticscholar.org/CorpusID:59523656 [Google Scholar]
Socher R, Perelygin A and Wu J et al. Recursive deep models for semantic compositionality over a sentiment treebank. In: Conference on Empirical Methods in Natural Language Processing, 2013, https://api.semanticscholar.org/CorpusID:990233 [Google Scholar]
Zampieri M, Malmasi S and Nakov P et al. Predicting the type and target of offensive posts in social media. In: North American Chapter of the Association for Computational Linguistics, 2019, https://api.semanticscholar.org/CorpusID:67856299 [Google Scholar]
Zhang X, Zhao JJ and LeCun Y. Character-level convolutional networks for text classification. In: Neural Information Processing Systems, 2015, https://api.semanticscholar.org/CorpusID:368182 [Google Scholar]
Devlin J, Chang MW and Lee K et al. Bert: Pre-training of deep bidirectional transformers for language understanding. In: North American Chapter of the Association for Computational Linguistics, 2019. [Google Scholar]
Kurita K, Michel P and Neubig G. Weight poisoning attacks on pretrained models, 2020, ArXiv preprint [arXiv:2004.06660], https://api.semanticscholar.org/CorpusID:215754328 [Google Scholar]
Dai J, Chen C and Li Y. A backdoor attack against lstm-based text classification systems. IEEE Access 2019; 7: 138872–138878. https://api.semanticscholar.org/CorpusID:168170110 [CrossRef] [Google Scholar]
Qi F, Li M and Chen Y et al. Hidden killer: Invisible textual backdoor attacks with syntactic trigger. In: Annual Meeting of the Association for Computational Linguistics, 2021, https://api.semanticscholar.org/CorpusID:235196099 [Google Scholar]
Qi F, Chen Y and Zhang X et al. Mind the style of text! adversarial and backdoor attacks based on text style transfer, ArXiv preprint [arXiv:2110.07139], https://api.semanticscholar.org/CorpusID:238857078 [Google Scholar]
Yang W, Lin Y and Li P et al. Rethinking stealthiness of backdoor attack against nlp models. In: Annual Meeting of the Association for Computational Linguistics, 2021, https://api.semanticscholar.org/CorpusID:236459933 [Google Scholar]
Zhang Z, Wang J and Zhao L. Curriculum Learning for Graph Neural Networks: Which Edges Should We Learn First. In: Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vol. 36, Curran Associates, Inc., 2023, 51113–51132. [Google Scholar]
Zhu J, Jin J and Loveland D et al. How does Heterophily Impact the Robustness of Graph Neural Networks? Theoretical Connections and Practical Implications. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2022, 2637–2647. [Google Scholar]
Xie B, Chang H and Zhang Z et al. Adversarially Robust Neural Architecture Search for Graph Neural Networks. In: Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, 8143–8152. [Google Scholar]
Chen L, Li J and Peng Q et al. Understanding Structural Vulnerability in Graph Convolutional Networks. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI), 2021, 2249–2255. [Google Scholar]
McCallum A, Nigam K and Rennie J et al. Automating the Construction of Internet Portals with Machine Learning. Inf Retr 2000; 3: 127–163. [CrossRef] [Google Scholar]
Sen P, Namata G and Bilgic M et al. Collective Classification in Network Data. AI Magazine 2008; 29: 93–106. [CrossRef] [Google Scholar]
Zügner D and Günnemann S. Adversarial Attacks on Graph Neural Networks via Meta Learning. In: Proceedings of the International Conference on Learning Representations (ICLR), 2019. [Google Scholar]
Zügner D, Akbarnejad A, Günnemann S, Adversarial Attacks on Neural Networks for Graph Data. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2018, 2847–2856. [Google Scholar]
Kipf TN and Welling M, Semi-Supervised Classification with Graph Convolutional Networks. In: Proceedings of the International Conference on Learning Representations (ICLR), 2017. [Google Scholar]
Wu H, Wang C and Tyshetskiy Y et al. Adversarial examples for graph data: Deep insights into attack and defense. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), 2019, 4816–4823. [Google Scholar]
Entezari N, Al-Sayouri SA and Darvishzadeh A et al. All you need is Low (rank): Defending against adversarial attacks on graphs. In: Proceedings of the 13th International Conference on Web Search and Data Mining (WSDM), 2020, 169–177. [Google Scholar]
Jin W, Ma Y and Liu X et al. Graph Structure Learning for Robust Graph Neural Networks. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2020, 66–74. [Google Scholar]
Qian R, Lai X and Li X. 3d object detection for autonomous driving: A survey. Pattern Recog 2022; 130: 108796 [CrossRef] [Google Scholar]
Chen L. Multi-stage feature fusion object detection method for remote sensing image. Acta Electron Sin 2023; 51: 3520–3528 [Google Scholar]
Peng Z, Chen X and Huang W et al. Shielding object detection: Enhancing adversarial defense through ensemble methods. In: 2024 5th Information Communication Technologies Conference (ICTC), 2024, 88–97. [Google Scholar]

Fan Zhang received his B.S. degree in computer and application, his M.S. degree in communication and information systems, and his Ph.D. degree in information and communication engineering, all from the National Digital Switching System Engineering and Technological Research and Development Center (NDSC), China, in 2003, 2006, and 2013, respectively. He is currently an Associate Researcher at NDSC. His research interests include proactive defense, high-performance computing, and big data processing.

Xi Chen is pursuing a Ph.D. degree in Cyberspace Security at PLA Information Engineering University, China. He obtained a bachelor’s degree from Central South University of Information Security, China, in 2020. His research interests include network security, adversarial attack, and ensemble models.

Wei Huang is currently a postdoctoral researcher at Purple Mountain Laboratories, China. He earned his PhD in Computer Science from the University of Liverpool, UK. He received his MSc degree from Imperial College London and his B.S. degree from Xiamen University. His primary research interests lie in trustworthy AI, with a particular emphasis on the robustness and backdoor issues in deep learning models.

Jiangxing Wu is the academician of China Academy of Engineering (CAE). He is the President, Professor and Doctoral supervisor of China National Digital Switching System Engineering and Technological R&D Center (NDSC), China. His research interests include the communication and information system, computer and net-work technologies, and cyberspace security technology.

Zijie Zhang is a graduate student at Southeast University in China, majoring in Cyberspace Security, with a research focus on AI security.

Chenyu Zhou is currently pursuing the Ph.D. degree in the School of Cyber Science and Engineering at Southeast University, Nanjing, China. His research interests include artificial intelligence security, adversarial attack, and graph neural network.

Jianpeng Li is currently pursuing a master’s degree in Cyberspace Security at Zhengzhou University, having completed his undergraduate studies at Zhengzhou University. His current research focuses on the security of backdoor attacks in self-supervised learning.

Ziwen Peng is pursuing a master’s degree at the School of Cyberspace Security at Zhengzhou University, and previously obtained a bachelor’s degree from Shanghai Maritime University. His research direction is artificial intelligence security.

Wei Guo received the B.S. and M.S. degrees in computer science and technology from the National Digital Switching System Engineering and Technological Research and Development Center (NDSC), Zhengzhou, Henan, China, in 2012 and 2015 respectively. He received the Ph.D. degree in information and communication engineering in 2019 from NDSC. He is currently an Research Associate Professor. His research interests include Cyber security, distributed storage systems, and big data processing.

Guangze Yang is a graduate student at Southeast University in China, majoring in Cyberspace Security, with a research focus on AI security.

Xinyuan Miao is currently a postdoctoral researcher at Purple Mountain Laboratories, China. He earned his Ph.D. in Information and Communication Engineering from the Harbin Institute of Technology, China. He received his B.S. and M.S. degrees in Information and Communication Engineering from the Harbin Institute of Technology, in 2015 and 2017, respectively. His primary research interests lie in trustworthy AI, with a particular emphasis on the robustness and backdoor issues in deep learning models.

Ruiyang Huang is a Master’s Supervisor and Professor at the National Digital Switching System Engineering and Technological Research Center, China. He has long been engaged in theoretical research and engineering practice in the fields of big data analysis and intelligent transformation technology in cyberspace.

Jiayu Du is currently engaged in research on endogenous security in Purple Mountain Laboratories, China. He earned his master degree in Communication Engineering from Dalian University of Technology. His research fields include digital image forensics, big data storage, cloud computing, artificial intelligence, etc.

All Tables

Table 1.

Attack transfer statistics for each model

In the text

Table 2.

Recognition accuracy (%) under grey-box attacks with control parameter (Para.) as ϵ in L_∞ for BIM, PGD and SPSA. The ensemble size is K = 3. The best performance is marked in bold.

In the text

Table 3.

Recognition accuracy (%) under white-box attacks with control parameter (Para.) as ϵ in L_∞ for FGSM, BIM, PGD A-PGD, and Auto Attack Note: ensemble size is K = 3 and optimal performance is marked in bold [43]

In the text

Table 4.

Our defense method ATA on different attack methods

In the text

Table 5.

Sentences generated while data diversification. SR: synonym replacement. RI: random insertion. RS: random swap. RD: random deletion

In the text

Table 6.

The results of data diversification, where “train” and “test” respectively represent the number of samples in the training set and the testing set.

In the text

Table 7.

The performance of BadNets on three datasets and the defense effect of fine-tuning with different data augmentation methods

In the text

Table 8.

The performance of the six attack methods on three datasets and the defense effect of ensemble distillation. BN denotes BadNets, RP denotes RIPPLe, Ins denotes InsertSent, HK denotes HiddenKiller, Style denotes StyleBkd

In the text

Table 9.

Classification performance (Accuracy±Std) under different perturbation rates of Meta

In the text

Table 10.

Classification performance (Accuracy±Std) under different perturbation numbers of Nettack

In the text

Table 11.

The MAP(%) of attacked submodels and the ensemble models trained on the different dataset and YOLOv3 model, under attacks from different methods on the first submodel

In the text

Table 12.

The MAP(%) of all submodels and the ensemble models when facing adversarial examples generated by an unrelated model trained on different dataset and YOLOv3 model

In the text

Table 13.

Comparison of transferability among sub-models within the ensemble model based on the self-made Road Traffic Sign Dataset

In the text

All Figures

	Figure 1. Existing classification of safety and security problems in DL
In the text

	Figure 2. New classification of safety and security problems in DL based on the ESS theroy
In the text

	Figure 3. The essence of an ensemble model is to combine the security spaces of multiple models [43]
In the text

	Figure 4. DHR-based ESS defense framework of AI
In the text

	Figure 5. System architecture design
In the text

	Figure 6. Comparison of attack results between different models of transfer attacks: (a) LBP is the attacked mode; (b) Canny is the attacked mode; (c) Mean is the attacked mode; (d) Max is the attacked mode
In the text

	Figure 7. Distribution of weight values for identical layer across Ensemble Model Sub-Models: (a) BASELINE; (b) ADP; (c) WC+SWC
In the text

	Figure 8. Flowchart of the method: (a) Use a small part of clean samples to erase backdoor on backdoor model to obtain multiple teacher models, then; (b) Utlize ensemble distillation through multiple teacher models to obtain a cleaner student model
In the text

	Figure 9. Illustration of proposed ensemble adversarial defense against physical adversarial attack on person detectors
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Grigorescu S, Trasnea B and Cocias T et al. A survey of deep learning techniques for autonomous driving. J Field Rob 2020; 37: 362–386 [CrossRef] [Google Scholar]

[2] Shvets AA, Rakhlin A and Kalinin AA et al. Automatic instrument segmentation in robot-assisted surgery using deep learning. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, 2018, 624–628. [Google Scholar]

[3] Athalye A, Carlini N and Wagner D. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In: International Conference on Machine Learning PMLR 2018: 274–283. [Google Scholar]

[4] Uesato J, O'donoghue B and Kohli P et al. Adversarial risk and the dangers of evaluating against weak attacks. In: International Conference on Machine Learning, PMLR 2018: 5025–5034. [Google Scholar]

[5] Gu T, Liu K and Dolan-Gavitt B et al. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access 2019; 7: 47230–47244. [CrossRef] [Google Scholar]

[6] Huang W, Zhao X and Huang X. Embedding and extraction of knowledge in tree ensemble classifiers. Mach Learn 2022; 111: 1925–1958. [CrossRef] [Google Scholar]

[7] Lu Y, Kamath G and Yu Y. Indiscriminate data poisoning attacks on neural networks, Transactions on Machine Learning Research https://openreview.net/forum?id=x4hmIsWu7e [Google Scholar]

[8] Shejwalkar V, Houmansadr A and Kairouz P et al. Back to the drawing board: A critical evaluation of poisoning attacks on production federated learning. In: 2022 IEEE Symposium on Security and Privacy (SP), IEEE, 2022, 1354–1371. [Google Scholar]

[9] Jin G, Yi X and Huang W et al. Enhancing adversarial training with second-order statistics of weights. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 15273–15283. [Google Scholar]

[10] Qiu H, Zeng Y and Zheng Q et al. An efficient preprocessing-based approach to mitigate advanced adversarial attacks, IEEE Trans Comput 2021; 73: 645–655. [Google Scholar]

[11] Li Y, Lyu X and Koren N et al. Neural attention distillation: Erasing backdoor triggers from deep neural networks. In: International Conference on Learning Representations, 2020. [Google Scholar]

[12] Chan PP, He ZM and Li H et al. Data sanitization against adversarial label contamination based on data complexity. Int J Mach Learn Cybernetics 2018; 9: 1039–1052. [CrossRef] [Google Scholar]

[13] Spyridopoulos T, Karanikas G and Tryfonas T et al. A game theoretic defence framework against dos/ddos cyber attacks. Comput. Security 2013; 38: 39–50. [CrossRef] [Google Scholar]

[14] Pal A and Vidal R. A game theoretic analysis of additive adversarial attacks and defenses. Adv Neural Inf Proc Syst 2020; 33: 1345–1355 [Google Scholar]

[15] Wu J. Cyberspace endogenous safety and security. Engineering 2022; 8 7. [Google Scholar]

[16] Jiangxing WU. Development paradigms of cyberspace endogenous safety and security, Science China 2022; 005: 065. [Google Scholar]

[17] Szegedy C, Zaremba W and Sutskever I et al. Intriguing properties of neural networks, ArXiv preprint [arXiv:1312.6199]. [Google Scholar]

[18] Xi B. Adversarial Classification. [Google Scholar]

[19] Kurakin A, Goodfellow I, Bengio S, Adversarial machine learning at scale, ArXiv preprint [arXiv:1611.01236]. [Google Scholar]

[20] Szegedy C, Zaremba W and Sutskever I et al. Intriguing properties of neural networks 2014, ArXiv preprint [arXiv:1312.6199]. [Google Scholar]

[21] Goodfellow IJ, Shlens J and Szegedy C. Explaining and harnessing adversarial examples. In: ICML, 2015. [Google Scholar]

[22] Carlini N and Wagner D. Towards Evaluating the Robustness of Neural Networks, 2017. [Google Scholar]

[23] Madry A, Makelov A and Schmidt L et al. Towards deep learning models resistant to adversarial attacks, 2017 [Google Scholar]

[24] Papernot N, McDaniel P and Jha S et al. The limitations of deep learning in adversarial settings. In: Proceedings of the 1st IEEE European Symposium on Security and Privacy, IEEE, 2016, 372–387. [Google Scholar]

[25] Demontis A, Melis M and Pintor M et al. Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks. In: Proceedings of the 28th USENIX Conference on Security Symposium, SEC'19, USENIX Association, USA, 2019, 321–338. [Google Scholar]

[26] Brendel W, Rauber J and Bethge M. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models, 2018, ArXiv preprint [arXiv:1712.04248]. [Google Scholar]

[27] Croce F and Hein M. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks, 2020, ArXiv preprint [arXiv:2003.01690]. [Google Scholar]

[28] Gu T, Dolan-Gavitt B and Garg S. Badnets: Identifying vulnerabilities in the machine learning model supply chain, 2017, ArXiv preprint [arXiv:1708.06733]. [Google Scholar]

[29] Liu Y, Ma X and Bailey J et al. Reflection backdoor: A natural backdoor attack on deep neural networks. In: Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part X 16, Springer, 2020, 182–199. [Google Scholar]

[30] Nguyen A and Tran A. Wanet-imperceptible warping-based backdoor attack, 2021, ArXiv preprint [arXiv:2102.10369]. [Google Scholar]

[31] Li S, Xue M and Zhao BZH et al. Invisible backdoor attacks on deep neural networks via steganography and regularization. IEEE Trans Depend Sec Comput 2020; 18: 2088–2105. [Google Scholar]

[32] Turner A, Tsipras D and Madry A. Label-consistent backdoor attacks, 2019, ArXiv preprint [arXiv:1912.02771]. [Google Scholar]

[33] Wu J. Endogenous Safety and Security in Cyberspace: mimic Defense and Generalized Robust Control, 2020. [Google Scholar]

[34] Wu J. Cyberspace endogenous safety and security. Engineering 2022; 15: 179–185. [CrossRef] [Google Scholar]

[35] Hu H, Wu J and Wang Z et al. Mimic defense: a designed-in cybersecurity defense framework. IET Inf Secur 2018; 12: 226–237. [CrossRef] [Google Scholar]

[36] Wu J. Cyberspace mimic defense: Generalized robust control and endogenous security, Cyberspace Mimic Defense, 2020, https://api.semanticscholar.org/CorpusID:208520469 [CrossRef] [Google Scholar]

[37] Feng F, Zhou X and Li B et al. Modelling the mimic defence technology for multimedia cloud servers. Secur Commun Net 2020; 2020: 1–22. [CrossRef] [Google Scholar]

[38] Wei D, Xiao L and Shi L et al. Mimic web application security technology based on dhr architecture. In: International Conference on Artificial Intelligence and Intelligent Information Processing (AIIIP 2022), Vol. 12456, SPIE, 2022, 118–124. [Google Scholar]

[39] Kariyappa S and Qureshi MK, Improving adversarial robustness of ensembles with diversity training, 2019, ArXiv preprint [arXiv:1901.09981] [Google Scholar]

[40] Pang T, Xu K and Du C et al. Improving adversarial robustness via promoting ensemble diversity. In: International Conference on Machine Learning, PMLR, 2019, 4970–4979. [Google Scholar]

[41] Huang B, Ke Z and Wang Y et al. Adversarial defence by diversified simultaneous training of deep ensembles. In: Proceedings of the AAAI conference on artificial intelligence, Vol. 35, 2021, 7823–7831. [Google Scholar]

[42] Zhao X, Huang W and Huang X et al. Baylime: Bayesian local interpretable model-agnostic explanations. In: Uncertainty in artificial intelligence, PMLR, 2021, 887–896 [Google Scholar]

[43] Chen X, Huang W and Peng Z et al. Diversity supporting robustness: Enhancing adversarial robustness via differentiated ensemble predictions. Comput Secur 2024; 142: 103861. [CrossRef] [Google Scholar]

[44] Canny J. A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell PAMI 1986; 8: 679–698 [CrossRef] [Google Scholar]

[45] Ojala T, Pietikäinen M and Harwood D. A comparative study of texture measures with classification based on feature distributions. Pattern Recog 1996; 29; 51–59 [CrossRef] [Google Scholar]

[46] Haralick RM, Shanmugam K and Dinstein I. Textural features for image classification. Stud Media Commun SMC 1973; 3 610–621. [Google Scholar]

[47] Tejankar A, Sanjabi M and Wang Q et al. Defending against patch-based backdoor attacks on self-supervised learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, 12239–12249. [Google Scholar]

[48] Selvaraju RR, Cogswell M and Das A et al. Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, 2017, 618–626. [Google Scholar]

[49] Li Y, Lyu X and Koren N et al. Anti-backdoor learning: Training clean models on poisoned data. Adv Neural Inf Proc Syst 2021; 34: 14900–14912. [Google Scholar]

[50] Chen T, Kornblith S and Norouzi M et al. A simple framework for contrastive learning of visual representations. In: International conference on machine learning, PMLR, 2020, 1597–1607. [Google Scholar]

[51] Grill JB, Strub F and Altché F et al. Bootstrap your own latent-a new approach to self-supervised learning. Adv Neural Inf Proc Syst 2020; 33 21271–21284 [Google Scholar]

[52] Chen X, Fan H and Girshick R et al. Improved baselines with momentum contrastive learning, 2020, ArXiv preprint [arXiv:2003.04297] [Google Scholar]

[53] Jia J, Liu Y and Gong NZ. Badencoder: Backdoor attacks to pre-trained encoders in self-supervised learning. In: 2022 IEEE Symposium on Security and Privacy (SP), IEEE, 2022, 2043–2059. [Google Scholar]

[54] Zheng R, Tang R and Li J et al. Data-free backdoor removal based on channel lipschitzness. In: European Conference on Computer Vision, Springer, 2022, 175–191. [Google Scholar]

[55] Saha A, Tejankar A and Koohpayegani SA et al. Backdoor attacks on self-supervised learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 13337–13346. [Google Scholar]

[56] Li C, Pang R and Xi Z et al. An embarrassingly simple backdoor attack on self-supervised learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, 4367–4378. [Google Scholar]

[57] Wei J and Zou K. Eda: Easy data augmentation techniques for boosting performance on text classification tasks. In: Conference on Empirical Methods in Natural Language Processing, 2019, https://api.semanticscholar.org/CorpusID:59523656 [Google Scholar]

[58] Socher R, Perelygin A and Wu J et al. Recursive deep models for semantic compositionality over a sentiment treebank. In: Conference on Empirical Methods in Natural Language Processing, 2013, https://api.semanticscholar.org/CorpusID:990233 [Google Scholar]

[59] Zampieri M, Malmasi S and Nakov P et al. Predicting the type and target of offensive posts in social media. In: North American Chapter of the Association for Computational Linguistics, 2019, https://api.semanticscholar.org/CorpusID:67856299 [Google Scholar]

[60] Zhang X, Zhao JJ and LeCun Y. Character-level convolutional networks for text classification. In: Neural Information Processing Systems, 2015, https://api.semanticscholar.org/CorpusID:368182 [Google Scholar]

[61] Devlin J, Chang MW and Lee K et al. Bert: Pre-training of deep bidirectional transformers for language understanding. In: North American Chapter of the Association for Computational Linguistics, 2019. [Google Scholar]

[62] Kurita K, Michel P and Neubig G. Weight poisoning attacks on pretrained models, 2020, ArXiv preprint [arXiv:2004.06660], https://api.semanticscholar.org/CorpusID:215754328 [Google Scholar]

[63] Dai J, Chen C and Li Y. A backdoor attack against lstm-based text classification systems. IEEE Access 2019; 7: 138872–138878. https://api.semanticscholar.org/CorpusID:168170110 [CrossRef] [Google Scholar]

[64] Qi F, Li M and Chen Y et al. Hidden killer: Invisible textual backdoor attacks with syntactic trigger. In: Annual Meeting of the Association for Computational Linguistics, 2021, https://api.semanticscholar.org/CorpusID:235196099 [Google Scholar]

[65] Qi F, Chen Y and Zhang X et al. Mind the style of text! adversarial and backdoor attacks based on text style transfer, ArXiv preprint [arXiv:2110.07139], https://api.semanticscholar.org/CorpusID:238857078 [Google Scholar]

[66] Yang W, Lin Y and Li P et al. Rethinking stealthiness of backdoor attack against nlp models. In: Annual Meeting of the Association for Computational Linguistics, 2021, https://api.semanticscholar.org/CorpusID:236459933 [Google Scholar]

[67] Zhang Z, Wang J and Zhao L. Curriculum Learning for Graph Neural Networks: Which Edges Should We Learn First. In: Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vol. 36, Curran Associates, Inc., 2023, 51113–51132. [Google Scholar]

[68] Zhu J, Jin J and Loveland D et al. How does Heterophily Impact the Robustness of Graph Neural Networks? Theoretical Connections and Practical Implications. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2022, 2637–2647. [Google Scholar]

[69] Xie B, Chang H and Zhang Z et al. Adversarially Robust Neural Architecture Search for Graph Neural Networks. In: Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, 8143–8152. [Google Scholar]

[70] Chen L, Li J and Peng Q et al. Understanding Structural Vulnerability in Graph Convolutional Networks. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI), 2021, 2249–2255. [Google Scholar]

[71] McCallum A, Nigam K and Rennie J et al. Automating the Construction of Internet Portals with Machine Learning. Inf Retr 2000; 3: 127–163. [CrossRef] [Google Scholar]

[72] Sen P, Namata G and Bilgic M et al. Collective Classification in Network Data. AI Magazine 2008; 29: 93–106. [CrossRef] [Google Scholar]

[73] Zügner D and Günnemann S. Adversarial Attacks on Graph Neural Networks via Meta Learning. In: Proceedings of the International Conference on Learning Representations (ICLR), 2019. [Google Scholar]

[74] Zügner D, Akbarnejad A, Günnemann S, Adversarial Attacks on Neural Networks for Graph Data. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2018, 2847–2856. [Google Scholar]

[75] Kipf TN and Welling M, Semi-Supervised Classification with Graph Convolutional Networks. In: Proceedings of the International Conference on Learning Representations (ICLR), 2017. [Google Scholar]

[76] Wu H, Wang C and Tyshetskiy Y et al. Adversarial examples for graph data: Deep insights into attack and defense. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), 2019, 4816–4823. [Google Scholar]

[77] Entezari N, Al-Sayouri SA and Darvishzadeh A et al. All you need is Low (rank): Defending against adversarial attacks on graphs. In: Proceedings of the 13th International Conference on Web Search and Data Mining (WSDM), 2020, 169–177. [Google Scholar]

[78] Jin W, Ma Y and Liu X et al. Graph Structure Learning for Robust Graph Neural Networks. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2020, 66–74. [Google Scholar]

[79] Qian R, Lai X and Li X. 3d object detection for autonomous driving: A survey. Pattern Recog 2022; 130: 108796 [CrossRef] [Google Scholar]

[80] Chen L. Multi-stage feature fusion object detection method for remote sensing image. Acta Electron Sin 2023; 51: 3520–3528 [Google Scholar]

[81] Peng Z, Chen X and Huang W et al. Shielding object detection: Enhancing adversarial defense through ensemble methods. In: 2024 5th Information Communication Technologies Conference (ICTC), 2024, 88–97. [Google Scholar]