MPHM: Model poisoning attacks on federal learning using historical information momentum

.


Introduction
With the rapid development of big data and artificial intelligence, the industry is increasingly concerned about data privacy. As a result, data, which is the "nutrition" of learning algorithms, is difficult to be fully shared [1,2]. For example, it is difficult to fully share data between different banks or between e-commerce platforms and banks due to security concerns. In industrial application scenarios, few enterprises are willing to share their data resources due to attention to data privacy and security, which has become a worldwide trend. Countries are also strengthening the protection of data security and privacy, as evidenced by the EU's implementation of the General Data Protection Regulation (GDPR) bill in 2018. As a result, the issue of "data islands" [3] has become a serious problem. Even for individual participants, there are concerns that the privacy risks associated with outsourcing local data sets to service providers may outweigh the benefits of convenient online services [4].
The emergence of federated learning (FL) [5] has attracted significant attention from both academia and industry. FL allows participants to conduct joint training without sharing their local data. In a federated learning framework, multiple participants train their data locally, and the central server iteratively updates the global model by collecting the parameters of the local model. Because private data does not leave the local device, FL is considered an innovative approach to protecting user data privacy [6]. FL has been applied in various technology areas involving security-sensitive information, such as edge computing [7,8], medical diagnosis [9,10], and autonomous driving technologies [11,12].
Despite the advantages of federated learning mentioned above, it still faces security threats, such as poisoning attacks [13][14][15]. There are several reasons for this. Firstly, in the federated learning framework, the cloud server does not have access to the participant's local data or training process, which means that malicious participants can upload incorrect model updates to corrupt the global model [16]. For example, an internal attacker can train a poisoned model with modified training data, effectively reducing the accuracy of the global model. Secondly, since the data of each participant may not be identically and independently distributed, the differences between the local updates generated by participants can be significant enough to make it difficult to detect malicious updates through anomaly detection [17].
Poisoning attacks on FL. The potential presence of dishonest participants in FL training makes FL vulnerable to poisoning attacks [18]. Attackers can compromise the global model of FL by uploading malicious updates [19]. The targets of poisoning attacks can be divided into two types, untargeted attacks [13,[19][20][21][22][23] that aim to reduce the accuracy of the global model on any test input, and targeted attacks [15,[24][25][26][27] that aim to reduce the utility of the global FL model on the attacker's selected inputs.
Our work. Currently, the trend in attacking FL is to use malicious poisoning updates that replace normal local updates in a poisoning attack. Typically, attackers compute a benign reference aggregation using some benign data samples they know, then they compute a malicious perturbation vector, and finally, they compute their malicious model update by adding perturbations to the benign reference aggregation to avoid detection by the robust aggregation rules. Current research has focused on the scale of the perturbation vector when producing malicious updates, and the choice of the perturbation vector is often straightforward, such as using the unit vector. This paper proposes a novel model poisoning attack on FL, called the momentum of historical information-based poisoning attack (MPHM). In this attack, the attacker gathers historical information from FL training, dynamically crafts malicious perturbations in each round of FL training, and uses them to build more covert malicious updates. By leveraging this information, the attacker can make their malicious updates harder to detect and mitigate, effectively bypassing FL defense mechanisms. The experimental results demonstrate that our attack can significantly reduce the accuracy of the FL global model compared to other advanced poisoning attacks. The contributions of this paper are summarized as follows: • We studied the effect of momentum accumulation of historical information on the production of malicious updates. • We propose a new poisoning attack MPHM on FL which is dedicated to reducing the accuracy of the FL global model. • Experiments show that our attack can effectively reduce the accuracy of the FL global model using robust aggregation rules on the CIFAR10 and FEMNIST datasets.
The rest of the paper is organized as follows. In Section 2, we present the background and related work on federated learning and poisoning attacks. In Section 3, we introduce the threat model, and we introduce our attack framework in Section 4. In Section 5, we give the experimental setup, and in Section 6, the results and discussion are given. Finally, we conclude the paper and present future work in Section 7. In the FL [19] setting, we assume that there are n clients, which jointly train a global model. During FL training, each client gets the global model sent from the server, computes the stochastic gradient based on the local dataset, and sends it to the server. In detail, in the t-th round of FL, the server sends the latest global model θ t to the client, then the k-th client computes the stochastic gradient ∇ t k = ∂L(θ t ,b) ∂θ t using local data, where L(θ t , b) refers to the loss function and b refers to the sample. Then, the client sends ∇ t k to the cloud server. The server aggregates these gradients and obtains the aggregated gradient ∇ t as follows: where A is the server's aggregation rule, and then the server computes the global model θ t+1 by the optimizer e.g., SGD, and broadcasts it to the selected clients for the next round of FL training. These steps are repeated until the global model converges.

Several popular aggregation rules in FL
Google proposed a federated average aggregation algorithm [5], however, researchers [28] have shown that non-robust aggregation algorithms can lead to the manipulation of the global model at will even if there is only one malicious client. Therefore, multiple Byzantine-robust aggregation algorithms [18,21,28,29] have been proposed to combat poisoning attacks. Next, we will introduce four common Byzantine-robust aggregation rules.
Krum. Krum's [28] algorithm is proposed based on the intuition that malicious gradients are far away from benign gradients. In an FL system with n clients, suppose there are m malicious clients. Krum calculates the distance sum of each client to the n − m − 2 clients that are closest to itself in the squared Euclidean norm space and then chooses the gradient of the one with the smallest sum as the gradient of the global model.
Trimmed-mean. Trimmed-mean [18,29] is a dimension-level aggregation method, which aggregates each dimension of the input gradients separately. For a given dimension j, Trimmed-mean sorts the j-th dimensional gradients of all clients, i.e., sort ∇ 1j , ∇ 2j , . . . , ∇ nj , where ∇ ij is the parameter of the j-th dimension of the i-th client. Then the largest and smallest β values are removed and the remaining n − 2β values are aggregated equally as the value of the j-th dimension, where β is the specified value, e.g., β = m. This procedure is carried out for each dimension.
Bulyan. The Euclidean distance between different clients may be largely influenced by a singledimensional parameter, which causes Krum not to aggregate the model well [21]. Thus Mhamdi et al. [21] proposed Bulyan, which can be seen as a variant combination of Krum and Trimmed-mean. Specifically, Bulyan first iteratively uses Krum to select κ(κ ≤ n − 2m) clients' parameters and then uses the variant Tr to aggregate the κ clients' parameters.
Median. Median [18,29] is also a dimensional-level aggregation algorithm, which aggregates each dimension of the input gradients separately. As the name of the algorithm suggests, for a given dimension j, Median sorts all client j-th dimension parameters and selects their median value as the j-th dimension parameter, with each dimension making such a selection.

Poisoning attack on federated learning
Due to the influence of potentially dishonest customers, studies [30][31][32] have shown that FL is vulnerable to poisoning attacks. Poisoning attacks can be divided into two categories according to the target of the attacker: untargeted poisoning attacks [13,20,22,23] and targeted poisoning attacks [15,25,26,33]. Untargeted poisoning attacks refer to the attacker's efforts to reduce the testing accuracy of the global model. Target poisoning attacks refer to the attacker making the global model output low accuracy for specific inputs and maintaining high accuracy for other inputs.
According to the attacker's capability, poisoning attacks can be divided into two categories: data poisoning attacks [31,[34][35][36] and model poisoning attacks [13,22,23,37]. Data poisoning attacks mean that the attacker cannot directly manipulate the client parameters uploaded to the server, but can only indirectly modify the uploaded client parameters by crafting malicious local datasets. Whereas in model poisoning attacks, the attacker can directly manipulate the client parameters uploaded to the server to attack FL training. In this paper, we focus on untargeted poisoning attacks on FL.
Currently, model poisoning attacks in FL are commonly performed by attackers who compute benign parameters of the local client, add a perturbation vector, and upload the malicious parameters to the server to poison the global model. Baruch et al. [20] have proposed a method of compromising the global model by adding tiny attacks to the local updates. Fang et al. [13] propose an optimization objective of adding a perturbation vector to the local updates to craft malicious updates. Shejwalkar et al. [19] optimize the scale of the perturbation vector in their approach. Later, they [22] argue that having an excessive number of compromised clients is not reasonable in this setting. Recently, Cao et al. [23] have proposed a new approach for model poisoning in FL, which involves injecting fake clients to poison the model and effectively mitigates the problem of having an excessive number of compromised clients.

Threat model 3.1 Attacker's goal
In this paper, the attacker's goal is to reduce the test accuracy of the global model by crafting malicious gradients, for all inputs without exception. That is the untargeted model poisoning attack.

Attacker's capability
In an FL training framework with n clients, we assume that the attacker controls m clients. Here we call the controlled clients as malicious clients and the uncontrolled as clients benign clients. The number of malicious clients is less than benign clients. The attacker can modify the gradients of malicious clients at will, but cannot control benign clients. Also, the attacker can control the communication between each malicious client.

Attacker's knowledge
The attacker's background knowledge can be described in two dimensions: the aggregation rule and the gradients of benign clients.
Aggregation rule. The background knowledge of the attacker can be divided into two categories based on whether they know the server aggregation rule or not. In FL training, the server can choose whether to make its aggregation rule public or not. Exposing the aggregation rule can increase the transparency of FL, but may increase the corresponding risk, such as an attacker can set up focused poisoning attacks based on the aggregation rule. In Fang attacks [13], knowledge of the server's aggregation rule is assumed.
Benign gradients. The background knowledge of the attacker can also be divided into two cases based on whether the gradients of the benign clients are known or not. Knowing that the gradients of all clients are strong background knowledge, the attacker can make the crafted malicious gradients more stealthy.
In the LIE attacks [20], the gradients of understanding the benign clients are assumed not to be known.
It can be seen that the attacker who knows the aggregation rule and the benign gradients is the strongest adversary against FL. But in practice such conditions are harsh. Therefore, our attack does not require these conditions, we focus on the weakest adversary condition, i.e., the attacker who knows neither the aggregation rule nor the benign gradient.

Framework
The overall framework of our method is shown in Figure 1. In round t of FL training, the server sends the global model θ t for this round to each client; in step 2, the benign clients compute the stochastic gradients Figure 1. Overall framework of our MPHM. In step 1, the server sends the global model θ t for this round to each client; in step 2, the benign clients compute the stochastic gradients ∇ t based on the model and upload them to the server. And the malicious clients, after computing the gradients, compute the malicious gradients ∇ t for by our MPHM, and then upload them to the server; in step 3, the server aggregates these gradients and computes new global model θ t+1 for next round.
∇ t based on the model and upload them to the server. And the malicious clients, after computing the gradients, communicate together and compute the malicious gradients ∇ t for this round by our MPHM, and then upload them to the server; finally, the server aggregates these gradients and computes a new global model θ t+1 . The new global model θ t+1 will replace the original global model θ t for the next round of FL training.

MPHM
In this section, we introduce our optimization objective and then our attack method.
The objective of the attacker is to manipulate the malicious clients to deliver malicious gradients that can evade detection in each round of FL training. In epoch t, the attacker produces a malicious gradient denoted as ∇ t , while the average of benign gradients is represented by ∇ b . The malicious gradient is computed as where ∇ t p is the perturbation vector in epoch t, and λ is the perturbation coefficient. In order to make the malicious gradient effective, similar to [13], we propose the optimization objective: max where ∇ t {i∈[n]} refer to the benign gradients known by the attacker in round t, ∇ t {i∈[m]} refer to the malicious gradients made by the attacker in round t, · refers to the l 2 norm, A avg refers to the mean aggregation and A refers to the server's aggregation rule.
To achieve this objective, the proposed approach is a poisoning attack based on historical information momentum (MPHM), which aims to make malicious updates more difficult to detect. To elaborate on MPHM, we first analyze the perturbation gradient ∇ t p . Previous works [19] have selected three intuitionbased perturbation vectors, namely the sign vector, unit vector, and standard deviation vector. The sign vector refers to ∇ t p = sign(∇ b ), the unit vector refers to ∇ t p = ∇ b ∇ b , and the standard deviation vector refers to ∇ t p = std(∇ {i∈[n]} ), where sign() refers to the sign function and std() refers to the standard deviation function. These perturbation vectors are based solely on the information from the current training round. In contrast, in the MPHM attack, the attacker references the previous training information when crafting the perturbation vectors. Specifically, the attacker will use gradient information from the previous training rounds and accumulate this information momentum to the newly crafted perturbation vectors, making them more stealthy. Thus, we propose a new method for calculating the perturbation vector as follows: Where ∇ b is the mean value of the known gradient of the attacker, ∇ t−1 p is the perturbation vector in epoch t − 1, and α is the decay factor.
To ensure that the malicious gradients bypass the aggregation rules, the magnitude of the added perturbation needs to be regulated. In our approach, we search for the optimal perturbation coefficient λ within a predefined range, based on the experience of [22]. Additionally, a higher standard deviation allows for a higher magnitude of perturbation to be introduced, so we use the deflating scale λ = σ 5 Experimental setup

Experimental environment
The experiments in this study were conducted on a server equipped with an Intel Xeon Silver 4210 CPU, 64GB RAM, NVIDIA Tesla T4 GPU, and 16GB RAM, running on the Ubuntu 20.04 server operating system. The FL experiments were implemented using the PyTorch framework.

Datasets and model architectures
The validation of the proposed attack is conducted on two visual domain datasets, i.e., CIFAR10 [38], and FEMNIST [39,40].

FL and attack settings
For the CIFAR10 dataset, we use the Alexnet architecture for training. The optimizer used is Adam, the batch size is 64, the number of training rounds is 1000, and the learning rate per training round is 0.001 × 0.998 t . For the FEMNIST dataset, we use CNN architecture training. The optimizer used is SGD, the number of training rounds is 1200, using the entire data from clients per batch, and the learning rate per training round is 0.2 × 0.998 t . By default, the malicious client's ratio is set to 20%, i.e., m/n = 0.2, and the percentage of malicious clients is fixed in each round of FL training. In the attack setup, except for the Fang attack, the attacker's knowledge is that neither benign updates nor aggregation rules are known. Due to Fang's algorithm setup, Fang needs to know the aggregation rule, which is a stronger adversarial setup compared to other attacks. In addition, the factor α in our attack is 0.5, unless otherwise specified.

Baseline attacks
LIE. Little is enough (LIE) [20] attack, as his name implies, jeopardizes FL training by adding small amounts of noise to each dimension of the benign gradients. Specifically, the attacker calculates the mean µ and standard deviation σ based on the benign gradient he himself possesses. Then the coefficient z is calculated based on the number of malicious and benign clients, and finally, the update of malicious clients is calculated µ + zσ.
Fang. Fang et al. [13] propose a generic framework for FL poisoning attacks. It computes the benign gradients' mean µ and then computes the perturbation vector ∇ p . Denote the benign parameter as ∇ b , The final poisoning update of the malicious clients as ∇ M = ∇ b − λ∇ p is derived by solving for the coefficient λ.
Min-Max. Shejwalkar et al. [19] propose a generic framework for FL poisoning attacks. Similar to Fang [13], the update of the malicious client in Min-Max is ∇ M = ∇ b − λ∇ p . It uses the constraint "so that the maximum distance between the malicious gradients and any other gradients is an upper bound on the maximum distance between any two benign gradients" to solve for a more appropriate factor λ.
Min-Sum. Min-Sum is another method in [19] that solves for the coefficient λ with the constraint that "the upper bound of the sum of the squares of the distances between the malicious gradients and all the benign gradients is the sum of the squares of the distances between any benign gradient and the other benign gradients". The specific details are in [19].

Evaluation metric
The untargeted poisoning attack is designed to decrease the testing accuracy of the global model, and the effectiveness of the attack is evaluated using the testing accuracy loss δ as the metric. Specifically, the notation P represents the test accuracy of the global model without any attack, while P represents the test accuracy of the global model with the attack. Therefore, the evaluation index is defined as δ = P −P .
6 Results and discussion 6.1 Impact of attacks on robust aggregation rules In this section, the impact of the proposed attacks on robust FL training is explored in comparison to baseline attacks. The training process of FL with various robust aggregation rules under multiple attacks is presented in Figure 2, while the impact of different attacks on the robust FL is summarized in Table 1.  From Figure 2, it can be observed that MPHM and baseline attacks both impact the robust aggregation rules. Specifically, on the CIFAR10 dataset, MPHM shows significant effectiveness against the Bulyan, Median, and Trimmed-mean aggregation rules, outperforming baseline attacks. However, when using the Krum aggregation rule, MPHM is less effective than Min-Max and LIE. We hypothesize that this may be due to the excessive scaling factor added by MPHM, which may interfere with the Krum algorithm's selection of the client update as the global model update. As can be observed from the curves in the figure, MPHM made the global model difficult to converge with the Bulyan, Median, and Trimmed-mean aggregation rules. These results demonstrate the superiority of MPHM in poisoning federated learning.
On the FEMNIST dataset, the MPHM attack has a significant impact on FL training with robust aggregation rules. Similar to CIFAR10, MPHM outperforms baseline attacks for FL training using Bulyan, Median, and Trimmed-mean aggregation rules, while being less effective than LIE and Min-Sum for FL training using Krum aggregation rules. The MPHM attack is particularly effective for Bulyan and Median aggregation rules, making it more difficult for the global model to converge.
From Table 1, it can be observed that using the classical defense, the MPHM attack can significantly reduce the accuracy of the global model. On the CIFAR10 dataset, the MPHM attack on Bulyan reduces the global accuracy by 24%, while the attack on Trimmed-mean reduces the global accuracy by 17%. On the FEMNIST dataset, the MPHM attack on Bulyan reduces the global accuracy by 28%, and the attack on Median reduces the global accuracy by 20%.
6.2 Impact of the proportion of malicious clients on FL Figure 3 illustrates the impact of attacks on FL training with different percentages of malicious clients.
The percentage of malicious clients varies from 5% to 25%. On the CIFAR10 dataset, the effectiveness of the MPHM attack increases with the percentage of malicious clients, and in the Bulyan, Median, and Trimmed-mean aggregation rule cases, the MPHM attack outperforms other attacks. On the FEMNIST dataset, the effect of the MPHM attack also increases with the proportion of malicious clients, except for the Krum aggregation rule case. In the Bulyan, Median, and Trimmed-mean aggregation rule cases, the MPHM attack is superior to other attacks. In addition, it can be seen from the figure that the MPHM attack is still effective for each aggregation algorithm with a small proportion of malicious clients, while part of the baseline attack is effective only with a large proportion of malicious clients. This indicates that the MPHM attack has better concealment.

Impact of the decay factor α
In this section, the impact of the decay factor on the MPHM attack is demonstrated. The results are shown in Figure 4, where the effect of different decay factors on the accuracy of the global model is presented.
Our perturbation vector is ∇ t p = ∇ b + α∇ t−1 p . Noting that our perturbation vector degenerates to a unit perturbation vector when α = 0, we use α = 0 as the baseline for comparing the effectiveness of our attacks.
On CIFAR10, when α = 0, the attack effect is significantly smaller than the other α values. As can be seen from Figure 4a, the attack effect increases with increasing α under Median and Trimmed-mean. α = 0.5 is the most effective for the Krum aggregation rule and α = 1 is most effective for the Bulyan aggregation rule. Overall, as α increases, our attacks are more effective for FL. At α = 0.5, the attacks are significantly more effective for all four aggregation rules than at α = 0.
On FEMNIST, the attack effect increases and then decreases with increasing α. As can be seen from Figure 4b, with Krum and Bulyan, the attack effect increases and then decreases with increasing α and maximizes around α = 0.5. At α = 1, it is most effective for Median and Trimmed-mean aggregation rules. At α = 0.5, the attack effect is significantly better for all four aggregation rules than at α = 0. It can be seen that the momentum accumulation of historical information can effectively assist malicious updates to escape detection by robust aggregation rules. In addition, taking into account the individual datasets and aggregation rules, we take the default value of 0.5 for α.

Discussion
The experimental results demonstrate that the proposed MPHM can effectively disrupt the accuracy of the FL global model. While Fang et al. [13] proposed an optimization target for model poisoning attacks, their method requires more prior knowledge as the attacker needs to know the aggregation algorithm  used by the FL architecture. Brauch et al. [20] proposed a simple and effective model poisoning attack, but its effectiveness is strongly influenced by the proportion of malicious clients. Shejwalkar et al. [19] proposed several optimization approaches for different prior knowledge cases, but their study lacks a thorough investigation of the direction of the perturbation added by local updates. This paper proposes a new way of computing perturbations, and the experimental results show that elaborate malicious perturbations can make the malicious updates of attackers more covert. However, the effectiveness of the proposed attack in this paper still needs verification in large datasets and megafederated learning frameworks due to the limitations of devices. The untarget model poisoning attack is still in the early stage of research, and more researchers are expected to join the research of attack and defense methods in FL.

Conclusion
In this work, we propose a new model poisoning attack on FL based on historical information momentum (MPHM). We use a setup where the attacker knows minimal information, and experiments show that our attack is effective compared to other advanced attacks in the face of classical defenses. Our approach focuses on the generation of perturbations, where we have found that carefully crafted malicious perturbations can enhance the surreptitious nature of the attacker's updates. We believe that there is significant research value in this area and will continue to focus on the generation of perturbation vectors in our future work. and embellished the manuscript. Qingxian Wang and Yuan Zhou discuss the effectiveness of the method and correct the typos. Yufei Gao designed the whole structure of the paper.