Eﬃcient poisoning attacks and defenses for unlabeled data in DDoS prediction of intelligent transportation systems

Nowadays, large numbers of smart sensors ( e.g. , road-side cameras) which communicate with nearby base stations could launch distributed denial of services (DDoS) attack storms in intelligent transportation systems. DDoS attacks disable the services provided by base stations. Thus in this paper, considering the uneven communication traﬃc ﬂows and privacy preserving, we give a hidden Markov model-based prediction model by utilizing the multi-step characteristic of DDoS with a federated learning framework to predict whether DDoS attacks will happen on base stations in the future. However, in the federated learning, we need to consider the problem of poisoning attacks due to malicious participants. The poisoning attacks will lead to the intelligent transportation systems paralysis without security protection. Traditional poisoning attacks mainly apply to the classiﬁcation model with labeled data.Inthispaper,weproposeareinforcementlearning-basedpoisoningmethodspeciﬁcallyfor poisoningthepredictionmodelwithunlabeleddata.Besides,previousrelateddefensestrate-giesrelyonvalidationdatasetswithlabeleddataintheserver.However,itisunrealisticsince thelocaltrainingdatasetsarenotuploadedtotheserverduetoprivacypreserving,andour datasetsarealsounlabeled.Furthermore,wegiveavalidationdataset-freedefensestrategy basedonDempster–Shafer(D–S)evidencetheoryavoidinganomalyaggregationtoobtaina robustglobalmodelforpreciseDDoSprediction.Inourexperiments,wesimulate3000points incombinationwithDARPA2000datasettocarryoutevaluations.Theresultsindicatethat our poisoning method can successfully poison the global prediction model with unlabeled data in a short time. Meanwhile, we compare our proposed defense algorithm with three popularly used defense algorithms. The results show that our defense method has a high accuracy rate of excluding poisoners and can obtain a high attack prediction probability.

captured in the early stage, we can take actions in advance. This is what the defense system needs. Thus, researchers had proposed some machine learning (ML) based and correlation-based techniques to predict the probabilities of DDoS attacks, such as Bayesian network-based method [3], attack graph-based method [4] and hidden Markov model-based (HMM) method [5]. Although these methods have achieved good results by analyzing data packets, there still exist the following problems: • First, the majority of such ML-based or correlation-based techniques are centralized methods, meaning that the BSs need to gather data from different areas and send them to the central cloud server. The requirement of large transmissions to the cloud server and privacy preservation awareness [6] propose challenges to the defense system if we directly apply previous methods to base stations in our city. A smart defense/attack prediction system on each road-side infrastructure and response with short delay are required. • Second, even if an attack prediction system is put in the road-side base station, the BS has no feasibility of data aggregation in some areas with sparse communication flows due to little training data. Then, the model cannot be trained effectively, which further leads to the fact that the system would not predict DDoS attacks on time.
To solve the aforementioned problems, the federated learning (FL) occurs on us firstly. Therefore, based on the above previous studies, our paper adopts the latest and excellent hidden Markov modelbased multi-step DDoS prediction model [5] combining FL, a type of decentralized ML paradigm. It can alleviate the pressure on BSs, and realize a secure and effective prediction system for determining whether there will be DDoS attacks on base stations. In FL, each BS can be seen as a participant. The participants can train their models (HMM) based on the local dataset and do not need to exchange the original privacy data with others. These participants upload their local model parameters to the cloud server, where the parameters will be aggregated to form a global model. The cloud server then distributes the global model to the participants and both sides repeat the above processes until the model converges.
Though the FL framework takes advantage of the collaborative training and avoids the leakage of sensitive privacy data, there still exists the vulnerability on data reliability. During a federated learning process, even we assure that the uploaded parameters are protected by various cryptography-based tamper-proofing technologies, we cannot guarantee all participants are trustful. Participants may upload interference parameters/models to the server due to vicious intentions, i.e., launching poisoning attacks. If one malicious participant uploads its interference parameter to the server continuously, the global model will be poisoned and cannot converge [7,8]. Then, the protection of the whole network will be reduced due to the poisoning attacks. Therefore, the uploaded parameters should be guaranteed to be valid.
Nowadays, there are many studies about the poisoning attacks [9][10][11][12][13] and defense strategies [14][15][16][17][18][19]. With respect to poisoning, the latest poisoning attacks based on GAN mainly point to the classifier/classification model with labeled data. Its principle is to generate data through a discriminator and a generator, and then to assign wrong labels to these data. This also means that the poisoner knows how to process the data to have a negative effect on the model. However, the poisoning attacks based on GAN cannot poison the prediction model that has no labeled data, since the GAN poisoner cannot know how to process data and cannot poison it by flipping the labels. Besides, among various prediction models, the HMM-based prediction model is a widely used, popular and excellent prediction model for multi-step DDoS prediction [5]. Thus, it is very meaningful to focus on the poisoning attack for unlabeled data in the HMM-based prediction model in this paper. 1 With respect to the defense, many studies about secure aggregation must have an assumption that the server has a validation dataset with labeled data to judge whether the uploaded local model is poisoned. However, in practice, this assumption is unrealistic since the local training datasets are not uploaded to the server due to privacy preserving, and our datasets are also unlabeled.
Thus, this is the first paper that considers the poisoning attacks on an unlabeled prediction model, while taking the validation dataset-free as a condition to design a defense strategy for multi-step DDoS prediction in ITS under a federated learning framework. The main contributions of our paper are listed as follows: • A reinforcement learning-based poisoning algorithm is proposed, which can not only poison the prediction model with unlabeled data successfully, but also can use accumulated experiences to speed up the poisoning speed continuously. • A validation dataset-free defense strategy based on D-S (Dempster-Shafer) evidence theory is given with avoiding anomaly aggregation. Three different types of evidences are designed to guarantee the robustness of the aggregated model under limited knowledge owned by the global server. • Through extensive experiments, the results show that our reinforcement learning-based poisoning method can successfully poison the global prediction model with unlabeled data in a short time. Besides, we compare our defense algorithm with Krum, Trimmed Mean and Median three algorithms.
The results show that our defense method can achieve the best performance on defending poisoning attacks and guarantee the attack prediction probability of global model above 0.96.
The paper is organized as follows. The related studies are reviewed in Section 2. System architecture and some preliminaries are given in Section 3. Then, a reinforcement learning-based poisoning algorithm for the HMM model is proposed in Section 4. Next, a validation dataset-free defense strategy based on D-S evidence theory is designed to guarantee the reliability of the aggregated global model in Section 5. Finally, we give experiments and analysis in Section 6 and conclude the paper in Section 7.

Related work
The previous studies related to the subject of our paper involve three aspects. The first is related to the methods of multi-step DDoS attack prediction (Section 2.1). The second is related to the poisoning attacks in FL (Section 2.2). The third is related to the secure aggregation and defense approaches to the poisoning attacks in FL (Section 2.3).

Prediction methods of multi-step DDoS attacks
There are many related studies about DDoS detection by utilizing supervised or unsupervised machine learning technologies. Some reviews [20,21] can be referred. Here we do not repeat them. Recently, with respect to the research of multi-step DDoS prediction, there are also some classic studies about DDoS prediction. For example, Kavousi and Akbari [3] presented a new algorithm to mine attack behavior patterns and generated correlation rules from a large number of intrusion alerts without prior knowledge through using a Bayesian causality mechanism. Wang et al. [4] proposed an approach to generate a global attack graph based on a heuristic searching strategy which would improve the attack search process. Holgado et al. [5] proposed a method based on a hidden Markov model (HMM), in which intrusion alerts were used to predict the multi-step DDoS attacks. Though all above studies have high prediction accuracy of multi-step DDoS attacks, they still cannot avoid the problems of tremendous data transmission and privacy leakage brought about by the central training.

Poisoning attacks in FL
FL is vulnerable to the poisoning attacks since it is difficult to ensure the security of each participant. The poisoning attacks in FL can be categorized into two types: data poisoning and model poisoning [9]. The former can finally generate poisoned model updates through interfering the training data. The latter is to directly launch model poisoning by manipulating benign model updates. With respect to the research of data poisoning, for example, Xie et al. [10] and Fung et al. [11] described the poisoning attacks that flipped the labels of data, which would make the local training result inaccurate and further generate poisoned model updates. This finally leads to poor performance of the global model. With respect to the research of model poisoning, for example, Fang et al. [12] proposed a method to reduce the reliability of local models. The author made a process of crafting a local model as an optimization problem and then minimized the loss function in iterations to launch a poisoning attack.
Though different studies can launch poisoning attacks in FL, they usually assume that the attacker (the distrustful participant) knows the distribution of other local participants' datasets. It is impractical for attackers to observe the other local participants' datasets directly in FL because each participant trains its model secretly and only updates the model parameters. With later the generative adversarial network (GAN) appearing, Zhang et al. [13] utilized GAN to generate the same distribution samples as the other participants through the global model without knowing the prior knowledge of training dataset. Then the attacker assigns the incorrect labels to the generated samples and finally uploads its poisoning parameters for aggregation. As described in Section 1, the previous poisoning attacks mainly apply to the classifier/classification model with labeled data. It cannot poison our HMM prediction model without labeled data. In this paper, we propose a data poisoning method through a learning way to realize the poisoning for the prediction model.

Secure aggregation and defense approaches
The approaches against poisoning attacks can be traditionally classified into two types [14]. One is robust aggregation. The server side aggregates the local models according to a secure strategy instead of using an average-based aggregation method [15]. For example, Blanchard et al. [16] proposed a defense approach named Krum. This research selects the most representative parameters to update the global model. Chen et al. [17] aggregated the global model with a median value. Another solution is anomaly detection. This type of approach is based on similarity comparison among all the updated local models and then excludes some local models with great differences.
Nowadays, as more and more poisoning attacks appear, there are many new research works concentrating on how to build a robust FL framework. They do not strictly divide the approaches into the above two types. For example, Tan et al. [14] proposed a verify-before-aggregate (VBA) method that removed the model if it could not achieve a good result after verifying. However, it may lead to some reliable clients taken away because the training dataset is insufficient and the local model cannot converge in time. Kang et al. [18] described a concept of reputation in FL. It can distinguish the non-benign participants and select reliable ones to aggregate. However, when there is a large number of clients, it will have huge computational costs. What's more, it is difficult to determine the reputation threshold. Fang et al. [12] proposed two defense methods based on the error rate and the loss function to reject the suspicious update parameters by verifying them using a validation dataset. Taheri et al. [19] trained a GAN network through using a validation dataset to obtain an anomaly detector and set an anomaly threshold. If the anomaly value of a participant is higher than the threshold, the local model in this participant would be excluded in aggregation. However in fact, a validation dataset does not always exist in the server side, especially considering the privacy preserving. Unlike the aforementioned approaches, we propose in this paper a defense method by utilizing D-S evidence theory to aggregate a robust global model.

System architecture and preliminaries
As stated in Section 1, an anti-poisoning FL strategy for multi-step DDoS prediction on base stations in intelligent transportation systems will be proposed in our paper. Thus, first we give a system architecture of our proposed FL-based multi-step DDoS prediction (Section 3.1). Under this architecture, we also show how the poisoning attack can impact the DDoS prediction preliminarily. Second, under this architecture, we introduce the latest and novel study [5] about the hidden Markov model-based multi-step DDoS prediction into our FL-based architecture (Section 3.2). This introduction is necessary since we will show how this HMM-based DDoS prediction model cannot resist the poisoning attack in FL framework in the following sections.  training datasets and each participant (i.e., BS) cannot know the models of other participants in the federated learning framework. The BS collects the sensor-BS's communication traffics continuously and trains its local prediction model to monitor whether it will suffer DDoS attacks in its communication range. Then all the participants in different geographical areas update their local models to the central server. The server aggregates these local models to a global model and further distributes the global model to each BS to realize the DDoS prediction at BSs. After predicting the DDoS attacks, each BS updates cumulative data into its training dataset through DDoS observation subsequently. Besides, we assume that the server has no validation dataset. Under this architecture, we need to clarify two issues.
• First, it is related to the prediction model deployed on each BS. In this paper, we use an HMMbased multi-step DDoS prediction method according to literature [5], which will be described briefly in Section 3.2. It requires each BS to be equipped with a usually used packet sniffer, named "Snort", 2 which can detect abnormal packets and produce relevant alerts. These alerts provide a common language to describe the suspicious events and finally correspond to some alert tags. These alert tags include "Information", "Remote Users", "Remote Attackers", "Denial of Service", and "Buffer Overflow". Each BS utilizes these generated alerts to train its local model. • Second, it is related to the poisoning attacks. As shown in Figure 2, if a participant updates a poisoned local model, the global model will become inaccurate, which further leads to poor prediction performance for each BS. Next, in Section 4, we show a designed poisoning method which can realize the poisoning for HMM prediction model in detail. Further, in Section 5, we give a defense method to complete a robust aggregation in the server.

An HMM-based multi-step DDoS prediction method
According to literature [5], an HMM-based multi-step DDoS prediction model is built as follows.
A discrete first-order HMM is defined as a five tuple λ, having These parameters, which describe the HMM and their correspondence to our purposes, are explained below: • S describes a set of HMM states corresponding to a DDoS attack's several sub-steps. As stated in Section 1, there are five types of hidden states: "IPsweep" (s 1 ), "Sadmind Ping" (s 2 ), "Break Into and Buffer Overflow" (s 3 ), "Installation Mstream Software" (s 4 ), and "Launch DDoS" (s 5 ). Thus, we have • Σ is a set of observations based on the alerts, which are defined by five types of alert tags described in Section 3.1: "Information" (σ 1 ), "Remote Users" (σ 2 ), "Remote Attackers" (σ 3 ), "Denial of Service" (σ 4 ), and "Buffer Overflow" (σ 5 ). We have • E is a transition probability matrix describing the transitions between the states in set S. We define a 5 × 5 transition probability matrix shown below: where a hidden process, which is represented by random variable x t and corresponds to a state at time t in the Markov chain. Notation e ij denotes the transition probability from state s i to state s j . • Q is an observation probability matrix, with size of 5 × 5. We have where an observation process, which is represented by random variable o t and corresponds to alert tags sent from Snort. Each element q ik in Q denotes the probability of observation symbol σ k while in state s i . • Π represents an initial distribution vector for each HMM state. It indicates the probability that the intrusion can start in each state. We have Here, we set each element in Π to 0.2 uniformly.
At the beginning of HMM training, the observation sequences and the hidden states are used to calculate the transition probability matrix E and observation probability matrix Q through supervised training. That is to say, it calculates the frequency of each observation in each HMM state and adopts the maximum likelihood method to obtain HMM parameters on λ.
Let O represent a series of observations with variable length, whose elements belong to the set of observations Σ. After obtaining the HMM parameters of λ, we can use Forward or Backward algorithm to calculate the observation sequence probability P (O|λ). Also, when given the λ and observation sequence, we can get the most likely corresponding state using Viterbi algorithm. According to literature [5], a final attack probability PA is calculated based on the number of alerts triggered by Snort. It reflects the progress of the current DDoS attack. The probability PA is given below: in which Num is calculated by using the following Equation (8): where n j is the number of alerts at state s j in the training dataset, and µ j is the counter of number of alerts for the attack in progress detected in each state which can dynamically adjust the number of alerts based on the current intrusion.
The DDoS attack has the multi-step characteristic. It means that the DDoS attacker needs to make a series of preparations to launch the final DDoS attack. In Equation (7), the numerator denotes the sum of the probability of the most likely state when receiving the alert at each time, while the denominator denotes the total alert length of a DDoS attack. If the HMM model can predict the state accurately, then when each alarm comes, the value of probability will be close to 1. As the number of alerts increases, the DDoS attack is approaching to the final DDoS launching stage. Therefore, the attack probability PA indicates the degree/extent from the current time when the alert is received to the final DDoS launching stage. A high attack probability PA means that it is closer to the final stage of a DDoS attack, and the base station needs to be prepared for defense.
4 Reinforcement learning-based poisoning attacks for HMM-based multi-step DDoS prediction As stated in Section 1, the federated learning could suffer from poisoning attacks and lead to the global model with low accuracy. In this section, we present a reinforcement learning-based poisoning algorithm in detail. First, we give the threat model and assumptions (Section 4.1). Then, we define a reinforcement learning model of poisoning attacks (Section 4.2). Finally, we design a reinforcement learning-based poisoning algorithm for HMM-based multi-step DDoS prediction (Section 4.3).

Threat model and assumptions
In the above FL-based multi-step DDoS prediction architecture, one or more poisoners would choose some BSs to invade and participate in the federated learning as normal participants. These poisoners not only make their local BSs unable to predict DDoS attacks normally, but also want to poison the global model which can further impact other BSs' predictions. To avoid the model deviating from the original one suddenly, a poisoner tries to poison its local model by injecting abnormal alert data into the BS's local training dataset. A poisoner can inject different alerts, which would change the transition probability matrix in HMM. When these alerts are used for training, the prediction model may not be accurate.

Reinforcement learning model of poisoning attacks
In each participant, its local training dataset consists of a sequence of alert tags. The five types of alert tags have been described in Equation (3). As a poisoner does not know how the alert tags impact the model and wants to poison the local data with a best policy, our poisoning method can be formulated as a reinforcement learning model. As shown in Figure 3, we regard a poisoner as an agent that interacts with the environment and takes an action according to the current state and reward. A poisoner can obtain the uploaded parameters so that it can evaluate its action under the current state and further can feedback a reward to guide the policy learning. After multiple iterations, the agent can make actions that maximize accumulated rewards based on the current state and learn the optimal policy. We give definitions of three important elements as follows.

State space definition
Since a tremendous state space is not friendly for an agent to learn, we select several representative environment characteristics to describe the current state. Here, a state can be defined as Notation sa j , j ∈ [1,5], indicates the proportion of j-th type of alert tags to the total number of alert tags in the training dataset. The value of sa j under the current t-th step can be calculated as Besides, notation sc denotes the last alert tag in our alert sequence of the training data. Since the poisoning attack needs to affect the HMM model ultimately and the observation sequence of HMM is a time-dependent sequence, the poisoning agent needs to try to inject interferential alerts into the tail of the observation sequence based on the last alert tag in the current observation sequence. At different states, the last alert tag of the current observation sequence will be different. Based on the last alert tag sc, the poisoning agent can choose two actions which are related to sc. The detailed work process is shown in the following action space definition. Thus, we set sc as a component of the state vector in Equation (10).

Action space definition
We define an action space as {a keep , a transit }. Let notation a t denote an agent selecting one of the actions in the above action space at time step t. In our paper, to speed up the process of poisoning, when an agent performs an action, 10 single/10 pairs of alerts will be injected to the local alert sequence of the training data. Action a keep represents the agent injecting the same 10 alerts as the last alert (i.e., sc) at the end of the alert sequence. Action a transit denotes transiting to the other alert tag. The agent randomly chooses one of the remaining four alert tags except the last alert tag sc as a transiting alert. Then the agent injects 10 pairs of alert tags with the last alert tag sc and the transiting alert in an alternating manner at the end of the alert sequence. For example, the local alert sequence of the training data is {σ 1 , σ 1 , σ 2 , σ 3 }. If the agent selects action a transit the transiting alert is randomly chosen as σ 4 . Then, the poisoned training data become {σ 1 , σ 1 , σ 2 , σ 3 , σ 4 , σ 3 , σ 4 , . . . , σ 3 , σ 4 }. The above two types of actions would finally change the transition probability matrix in HMM.

Reward function definition
To verify whether the current state and action have an impact on model poisoning, we should judge whether the alert injection takes effect. Since a poisoner owns the local training dataset in the base station it can utilize this original dataset to verify the performance of model poisoning. Considering that the global model is aggregated and distributed periodically (i.e., it will take some time), we use the local model and the global model together to formalize two reward functions so as to speed up convergence. Immediate Reward : Before a global model is distributed, a poisoner can use its local model to judge whether a certain action takes effect under the current state. We take the final attack probability introduced in Equation (7) as the calculation criterion for evaluating the poisoning performance. An immediate reward RI t can be calculated as follows: Page 9 of 21 Security and Safety, Vol. 1, 2022003

Algorithm 1 Reinforcement Learning-Based Poisoning Algorithm
Input: The original training dataset in each local base station The descending degree tar of model performance Output: A trained neural network mode that can output policy πDDQN 1: Initialize action-value functions Q and target action-value functionsQ with random weights 2: Initialize replay memory to capacity M 3: repeat 4: t ← 0, Rt ← 0

5:
With probability ε select a random action at With probability 1 − ε select action at = arg max a Q( st, a; θ) 6: Execute action at and obtain st+1 7: Use the original training dataset to calculate RIt according to Equation (12) 8: if the agent obtains a global model at the current time step then 9: Calculate RDt according to Equation (13) 10:

13:
Sample random mini-batch transitions ( sj, aj, Rj, sj+1) from the replay memory 14: Perform a gradient descent step with loss function L(θ) according to Equation (15) 15: Update θ ← θ for every τ steps 16: until t RDt > tar 17: Output a trained neural network model where PA I t−1 and PA I t are the attack probabilities calculated by the local model at time step t − 1 and local model at the current time step t after taking a poisoning action.
Delayed Reward : When the server aggregates local models, it will distribute a global model to each participant. After a poisoner obtains the global model, the impacts of actions during this period can be measured. A delayed reward RD t is defined as follows: where PA D t−1 and PA D t are the attack probabilities calculated by the global model at time step t − 1 and global model at the current time step t after taking a poisoning action. If the global model is not distributed at the current time step, we set PA D t = PA D t−1 . Finally, the total reward is defined as: where w i , w d are the weights of different rewards, and here we set w i = 0.5, w d = 0.5. When either the global model or local model is poisoned, the attack probabilities will decrease, and then the reward R t will be positive. Both the immediate reward and the delayed reward increasing will optimize an agent's action choice.

Reinforcement learning-based poisoning algorithm for HMM-based multi-step DDoS prediction
Based on the above-mentioned reinforcement learning model, we can see that the state space is continuous and large. As a type of reinforcement learning method, double deep Q-network (DDQN) [22] is fit for tremendous state space and action space, which can accelerate the convergence speed. Thus, in this section, we give pseudo-codes of our reinforcement learning-based poisoning algorithm by utilizing DDQN in Algorithm 1.
In DDQN, two deep neural networks with the same structure are set up and used to approximate the action-value function. Notation Q( s t , a t ; θ) with weight θ represents an online neural network, and notationQ( s t , a t ;θ) withθ represents a target neural network. Then, we use M to denote the maximum capacity of the replay memory in DDQN. The replay memory contains an agent's experiences that will be used for optimizing the weights of Q andQ.
In Algorithm 1, to describe the poisoning level uniformly, a threshold tar, tar ∈ [0, 1], is set to denote the descending degree of model performance. For an agent, it takes the original training dataset in its local base station and threshold tar as the inputs of algorithm. First, we initialize relevant parameters of Q,Q, and M (Lines 1-2). Next, at each step, the agent selects a random action a t using ε-greedy method based on policy π DDQN ( st) to balance the exploration and exploitation [23] (Lines 3-5). After the agent executes action a t , the next state s t+1 can be obtained (Line 6). Then, the agent can calculate the reward about the poisoning performance according to Equations (12)-(14) (Lines 7-11). Then the transition ( s t , a t , R t , s j+1 ) will be stored in the replay memory (Line 12). In the next step, the mini-batch gradient descent method is used to optimize the weight θ of online neural network Q( s t , a t ; θ) with a loss function L(θ) (Lines 13-15). In the mini-batch gradient descent method, the agent randomly selects a part of transitions from M for training. Assuming that the size of the training set is M s . L(θ) is calculated as: where a max = arg max a Q( s j , a; θ). Notation γ is a discount factor, and the value is between 0 and 1. The parameterθ is updated by cloning the value of θ for every τ steps [22]. The agent will not terminate a loop until the poisoning level tar is reached or the target network Q( s t , a t ;θ) converges (Line 16). Finally, we can get a trained neural network (Line 17) and use it to guide the action when a current state s t is given. The values of relevant parameters in this algorithm are given in Table 2 in the experiment.

A validation dataset-free defense strategy based on D-S evidence theory
From Section 4, a reinforcement learning-based poisoning can impact the global model for HMM-based multi-step DDoS prediction. Since the central server has no validation dataset, it is difficult to judge which participant is harmful. We only can use the uploaded information and the knowledge owned by the central server to infer the suspicious poisoners and obtain a robust global model. D-S evidence theory [24] proposed by Dempster and Shafer is an effective method to infer uncertain conception which is based on the synthetic formulas that integrate different evidences. For this reason, we utilize this theory to design our defense algorithm. Note that, our proposed strategy is under the condition that the number of poisoners is less than 1/3 of the total participants. It has been proved that only under this condition, a secure defense strategy can be found. Here, the server will execute the defense strategy and aggregate the global model periodically. We give the pseudo-codes of our proposed defense strategy in Algorithm 2. The input is each participant's local model. The output is an aggregated global model. In each round of aggregation, we first utilize D-S evidence theory to do evidence quantization (Lines 2-3). The details are shown as follows.
First, four tuple (Ω, EV, BPA, BEL) is defined under the framework of D-S evidence theory as follows.
• Ω denotes an identification frame, also named hypothetical space which contains all the propositions. Here, we define Ω = {Benign, Harmf ul}, where Benign presents a participant is not a poisoner, while Harmf ul denotes a participant is a poisoner. • EV denotes a set of evidences, which is an essential part of D-S evidence theory. The elements of the set are independent of each other. We define three evidences from different perspectives that can present the reliability level of the participants. The detailed definitions about the three evidences are given in the following Section 5.1. • BPA describes a set of basic probability assignment functions. We assume L is the number of participants, and l denotes one of them. Then the BPA is given by

Input:
Each participant's local model λ l Output: An aggregated global model 1: for each round of aggregation do 2: Use K-means to cluster the participants' local models 3: Calculate three types of evidences according to Section 5.1

4:
Aggregate different evidences using Equation (25) 5: for {Bel l (Benign)}, 1 ≤ l ≤ L do 6: Sort these values in a descending order and calculate the average value and the median Each element in the above set represents a total belief function for the Benign or Harmf ul proposition through fusion of the three different evidences for the l-th participant.

Evidence definitions
Here, we give three evidences, including clustering, upload frequency of local model and historical trust.
(1) Evidence of Clustering In the federated learning, each participant will upload its parameter/model λ to the central server. For normal participants, their uploaded parameters will be similar. Thus, firstly we utilize a clustering method to group the uploaded local models. There are many different clustering algorithms, such as the hierarchical clustering, the density-based clustering algorithm and K-means algorithm. Compared with other algorithms, K-means algorithm can achieve good clustering effect with the advantage of easy implementation and low time complexity O(N ) [25]. Here, N denotes the number of samples in the dataset. Thus, the K-means algorithm is selected to cluster the participants' local models here.
Note that, λ is five tuple and it contains the element of matrix. Since the dimension of the matrix for each participant is the same, the Euclidean distance of the matrix is defined as the sum of the Euclidean distances of the corresponding elements. Besides, it is very important to determine the appropriate K value in the K-means algorithm. We refer to the latest improved K-means algorithm proposed by Zhang et al. [26]. Through using the maximum weight product method mentioned in this paper, we can obtain the optimal K value and initial cluster centers. Here, we do not repeat them.
After obtaining the clustering results, we can observe that every local model will be grouped into a cluster. Since we assume that the normal participants account for the majority, the normal participants are easy to be grouped together and the abnormal participants may be scattered in the clustering results. Therefore, we use the size of a cluster where a participant is located as the first evidence. Then, we have the following probability assignment functions: The notation C l represents the size of a cluster where the l-th participant is located.

2) Evidence of Upload Frequency
A poisoner requires to upload the latest local model to evaluate the impact of the current state and action on the final poisoning target, resulting in the upload frequency higher than other normal participants. Thus, we use the upload frequency as the second evidence. Based on this evidence, we have the following probability assignment functions for the l-th participant: Here, f l denotes the number of uploads in this round of aggregation. (

3) Evidence of Historical Trust
In the server side, it records whether a participant is a poisoner in each aggregation round. Therefore, we use the number of times a participant is judged to be a benign participant as the third evidence. Thus, we have the following probability assignment functions for the l-th participant: Here, notation D denotes the total number of aggregations and d l represents the number of times the l-th participant is judged to be a benign participant historically. Importantly, some excellent references [27][28][29] use the clustering method to defend the poisoning attacks. However, the clustering method is only one evidence of our proposed D-S evidence theorybased defense strategy in our paper. We have other evidences to strengthen the defense against poisoning attacks, and we also have an ablation study in the experiment ( Table 3) to verify that our multi-evidences, not just clustering, are effective to defend the poisoning attacks in our paper.

D-S evidence fusion and robust model aggregation
After obtaining the above probability assignment functions for three evidences, we need to fuse these evidences and obtain the final belief function Bel l (Benign) for the l-th participant (in Algorithm 2 Line 4). Since our two propositions are incompatible, the Bel function can be obtained by: Furthermore, we have where the coefficient ξ indicates the degree of conflict between evidences, having Then, performing fusion with the third evidence is the same process as that in the above steps. After obtaining Bel l (Benign), we have to decide whether the l-th participant's local model can be aggregated into a robust global model. For {Bel l (Benign)}, 1 ≤ l ≤ L, we sort these values in a descending order and calculate the average value and the median (Lines 5-7). Then, we set two pointers. One is called trustful pointer, the other is called distrustful pointer (Line 8). Through comparing the average value and the median, we assign the large one to the trustful pointer and the small one to the distrustful pointer

Performance evaluation and analysis
In this section, we evaluate the proposed poisoning and defense methods in detail. First, we give the simulation settings in Section 6.1. Then, we test the performance of our designed reinforcement learningbased poisoning attack method in Section 6.2. Finally, we evaluate the aggregation effect of our D-S evidence theory-based defense strategy in Section 6.3, and meanwhile, observe the performance of predicting DDoS attacks for global model in Section 6.4.

Simulation settings
We select an experiment area nearby Shanghai Hongqiao Airport, shown in Figure 4. The longitude is between [121.29031, 121.33821], and the latitude is between [31.19216, 31.2241]. We randomly set 10 base stations (participants of the federated learning) in the area. The detailed information of base stations is provided in Table 1. The communication ranges are marked by grey circles in Figure 4. In this experiment area, we use Python to generate 2000 random points to simulate the smart sensors (e.g., cameras, embedded road sensors). Meanwhile, we use a simulator named SUMO to generate 1000 vehicles. The vehicles' traces are restricted to the road topology. Then, we utilize a dataset named DARPA2000 3 which contains multi-step DDoS attacks to carry out experiments. There are 420 alerts in the dataset. Under FL framework, we randomly select a number of attack alerts and assign them to the

Performance of poisoning attacks
The parameters used in DDQN are given in Table 2. Here, we set the poisoning target, that is, the descending degree tar of model performance as 0.1 to balance the poisoning efficiency and the poisoning effect. To verify the performance of our RL-based poisoning method is effective during the federated learning process, we show the different numbers of uploads that achieve the poisoning goal under different numbers of poisoners. The experiment results are shown in Figure 5. The x-axis is the number of uploads  of local model(s). The y-axis is the attack probability of the aggregated global model. In Figure 5, when there is only one poisoner, the number of uploads (iterations) is 185, which can make the attack probability decrease from 0.96 to 0.86 (i.e., tar = 0.1). When the poisoners are two, three, four and five, the number of uploads are 76, 53, 38 and 32 in a gradual downward trend. Besides, we test the convergence time that a poisoner completes a successful poisoning attack. The result is shown in Figure 6. The x-axis is the number of learning. A point in x-axis means a complete learning convergence process, i.e., the attack probability is reduced by 10%. We can see that the poisoning method pays much time to achieve the poisoning goal at the beginning. Subsequently, the convergence time becomes low and steady due to cumulative learning experiences.

Performance of our defense strategy
In terms of the attack probability, we compare our proposed defense method with three other defense algorithms 4 (Trimmed Mean [15], Krum [16], Median [17]) under different numbers of poisoners. The experiment results are shown in Table 3.
From Table 3, we can find that our proposed defense strategy performs best than Trimmed Mean [15], Krum [16] and Median [17]. When there is only one poisoner, the three comparison defense methods can work effectively. The result of the Trimmed Mean is better than Krum and Median, as it can exclude the worst model and then take the mean. When there are two or three poisoners, the Krum and Median methods are well, and the values are 0.9576 and 0.9423, respectively. Trimmed Mean has a performance degradation since it takes mean and aggregates some poisoned parameters. When there are more than three poisoners, the attack probabilities of Krum, Median, and Trimmed Mean are bad at predicting the DDoS attacks. However, our defense method still has good results when the number of poisoners is less than 1/2, since our aggregation criteria considers many different evidences.
We also do an ablation study, in which our defense strategy retains any two evidences (i.e., containing three variants). The results are given in the last four lines of Table 3. We can see that the attack probabilities of our defense strategy are also higher than the other three variants. When the number of poisoners is less than 3, the attack probabilities are close to the results that retaining all the three evidences. When the number of poisoners is more than 3, we find that selecting any two evidences performs worse than selecting all the three evidences. The ablation study verifies that selecting three evidences can achieve better results since it considers more factors to make the defense more rigorous.
Besides, we test our defense method's average accuracy of excluding poisoners in the federated learning when there are different ratios of poisoners. The results are shown in Figure 7. From Figure 7, we note that when the number of poisoners is less than 3/10 of participants, the accuracy rate of excluding poisoners is close to 100%. The gap to 100% is because the poisoned local model has little effect on the global model when the poisoning just starts, so that it cannot be directly eliminated. As the poisoning process progresses, our defense method can successfully detect and exclude it. When there are more than 3/10 poisoners, the accuracy rate of excluding poisoners decreases. Although the accuracy rate of excluding poisoners drops, the attack prediction probability in Table 3 remains relatively steady. This is because our defense method aggregates the local models cautiously. Through observing the experimental data, we find that our defense method will not select more than one-half of the participants for aggregation when the poisoners cannot be accurately detected.

Performance of predicting DDoS attacks for global model
In this paper, we implemented HMM to predict the intrusion probability of DDoS attacks as the number of alerts increases.
We set three performance metrics to evaluate the intrusion prediction performance. The first metric is the prediction accuracy rate (PAR). It is used to measure the proportion of the number of states that the model cumulatively predicts correctly. We have PAR = The Number of Correct Predictions Alerts , where Alerts denotes the total number of incoming alerts. The second metric is the average state prediction probability (ASPP). It is used to evaluate how high the average probability that the model can predict  Alerts .
The third metric is the attack probability, defined in Equation (7). It is used to indicate the progress of the current DDoS attack based on the current alerts. We compare our aggregated global model without poisoners and the global model with one poisoner, to observer the performance changes. The experiment results are shown in Figures 8-10.
In Figure 8, we can see that the prediction accuracy rate can keep close to 1 steadily and the average value reaches to 0.995. However, when there is a poisoner in participants, the PAR of global model decreases and only has 0.9 approximately. It shows that the sever aggregates the poisoned model, which has an impact on the global model.

Page 18 of 21
Security and Safety, Vol. 1,2022003 In Figure 9, the initial values of ASPP in both cases are only 0.664. The reason is when there are few alerts, the observation sequence is too short to accurately calculate the current state probability. As the number of incoming alerts increases, the ASPP without poisoners increases rapidly and reaches to 0.965, while the ASPP with one poisoner is lower and finally only reaches to 0.855. Combining Figures 8 and 9, we can see our global model without poisoners can accurately predict the current state with a high probability.
From Figure 10, we can find the attack probability rises as the number of incoming alerts increases. The attack probability represents how far a DDoS attack is from reaching the final target. A point of view [2] is that there should be some reactions when the attack probability arrives at 0.95. In our experiment, the final attack probability of the global model without poisoners can finally reach up to 0.963. When the attack probability arrives at 0.95, the number of incoming alerts is 315, much less than total alerts 420. It means that the BS has enough time to make active defenses. However, the attack probability of global model with one poisoner increases slowly. When the number of alerts reaches to 315, the attack probability is only 0.805, which is far less than 0.95.

Conclusions
An FL-based robust HMM prediction method is proposed for predicting multi-step DDoS attacks and defending the poisoning attacks in ITS. Also, a poisoning attack based on reinforcement learning is designed to poison the global model, and meanwhile, a validation dataset-free defense strategy is given to avoid anomaly aggregation. In the reinforcement learning-based poisoning algorithm, the poisoner can learn the best strategy to poison the global model and can speed up the poisoning process according to the accumulated experiences. For the validation dataset-free defense strategy, we analyze the characteristics of the poisoning attacks and propose three evidences to select benign participants. In the future work, we will consider a more complex attacking scenario, where multiple adversaries can work cooperatively to launch cooperative reinforcement learning-based poisoning attack. This will be a more challenging work for us to continue to study.