Exploration of Transferable Deep Learning-Aided Radio Frequency Fingerprint Identification Systems

Radio frequency fingerprint identification (RFFI) shows great potential as a means for authenticating wireless devices. As RFFI can be addressed as a classification problem, deep learning techniques are widely utilized in modern RFFI systems for their outstanding performance. RFFI is suitable for securing the legacy existing Internet of Things (IoT) networks since it does not require any modifications to the existing end-node hardware and communication protocols. However, most deep learning-based RFFI systems require the collection of a great number of labelled signals for training, which is time-consuming and not ideal, especially for the IoT end nodes that are already deployed and configured with long transmission intervals. Moreover, the long time required to train a neural network from scratch also limits rapid deployment on legacy IoT networks. To address the above issues, two transferable RFFI protocols are proposed in this paper leveraging the concept of transfer learning. More specifically, they rely on fine-tuning and distance metric learning, respectively, and only require only a small amount of signals from the legacy IoT network. As the dataset used for transfer is small, we propose to apply augmentation in the transfer process to generate more training signals to improve performance. A LoRa-RFFI testbed consisting of 40 commercial-off-the-shelf (COTS) LoRa IoT devices and a software-defined radio (SDR) receiver is built to experimentally evaluate the proposed approaches. The experimental results demonstrate that both the fine-tuning and distance metric learning-based RFFI approaches can be rapidly transferred to another IoT network with less than ten signals from each LoRa device. The classification accuracy is over 90%, and the augmentation technique can improve the accuracy by up to 20%.


Introduction
In recent years, there has been a significant increase in the quantity of connected Internet of Things (IoT) devices, which is predicted to reach 29.42 billion by 2030 [1]. As IoT technology is becoming increasingly integrated into both industrial manufacturing and our daily activities, its security has also attracted significant attention. Authentication is a critical aspect of ensuring secure wireless connectivity of IoT devices, which prevents adversaries from gaining access to the network. Current IoT authentication solutions heavily rely on cryptographic algorithms, which require keys to be securely distributed among IoT devices. Although these cryptographic solutions theoretically offer robust and strong security, they can be entirely compromised once the keys are leaked. As ensuring absolute security of key distribution and  without the need to collect large amounts of labeled signals and time-consuming training. Firstly, an NN-based feature extractor is pre-trained, then it can be used for transfer learning with only a few labeled signals. For the fine-tuning-based approach, we train a new classifier to identify the end nodes operating in the legacy IoT network; for the metric learning-based approach, we use the k-nearest neighbor (kNN) algorithm to implement transfer learning. The experimental results show that the transferred RFFI systems can achieve over 90% classification accuracy at six different locations, and only less than five signals are needed to reach a satisfactory transfer performance.
-We propose to perform augmentation to expand the dataset collected from the legacy IoT network.
Due to the fact that end nodes in existing legacy IoT networks may be configured with long transmission intervals, the dataset collected for transfer is often constrained in size and thus reducing the performance of the transferred RFFI systems. To mitigate this, artificial noise can be added to expand the collected dataset. The experimental results demonstrate that the augmentation technique can effectively improve classification accuracy by up to 20%.
The rest of the paper is organized as follows. Section 2 introduces the background of RFFI and states the problem targeted in this paper. Section 3 details the proposed two RFFI protocols, namely finetuning and distance metric learning-based approaches. Section 4 presents the experimental results with a LoRa-RFFI case study. Section 5 finally concludes the paper.

Background and Problem Statement
This section first introduces the technical background of DL-RFFI systems. After that, the problem that this paper aims to solve is formulated and presented.

RFFI Primer
The overview of a DL-aided RFFI system is shown in Figure 1. There are M legitimate devices under test (DUTs) operating in an IoT network and a receiver capturing wireless signals from which infers the index of the transmitter. Rogue and unauthorized devices are not considered in this paper.
As illustrated in the figure, a DL-aided RFFI system consists of two successive stages: training and inference. To train a classification NN, a significant amount of labelled wireless signals need to be collected from all DUTs, i.e., training category set C train = {DUT 1, ..., DUT m, ..., DUT M}, and form a training dataset X train , given as where r i denotes the i th captured wireless signal and y i is the corresponding one-hot encoded label. I train is the total number of training samples in X train . With the dataset consisting of a large number of labelled signals, a classification NN c(·, θ) can be trained. The training process can be expressed as an optimization problem that seeks to find the parameter set θ that minimizes the loss function L(·, ·), given as whereŷ = {ŷ 1 , ...,ŷ m , ...,ŷ M } is the prediction output by the classification NN c(·; θ), andŷ m can be interpreted as the probability that the signal is sent from DUT m. To reach the training goal, iterative optimization algorithms such as stochastic gradient descent (SGD) and adaptive moment estimation (ADAM) are often used. The loss function L(·, ·) is often defined as cross-entropy, given as In the inference stage shown in Figure 1(b), the classification NN can predict the device identity by analyzing the captured signal. The inference stage is mathematically given aŝ y = c(r; θ). (4)

Problem Statement
While the above two-stage method is popular in the literature, collecting sufficient packets for training is time-consuming. In the existing works, data collection was carried out offline in a lab setting, where the device's configuration (e.g., transmission interval) and environment conditions (e.g., multipath levels) can be well controlled. In practice, when the DL-based RFFI is to be applied to a legacy IoT network where the IoT devices have already been deployed in real environments, it is extremely challenging to collect sufficient packets with good quality, e.g., high signal-to-noise ratio (SNR). Our goal is to design an RFFI protocol that enables the NN trained with the data collected from a number of training DUTs, i.e., training category set C train , can be rapidly transferred to a legacy IoT network, i.e., legacy category set C legacy , without the requirement for collecting numerous labeled signals and time-consuming retraining. Note that the category set C legacy for a legacy existing IoT network is assumed to be different from the training category set, given as where DUTs 1-N are the end nodes operating in the existing legacy IoT network.

Transferable RFFI Protocols
In this section, two transferable RFFI protocols are proposed, namely the fine-tuning and distance metric learning-aided protocols. Both protocols require pre-training a classification NN, which is thus first described. After that, the fine-tuning and distance metric learning-aided approaches are introduced separately.

Pre-Training
Both of the proposed transferable RFFI protocols require pre-training of a classification NN, whose procedure is shown in Figure 2 and detailed in this subsection.

Signal Collection
The receiver performs signal collection algorithms to capture wireless signals over the air. To meet the requirements of RFFI systems, signal collection often includes synchronization, frequency offset compensation, power normalization, and preamble extraction. The algorithm design depends on the communication protocol used by the IoT network. Interested readers can refer to [16] for a signal collection implementation example for LoRa-RFFI systems. The captured baseband signals r are in the format of IQ samples, i.e., a vector consisting of complex numbers, and are stored in the training dataset X train .

Signal Representation
The IQ samples are often converted to other appropriate signal representations before being fed into the NN. This is because using the IQ samples r as input to the NN often cannot lead to a satisfactory identification performance due to factors like channel effects. The design of signal representation demands expertise in wireless communication knowledge, such as the channel-independent spectrogram [19] and differential constellation trace figure (DCTF) [7], etc. The symbol G(·) represents the signal representation conversion.

Neural Network Architecture
As shown in Figure 2, the classification NN used for pre-training consists of two main components: a feature extractor and a classifier. The feature extractor can be constructed with various types of layers, such as convolutional and recurrent layers. Its output, x, can be considered as the extracted feature as it contains high-level representations of the input data G(r). The feature x is then fed into a classifier composed of fully connected layers, with the outputŷ = {ŷ 1 , ...,ŷ m , ...,ŷ M } being the predicted probabilities for each DUT.

Train with Data Augmentation
As discussed in Section 2, the training of NN is mathematically an optimization problem, and the minibatch training is often leveraged considering the training dataset X train is large. During the training process, a batch of training samples X batch is first selected from X train , given as where B is the batch size. The artificial noise is then added to each sample in the batch, which is called augmentation and can effectively increase the RFFI performance in low SNR scenarios [42]. The batch after augmentation is mathematically given as where n b is the generated artificial Gaussian noise for the b th training sample. Note that the power of n b is not constant and differs for each specific training sample. After augmentation, the training samples in X batch are converted to designed signal representations and used to calculate gradients, given as where g is the gradient for batch X batch computed to update the NN parameters θ. The training process is repeated until the stop conditions are satisfied.

Remove Classifier
Once the training is finished, the classifier is removed from the NN since it is designed to classify the DUTs in C train and becomes useless when applied to a legacy existing IoT network consisting of DUTs 1-N, i.e., C legacy . Note that the remaining feature extractor is still effective in extracting high-level representations from the input data G(r) that is collected from unknown categories in C legacy , which will be further experimentally supported in the following sections.

Fine-Tuning Aided Transferable RFFI Protocol
This subsection introduces a fine-tuning-aided transferable RFFI protocol, which consists of fine-tuning and inference stages. The fine-tuning procedure is shown in Figure 3(a). Firstly, a fine-tuning dataset X tune is collected from the legacy DUTs 1-N that are operating in the IoT network, i.e., C legacy , given as where I tune is the number of packets in the fine-tuning dataset X tune . After that, we fix the parameters of the feature extractor and train a new classifier. Finally, the trained new classifier is connected after the pre-trained feature extractor, which acts as a classification NN that can classify the legacy DUTs in C legacy . Note that the data augmentation introduced in Section 3.1.4 is applied as well, which is an effective mitigation of the limited size of the fine-tuning dataset X tune . The detailed procedure for data augmentation is not introduced here to reduce redundancy.
In the inference stage shown in Figure 3(b), the signal sent from the DUT n in the legacy IoT network is captured, processed, and fed to the assembled classification NN. The outputŷ is the probability vector over DUTs 1-N.

Distance Metric Learning Aided Transferable RFFI Protocol
This subsection introduces a distance metric learning-aided transferable RFFI protocol, which consists of enrollment and inference stages. Firstly, an enrollment dataset X enrol is collected from the DUTs in the category set C legacy , given as where I enrol is the number of signals in the enrollment dataset X enrol . We then augment X enrol by replicating the dataset and adding artificial noise. The enrollment dataset after augmentation is given as where A denotes the number of replications. n i is the generated Gaussian noise for the i th enrollment sample. After that, we use the pre-trained feature extractor to process the augmented enrollment dataset X enrol and convert it to an RFF database X rf f , given as where x i is the RFF extracted from the i th signal. The inference stage is illustrated in Figure 4. The collected signal is converted into the designed signal representation, and the pre-trained feature extractor extracts its RFF x ′ . After that, the kNN algorithm is used to determine which DUT the signal is sent from. More specifically, the distances between x ′ and all the RFFs in X rf f are calculated, and x ′ is assigned with the label that is most frequent among its k nearest data samples in X rf f . The cosine distance is leveraged as the distance metric, which is given as where D(x 1 , x 2 ) denotes the cosine distance between x 1 and x 2 , (·) denotes the dot product operation, and ∥·∥ returns the vector magnitude.

Experimental Evaluation
This section evaluates the proposed transferable RFFI protocols. We take LoRa-RFFI as a case study and use the real-collected LoRa signals for evaluations.

Case Study: Transferable LoRa-RFFI Protocols
The LoRa-RFFI is taken as a case study to evaluate the transferable RFFI protocols proposed in Section 3. Note that the proposed RFFI protocols can be applied to any wireless technologies and are not restricted to LoRa. This subsection introduces the detailed experimental setup as well as the collected datasets.

LoRa Modulation Primer
LoRa is a wireless modulation technology designed for long-range and power-consuming communication.
It is derived from the existing chirp spread spectrum (CSS) technology. More specifically, the linear chirps are used for communication and the information is encoded in the initial frequency. An unmodulated chirp x(t) is mathematically expressed as where A and BW are signal amplitude and bandwidth, respectively. T is the duration of a LoRa symbol. A LoRa packet typically starts with eight repeating x(t) for packet detection and synchronization, which is named preamble. Note that this work only utilizes the preamble part for RFFI to prevent the payload information from contributing to the identification.

Experimental Dataset
-Transmitters (DUTs): 40 COTS LoPy4 development kits are used as the DUTs to be identified. The spreading factor, bandwidth, and center frequency are set to seven, 125 kHz, and 868.1 MHz, respectively. -Receiver: An USRP N210 SDR is utilized as the receiver to capture LoRa physical layer signals, whose sampling rate is set to 1 MHz. The USRP N210 is connected to a laptop that runs the MATLAB LoRa collection program.  Figure 5. The dataset used for fine-tuning/enrollment is collected at Location A, and the dataset used for inference will be specified in each subsection.

Experimental Setup: Pre-Training
-Signal Collection: The LoRa receiver needs to perform signal collection algorithms to capture physical layer IQ samples. The signal collection program includes packet detection, fine synchronization, power normalization, and preamble extraction. After that, the frequency offset compensation is additionally conducted to increase system stability. The algorithms are implemented in MATLAB. More details about the LoRa signal collection algorithms can be found in [16].  Figure 6: The architecture of the neural network during pre-training. The input is a 52×128 channelindependent spectrogram.
-Signal Representation: The channel-independent spectrogram proposed in [19] is leveraged as the signal representation, which is specially designed for the LoRa modulation technique and can significantly mitigate the channel effects. The conversion function G(·) converts the collected IQ samples, i.e., a complex vector, into a 2D matrix. The converted channel-independent spectrogram is shown in Figure 6 as the input to the NN. As this paper does not focus on the impact of the wireless channels, interested readers please refer to [19] for more information. -Neural Network Architecture: The classification NN used for pre-training is shown in Figure 6, whose input is the channel-independent spectrogram. The NN consists of a feature extractor and a classifier. The feature extractor is composed of three convolutional, two 2×2 max-pooling layers, and a linear layer of 128 neurons. The number of kernels in the convolutional layers is 8, 16, and 32, respectively. The convolutional layers are activated by the ReLU function and their outputs as padded to maintain the same size as the inputs. The output of the feature extractor is a 128-element vector x, which is considered the high-level representation extracted from the input data. The feature vector x is then input to a classifier, i.e., a linear layer of M neurons, andŷ is the softmax-activated NN output. The NN is implemented with the PyTorch library. -Train with Augmentation: After the NN is built, its parameters are updated with the collected training dataset. The augmentation is leveraged during training to increase its robustness to noise. The batch size and initial learning rate are set to 32 and 0.001, respectively. The Adam optimization algorithm is used. The SNR of the signal after augmentation is uniformly distributed in the range of [0, 80] dB. 10% of the training samples are used for validation. We utilize a learning rate scheduler to control the training process, which reduces the learning rate by 0.5 when the validation loss does not change for 10 epochs. The training stops when the validation loss stays unchanged for 20 epochs. The training is conducted on a PC with NVIDIA GeForce GTX 1660. -Remove Classifier: As discussed in Section 3.1, the classifier, i.e., fully connected layers, of the classification NN will be removed once training is complete. The rest part serves as a feature extractor. According to our CNN design shown in Figure 6, the feature extractor receives a 52×126 channelindependent spectrogram and outputs a vector of length 128.

Experimental Setup: Fine-Tuning Aided Transferable RFFI Protocol
In the fine-tuning process, we fix the parameters of the feature extractor and train a new classifier, i.e., a linear layer of N neurons. The training batch size and learning rate are set to four and 0.001, respectively. The SGD optimizer is leveraged. The learning rate scheduler and stop conditions are exactly the same as those used during pre-training. Note that the augmentation technique introduced in Section 3.1 is utilized during fine-tuning as well. The SNR range for augmentation is [0, 80] dB. After fine-tuning, the captured signal can be fed into the classification NN and a predicted identity will be given.

Experimental Setup: Distance Metric Learning Aided Transferable RFFI Protocol
In the distance metric learning-aided transferable RFFI protocol, we need to first build an RFF database with the pre-trained feature extractor, which is called the enrollment stage. Note that augmentation on the enrollment dataset is also applied by replicating it 10 times and adding artificial noise. Once the RFF  database is built, the kNN algorithm can be used to classify the received LoRa signal. The k is set to 15 unless otherwise specified.

Evaluation on the Transferability and Robustness to Location Changes
This section evaluates whether the pre-trained RFFI systems can be transferred to a legacy IoT network that is already in operation without requiring a large number of labeled signals. As the locations of end nodes in an IoT network may change due to movement, we evaluate the performance of the transferred RFFI systems on the datasets collected at different locations. The pre-training procedure is described in Section 4.1, whose training dataset contains the signals collected from DUT 1-30. Then the pre-trained RFFI system is transferred to identify DUT 31-40, emulating the end nodes operating in an existing legacy IoT network. More specifically, we use 15 signals collected from DUT 31-40 at Location A as the fine-tuning/enrollment dataset and evaluate the transferred RFFI systems on Locations A-F, respectively. The experimental results at six locations are given in Figure 7 and Figure 8 provide the classification results at Location A as confusion matrices. It can be observed that the classification accuracy is always above 90% for both fine-tuning and distance metric learning-based protocols, which shows that the proposed RFFI protocols have excellent transferability and is robust to location changes.

Effect of Fine-Tuning/Enrollment Dataset Size
This section investigates the impact of the number of signals in the fine-tuning/enrollment dataset, i.e., I tune and I enrol , on the performance of the transferred RFFI system. It is desired that I tune and I enrol are restricted because the IoT end nodes may be configured with long transmission intervals and thus collecting a large number of signals is time-consuming. The fine-tuning/enrollment datasets are collected at Location A, and the transferred RFFI systems are evaluated on another 100 signals collected at the same location. The experimental results are given in Figure 9. It can be observed that the classification accuracy increases as more signals are collected from the legacy IoT network. The performance improvement becomes marginal when the number of signals reaches 15 for both protocols, which implies that the end nodes in the legacy IoT network do not need to transmit numerous signals and thus the time consumption for transfer can be reduced.

Effect of Augmentation on the Fine-Tuning/Enrollment Dataset
The augmentation technique can be applied to the fine-tuning/enrollment procedure to further increase the performance of the transferred RFFI systems. As described in Section 3, in the fine-tuning-based protocol, artificial noise is added to the batch, while in the metric learning-based protocol, it is added to the replicated enrollment dataset.
We collect 15 signals from DUT 31-40 at Location A for fine-tuning/enrollment and then use another 100 signals from each DUT for the test. Different levels of artificial noise are added to the test signals to simulate environments with varying SNRs. As illustrated by the experimental results in Figure 10, the augmentation during transfer can significantly improve the classification performance. More specifically, the accuracy can be increased by up to 20% when the SNR is between 20 dB and 40 dB. Therefore, augmentation should be leveraged during transfer to improve system performance.

Summary and Discussion
It is experimentally demonstrated that both fine-tuning and distance metric learning-based RFFI protocols can be rapidly deployed to protect legacy existing IoT networks. The results in Figure 10 show that fine-tuning aided RFFI protocol achieves higher accuracy than the distance metric learning-based one. However, fine-tuning introduces additional training costs and thus requires the receiver/authenticator to have the ability to update the NN parameters. In contrast, the distance metric learning-based approach is training-free and is more friendly to low-cost receivers. In summary, there is a trade-off between complexity and performance. It is recommended to apply the metric learning-based approach when the receivers are computing-constrained while applying the fine-tuning-based approach when sufficient computing resources are available.

Conclusion
This paper aims to design RFFI protocols that can be rapidly transferred to legacy existing IoT networks without the need to collect numerous signals. Fine-tuning and distance metric learning techniques in transfer learning are utilized, which make the RFFI systems efficiently transferable. More specifically, we first pre-train a feature extractor using a large amount of data, and then only need to collect a few signals from the existing IoT networks for transfer learning. For the fine-tuning-based approach, we train a new classifier for the IoT end nodes in the legacy network, while for the metric learning-based approach, the kNN algorithm is used for classification. Since the signals used for transfer learning are limited in number, we propose to perform augmentation on the transfer dataset to further improve performance. A LoRa-RFFI system is built as a case study to evaluate the proposed transferable RFFI protocols, consisting of 40 COTS LoRa DUTs and a USRP N210 SDR receiver. The experimental results show that both the proposed RFFI protocols can achieve classification accuracy higher than 90% when transferred to a new IoT network, and require no more than five signals per DUT. It is also experimentally demonstrated that augmenting the transfer dataset can improve the identification performance by up to 20%.