An accurate identification method for network devices based on spatial attention mechanism

Xiuting Wang; Ruixiang Li; Shaoyong Du; Xiangyang Luo

doi:10.1051/sands/2023002

All issues

Volume 2 (2023)

Security and Safety, 2 (2023) 2023002

Full HTML

Security and Safety in the "Metaverse"

Open Access

Issue		Security and Safety Volume 2, 2023 Security and Safety in the "Metaverse"


Article Number		2023002
Number of page(s)		17
Section		Other Fields
DOI		https://doi.org/10.1051/sands/2023002
Published online		03 May 2023

Security and Safety, Vol. 2, 2023002 (2023)

Research Article

An accurate identification method for network devices based on spatial attention mechanism

Xiuting Wang¹^,2^,3^*, Ruixiang Li²^,3, Shaoyong Du²^,3^,4 and Xiangyang Luo²^,3^*

¹ Henan Polytechnic Institute, Nanyang, 473000, China
² Henan Province Key Laboratory of Cyberspace Situation Awareness, Zhengzhou, 450001, China
³ State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou, 450001, China
⁴ Institute of Information Engineering, State Key Laboratory of Information Security, Beijing, 100093, China

^* Corresponding authors (email: wangxiuxiu1997@163.com (Xiuting Wang); luoxy_ieu@sina.com (Xiangyang Luo))

Received: 16 December 2022
Revised: 23 February 2023
Accepted: 22 March 2023

Abstract

With the metaverse being the development direction of the next generation Internet, the popularity of intelligent devices, and the maturity of various emerging technologies, more and more intelligent devices try to connect to the Internet, which poses a major threat to the management and security protection of network equipment. At present, the mainstream method of network equipment identification in the metaverse is to obtain the network traffic data generated in the process of device communication, extract the device features through analysis and processing, and identify the device based on a variety of learning algorithms. Such methods often require manual participation, and it is difficult to capture the small differences between similar devices, leading to identification errors. Therefore, we propose a deep learning device recognition method based on a spatial attention mechanism. Firstly, we extract the required feature fields from the acquired network traffic data. Then, we normalize the data and convert it into grayscale images. After that, we add a spatial attention mechanism to CNN and MLP respectively to increase the difference between similar network devices and further improve the recognition accuracy. Finally, we identify devices based on the deep learning model. A large number of experiments were carried out on 31 types of network devices such as web cameras, wireless routers, and smartwatches. The results show that the accuracy of the proposed recognition method based on the spatial attention mechanism is increased by 0.8% and 2.0%, respectively, compared with the recognition method based only on the deep learning model under the CNN and MLP models. The method proposed in this paper is significantly superior to the existing method of device-type recognition based only on a deep learning model.

Key words: Metaverse / Device identification / Deep learning / Spatial attention

Citation: Wang XT, Li RX, Du SY and Luo XY. An accurate identification method for network devices based on spatial attention mechanism. Security and Safety 2023; 2: 2023002. https://doi.org/10.1051/sands/2023002

© The Author(s) 2023. Published by EDP Sciences and China Science Publishing & Media Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

The metaverse is a world made up of computers, and the concept of the metaverse has evolved since it first appeared, with a variety of descriptions. Typically, the metaverse is thought of as a virtual shared space that blends the physical, human, and digital worlds [1], and is where the next generation of the Internet is headed following the Web and mobile Internet revolution [2]. At present, with the popularity of intelligent devices and the maturity of various emerging technologies [3–5], more and more intelligent devices try to connect to the Internet so that computer users can experience another kind of life in the virtual world. There is no doubt that massive network device greatly facilitates people’s daily production and life, but it also brings a variety of security problems. Through these network devices, people’s life and privacy are easy to be obtained and monitored. In 2016, a large-scale DDoS attack took place in the United States, which caused a large area of Internet breakdown in the United States. Behind the attack was a botnet composed of about 1.5 million network devices [6]. The attack had a huge impact on the daily production and life of the American people. Therefore, to maintain network security, it is urgent to identify network devices on the Internet. Efficient and accurate identification of network devices is the basis for enhancing the security of cyberspace [7], realizing asset evaluation [8], and network situational awareness [9]. Therefore, in recent years, the research on network device identification technology has attracted wide attention at home and abroad.

At present, the network device identification method based on traffic is one of the hot research spots. In the process of identifying network devices, the traffic-based network device identification method mainly extracts the characteristic attributes of application layer protocol packets, network layer protocol packets, and transport layer protocol packets. However, both device identification methods based on network traffic packets and device identification information can identify network devices, but in the actual identification process, the data may be subject to malicious changes and forgery, and it is easy to cause identification errors when the characteristics are processed manually, which will reduce the identification accuracy. Therefore, it is of great academic value and practical significance to study fast and accurate network device identification methods.

By extracting application layer protocol features, Gao et al.[10] construct the fingerprint of a fixed device and realizes the identification of the network device. Luo et al.[11] propose to generate fine-grained fingerprints based on the subtle differences between the hardware information of the device, use natural language processing technology to process the hardware information of the device and verify the effectiveness of the proposed method through the system. Genevieve et al.[12] propose to apply deep learning to network traffic to automatically identify network devices connected to the network. Wang et al.[13] propose a network device identification method based on MAC boundary inference, which identifies device types by inferring MAC address boundaries corresponding to device types. In reference [14], clock offset information is used as device identification, and device fingerprint is generated by extracting the timestamp difference between the detection packet and the response packet, so as to realize the identification of network devices. The feature attributes extracted by the existing device identification based on network traffic often have redundant features and interference features, which increase the time cost of identification and reduce the identification accuracy.

In view of the above problems, by analyzing the relationship between network traffic and deep learning, this paper studies the network device identification technology based on deep learning, combines the characteristics of image for data enhancement, and the grayscale image of different device types has great differences, and uses conversion rules to convert network traffic data into a grayscale image for deep learning processing. By adding the spatial attention mechanism to make the network device identification model has a certain interpretability and increase the small differences between devices, a deep learning network device identification method based on the spatial attention mechanism is designed to improve the accuracy of network device identification.

The main work of this paper is as follows:

This paper proposes a gray image conversion technology, which splits and reorganizes the traffic data of network devices based on sessions, and generates gray images corresponding to different device types as the input of the deep learning model. Compared with the existing methods, the proposed method can reduce the error of device identification.
This paper proposes an optimal deep learning network device identification technology based on spatial attention mechanism, and uses the image processing method of deep learning to process the network gray image. By adding a spatial attention mechanism, the identification accuracy of network devices is improved. Compared with the existing methods, this method can effectively improve the identification accuracy of the device.
The experimental results on public data sets show that compared with the typical identification methods based only on convolutional neural networks, the accuracy of the proposed convolutional neural network device identification method based on spatial attention mechanism is improved by 0.8%. The proposed multi-layer perceptron network device identification method based on the spatial attention mechanism has an improved accuracy of 2.0% compared with the identification method based only on the multi-layer perceptron model.

The rest of this paper is organized as follows: Section 2 introduces the work related to network device identification, Section 3 describes the method in detail, Section 4 introduces the experiments and analyzes the results, and Section 5 summarizes the full text and looks into future work.

2. Related work

In recent years, network device identification has gradually become a research hotspot in the field of cyberspace security. At present, the mainstream identification method is to identify devices based on network traffic. Firstly, this method captures data packets [15] and firmware information [16] of network devices through detection tools. Then the captured traffic data is processed, the optimal features are selected, and interference features and redundant features are removed. Finally, various learning algorithms are used to identify the network device.

Among them, Greis et al.[17] construct the fingerprint of a specific network device by extracting the application layer protocol features and using the deep learning model to realize the identification of network devices. Zhang et al.[6] propose to use natural language processing to extract web content, use machine learning to build a classification model, and use network scanning technology to achieve real-time, non-invasive network crawling.

Deep learning is widely used in the field of network device identification due to its autonomous learning ability of features. Umair et al.[18] propose to apply deep learning to analyze network traffic to automatically identify network devices connected to the network. Zhu et al.[19] propose an efficient classification method for network devices based on multi-level deep learning. In this method, deep neural networks are used to extract traffic features and maximum entropy classifiers are used to classify Internet traffic. Although the machine learning traffic classification system based on a shallow neural network proposed in this paper has achieved very good classification effect and achieved a very high identification accuracy, this method does not carry out further fine-grained identification of network devices. Meidan et al.[20] design a traffic monitoring system based on the C5.0 decision tree and time series analysis. The system uses a CNN-LSTM (Convolutional Neural Networks, Long short-term Memory) combination model to autonomously learn the traffic, to avoid human intervention features. However, this system has poor applicability. Kotak et al.[21] propose a method based on machine learning to analyze the traffic of network devices, and then identify the network device. Firstly, researchers monitor and obtain TCP packets of devices. Then, the feature extraction tool is used to convert each TCP packet of the device into a feature vector and construct the feature space. Finally, an optimal classifier is constructed for each device type, and the device is identified by the machine learning algorithm. The proposed method achieves high identification accuracy. However, the single use of one protocol packet cannot satisfy the existing identification of various types of network devices.

Shiv et al.[22] train ten deep learning models for ten types of network devices to detect the traffic generated by network devices and non-network devices. In each model, one type of deep learning model is regarded as a positive class and the rest as a negative class. The deep learning model is trained separately for each network device. Finally, the effectiveness of this method is proved through identification accuracy. Although this method can improve the identification accuracy by training a deep learning model for each network device, it still has some problems. Firstly, each deep learning model needs to train all samples in the training process. As the amount of sample data increases, the time cost of model training will also increase. Secondly, because in this mode, the number of positive and negative samples is very unbalanced, with the increase of sample data, this asymmetry will be gradually serious. Thirdly, in practical application, if new network device types are added, all ten deep learning models need to be retrained.

According to the experimental results of the existing methods, the existing identification methods all need to manually process the features extracted from the data and have better identification performance for the network devices with a large degree of differentiation. However, manual feature selection and processing often difficult to capture the small differences between similar devices, which will lead to identification errors. The deep learning algorithm can automatically learn features without human participation, which greatly reduces the identification errors of similar network devices. By analyzing the relationship between network traffic and deep learning, this paper studies the network device identification technology based on deep learning, combines the characteristics of images for data augmentation, and the grayscale images of different device types have great differences and uses conversion rules to convert network traffic data into grayscale images for deep learning processing. By adding the spatial attention mechanism to make the network device identification model has a certain interpretability and increase the small differences between devices, a deep learning network device identification method based on the spatial attention mechanism is designed to improve the accuracy of network device identification.

3. Methods

This section describes network device identification techniques based on spatial attention mechanisms. By analyzing the relationship between network traffic and deep learning, this method studies the network device identification technology based on deep learning. Combining the characteristics of gray images of different device types with great differences, this method uses conversion rules to convert network traffic data into gray images that are convenient for deep learning processing, and designs the network device identification technology based on a spatial attention mechanism. In order to improve identification accuracy.

3.1. Method framework

In response to the problems raised in Section 2, this section proposes a method for identifying network device types based on spatial attention mechanisms. Firstly, the traffic data of the network device is split and reorganized based on the session. Then, grayscale images corresponding to different device types are generated as the input of the deep learning model. Secondly, the spatial attention mechanism training optimal classification model is added to the convolutional neural network and the multi-layer perceptron respectively. Finally, the network devices are classified based on the optimal classification model. Through the deep learning model, the autonomous learning ability of the model is improved, to avoid the identification error caused by manual processing. The addition of an attention mechanism makes the deep learning model of the network device identification with certain interpretability, further improving the identification accuracy. The following mainly introduces the basic principle framework and main steps of the network device identification method based on the spatial attention mechanism. The basic framework is shown in Figure 1. The framework mainly includes four parts: data preprocessing, grayscale image generation, optimal classification model training, and network device classification identification.

Figure 1.

Framework of network device identification method based on spatial attention

The specific workflow of the network device identification method based on the spatial attention mechanism is as follows:

Data splitting: Since there are many protocols in a pcap (packet capture) file, this paper only needs TCP protocol packets, so the existing original pcap file is parsed to extract the TCP traffic data.
Data reorganization: The TCP traffic packets are reorganized according to the session, and finally constitute a complete TCP session.
Data preprocessing: Normalizes the reassembled TCP traffic data.
Data filling: Since convolutional neural network and multi-layer perceptron are to be used for classification, and the input requirements of deep learning model are consistent, the length of traffic data must be fixed, that is, the pixel size of the grayscale image must be fixed. Most of the effective information in communication packets is concentrated in the header. Therefore, this paper selects 784 bytes to ensure identification accuracy. If the length of bytes is larger than 784, it will be truncated, and if the length is smaller than 784 bytes, it will be processed by 0 filling.
Generating gray image: The reconstructed traffic data is converted into a 28*28 pixel gray image, which is the input of a convolutional neural network and multi-layer perceptron, so as to train the optimal deep learning model. In this paper, the generated gray image is divided into two parts. One part is used as the training data to construct the optimal deep learning model, and the other part is used as the validation data set to verify the learning effect of the deep learning model.
Construct the deep learning model: the gray image generated in the previous step is used as the input of the convolutional neural network and the multi-layer perceptron model, and the spatial attention mechanism is added to the two deep learning models. Through the autonomous learning of the deep learning model, the weights and parameters are adjusted iteratively, and the deep learning model is evaluated according to the loss function until it is trained into the optimal identification model.
Network device identification: The trained optimal deep learning model is used to classify the traffic of all network devices in the data set, and the type of network devices is determined by classification.
Performance evaluation: After classification, this paper evaluates the performance of the deep learning model, and evaluates the effectiveness of network device classification based on the spatial attention mechanism of convolutional neural network and multi-layer perceptron model by calculating the accuracy rate, recall rate, and F1 value.

3.2. Generating grayscale images

The existing network device identification methods mainly analyze the traffic generated by the network device in the communication process and identify the device by processing the features extracted from the traffic data. In the process of extracting the traffic feature field, there will be two problems. One is that the artificial processing of the feature is easy to appear information loss, resulting in the identification error. Second, for the data set with a small sample size, the model learning is not sufficient and the identification error will occur. On the one hand, the grayscale image can ensure that there will be no information loss. on the other hand, the data set can be expanded by means of data augmentation such as translation, rotation, and cropping. Therefore, this chapter converts the traffic data of network devices into the gray image to expand the data set, reduce the error of device identification and improve the identification accuracy.

Since the premise of generating the grayscale image is data division and data reorganization, this section introduces data preprocessing, data division, and data reorganization, then introduces the generation of a grayscale image, and finally displays the grayscale image of some network device generated.

The original network traffic file obtained by this method contains a variety of protocol information, only part of the traffic data can be used for identification, so it is necessary to analyze the header information of the network traffic file pcap and extract the required protocol through the identification field. In this experiment, this paper extracts the TCP protocol for network device identification.

The traffic packet is divided according to the session, then an original pcap file can be divided into several small pcap files. In this method, the traffic data after re-partitioning is normalized. Because the convolutional neural network and multi-layer perceptron are used for classification, the length of the input traffic data must be fixed, that is, the size of the gray image must be the same. However, most of the traffic characteristic data required by this method are distributed near the front. Therefore, 784 bytes are fixed in this paper. If the traffic byte is larger than 784, it will be truncated. if it is smaller than 784 bytes, it will perform the 0-filling operation at the end of the data. The specific grayscale image generation algorithm is shown in Algorithm 1.

Algorithm 1Grayscale image generation algorithm

Input: Network device traffic list L_t, Length of flow flow_len, Flow length after cutting cut_len;

Output: List of grayscale images of network devices P_t;

1: cut_len = 784;

2: png = 28*28;

3: for each flow_len ∈ L_t do

4: if flow_len > 784 then

5: cut flow_len = cut len ;

6: else

7: add 0x00 until flow_len = cut len ;

8: end if

9: end for

10: for each flow_len ∈ L_t do

11: transform flow_len to png;

12: png_width = 28;

13: png_height = 28;

14: insert png into P_t;

15: end for

According to the traffic data generated by different network devices, the restructured traffic data is converted into a 28*28 pixel gray picture, marked with the device label, as the input of convolutional neural network and multi-layer perceptron.The grayscale images of some of the network devices generated at last are shown in Figure 2, where 2a represents the grayscale image of the Youxun camera, 2b represents the grayscale image of the door sensor, 2c represents the grayscale image of the switch, 2d represents the grayscale image of the water sensor, 2e represents the grayscale image of the lamp, and 2f represents the grayscale image of the WeMo switch.

Figure 2.

Grayscale image of part network device. (a) D-LinkCam, (b) D-LinkDoorSensor, (c) D-LinkSwitch, (d) D-LinkWaterSenor, (e) Lightify, (f) WeMoSwitch.

3.3. Construct an optimal deep learning model

By inputting grayscale images of network devices generated in Section 3.2, an optimal deep learning network device identification model based on a spatial attention mechanism is constructed. In this paper, a relatively simple deep-learning model is adopted. The convolutional neural network CNN is taken as an example to train the optimal network device identification deep learning model.

In the training of the deep learning model, this paper converts the pre-processed traffic feature data into a gray image, which is taken as the input of the model. Part of the gray image is used as the training set data to train the optimal identification model, and the results of each layer are calculated through forward propagation until the output, which is used as the prediction result. If the predicted results meet the expectation and the training times are enough, it shows that the model is the optimal deep learning model for network device identification. If the predicted result does not meet the expectation, the parameters will be adjusted through backpropagation, and the process will be repeated until the optimal deep learning model is found and the network device type is finally output.

Forward propagation refers to calculating and storing the results of each layer of the deep learning model from input to output [23]. Its calculation formula can be expressed as

$\begin{matrix} α^{k} = σ (z^{k}) = σ (α^{k} * W^{k} + b^{k}) \end{matrix}$ $\begin{aligned} \alpha ^{k} = \sigma \left( z^{k} \right) = \sigma \left( \alpha ^{k}*W^{k} + b^{k} \right) \end{aligned}$ (1)

where k stands for the number of layers, W for weight, b for bias, and σ for activation function.

In this chapter, two layers of convolution are used. The first layer of convolution uses 16 convolution kernels, the step size is 1, the activation function is the Relu function, and the convolution process can be expressed as

$\begin{matrix} s (i, j) = (X * W) (i, j) + b = \sum_{k = 1}^{n_i n} (X_{k} * W_{k}) (i, j) + b \end{matrix}$ $\begin{aligned} s\left( i,j \right) = \left( X*W \right)\left( i,j \right)+b=\sum _{k=1}^{n\_in}\left( X_{k} *W_{k}\right)\left( i,j \right)+b \end{aligned}$ (2)

where s(i,j) represents the corresponding position of the convolution kernel W in the final output matrix, n_i n represents the number of input matrices, X _k represents the kth matrix, and W _k represents the kth sub-convolution kernel matrix.

When the spatial attention mechanism is added to the process, the contribution of different features to the identification device is different in the network device identification based on the traffic packet. Similarly, not all areas in the image are equally important in terms of their contribution to identifying devices. Therefore, by adding the spatial attention mechanism, the image features can be independently learned to find out the parts of the gray image that contribute more to the identification device for enhancement processing, give them weight, weaken the features that contribute less to the identification, to improve the identification accuracy. A simple convolutional neural network model based on the spatial attention mechanism is shown in Figure 3.

Figure 3.

Spatial attention mechanism

The Relu activation function is used in the above convolution process, and its formula can be expressed as

$\begin{matrix} r (α) = m a x (0, x) \end{matrix}$ $\begin{aligned} r\left( \alpha \right) = max\left( 0,x \right) \end{aligned}$ (3)

Figure 4.

CNN model parameters based on spatial attention mechanism

This method adopts maximum pooling, the pooling window is set to [2, 2], the step size is set to 1, and its formula can be expressed as 3.4. The Pooling layer can be located in multiple convolution layers and used to compress the image. It can compress images and reduce the dimension of features, leaving only the most important features for network device identification. The most important thing is that the pooling operation can prevent overfitting, which is more conducive to the optimization of deep learning model. Dropout is also used to prevent overfitting, setting this parameter to 0.25.

$\begin{matrix} M a x_p o o l i n g = [\begin{matrix} a_{1} & a_{2} \\ a_{3} & a_{4} \end{matrix}] \end{matrix}$ $\begin{aligned} Max\_pooling = \left[ \begin{matrix} a_1&a_2\\ a_3&a_4\\ \end{matrix} \right] \end{aligned}$ (4)

The second layer of convolution adopts 36 convolution cores with a step size of 1, which also uses the Relu activation function and maximum pooling and has the same parameters as the first layer.

In this method, parameters are adjusted and corrected through the cross entropy loss function. The larger the cross-entropy loss is, the larger the gap between the two outputs is. Otherwise, it indicates that the two outputs are closer. Its formula can be expressed as:

$\begin{matrix} L (Y | f (x)) = - \sum_{i = 1}^{N} Y_{i} log f (x_{i}) \end{matrix}$ $\begin{aligned} L\left( Y | f\left( x \right) \right) = -\sum _{i=1}^{N}Y_{i}\log f\left( x_{i} \right) \end{aligned}$ (5)

where Y represents the true value and f(x) represents the predicted value. The specific CNN model parameters and process based on the spatial attention mechanism are shown in Figure 4.

The parameter setting and process of CNN based on the spatial attention mechanism are shown above. This model is also applied to the multi-layer perceptron MLP model. Assuming that the vector X represents the input layer of the multi-layer perceptron, then the hidden layer can be represented as f(W ₁ X+b ₁), where w ₁ is the weight, which can also be called the connection coefficient. b ₁ represents the bias and the function f is the activation function. The output layer can then be represented as s o f t m a x(W ₂ X ₁+b ₂), where the X ₁ represents the hidden layer’s output f(W ₁ X+b ₁). Combined with the above explanation, the multi-layer perceptron model can be represented as

$\begin{matrix} f (x) = s o f t m a x (b^{(2)} + W^{(2)} (s (b^{(1)} + W^{(1)} x))) \end{matrix}$ $\begin{aligned} f\left( x \right)= softmax\left( b^{\left( 2 \right)}+W^{\left( 2 \right)}\left( s\left( b^{\left( 1 \right)}+W^{\left( 1 \right)}x \right) \right) \right) \end{aligned}$ (6)

This model converts network traffic data into grayscale images, and then takes the grayscale images as the input of the deep learning model. The image processing method of deep learning is used to process the grayscale images on the network. By adding a spatial attention mechanism, the identification accuracy of network devices is improved.

4. Experiment and result analysis

In order to evaluate the performance of the network device identification method proposed in this paper, experimental verification of real data is carried out in this section. The deep learning network device identification method proposed in this paper based on spatial attention mechanism is compared with the existing network device identification method based only on CNN. The data set in reference [24] was used in the experiment, and the identification accuracy was compared with that of reference [22], which only used CNN for network device identification.

4.1. Experimental setup

In this experiment, the original network traffic data pcap file captured in the paper [24] is adopted. The network traffic data set contains the traffic generated by 31 kinds of smart home network devices during communication, including monitoring cameras, switches, smartwatches and other smart home network devices. Each type of network device was set repeatedly for at least 20 times, and the original network traffic data pcap file was obtained. Based on these original traffic data files, the new traffic data was generated by splitting and reassembling. The specific device types of this experimental data set are shown in Table 1.

Table 1.

Network device types

4.2. Optimal deep learning model training

This section makes a comparative analysis of the optimal deep learning model constructed in Section 3.3. Take the convolutional neural network as an example. The network device identification model-based on the spatial attention mechanism adopts two layers of convolution. The first convolutional layer uses 16 convolutional cores, the step size is 1, and the activation function is the Relu function. The second layer of convolutional layer uses 36 convolutional cores with a step size of 1 and a Relu activation function. At the same time, maximum pooling, cross-entropy loss function, and Adam optimization algorithm are adopted.

Figures 5 and 6 respectively show the changing process of identification accuracy and loss value under the convolutional neural network model only, and the changing process of identification accuracy and loss value under the convolutional neural network based on the spatial attention mechanism. Figure (a) shows the changing trend of identification accuracy of a network device in the training process, and Figure (b) shows the changing trend of loss rate in the training process. It can be seen from the figure that the convolutional neural network based on the spatial attention mechanism has less fluctuation and lower loss value.

Figure 5.

Identification accuracy and loss values based on CNN model

Figure 6.

Identification accuracy and loss value of CNN model based on spatial attention mechanism

In this experiment, the same comparison experiment was performed again under the multi-layer perceptron. Figures 7 and 8 respectively show the change process of identification accuracy and loss value under multi-layer perceptron only, and the change process of identification accuracy and loss value under multi-layer perceptron based on spatial attention mechanism.

Figure 7.

Identification accuracy and loss values based on multi-layer perceptron MLP model

Figure 8.

Identification accuracy and loss value of MLP model based on spatial attention mechanism

Figures 7 and 8 respectively show the change process of identification accuracy and loss value under the multi-layer perceptron model only, and the change process of identification accuracy and loss value under the multi-layer perceptron model based on spatial attention mechanism. It can also be seen from the figure that the multi-layer perceptron model based on the spatial attention mechanism has less fluctuation and lower loss value.

4.3. Identification accuracy experiment

This paper proposes a network device identification method based on a spatial attention mechanism, which uses a convolutional neural network and multi-layer perceptron model to autonomously learn features in network devices, and at the same time adds spatial attention mechanism to build an identification model to identify network devices. Kotak et al.[22] propose an automatic identification method for network devices based solely on convolutional neural networks. This experiment generates its own grayscale image data set by analyzing network traffic data files, compares the identification accuracy of this method with that of the reference [22] by calculating the confusion matrix, and further tests the two methods based on the multi-layer perceptron model to evaluate the effectiveness of this method. In this experiment, the numbers 0–30 are used in the confusion matrix to represent a total of 31 types of network devices from Aria to Withings.

It can be seen from Figures 9 and 10 that the accuracy of devices 3 and 14, namely D-LinkDoorSensor and EdnetCam2, is low. The accuracy of device identification in the convolutional neural network model only is 75% and 33%. In the convolutional neural network model based on the spatial attention mechanism, the accuracy rate is significantly improved, which is 83% and 67%. This indicates that the addition of a spatial attention mechanism can improve the identification accuracy for the data with a small sample size. However, the reason for the low identification accuracy of the device EdnetCam2 is that the data sample size of the network device is small, and the deep learning model cannot learn all the features, so the identification accuracy is low.

Figure 9.

Confusion matrix based on CNN

Figure 10.

CNN confusion matrix based on spatial attention mechanism

It can be seen from Figures 11 and 12 that devices 3, 9, 11, 12, 22, 25, 26, and 29 are less accurate. In this paper, a bar chart is used to show in detail the comparison of identification accuracy based on the multi-layer perceptron model before and after adding the spatial attention mechanism.

Figure 11.

Confusion matrix based on MLP

Figure 12.

MLP confusion matrix based on spatial attention mechanism

As can be seen from Figure 13, the identification accuracy of network devices is 75%, 71%, 88%, 88%, 50%, 97%, 88%, and 85% in the MLP model based only on multi-layer perceptron. In the multi-layer perceptron MLP model based on spatial attention mechanism, the accuracy rate is significantly improved, which is 83%, 100%, 100%, 100%, 100%, 100%, 100%, and 100%. It can be seen that the addition of a spatial attention mechanism can significantly improve the identification accuracy.

Figure 13.

Comparison of identification accuracy based on MLP model before and after the addition of spatial attention mechanism

In order to evaluate the performance of the method in this chapter, three performance indexes including Accuracy, Recall, and F1-Score were still used in this experiment to evaluate the effectiveness of this method in terms of identification accuracy. Table 2 shows the performance comparison of CNN and MLP before and after the addition of spatial attention mechanisms.

Table 2.

Compares the performance of the deep learning model before and after adding the spatial attention mechanism

The above experimental results show that the network device identification method based on the spatial attention mechanism proposed in this paper has higher accuracy than the identification method based on deep learning alone. Through the evaluation of the public data set, the identification performance of the proposed method is better than that of the existing deep learning-based network device-type identification methods.

5. Conclusion

Existing traffic packet-based meta-universe network device identification methods often require manual participation, and it is difficult to capture the small differences between similar devices leading to identification errors. In this paper, a deep learning network device identification method based on a spatial attention mechanism is proposed. Firstly, the required feature fields are extracted from the acquired network traffic data. Then the data is normalized as the input of the deep learning algorithm, and converted into a gray image. Then, the spatial attention mechanism is added into the convolutional neural network and the multi-layer perceptron respectively to increase the differences between similar network devices, to improve the feature autonomous learning ability of the model and further improve the identification accuracy. Finally, network devices are identified based on the deep learning model. A large number of experiments were carried out on 31 types of network devices such as web cameras, wireless routers, and smartwatches. The results show that the accuracy of the proposed CNN identification method based on the spatial attention mechanism is increased by 0.8% compared with the typical identification method based on CNN only. The proposed MLP network device identification method based on the spatial attention mechanism has an improved accuracy of 2.0% compared with the identification method based only on the MLP model. In future work, this paper will focus on the large-scale identification method of network devices in the metaverse and enhance the applicability, to achieve accurate identification of network devices and improve the security of network devices in the meta-universe.

Conflict of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability

The original data are available from corresponding authors upon reasonable request.

Authors’ Contributions

Methodology,Wang XT; validation, Li RX and Du SY; formal analysis, Wang XT, Li RX and Du SY; investigation, Wang XT, Li RX and Du SY; data curation, Wang XT, Li RX and Du SY; writing-original draft preparation, Wang XT; writing-review and editing, Luo XY; visualization, Wang XT; supervision, Luo XY and Du SY; funding acquisition, Luo XY. All authors have read and agreed to the published version of the manuscript.

Acknowledgments

We would like to thank Pei Zhang, Guo Wei and others for helping us check the details and providing us with valuable suggestions in this paper.

Funding

This work was supported bythe National Key Research and Development Program of China (No. 2022YFB3102900), the National Natural Science Foundation of China (No. U1804263, 62172435 and 62002386) and the Zhongyuan Science and Technology Innovation Leading Talent Project, China (No. 214200510019).

References

Ning H., Wang H. and Lin Y., et al. A survey on metaverse: the state-of-the-art, technologies, applications, and challenges, arXiv preprint arXiv preprint [arXiv:2111.09673], 2021. [Google Scholar]
Grider D., Maximo M. The metaverse: Web3.0 virtualcloud economies, (accessed on 1 November 2021). https://grayscale.com/wpcontent/uploads/2021/11/Graysca-le_Metaverse_Report_Nov2021.pdf. [Google Scholar]
Lee L.H., Braud T. and Zhou P., et al. All one needs to know about metaverse: A complete survey on technological singularity, virtual ecosystem, andresearch agenda, arXiv preprint [arXiv:2110.05352], 2021. [Google Scholar]
Sajjad R., Martin M., The Metaverse and Beyond: Implementing Advanced Multiverse Realms With Smart Wearables[J], IEEE Access, 2022; 10: 110796–110806. [Google Scholar]
Yang Q., Zhao Y., Huang H., Zheng Z. Fusing blockchain and Awith metaverse: A survey, arXiv preprint [arXiv:2201.03201], 2000. [Google Scholar]
Li Q., Feng X. and Wang H. Automatically discovering surveillance devices in the cyberspace. In: Proceedings of the 8th ACM on Multimedia Systems Conference, New York, 2017, 331–42. [Google Scholar]
Zhang H., Han W. and Lai X. et al. Survey on cyberspace security. Sci Sin Inf 2016; 46: 125–64. [CrossRef] [Google Scholar]
Feng G.D., Zhang Y. and Zhang Y.Q. Overview of information security risk assessment. J China Inst Commun 2004; 25: 10–8. [Google Scholar]
Xi R.G., Yun X.C. and Jin S.Y., Research survey of Network security situation awareness. Comput Appl 2012; 32: 1–4. [Google Scholar]
Shah S. An introduction to HTTP fingerprinting. Net-Square Solutions, 2004. [Google Scholar]
Gao K. A passive approach to wireless device fingerprinting. In: Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems & Networks, 2010, 383–92. [Google Scholar]
Luo X.Y., Liu Y. and Yin M.J., Network Space Mapping. Beijing: Science Press, 2021. [Google Scholar]
Genevieve B., John H. and Christos P. Understanding passive and active service discovery. In: Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, 2007, 57–70. [Google Scholar]
Wang C.D., Guo Y.B., Zhen S.H. Yang WC. Research on network asset detection technology. Comput Sci 2018; 45: 24–31. [Google Scholar]
Li R, Shen M and Yu H et al. A survey on cyberspace search engines. In: Proceedings of the CCIS, 2020, 206–14. [Google Scholar]
S. Arunan, H.G. Hassan and L. Franco et al., Classifying IoT devices in smart environments using network traffic characteristics. IEEE Trans Mobile Comput 2018; 18: 1745–59. [Google Scholar]
Aneja S., Aneja N. and Islam M.S. IoT device fingerprint using deep learning. In: Proceedings of the 2018 IEEE International Conference on Internet of Things and Intelligence System, 2018, 174–79. [Google Scholar]
Greis J., Yushchenko A., Vogel D., Meier M. Steinhage V. Automated Identification of Vulnerable Devices in Networks using Traffic Data and Deep Learning, [arXiv:2102.08199], 2021. [Google Scholar]
Umair M.B., Iqbal Z. and Bilal M. et al. An efficient internet traffic classification system using deep learning for IoT. Comput Mater Continua 2022; 71: 407–22. [CrossRef] [Google Scholar]
Zhu B.K., Hou X.Y., Liu S.M., Ma W.L., Dong M.Y., Wen H.B., Wei Q., Du S.X. and Zhang Y.F. IoT Equipment Monitoring System Based on C5.0 Decision Tree and Time-Series Analysis[J]. IEEE Access 2021; 10: 36637–36648. [Google Scholar]
Meidan Y., Bohadana M. and Shabtai A., et al. ProfilIoT: A machine learning approach for IoT device identification based on network traffic analysis. In: Proceedings of the Symposium on Applied Computing, 2017, 506–09. [Google Scholar]
Kotak J., Y. Elovici IoT device identification using deep learning. In: Proceedings of the 13th International Conference on Computational Intelligence in Security for Information Systems, Spain, 2019, 76–86. [Google Scholar]
Shiv R.D., Satish K.S. and Bidyut B.C. Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing 2022; 503; 92–108. [CrossRef] [Google Scholar]
Mettinen M, Marchal S and Hafeez I et al. IoT sentinel: Automated device-type identification for security enforcement in IoT. In: Proceedings of the 37th IEEE International Conference on Distributed Computing Systems, 2017. [Google Scholar]

Xiuting Wang received the B.S. degree from Zhengzhou University, Zhengzhou, China, in 2019. She received her M.S. degrees from the State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou, China, in 2022. She is currently teaching at Henan Polytechnic Institute. Her major research direction is network device identification and data analysis.

Ruixiang Li received the B.S. and M.S. degrees from the State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou, China, in 2016 and 2019 respectively. He is currently pursuing his PhD in the state Key Laboratory of Mathematical Engineering and Advanced Computing. His research interests include IP geolocation, device recognition and data analysis.

Shaoyong Du received the B.E. degree in software engineering from Zhengzhou University, in 2012, and the Ph.D. degree in computer science and technology from Nanjing University, in 2019. His current research focuses on security and privacy in mobile computing.

Xiangyang Luo received the B.S., M.S., and Ph.D. degrees from the State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou, China, in 2001, 2004, and 2010, respectively. He is currently a professor at Zhengzhou Science and Technology Institute and the State Key Laboratory of Mathematical Engineering and Advanced Computing. His research interests lie in multimedia security and cyberspace surveying and mapping.

All Tables

Table 1.

Network device types

In the text

Table 2.

Compares the performance of the deep learning model before and after adding the spatial attention mechanism

In the text

All Figures

	Figure 1. Framework of network device identification method based on spatial attention
In the text

	Figure 2. Grayscale image of part network device. (a) D-LinkCam, (b) D-LinkDoorSensor, (c) D-LinkSwitch, (d) D-LinkWaterSenor, (e) Lightify, (f) WeMoSwitch.
In the text

	Figure 3. Spatial attention mechanism
In the text

	Figure 4. CNN model parameters based on spatial attention mechanism
In the text

	Figure 5. Identification accuracy and loss values based on CNN model
In the text

	Figure 6. Identification accuracy and loss value of CNN model based on spatial attention mechanism
In the text

	Figure 7. Identification accuracy and loss values based on multi-layer perceptron MLP model
In the text

	Figure 8. Identification accuracy and loss value of MLP model based on spatial attention mechanism
In the text

	Figure 9. Confusion matrix based on CNN
In the text

	Figure 10. CNN confusion matrix based on spatial attention mechanism
In the text

	Figure 11. Confusion matrix based on MLP
In the text

	Figure 12. MLP confusion matrix based on spatial attention mechanism
In the text

	Figure 13. Comparison of identification accuracy based on MLP model before and after the addition of spatial attention mechanism
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Ning H., Wang H. and Lin Y., et al. A survey on metaverse: the state-of-the-art, technologies, applications, and challenges, arXiv preprint arXiv preprint [arXiv:2111.09673], 2021. [Google Scholar]

[2] Grider D., Maximo M. The metaverse: Web3.0 virtualcloud economies, (accessed on 1 November 2021). https://grayscale.com/wpcontent/uploads/2021/11/Graysca-le_Metaverse_Report_Nov2021.pdf. [Google Scholar]

[3] Lee L.H., Braud T. and Zhou P., et al. All one needs to know about metaverse: A complete survey on technological singularity, virtual ecosystem, andresearch agenda, arXiv preprint [arXiv:2110.05352], 2021. [Google Scholar]

[4] Sajjad R., Martin M., The Metaverse and Beyond: Implementing Advanced Multiverse Realms With Smart Wearables[J], IEEE Access, 2022; 10: 110796–110806. [Google Scholar]

[5] Yang Q., Zhao Y., Huang H., Zheng Z. Fusing blockchain and Awith metaverse: A survey, arXiv preprint [arXiv:2201.03201], 2000. [Google Scholar]

[6] Li Q., Feng X. and Wang H. Automatically discovering surveillance devices in the cyberspace. In: Proceedings of the 8th ACM on Multimedia Systems Conference, New York, 2017, 331–42. [Google Scholar]

[7] Zhang H., Han W. and Lai X. et al. Survey on cyberspace security. Sci Sin Inf 2016; 46: 125–64. [CrossRef] [Google Scholar]

[8] Feng G.D., Zhang Y. and Zhang Y.Q. Overview of information security risk assessment. J China Inst Commun 2004; 25: 10–8. [Google Scholar]

[9] Xi R.G., Yun X.C. and Jin S.Y., Research survey of Network security situation awareness. Comput Appl 2012; 32: 1–4. [Google Scholar]

[10] Shah S. An introduction to HTTP fingerprinting. Net-Square Solutions, 2004. [Google Scholar]

[11] Gao K. A passive approach to wireless device fingerprinting. In: Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems & Networks, 2010, 383–92. [Google Scholar]

[12] Luo X.Y., Liu Y. and Yin M.J., Network Space Mapping. Beijing: Science Press, 2021. [Google Scholar]

[13] Genevieve B., John H. and Christos P. Understanding passive and active service discovery. In: Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, 2007, 57–70. [Google Scholar]

[14] Wang C.D., Guo Y.B., Zhen S.H. Yang WC. Research on network asset detection technology. Comput Sci 2018; 45: 24–31. [Google Scholar]

[15] Li R, Shen M and Yu H et al. A survey on cyberspace search engines. In: Proceedings of the CCIS, 2020, 206–14. [Google Scholar]

[16] S. Arunan, H.G. Hassan and L. Franco et al., Classifying IoT devices in smart environments using network traffic characteristics. IEEE Trans Mobile Comput 2018; 18: 1745–59. [Google Scholar]

[17] Aneja S., Aneja N. and Islam M.S. IoT device fingerprint using deep learning. In: Proceedings of the 2018 IEEE International Conference on Internet of Things and Intelligence System, 2018, 174–79. [Google Scholar]

[18] Greis J., Yushchenko A., Vogel D., Meier M. Steinhage V. Automated Identification of Vulnerable Devices in Networks using Traffic Data and Deep Learning, [arXiv:2102.08199], 2021. [Google Scholar]

[19] Umair M.B., Iqbal Z. and Bilal M. et al. An efficient internet traffic classification system using deep learning for IoT. Comput Mater Continua 2022; 71: 407–22. [CrossRef] [Google Scholar]

[20] Zhu B.K., Hou X.Y., Liu S.M., Ma W.L., Dong M.Y., Wen H.B., Wei Q., Du S.X. and Zhang Y.F. IoT Equipment Monitoring System Based on C5.0 Decision Tree and Time-Series Analysis[J]. IEEE Access 2021; 10: 36637–36648. [Google Scholar]

[21] Meidan Y., Bohadana M. and Shabtai A., et al. ProfilIoT: A machine learning approach for IoT device identification based on network traffic analysis. In: Proceedings of the Symposium on Applied Computing, 2017, 506–09. [Google Scholar]

[22] Kotak J., Y. Elovici IoT device identification using deep learning. In: Proceedings of the 13th International Conference on Computational Intelligence in Security for Information Systems, Spain, 2019, 76–86. [Google Scholar]

[23] Shiv R.D., Satish K.S. and Bidyut B.C. Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing 2022; 503; 92–108. [CrossRef] [Google Scholar]

[24] Mettinen M, Marchal S and Hafeez I et al. IoT sentinel: Automated device-type identification for security enforcement in IoT. In: Proceedings of the 37th IEEE International Conference on Distributed Computing Systems, 2017. [Google Scholar]