INTRODUCTION
Infectious diseases such as Influenza and Coronavirus Disease 19 (COVID-19) cause millions of deaths around the world [1] [2]. The pathogens of such diseases are mainly spread by droplets or aerosols as a result of cough, sneeze, etc [3]. Nowadays, due to the pandemic situation caused by the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) virus, there is an urgent need to limit airborne transmission of COVID-19. The target is to develop and implement effective methods or mechanisms to reduce the number of particles such as viruses from the air. Dissemination of infectious pathogens in crowded areas can be significant and, in many scenarios, the requirement is to implement mechanisms to protect people from being exposed to pathogens [4]. One of the most popular mechanisms is the use of a face mask which in some regions is required by law [5]. The World Health Organization (WHO) issued a guide to the use of face masks as a mechanism to reduce the risk of exposure to the COVID-19 [6]. In the document, the WHO states: “Place the mask carefully, ensuring it covers the mouth and nose, and tie it securely to minimize any gaps between the face and the mask”. The guide aims to help people understand the benefits of using a face mask and the risks associated with not wearing or misusing. Despite the requirements and regulations, people hesitate to wear a mask, or they wear it in the wrong way.
The proliferation of SARS CoV-2 has affected all the countries over the world, and technology has an important role to play in this matter. Today's technology has enabled some areas such as schools to continue in operation, but there are some other areas or jobs that still require face-to-face contact, for instance, hospitals. To reactivate the economy, a certain level of on-site or face-to-face activity is needed [7], but always observing healthcare regulations such as wearing a face mask. The IoT together with AI techniques could work to provide interesting solutions for the COVID-19 pandemic.
Internet of Things (IoT) techniques has been crucial against this pandemic, especially for detecting and tracking infected people. In [8] authors proposed a system using IoT for collecting vital signs from different users. With this system, important data can be collected and analyzed for a better understanding of the symptoms and from the virus. On the other hand, artificial intelligence (AI) has been very important to fight this pandemic. Examples of applications are the algorithms to detect if a person is infected or not with COVID-19. An image classification algorithm is proposed in [9] using deep learning to detect infections in X-ray images. With these algorithms, the images could be processed and improved to help the doctors to have better results in their diagnosis.
To keep track of the people wearing face masks, a surveillance camera could be used for detecting in realtime when someone is using a mask or not, this is possible thanks to the development of AI. In [10] the authors proposed a method for detecting anomalies in surveillance videos using deep learning techniques. One of the advantages of using AI is that a single person does not need to be aware of the place in every moment.
This paper shows the implementation of a face mask detection system, using augmented reality as tracking mechanism to trigger a screen projection on a mobile device which is used to request access to critical areas where the correct use of face mask is a requirement. To achieve this, a machine learning model based on Convolutional Neural Networks is built on top of an IoT framework to enforce the correct use of the face mask in required areas.
Cyber-Physical Systems
Cyber-Physical Systems (CPS) refers to a combination or integration between the physical part and the computations of a system, mainly focused on their interaction [11]. Although this integration is not new, as embedded systems have been around for a while [12], the term CPS is kind of new, in 2006 Helen Gill presented this term and associated this relation with another concept called cybernetics [11].
CPS is growing very fast, and its growth is closely related to the growth of other technologies such as the internet of things and cloud computing. The applications of this kind of system are very wide, some important ones are health care, smart cities, industrial processes, and machine connectivity just to mention a few.
Deep learning
Deep Learning is one of the main subjects of machine learning. Deep learning algorithms are composed of multiple layers to represent learning at different levels; this representation is inspired by biological neural networks [13]. Deep learning uses this Artificial Neural Networks (ANN) to feed a machine with information and generate knowledge without human interaction. Over the last few years, Deep Learning has been a trend in AI and Machine Learning systems. It's widely used in several applications such as speech recognition, object detection, natural language processing (NLP), image classification, and many more [14].
An important asset for Deep Learning is data; a lot of data is needed to give the machine enough information to make good decisions. These algorithms use the new information to change the internal parameters in the ANN for better future performance [14].
Convolutional Neural Network
Convolutional Neural Networks (CNN) has been widely used in recent years for real-time application such as face detection [15]. This class of networks can automatically extract some features from the input data and assign relevant data, such as weight. This is called the Convolutional layer. Once the features are extracted, then the next layer processes the data in different layers to apply filters and reduce the number of parameters, this is the Pooling layers [16].
A basic example of CNN is shown in Figure 1.
Image Classification
Image recognition and classification are difficult tasks for machines [17], deep learning methods are used to process the images to get better data and perform the classification, this process could be: noise reduction, slight improvement, color correction, etc. Multiple images are needed to feed the algorithms to get better results. There are techniques to improve the training data on an algorithm; these techniques are responsible for improving the quality and quantity of the data so that the algorithms work better in different types of environments; this is called data augmentation [18].
Internet of Things
The Internet of Things (IoT) also well known as the Internet of Objects or the Internet of Everything, refers to the interconnected network of all kinds of objects, which are often equipped with data processing technology [19]. Experts estimate that by the end of the year 2025 there will be approximately 75 billion devices connected to the internet [20].
MATERIALS AND METHODS
In this project, it is fundamental to integrate several technologies in which the communication will persist and be consistent from the diffuse to the receptor, this is, from the physical machine to the digital information visualization system. A messaging protocol for sensors and mobile devices, MQTT (Message Queue Telemetry Transport) is a well-known lightweight messaging protocol for IoT systems [21], widely used to communicate and manage message transport from publishers to clients. This protocol must relate to other technologies to get all its potential. This section describes each one of these components that are part of all the systems to be a bridge on each layer from the overall framework.
An algorithm to explain each of the steps followed in this project is presented in Figure 2. Each step will be described in the following subsections.
Methodology
Figure 3 shows the IoT framework for this project. This framework is based on the architecture proposed and explained by the authors in [22]. In the center, it is displayed the MQTT Mosquitto Broker, which is in charge of communicating all devices and states. The clients are subscribed to the assigned topic which serves as the main communication channel. This topic contains all the user states such as the Username, ID, connection attempts, and if it is wearing the mask to grant or deny access. The message uses the JavaScript Object Notation (JSON) format, which is a lightweight data-interchange method, easy for humans to read and write and suitable for machines to process.
The three MQTT clients used for this IoT framework are one Raspberry Pi device, one computer, and the mobile device with the access request application. Raspberry Pi client controls the camera and the servomotor. Turn on the camera when access is requested and send the Open signal to the servo motor as an actuator if the access is granted. The computer client gets all the MQTT server stats to store them in the database and to update it on the dashboard. The mobile device client sends the access request within the user parameters.
The camera detects in real-time when someone requires access to the area of interest. This means that if a person puts on the mask or gets it off in a very short time, it will be detected. The Raspberry Pi receives the data from the camera and constantly communicates those values to the service layer through the internet.
The camera device and the servo motor correspond to the Physical layer. The Raspberry Pi device communicates the Physical layer status as a gateway, so it corresponds to the Communication layer. All data is transferred through the internet and stored in a database, reaching the Service layer. For this case, services are implemented on a local server configured with Apache Server.
The last layer is the Visualization one. Once the data is stored and processed it must be visualized. The parameters of the access control system are displayed on the dashboard and in the mobile device application to inform the user of its status when the access is requested. This layer also displays the ID and picture of the person requiring access.
Face Mask Detection Algorithm
To detect in real-time whether the person in front of the camera is wearing a mask or not, a detector model was made. To train the model, the images were taken from the Kaggle Face Mask Detection Dataset [23]. This Dataset consists of 3725 images of people wearing a mask and 3728 images of people not wearing one. An example of an image used for each class can be seen in Figure 4. Masks with different colors, shapes, and textures were considered, to ensure that as many possible types of face masks were taken into consideration. For the current model, images of people wearing the mask incorrectly were not used.
The model uses CNN and deep learning to extract and process the data to give a classification output. The CNN is designed using Keras and Tensorflow libraries from Python, and the MobileNetV2 architecture. The architecture shows an acceptable performance with low computational power [14], this makes this model suitable for embedded. Once the model was trained, it was deployed to the raspberry pi and camera to start the real-time detection, as shown in Figure 5.
System Modelling
Cyber-physical systems, like the one presented in this paper, can be modeled using state machines to represent their behavior. For the design of the dynamics of the access control system, MATLAB’s Simulink was used. Figure 6 shows the layout where its operation is described.
The state machine represented by the Simulink Stateflow manages the behavior of the system according to the inputs it receives. These inputs are the access attempt that the person sends with their username from a mobile application, the result of the mask detection model on the image captured in real-time by the came- ra, and the successful or failed connection of the MQTT server. The state machine can be seen in Figure 7.
The first state is searching for an access attempt that will be made from the mobile app. Once an access attempt is detected, the system jumps to the next state which is in charge of checking the connection with the MQTT server, if the connection is successful, it goes to the other state, otherwise, it returns to the first state, and the user must retry the access until there is a successful connection; in this state, user information is sent to a da- tabase. The third state checks whether the person who wants to enter has a mask; in case of using one, the access is granted, and the door lock state is sent to be opened, in case of not having the mask, entry will be denied.
The database subsystem is receiving the user's information, store and transmit it to a dashboard designed in HTML where all the access attempts can be visualized. The door lock subsystem is responsible for controlling the servo motor or any other lock mechanism that may be selected.
Mobile Application
The Android-based mobile application runs the Image Tracking Detection developed with the Graphical Motor Unity 3D and the Vuforia SDK Engine.
Unity is a very popular video game engine to create the most sophisticated video games and a wide range of in- teractive apps for several kinds of users and industries.
Vuforia Engine is a straightforward integration software development kit (SDK), that uses the newer techniques in Computer Vision for tracking or recognizing images and objects for Augmented Reality applications [24]. It consists of controlling a camera sensor that captures the frames and passes them to the computer vision algorithms that analyze the datasets that detect and track real-world objects and compare them with the Vuforia web-based developer registered targets [25].
The Vuforia Engine SDK and Unity Engine’s advantages to track and display content on the handheld device are applied to this work.
In this project, the Vuforia Engine SDK and the Unity Engine were used to develop a tracking app to trigger a mobile User Interface where the user insert its credentials to access (Figure 8).
The app is installed on a hand-held device which displays in a full-screen mode the user interface in which the control access connection will be done. The mobile application uses the M2Mqtt library, an MQTT client available for all .Net platforms for IoT and M2M communication. The Android-based mobile application runs the Image Tracking Detection developed with Unity and Vuforia SDK Engines.
RESULTS AND DISCUSSION
Detection Model Performance
The metrics used to evaluate the performance of the detection model are Precision, Recall, F1-score, and Accuracy. The explanation of these metrics is described below.
In the Equations (1) to (4), True Positives (TruePos) are the images that were classified correctly as positives, in this case, people wearing masks.
Similarly, True Negatives (TrueNeg) are the images correctly classified as negatives, people not wearing masks. False Positives (FalsePos) are those cases when the image is classified as positive, but it is labeled as negative. False Negatives (FalseNeg), on the other hand, are those cases when the image is classified as a negative but is labeled as a positive.
The obtained metrics for the face mask detector model after the training are presented in Figure 9.
The model presents an average accuracy of 96% when classifying if a person is wearing a mask or not.
The behavior of the model after 20 epochs of training can be seen in Figure 10. As it can be seen, the training loss decreases as the model is being trained, while the accuracy of the model increases. The total training time was close to 40 minutes.
The Confusion Matrix presented in Figure 11 can lead to a better understanding of the model’s results and shows where it gets confused.
This model successfully identified 685 images of people wearing masks (91.95 % of True Positives) and 762 images of people not wearing masks (99.48% of True Negatives). Nevertheless, the model incorrectly classified 60 images of people not wearing masks, when in fact they were (8.05% False Negatives). And finally, it also incorrectly classified 4 images of people wearing a mask when they weren’t wearing them (0.52% False Positives), which is good for our proposal, as it will have a minimal error if a person wants to enter without a mask.
The ROC Curve displayed in Figure 12, shows the performance of the model when it is trying to differentiate one class from the other with the default threshold of 0.5. An ideal model will have an Area Under the Curve (AUC) of 1. This model presents an AUC of 0.96, which represents a good performance to distinguish between the 2 classes.
Limitations and Future work
One limitation of the current model presented in this paper is that it was only trained with people wearing or not wearing a mask. Cases, where the person may be using the mask incorrectly, were not taken into consideration, although these cases usually classify them as not wearing masks (see Figure 5). A third class could be added with cases when the person is wearing the mask incorrectly, this would help the detector to perform better.
Another limitation of this project is the hardware of the embedded system. For this proposal, a raspberry pi 4 is used, which has certain constraints when working with real-time object detection.
Currently, the time elapsed from when the user requests access until it is recognized if the person is wearing a mask and access is granted, are 2 to 2.5 seconds. This would improve with a device with a higher GPU capacity.
The device number could be reduced by running the MQTT Mosquitto server and the Apache server in the same Raspberry Pi.
Considering that this work employs the Target Tracking technique using the Unity and Vuforia Engine’s, Augmented Reality technology can be exploited to the next level by adding useful and attractive information in the field of view of the user’s mobile device, having unlimited representations for user interfaces, videos, 3D objects, visual animations, or other developed features, like security and health information.
This system is flexible and adaptable to any area, section, room, department, or other places according to the needs of the company or institution. The door lock mechanism may be different from each control access, and it could be as simple or complex as it is required. For example, using a servomotor or using electromagnetic door locks.
Another implementation possibility is to add control parameters to the system, such as the ID of the Access Point or other health measures from the person.
Push-up notifications or alert messages can be applied to notify supervisors. The database could save the surveillance frame when a person is not using the mask and store the evidence for future references.
CONCLUSIONS
In this paper, an autonomic face mask detection system applying deep learning was proposed for controlling access to critical areas. The face mask detector showed an average accuracy of 96% when detecting if the user requesting access is wearing a mask, which can be considered good performance considering that the model was created using a CNN with the MobileNetV2 architecture for low computational devices. Through the confusion matrix, it can be seen that the model classified correctly 91.95% of the True Positives (people wearing masks), 99.48% of the True Negatives (people not wearing a mask) and got confused by 8.05% with False Negatives, but only 0.52% with False Positives, which can be interpreted as good, since the system will make fewer mistakes when given access to people not wearing a mask.
The integration and connection between all the devices are made possible thanks to the application of IoT. User access is requested with their mobile device through image tracking (with the Vuforia app), this access reaches a Mosquitto server with MQTT which is also responsible for sending this request to the embedded device (raspberry pi 4) in charge of granting the access using the camera and the face detection model. This occurs in a time between 2 to 2.5 seconds, which can be reduced if a higher graphics processing device is used.
The use of face masks is essential in times of pandemic, and measures must be taken to ensure that people who leave their homes always use one when entering public places or where there is a lot of contact with other people, conditions of high risk for the infection of COVID-19. This project shows how technologies such as the IoT, artificial intelligence, and augmented reality can be integrated to help with this problem. With this system, a healthy culture can also be educated where the use of the mask is mandatory and essential to the “new normal” life.
The access system has the potential to be installed in different areas and adapted according to the needs of the establishment. The results shown in this work revealed an efficient system to control and collect information remotely, without the need for face-toface monitoring.
A face mask detection system using artificial intelligence and powered by IoT technologies, like the one shown in this paper, has a wide application potential. Everything seems to indicate that the use of face masks will be a measure that should be adopted in different work centers and crowded places. The experience with COVID-19 should be used for the next health contingencies that could potentially occur in the following years.