Applied Sciences

Research

16 pages, 2902 KiB

Open AccessArticle

Enhanced Context Learning with Transformer for Human Parsing

by Jingya Song, Qingxuan Shi, Yihang Li and Fang Yang

Appl. Sci. 2022, 12(15), 7821; https://doi.org/10.3390/app12157821 - 4 Aug 2022

Viewed by 1578

Human parsing is a fine-grained human semantic segmentation task in the field of computer vision. Due to the challenges of occlusion, diverse poses and a similar appearance of different body parts and clothing, human parsing requires more attention to learn context information. Based [...] Read more.

Human parsing is a fine-grained human semantic segmentation task in the field of computer vision. Due to the challenges of occlusion, diverse poses and a similar appearance of different body parts and clothing, human parsing requires more attention to learn context information. Based on this observation, we enhance the learning of global and local information to obtain more accurate human parsing results. In this paper, we introduce a Global Transformer Module (GTM) via a self-attention mechanism to capture long-range dependencies for effectively extracting context information. Moreover, we design a Detailed Feature Enhancement (DFE) architecture to exploit spatial semantics for small targets. The low-level visual features from CNN intermediate layers are enhanced by using channel and spatial attention. In addition, we adopt an edge detection module to refine the prediction. We conducted extensive experiments on three datasets (i.e., LIP, ATR, and Fashion Clothing) to show the effectiveness of our method, which achieves 54.55% mIoU on the LIP dataset, 80.26% on the average F-1 score on the ATR dataset and 55.19% on the average F-1 score on the Fashion Clothing dataset. Full article

(This article belongs to the Special Issue Advances in Computer Vision, Volume Ⅱ)

► Show Figures

Figure 1

22 pages, 1449 KiB

Open AccessArticle

Performance Optimization of Object Tracking Algorithms in OpenCV on GPUs

by Jaehyun Song, Hwanjin Jeong and Jinkyu Jeong

Appl. Sci. 2022, 12(15), 7801; https://doi.org/10.3390/app12157801 - 3 Aug 2022

Cited by 1 | Viewed by 2174

Abstract

Machine-learning-based computer vision is increasingly versatile and being leveraged by a wide range of smart devices. Due to the limited performance/energy budget of computing units in smart devices, the careful implementation of computer vision algorithms is critical. In this paper, we analyze the [...] Read more.

Machine-learning-based computer vision is increasingly versatile and being leveraged by a wide range of smart devices. Due to the limited performance/energy budget of computing units in smart devices, the careful implementation of computer vision algorithms is critical. In this paper, we analyze the performance bottleneck of two well-known computer vision algorithms for object tracking: object detection and optical flow in the Open-source Computer Vision library (OpenCV). Based on our in-depth analysis of their implementation, we found the current implementation fails to utilize Open Computing Language (OpenCL) accelerators (e.g., GPUs). Based on the analysis, we propose several optimization strategies and apply them to the OpenCL implementation of object tracking algorithms. Our evaluation results demonstrate the performance of the object detection is improved by up to 86% and the performance of the optical flow by up to 10%. We believe our optimization strategies can be applied to other computer vision algorithms implemented in OpenCL. Full article

(This article belongs to the Special Issue Advances in Computer Vision, Volume Ⅱ)

► Show Figures

Figure 1

17 pages, 759 KiB

Open AccessArticle

Unsupervised Domain Adaptive Person Re-Identification via Intermediate Domains

by Haonan Xie, Hao Luo, Jianyang Gu and Wei Jiang

Appl. Sci. 2022, 12(14), 6990; https://doi.org/10.3390/app12146990 - 11 Jul 2022

Cited by 3 | Viewed by 1791

Abstract

Recent years have witnessed outstanding success in supervised domain adaptive person re-identification (ReID). However, the model often suffers serious performance drops when transferring to another domain in real-world applications. To address the domain gap situations, many unsupervised domain adaptive (UDA) methods have been [...] Read more.

Recent years have witnessed outstanding success in supervised domain adaptive person re-identification (ReID). However, the model often suffers serious performance drops when transferring to another domain in real-world applications. To address the domain gap situations, many unsupervised domain adaptive (UDA) methods have been proposed to adapt the model trained on the source domain to a target domain. Such methods are typically based on clustering algorithms to generate pseudo labels. Noisy labels, which often exist due to the instability of clustering algorithms, will substantially affect the performance of UDA methods. In this study, we focused on intermediate domains that can be regarded as a bridge that connects source and target domains. We added a domainness factor in the loss function of SPGAN that can decide the style of the image generated by the GAN model. We obtained a series of intermediate domains by changing the value of the domainness factor. Pseudo labels are more reliable because intermediate domains are closer to the source domain compared with the target domain. We then fine-tuned the model pre-trained with source data on these intermediate domains. The fine-tuning process was conducted repeatedly because intermediate domains are composed of more than one dataset. Finally, the model fine-tuned on intermediate domains was adapted to the target domain. The model can easily adapt to changes in image style as we gradually transfer the model to the target domain along the bridge consisting of several intermediate domains. To the best of our knowledge, we are the first to apply intermediate domains to UDA problems. We evaluated our method on Market1501, DukeMTMC-reID and MSMT17 datasets. Experimental results proved that our method brings a significant improvement and achieves a state-of-the-art performance. Full article

(This article belongs to the Special Issue Advances in Computer Vision, Volume Ⅱ)

► Show Figures

Figure 1

16 pages, 5631 KiB

Open AccessArticle

A Heart Rate Variability-Based Paroxysmal Atrial Fibrillation Prediction System

by Milna Maria Mendez, Min-Chia Hsu, Jenq-Tay Yuan and Ke-Shiuan Lynn

Appl. Sci. 2022, 12(5), 2387; https://doi.org/10.3390/app12052387 - 25 Feb 2022

Cited by 5 | Viewed by 2196

Abstract

Atrial fibrillation (AF) is characterized by totally disorganized atrial depolarizations without effective atrial contraction. It is the most common form of cardiac arrhythmia, affecting more than 46.3 million people worldwide and its incidence rate remains increasing. Although AF itself is not life-threatening, its [...] Read more.

Atrial fibrillation (AF) is characterized by totally disorganized atrial depolarizations without effective atrial contraction. It is the most common form of cardiac arrhythmia, affecting more than 46.3 million people worldwide and its incidence rate remains increasing. Although AF itself is not life-threatening, its complications, such as strokes and heart failure, are lethal. About 25% of paroxysmal AF (PAF) patients become chronic for an observation period of more than one year. For long-term and real-time monitoring, a PAF prediction system was developed with four objectives: (1) high prediction accuracy, (2) fast computation, (3) small data storage, and (4) easy medical interpretations. The system takes a 400-point heart rate variability (HRV) sequence containing no AF episodes as the input and outputs whether the corresponding subject will experience AF episodes in the near future (i.e., 30 min). It first converts an input HRV sequence into four image matrices via extended Poincaré plots to capture inter- and intra-person features. Then, the system employs a convolutional neural network (CNN) to perform feature selection and classification based on the input image matrices. Some design issues of the system, including feature conversion and classifier structure, were formulated as a binary optimization problem, which was then solved via a genetic algorithm (GA). A numerical study involving 6085 400-point HRV sequences excerpted from three PhysioNet databases showed that the developed PAF prediction system achieved 87.9% and 87.2% accuracy on the validation and the testing datasets, respectively. The performance is competitive with that of the leading PAF prediction system in the literature, yet our system is much faster and more intensively tested. Furthermore, from the designed inter-person features, we found that PAF patients often possess lower (~60 beats/min) or higher (~100 beats/min) heart rates than non-PAF subjects. On the other hand, from the intra-person features, we observed that PAF patients often exhibit smaller variations (≤5 beats/min) in heart rate than non-PAF subjects, but they may experience short bursts of large heart rate changes sometimes, probably due to abnormal beats, such as premature atrial beats. The other findings warrant further investigations for their medical implications about the onset of PAF. Full article

(This article belongs to the Special Issue Advances in Computer Vision, Volume Ⅱ)

► Show Figures

Graphical abstract

22 pages, 10550 KiB

Open AccessArticle

A Defect-Inspection System Constructed by Applying Autoencoder with Clustered Latent Vectors and Multi-Thresholding Classification

by Cheng-Chang Lien and Yu-De Chiu

Appl. Sci. 2022, 12(4), 1883; https://doi.org/10.3390/app12041883 - 11 Feb 2022

Viewed by 1737

Abstract

Defect inspection is an important issue in the field of industrial automation. In general, defect-inspection methods can be categorized into supervised and unsupervised methods. When supervised learning is applied to defect inspection, the large variation of defect patterns can make the data coverage [...] Read more.

Defect inspection is an important issue in the field of industrial automation. In general, defect-inspection methods can be categorized into supervised and unsupervised methods. When supervised learning is applied to defect inspection, the large variation of defect patterns can make the data coverage incomplete for model training, which can introduce the problem of low detection accuracy. Therefore, this paper focuses on the construction of a defect-inspection system with an unsupervised learning model. Furthermore, few studies have focused on the analysis between the reconstruction error on the normal areas and the repair effect on the defective areas for unsupervised defect-inspection systems. Hence, this paper addresses this important issue. There are four main contributions to this paper. First, we compare the effects of SSIM (Structural Similarity Index Measure) and MSE (Mean Square Error) functions on the reconstruction error. Second, various kinds of Autoencoders are constructed by referring to the Inception architecture in GoogleNet and DEC (Deep Embedded Clustering) module. Third, two-stage model training is proposed to train the Autoencoder models. In the first stage, the Autoencoder models are trained to have basic image-reconstruction capabilities for the normal areas. In the second stage, the DEC algorithm is added to the training of the Autoencoder model to further strengthen feature discrimination and then increase the capability to repair defective areas. Fourth, the multi-thresholding image segmentation method is applied to improve the classification accuracy of normal and defect images. In this study, we focus on the defect inspection on the texture patterns. Therefore, we select the nanofiber image database and carpet and grid images in the MVTec database to conduct experiments. The experimental results show that the accuracy of classifying normal and defect patch nanofiber images is about 86% and the classification accuracy can approach 89% and 98% for carpet and grid datasets in the MVTec database, respectively. It is obvious that our proposed defect-inspection and classification system outperforms the methods in MVTec. Full article

(This article belongs to the Special Issue Advances in Computer Vision, Volume Ⅱ)

► Show Figures

Figure 1

21 pages, 2378 KiB

Open AccessArticle

Large-Scale Printed Chinese Character Recognition for ID Cards Using Deep Learning and Few Samples Transfer Learning

by Yi-Quan Li, Hao-Sen Chang and Daw-Tung Lin

Appl. Sci. 2022, 12(2), 907; https://doi.org/10.3390/app12020907 - 17 Jan 2022

Cited by 8 | Viewed by 3942

Abstract

In the field of computer vision, large-scale image classification tasks are both important and highly challenging. With the ongoing advances in deep learning and optical character recognition (OCR) technologies, neural networks designed to perform large-scale classification play an essential role in facilitating OCR [...] Read more.

In the field of computer vision, large-scale image classification tasks are both important and highly challenging. With the ongoing advances in deep learning and optical character recognition (OCR) technologies, neural networks designed to perform large-scale classification play an essential role in facilitating OCR systems. In this study, we developed an automatic OCR system designed to identify up to 13,070 large-scale printed Chinese characters by using deep learning neural networks and fine-tuning techniques. The proposed framework comprises four components, including training dataset synthesis and background simulation, image preprocessing and data augmentation, the process of training the model, and transfer learning. The training data synthesis procedure is composed of a character font generation step and a background simulation process. Three background models are proposed to simulate the factors of the background noise patterns on ID cards. To expand the diversity of the synthesized training dataset, rotation and zooming data augmentation are applied. A massive dataset comprising more than 19.6 million images was thus created to accommodate the variations in the input images and improve the learning capacity of the CNN model. Subsequently, we modified the GoogLeNet neural architecture by replacing the fully connected layer with a global average pooling layer to avoid overfitting caused by a massive amount of training data. Consequently, the number of model parameters was reduced. Finally, we employed the transfer learning technique to further refine the CNN model using a small number of real data samples. Experimental results show that the overall recognition performance of the proposed approach is significantly better than that of prior methods and thus demonstrate the effectiveness of proposed framework, which exhibited a recognition accuracy as high as 99.39% on the constructed real ID card dataset. Full article

(This article belongs to the Special Issue Advances in Computer Vision, Volume Ⅱ)

► Show Figures

Figure 1

14 pages, 3485 KiB

Open AccessArticle

Generating Scenery Images with Larger Variety According to User Descriptions

by Hsu-Yung Cheng and Chih-Chang Yu

Appl. Sci. 2021, 11(21), 10224; https://doi.org/10.3390/app112110224 - 1 Nov 2021

Viewed by 1452

Abstract

In this paper, a framework based on generative adversarial networks is proposed to perform nature-scenery generation according to descriptions from the users. The desired place, time and seasons of the generated scenes can be specified with the help of text-to-image generation techniques. The [...] Read more.

In this paper, a framework based on generative adversarial networks is proposed to perform nature-scenery generation according to descriptions from the users. The desired place, time and seasons of the generated scenes can be specified with the help of text-to-image generation techniques. The framework improves and modifies the architecture of a generative adversarial network with attention models by adding the imagination models. The proposed attentional and imaginative generative network uses the hidden layer information to initialize the memory cell of the recurrent neural network to produce the desired photos. A data set containing different categories of scenery images is established to train the proposed system. The experiments validate that the proposed method is able to increase the quality and diversity of the generated images compared to the existing method. A possible application of road image generation for data augmentation is also demonstrated in the experimental results. Full article

(This article belongs to the Special Issue Advances in Computer Vision, Volume Ⅱ)

► Show Figures

Figure 1

25 pages, 15901 KiB

Open AccessArticle

A Novel Luminance-Based Algorithm for Classification of Semi-Dark Images

by Mehak Maqbool Memon, Manzoor Ahmed Hashmani, Aisha Zahid Junejo, Syed Sajjad Rizvi and Adnan Ashraf Arain

Appl. Sci. 2021, 11(18), 8694; https://doi.org/10.3390/app11188694 - 18 Sep 2021

Cited by 5 | Viewed by 2868

Abstract

Image classification of a visual scene based on visibility is significant due to the rise in readily available automated solutions. Currently, there are only two known spectrums of image visibility i.e., dark, and bright. However, normal environments include semi-dark scenarios. Hence, visual extremes [...] Read more.

Image classification of a visual scene based on visibility is significant due to the rise in readily available automated solutions. Currently, there are only two known spectrums of image visibility i.e., dark, and bright. However, normal environments include semi-dark scenarios. Hence, visual extremes that will lead to the accurate extraction of image features should be duly discarded. Fundamentally speaking there are two broad methods to perform visual scene-based image classification, i.e., machine learning (ML) methods and computer vision methods. In ML, the issues of insufficient data, sophisticated hardware and inadequate image classifier training time remain significant problems to be handled. These techniques fail to classify the visual scene-based images with high accuracy. The other alternative is computer vision (CV) methods, which also have major issues. CV methods do provide some basic procedures which may assist in such classification but, to the best of our knowledge, no CV algorithm exists to perform such classification, i.e., these do not account for semi-dark images in the first place. Moreover, these methods do not provide a well-defined protocol to calculate images’ content visibility and thereby classify images. One of the key algorithms for calculation of images’ content visibility is backed by the HSL (hue, saturation, lightness) color model. The HSL color model allows the visibility calculation of a scene by calculating the lightness/luminance of a single pixel. Recognizing the high potential of the HSL color model, we propose a novel framework relying on the simple approach of the statistical manipulation of an entire image’s pixel intensities, represented by HSL color model. The proposed algorithm, namely, Relative Perceived Luminance Classification (RPLC) uses the HSL (hue, saturation, lightness) color model to correctly identify the luminosity values of the entire image. Our findings prove that the proposed method yields high classification accuracy (over 78%) with a small error rate. We show that the computational complexity of RPLC is much less than that of the state-of-the-art ML algorithms. Full article

(This article belongs to the Special Issue Advances in Computer Vision, Volume Ⅱ)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Advances in Computer Vision, Volume Ⅱ

Share This Special Issue

Special Issue Editor

Special Issue Information

Published Papers (8 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI