applsci-logo

Journal Browser

Journal Browser

Advances in Computer Vision, Volume Ⅱ

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (25 February 2023) | Viewed by 20797

Special Issue Editor


E-Mail Website
Guest Editor
Electrical Engineering, Fu Jen Catholic University, New Taipei 24205, Taiwan
Interests: intelligent video surveillance; face recognition; deep learning for object detection; robotic vision; embedded computer vision; sleep healthcare; neuromorphic computing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Computer vision has become one of the most successful research topics in artificial intelligence. It is a key driving factor of successful applications such as face recognition, optical character recognition, biometrics, and video surveillance, and teaches machines to see. Machines have eyes and brains to interpret the world by extracting meaning from image pixels. The recent vast development in various novel applications has brought computer vision to a new peak, and includes augmented reality, computational photography, autonomous vehicles, unmanned air vehicles and unmanned stores, egocentric vision, and three-dimensional movies. In more real and complicated applications, machine learning and neural networks are employed to achieve a big leap in computer vision. In particular, deep learning shows great promise for computer vision applications.

Computer vision dramatically consumes processing power. However, thanks to the continuously increasing processing and sensing power of mobile processors and the quality of emerging displays, computer vision no longer requires expensive specialized lab equipment and has proven its practical applicability in many domains such as health, automotive, art, education, intelligent manufacturing, smart agriculture, and others. Embedded computer vision applies DSP processors, FPGA, and GPU devices to achieve edge computing. Moreover, neuromorphic computing, that is, the so-called next-level neural networks, can simulate the visual cortex and has great potential to develop high-performance computer vision algorithms.

In this Special Issue on “Advances in Computer Vision”, we invite authors to submit original research articles, reviews, and viewpoint articles related to recent advances at all levels of the applications and technologies of computer vision. We are particularly interested in presenting emerging technologies related to machine learning and deep learning that may have a significant impact on this research field. We are open to papers addressing a broad range of topics, from foundational topics regarding theoretical issues of computer vision to novel algorithms improving classical vision problems, advanced and technological systems for interesting applications, and innovative approaches in edge computing and neuromorphic computing. Topics of interest for this Special Issue include but are not limited to:

  • Object detection, tracking, categorization, and recognition
  • Machine learning and deep learning for computer vision
  • Segmentation, feature extraction, and registration for images and videos
  • Three-dimensional imaging, analysis, and applications
  • Biometrics by the recognition of face, fingerprint, palm, iris, and more
  • Gesture, behavior, and event analysis for videos
  • Computational photography, such as superresolution, high-dynamic-range imaging, style transfer, colorization and decolorization, and more
  • Beyond the visual spectrum in computer vision, such as near-infrared and thermal imaging
  • Embedded computer vision for edge computing
  • Novel applications in video surveillance, augmented reality, sport video analysis, unmanned air vehicle, robotic vision, medical image, healthcare, AIoT, intelligent consumer electronics, etc.
  • Neuromorphic computing for computer vision

Prof. Yuan-Kai Wang
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 2902 KiB  
Article
Enhanced Context Learning with Transformer for Human Parsing
by Jingya Song, Qingxuan Shi, Yihang Li and Fang Yang
Appl. Sci. 2022, 12(15), 7821; https://doi.org/10.3390/app12157821 - 4 Aug 2022
Viewed by 1578
Abstract
Human parsing is a fine-grained human semantic segmentation task in the field of computer vision. Due to the challenges of occlusion, diverse poses and a similar appearance of different body parts and clothing, human parsing requires more attention to learn context information. Based [...] Read more.
Human parsing is a fine-grained human semantic segmentation task in the field of computer vision. Due to the challenges of occlusion, diverse poses and a similar appearance of different body parts and clothing, human parsing requires more attention to learn context information. Based on this observation, we enhance the learning of global and local information to obtain more accurate human parsing results. In this paper, we introduce a Global Transformer Module (GTM) via a self-attention mechanism to capture long-range dependencies for effectively extracting context information. Moreover, we design a Detailed Feature Enhancement (DFE) architecture to exploit spatial semantics for small targets. The low-level visual features from CNN intermediate layers are enhanced by using channel and spatial attention. In addition, we adopt an edge detection module to refine the prediction. We conducted extensive experiments on three datasets (i.e., LIP, ATR, and Fashion Clothing) to show the effectiveness of our method, which achieves 54.55% mIoU on the LIP dataset, 80.26% on the average F-1 score on the ATR dataset and 55.19% on the average F-1 score on the Fashion Clothing dataset. Full article
(This article belongs to the Special Issue Advances in Computer Vision, Volume Ⅱ)
Show Figures

Figure 1

22 pages, 1449 KiB  
Article
Performance Optimization of Object Tracking Algorithms in OpenCV on GPUs
by Jaehyun Song, Hwanjin Jeong and Jinkyu Jeong
Appl. Sci. 2022, 12(15), 7801; https://doi.org/10.3390/app12157801 - 3 Aug 2022
Cited by 1 | Viewed by 2174
Abstract
Machine-learning-based computer vision is increasingly versatile and being leveraged by a wide range of smart devices. Due to the limited performance/energy budget of computing units in smart devices, the careful implementation of computer vision algorithms is critical. In this paper, we analyze the [...] Read more.
Machine-learning-based computer vision is increasingly versatile and being leveraged by a wide range of smart devices. Due to the limited performance/energy budget of computing units in smart devices, the careful implementation of computer vision algorithms is critical. In this paper, we analyze the performance bottleneck of two well-known computer vision algorithms for object tracking: object detection and optical flow in the Open-source Computer Vision library (OpenCV). Based on our in-depth analysis of their implementation, we found the current implementation fails to utilize Open Computing Language (OpenCL) accelerators (e.g., GPUs). Based on the analysis, we propose several optimization strategies and apply them to the OpenCL implementation of object tracking algorithms. Our evaluation results demonstrate the performance of the object detection is improved by up to 86% and the performance of the optical flow by up to 10%. We believe our optimization strategies can be applied to other computer vision algorithms implemented in OpenCL. Full article
(This article belongs to the Special Issue Advances in Computer Vision, Volume Ⅱ)
Show Figures

Figure 1

17 pages, 759 KiB  
Article
Unsupervised Domain Adaptive Person Re-Identification via Intermediate Domains
by Haonan Xie, Hao Luo, Jianyang Gu and Wei Jiang
Appl. Sci. 2022, 12(14), 6990; https://doi.org/10.3390/app12146990 - 11 Jul 2022
Cited by 3 | Viewed by 1791
Abstract
Recent years have witnessed outstanding success in supervised domain adaptive person re-identification (ReID). However, the model often suffers serious performance drops when transferring to another domain in real-world applications. To address the domain gap situations, many unsupervised domain adaptive (UDA) methods have been [...] Read more.
Recent years have witnessed outstanding success in supervised domain adaptive person re-identification (ReID). However, the model often suffers serious performance drops when transferring to another domain in real-world applications. To address the domain gap situations, many unsupervised domain adaptive (UDA) methods have been proposed to adapt the model trained on the source domain to a target domain. Such methods are typically based on clustering algorithms to generate pseudo labels. Noisy labels, which often exist due to the instability of clustering algorithms, will substantially affect the performance of UDA methods. In this study, we focused on intermediate domains that can be regarded as a bridge that connects source and target domains. We added a domainness factor in the loss function of SPGAN that can decide the style of the image generated by the GAN model. We obtained a series of intermediate domains by changing the value of the domainness factor. Pseudo labels are more reliable because intermediate domains are closer to the source domain compared with the target domain. We then fine-tuned the model pre-trained with source data on these intermediate domains. The fine-tuning process was conducted repeatedly because intermediate domains are composed of more than one dataset. Finally, the model fine-tuned on intermediate domains was adapted to the target domain. The model can easily adapt to changes in image style as we gradually transfer the model to the target domain along the bridge consisting of several intermediate domains. To the best of our knowledge, we are the first to apply intermediate domains to UDA problems. We evaluated our method on Market1501, DukeMTMC-reID and MSMT17 datasets. Experimental results proved that our method brings a significant improvement and achieves a state-of-the-art performance. Full article
(This article belongs to the Special Issue Advances in Computer Vision, Volume Ⅱ)
Show Figures

Figure 1

16 pages, 5631 KiB  
Article
A Heart Rate Variability-Based Paroxysmal Atrial Fibrillation Prediction System
by Milna Maria Mendez, Min-Chia Hsu, Jenq-Tay Yuan and Ke-Shiuan Lynn
Appl. Sci. 2022, 12(5), 2387; https://doi.org/10.3390/app12052387 - 25 Feb 2022
Cited by 5 | Viewed by 2196
Abstract
Atrial fibrillation (AF) is characterized by totally disorganized atrial depolarizations without effective atrial contraction. It is the most common form of cardiac arrhythmia, affecting more than 46.3 million people worldwide and its incidence rate remains increasing. Although AF itself is not life-threatening, its [...] Read more.
Atrial fibrillation (AF) is characterized by totally disorganized atrial depolarizations without effective atrial contraction. It is the most common form of cardiac arrhythmia, affecting more than 46.3 million people worldwide and its incidence rate remains increasing. Although AF itself is not life-threatening, its complications, such as strokes and heart failure, are lethal. About 25% of paroxysmal AF (PAF) patients become chronic for an observation period of more than one year. For long-term and real-time monitoring, a PAF prediction system was developed with four objectives: (1) high prediction accuracy, (2) fast computation, (3) small data storage, and (4) easy medical interpretations. The system takes a 400-point heart rate variability (HRV) sequence containing no AF episodes as the input and outputs whether the corresponding subject will experience AF episodes in the near future (i.e., 30 min). It first converts an input HRV sequence into four image matrices via extended Poincaré plots to capture inter- and intra-person features. Then, the system employs a convolutional neural network (CNN) to perform feature selection and classification based on the input image matrices. Some design issues of the system, including feature conversion and classifier structure, were formulated as a binary optimization problem, which was then solved via a genetic algorithm (GA). A numerical study involving 6085 400-point HRV sequences excerpted from three PhysioNet databases showed that the developed PAF prediction system achieved 87.9% and 87.2% accuracy on the validation and the testing datasets, respectively. The performance is competitive with that of the leading PAF prediction system in the literature, yet our system is much faster and more intensively tested. Furthermore, from the designed inter-person features, we found that PAF patients often possess lower (~60 beats/min) or higher (~100 beats/min) heart rates than non-PAF subjects. On the other hand, from the intra-person features, we observed that PAF patients often exhibit smaller variations (≤5 beats/min) in heart rate than non-PAF subjects, but they may experience short bursts of large heart rate changes sometimes, probably due to abnormal beats, such as premature atrial beats. The other findings warrant further investigations for their medical implications about the onset of PAF. Full article
(This article belongs to the Special Issue Advances in Computer Vision, Volume Ⅱ)
Show Figures

Graphical abstract

22 pages, 10550 KiB  
Article
A Defect-Inspection System Constructed by Applying Autoencoder with Clustered Latent Vectors and Multi-Thresholding Classification
by Cheng-Chang Lien and Yu-De Chiu
Appl. Sci. 2022, 12(4), 1883; https://doi.org/10.3390/app12041883 - 11 Feb 2022
Viewed by 1737
Abstract
Defect inspection is an important issue in the field of industrial automation. In general, defect-inspection methods can be categorized into supervised and unsupervised methods. When supervised learning is applied to defect inspection, the large variation of defect patterns can make the data coverage [...] Read more.
Defect inspection is an important issue in the field of industrial automation. In general, defect-inspection methods can be categorized into supervised and unsupervised methods. When supervised learning is applied to defect inspection, the large variation of defect patterns can make the data coverage incomplete for model training, which can introduce the problem of low detection accuracy. Therefore, this paper focuses on the construction of a defect-inspection system with an unsupervised learning model. Furthermore, few studies have focused on the analysis between the reconstruction error on the normal areas and the repair effect on the defective areas for unsupervised defect-inspection systems. Hence, this paper addresses this important issue. There are four main contributions to this paper. First, we compare the effects of SSIM (Structural Similarity Index Measure) and MSE (Mean Square Error) functions on the reconstruction error. Second, various kinds of Autoencoders are constructed by referring to the Inception architecture in GoogleNet and DEC (Deep Embedded Clustering) module. Third, two-stage model training is proposed to train the Autoencoder models. In the first stage, the Autoencoder models are trained to have basic image-reconstruction capabilities for the normal areas. In the second stage, the DEC algorithm is added to the training of the Autoencoder model to further strengthen feature discrimination and then increase the capability to repair defective areas. Fourth, the multi-thresholding image segmentation method is applied to improve the classification accuracy of normal and defect images. In this study, we focus on the defect inspection on the texture patterns. Therefore, we select the nanofiber image database and carpet and grid images in the MVTec database to conduct experiments. The experimental results show that the accuracy of classifying normal and defect patch nanofiber images is about 86% and the classification accuracy can approach 89% and 98% for carpet and grid datasets in the MVTec database, respectively. It is obvious that our proposed defect-inspection and classification system outperforms the methods in MVTec. Full article
(This article belongs to the Special Issue Advances in Computer Vision, Volume Ⅱ)
Show Figures

Figure 1

21 pages, 2378 KiB  
Article
Large-Scale Printed Chinese Character Recognition for ID Cards Using Deep Learning and Few Samples Transfer Learning
by Yi-Quan Li, Hao-Sen Chang and Daw-Tung Lin
Appl. Sci. 2022, 12(2), 907; https://doi.org/10.3390/app12020907 - 17 Jan 2022
Cited by 8 | Viewed by 3942
Abstract
In the field of computer vision, large-scale image classification tasks are both important and highly challenging. With the ongoing advances in deep learning and optical character recognition (OCR) technologies, neural networks designed to perform large-scale classification play an essential role in facilitating OCR [...] Read more.
In the field of computer vision, large-scale image classification tasks are both important and highly challenging. With the ongoing advances in deep learning and optical character recognition (OCR) technologies, neural networks designed to perform large-scale classification play an essential role in facilitating OCR systems. In this study, we developed an automatic OCR system designed to identify up to 13,070 large-scale printed Chinese characters by using deep learning neural networks and fine-tuning techniques. The proposed framework comprises four components, including training dataset synthesis and background simulation, image preprocessing and data augmentation, the process of training the model, and transfer learning. The training data synthesis procedure is composed of a character font generation step and a background simulation process. Three background models are proposed to simulate the factors of the background noise patterns on ID cards. To expand the diversity of the synthesized training dataset, rotation and zooming data augmentation are applied. A massive dataset comprising more than 19.6 million images was thus created to accommodate the variations in the input images and improve the learning capacity of the CNN model. Subsequently, we modified the GoogLeNet neural architecture by replacing the fully connected layer with a global average pooling layer to avoid overfitting caused by a massive amount of training data. Consequently, the number of model parameters was reduced. Finally, we employed the transfer learning technique to further refine the CNN model using a small number of real data samples. Experimental results show that the overall recognition performance of the proposed approach is significantly better than that of prior methods and thus demonstrate the effectiveness of proposed framework, which exhibited a recognition accuracy as high as 99.39% on the constructed real ID card dataset. Full article
(This article belongs to the Special Issue Advances in Computer Vision, Volume Ⅱ)
Show Figures

Figure 1

14 pages, 3485 KiB  
Article
Generating Scenery Images with Larger Variety According to User Descriptions
by Hsu-Yung Cheng and Chih-Chang Yu
Appl. Sci. 2021, 11(21), 10224; https://doi.org/10.3390/app112110224 - 1 Nov 2021
Viewed by 1452
Abstract
In this paper, a framework based on generative adversarial networks is proposed to perform nature-scenery generation according to descriptions from the users. The desired place, time and seasons of the generated scenes can be specified with the help of text-to-image generation techniques. The [...] Read more.
In this paper, a framework based on generative adversarial networks is proposed to perform nature-scenery generation according to descriptions from the users. The desired place, time and seasons of the generated scenes can be specified with the help of text-to-image generation techniques. The framework improves and modifies the architecture of a generative adversarial network with attention models by adding the imagination models. The proposed attentional and imaginative generative network uses the hidden layer information to initialize the memory cell of the recurrent neural network to produce the desired photos. A data set containing different categories of scenery images is established to train the proposed system. The experiments validate that the proposed method is able to increase the quality and diversity of the generated images compared to the existing method. A possible application of road image generation for data augmentation is also demonstrated in the experimental results. Full article
(This article belongs to the Special Issue Advances in Computer Vision, Volume Ⅱ)
Show Figures

Figure 1

25 pages, 15901 KiB  
Article
A Novel Luminance-Based Algorithm for Classification of Semi-Dark Images
by Mehak Maqbool Memon, Manzoor Ahmed Hashmani, Aisha Zahid Junejo, Syed Sajjad Rizvi and Adnan Ashraf Arain
Appl. Sci. 2021, 11(18), 8694; https://doi.org/10.3390/app11188694 - 18 Sep 2021
Cited by 5 | Viewed by 2868
Abstract
Image classification of a visual scene based on visibility is significant due to the rise in readily available automated solutions. Currently, there are only two known spectrums of image visibility i.e., dark, and bright. However, normal environments include semi-dark scenarios. Hence, visual extremes [...] Read more.
Image classification of a visual scene based on visibility is significant due to the rise in readily available automated solutions. Currently, there are only two known spectrums of image visibility i.e., dark, and bright. However, normal environments include semi-dark scenarios. Hence, visual extremes that will lead to the accurate extraction of image features should be duly discarded. Fundamentally speaking there are two broad methods to perform visual scene-based image classification, i.e., machine learning (ML) methods and computer vision methods. In ML, the issues of insufficient data, sophisticated hardware and inadequate image classifier training time remain significant problems to be handled. These techniques fail to classify the visual scene-based images with high accuracy. The other alternative is computer vision (CV) methods, which also have major issues. CV methods do provide some basic procedures which may assist in such classification but, to the best of our knowledge, no CV algorithm exists to perform such classification, i.e., these do not account for semi-dark images in the first place. Moreover, these methods do not provide a well-defined protocol to calculate images’ content visibility and thereby classify images. One of the key algorithms for calculation of images’ content visibility is backed by the HSL (hue, saturation, lightness) color model. The HSL color model allows the visibility calculation of a scene by calculating the lightness/luminance of a single pixel. Recognizing the high potential of the HSL color model, we propose a novel framework relying on the simple approach of the statistical manipulation of an entire image’s pixel intensities, represented by HSL color model. The proposed algorithm, namely, Relative Perceived Luminance Classification (RPLC) uses the HSL (hue, saturation, lightness) color model to correctly identify the luminosity values of the entire image. Our findings prove that the proposed method yields high classification accuracy (over 78%) with a small error rate. We show that the computational complexity of RPLC is much less than that of the state-of-the-art ML algorithms. Full article
(This article belongs to the Special Issue Advances in Computer Vision, Volume Ⅱ)
Show Figures

Figure 1

Back to TopTop