TY - JOUR
T1 - Multi-label image recognition by using semantics consistency, object correlation, and multiple samples
AU - Chu, Wei Ta
AU - Huang, Si Heng
N1 - Funding Information:
This work was funded in part by Qualcomm through a Taiwan University Research Collaboration Project and in part by the Ministry of Science and Technology, Taiwan, under grant 108–2221-E-006–227-MY3, 107–2923-E- 194–003-MY3, and 109–2218-E-002–015.
Publisher Copyright:
© 2021 Elsevier Inc.
PY - 2021/5
Y1 - 2021/5
N2 - An image can be annotated from the local perspective, based on objects visually present. An image can also be annotated from the global perspective, based on implicit emotion or meanings derived from it. We propose three points relatively little studied before. First, semantics remain the same even if the image is manipulated by some geometric processes. Second, object correlation is important in image labelling. We propose to use a standard recurrent neural network to take object sequences in random orders. Third, we observe that some entity can be represented by multiple image samples, and multiple samples can be jointly considered to improve recognition performance. These three points are implemented in a network that jointly considers global and local information. With comprehensive evaluation studies, we verify that a simple network with these points is effective and is able to achieve competitive performance compared to the state of the arts.
AB - An image can be annotated from the local perspective, based on objects visually present. An image can also be annotated from the global perspective, based on implicit emotion or meanings derived from it. We propose three points relatively little studied before. First, semantics remain the same even if the image is manipulated by some geometric processes. Second, object correlation is important in image labelling. We propose to use a standard recurrent neural network to take object sequences in random orders. Third, we observe that some entity can be represented by multiple image samples, and multiple samples can be jointly considered to improve recognition performance. These three points are implemented in a network that jointly considers global and local information. With comprehensive evaluation studies, we verify that a simple network with these points is effective and is able to achieve competitive performance compared to the state of the arts.
UR - http://www.scopus.com/inward/record.url?scp=85102898237&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85102898237&partnerID=8YFLogxK
U2 - 10.1016/j.jvcir.2021.103067
DO - 10.1016/j.jvcir.2021.103067
M3 - Article
AN - SCOPUS:85102898237
SN - 1047-3203
VL - 77
JO - Journal of Visual Communication and Image Representation
JF - Journal of Visual Communication and Image Representation
M1 - 103067
ER -