Skip to content

HCIILAB/Scene-Text-Recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

The State of the Art of Scene Text Recognition

Chinese_version

Author:陈晓雪

  1. 1. Datasets
    1. 1.1 Regular Scene Text Datasets
    2. 1.2 Irregular Scene Text Datasets
    3. 1.3 Chinese Scene Text Datasets
    4. 1.4 Synthetic Datasets
    5. 1.5 Comparison of Datasets
  2. 2. Summary of Scene Text Recognition Results
    1. 2.1 Introduction
    2. 2.2 Summary of Scene Text Recognition Results
    3. 2.3 Chinese Scene Text Recognition Results
  3. 3. Field Survey
  4. 4. OCR Service
  5. 5. References

1. Datasets

1.1 Regular Scene Text Datasets

  • IIIT5K[31]:
    • Introduction: There are 5000 images in total, 2000 for training and 3000 for testing. Text instances are mostly horizontal. Each image is associated with a short, 50-word lexicon and a long, 1000-word lexicon. (The lexicon consists of the groundtruth word and other random words.)
    • Link: IIIT5K-download
  • SVT[32]:
    • Introduction: There are 647 images of cropped words. Text instances are mostly horizontal. Many images are severely corrupted by noise, blur, and low resolution. SVT is collected from the Google Street View, and each image is associated with a 50-word lexicon. It only has word-level annotations.
    • Link: SVT-download
  • ICDAR 2003(IC03)[33]:
    • Introduction: There are 509 images in total, 258 for training and 251 for testing. It contains 867 test images of cropped word after filtering. Text instances are mostly horizontal. Each image is associated with a 50-word lexicon and a full-word lexicon. (The full lexicon combines all lexicon words.)
    • Link: IC03-download
  • ICDAR 2013(IC13)[34]:
    • Introduction: There are 1015 images of cropped word. Most images of IC13 are inherits from IC03. The text are mostly horizontal. No lexicon is provided.
    • Link: IC13-download
  • SVHN[45]:
    • Introduction: There are 600000 digits of house numbers in natural scenes. The digits are mostly horizontal. SVHN is collected from the Google View images, and is used to digit recognition.
    • Link: SVHN-download

1.2 Irregular Scene Text Datasets

  • SVT-P[35]:
    • Introduction: There are 639 images of cropped word. Many images are heavily distorted by the non-frontal view angle. SVT-P is collected from the side-view images in Google Street View. Each image is associated with a 50-word lexicon and a full-word lexicon.
    • Link: SVT-P-download (Extraction code : vnis)
  • CUTE80[36]:
    • Introduction: There are 80 high-resolution images taken in natural scenes. It contains 288 test images of cropped word after filtering and focuses on curved text. No lexicon is provided.
    • Link: CUTE80-download
  • ICDAR 2015(IC15)[37]:
    • Introduction: There are 1500 images in total, 1000 for training and 500 for testing. It contains 2077 test images of cropped word including more than 200 irregular text. No lexicon is provided.
    • Link: IC15-download
  • COCO-Text[38]:
    • Introduction: There are 63686 images in total. It contains 145859 test images of cropped word including handwritten and printed, clear and blur, English and non-English.
    • Link: COCO-Text-download
  • Total-Text[39]:
    • Introduction: There are 1555 images in total. It contains 11459 test images of cropped word with more than three different text orientations: horizontal, multi-oriented and curved.
    • Link: Total-Text-download
  • CTW-1500[43]:
    • Introduction: There are 1500 images in total, 1000 for training and 500 for testing. It contains 10751 test images of cropped word. Text instances are multi-oriented and curved. The main languages are Chinese and English.
    • Link: CTW-1500-download

1.3 Chinese Scene Text Datasets

  • CTW-12k(RCTW competition,ICDAR17)[40]:
    • Introduction: There are 12514 images in total, 11514 for training and 1000 for testing. Most images in CTW-12k is collected by camera or mobile phone, and others are generated images. Each image contains more than one text line. The competition tasks include text detection and end-to-end text recognition.
    • Link: CTW-12K-download
  • MTWI(competition)[41]:
    • Introduction: There are 20000 images. Text instances are mainly Chinese or English web text. The competition tasks include web text recognition, web text detection and end-to-end web text detection and recognition.
    • Link: MTWI-download (Extraction code:gox9)
  • CTW[42]:
    • Introduction: There are 32285 high resolution street view images of Chinese text, with 1018402 character instances in total. CTW contains planar text, text in cities, text in rural areas, text under poor illumination, distant text, partially occluded text, etc.
    • Link: CTW-download

1.4 Synthetic Datasets

  • Synth90k [53] :
    • Introduction: There are 9 million images of cropped word generated from a set of 90k common English words. Words are rendered onto natural images with random transformations and effects.
    • Link: Synth90k-download
  • SynthText [54] :
    • Introduction: There are 6 million images of cropped word. The generation process is similar to that of Synth90k.
    • Link: SynthText-download

1.5 Comparison of Datasets

Comparison of Datasets
Datasets     Language     Images Lexicon Label Types
Pictures Instances Training Pictures Training Instances Testing Pictures Testing Instances 50 1k Full None Char Word
IIIT5K[31] English 5000 5000 2000 2000 3000 3000 × × regular
SVT[32] English 350 - 100 - 250 647 × × × × regular
IC03[33] English 509 - 258 - 251 867 × regular
IC13[34] English 462 - 229 - 233 1015 × × × regular
SVHN[45] Digits 600000 600000 573968 573968 26032 26032 × × × regular
SVT-P[35] English 238 639 - - 238 639 × × × irregular
CUTE80[36] English 80 288 - - 80 288 × × × × irregular
IC15[37] English 1500 - 1000 - 500 2077 × × × × irregular
COCO-Text[38] English 63686 145859 43686 118309 2000 27550 × × × × regular
Total-Text[39] English 1555 11459 - - 1555 11459 × × × × irregular
CTW-1500[43] Chinese/English 1500 10751 1000 - 500 - × × × × irregular
CTW-12K[40] Chinese 12514 - 11514 - 1000 - × × × × regular
MTWI[41] Chinese 20000 - 10000 - 10000 - × × × × regular
CTW[42] Chinese 32285 1018402 25887 812872 3269 103519 × × × regular

2. Summary of Scene Text Recognition Results

2.1 Introduction

As shown in Table "Summary of Scene Text Recognition Results", we summarize the main recognition algorithms in community from 2011 to the present. The content of the table includes the sources, highlights, codes, types and recognition performance. The '*' in the Method indicates the use of extra datasets. The bold in the numeric represents the highest recognition results; ''^" represents the highest result of using the extra datasets; "@" represents different evaluation method which only uses 1811 test images.

2.2 Summary of Scene Text Recognition Results

Summary of Scene Text Recognition Results


Method         Highlight                                                               Code Regular Irregular Segmentation Extra data CTC Attention IIIT5K SVT IC03 IC13 SVT-P CUTE80 IC15(2077) COCO-TEXT Time Source
50 1K None 50 None 50 Full 50k None None 50 Full None None None None
Wang et al. [1]: ABBYY Authors propose a two-stage text recognition system. The system consists of a state-of-the-art text detection followed by a leading OCR engine that shows outstanding performance. × × × × 24.3 - - 35.0 - 56.0 55.0 - - - 40.5 26.1 - - - - 2011 ICCV
Wang et al. [1] : SYNTH+PLEX Authors established a baseline for scene text recognition. The results showed that an object recognition-based pipeline outperform conventional OCR engines and do so without explicit use of a text detector, which significantly simplifying the recognition pipeline. × × × × × - - - 57.0 - 76.0 62.0 - - - - - - - - - 2011 ICCV
Mishra et al. [2] Authors presented a new framework, which used CRF and some or all of the English dictionary as the priors to obtain the recognition results. Besides, they introduced the IIIT5K-word datasets. × × × × × 64.1 57.5 - 73.2 - 81.8 67.8 - - - 45.7 24.7 - - - - 2012 BMVC
Wang et al. [3] Authors built a new recognition system for scene text recognition task. They combined the representational power of multi-layer CNN, NMS and beam search in a end-to-end, lexicon-driven, scene text recognition system. × × × × - - - 70.0 - 90.0 84.0 - - - 40.2 32.4 - - - - 2012 ICPR
Goel et al. [4] : wDTW Authors presented a holistic word recognition framework which did not require explicit character segmentation. They generated synthetic images from lexicon words and recognized the text in the image by matching the scene and synthetic images features with wDTW. × × × × × - - - 77.3 - 89.7 - - - - - - - - - - 2013 ICDAR
Bissacco et al. [5] : PhotoOCR Authors presented a two-stage text recognition system. The system was based on HOG features and recognized by a 5-layer CNN. Besides, they built a self-supervision mechanism to construct additional datasets. × × × × × - - - 90.4 78.0 - - - - 87.6 - - - - - - 2013 ICCV
Phan et al. [6] Authors proposed a two-stage text recognition system for scene text recognition with perspective distortion. The system used MSERs to detect characters, SIFT descriptors to extract features, and SVM clustering to recognize words. Besides, the SVT-P datasets was introduced. × × × × × - - - 73.7 - 82.2 - - - - 62.3 42.2 - - - - 2013 ICCV
Alsharif et al. [7] : HMM/Maxout Authors constructed a two-stage text recognition system based on segmentation method. The system recognized words by leveraging the convolutional Maxout networks along with hybrid HMM models. Besides, author showed the performance of end-to-end recognition. × × × × × - - - 74.3 - 93.1 88.6 85.1 - - - - - - - - 2014 ICLR
Almazan et al [8] : KCSR Authors embedded the word images and text strings in a common vectorial subspace, and regarded the recognition task as a nearest neighbor problem. The proposed representation had a fixed length, low dimensional and was fast to compute. × × × × × 88.6 75.6 - 87.0 - - - - - - - - - - - - 2014 TPAMI
Yao et al. [9] : Strokelets Authors proposed a novel multi-scale representation 'Strokelets' for scene text recognition. Strokelets possed four distinctive advantages: usability, robustness, generality and expressivity. × × × × × 80.2 69.3 - 75.9 - 88.5 80.3 - - - - - - - - - 2014 CVPR
R.-Serrano et al.[10] : Label embedding Authors embedded word labels and word images into a common Euclidean space. The advantages of proposed method was simple and effective. It could combine with any descriptor and did not require costly pre-/post-processing operations. It also allowed for the recognition of never-seen before words. × × × × × × 76.1 57.4 - 70.0 - - - - - - - - - - - - 2015 IJCV
Jaderberg et al. [11] Authors proposed a novel CNN classifier that enabled efficient feature sharing for text detection, character case-sensitive and insensitive classification, and bi-gram classification. Besides, they proposed a method of automated data mining of Flickr. × × × × - - - 86.1 - 96.2 91.5 - - - - - - - - - 2014 ECCV
Su and Lu [12] Authors proposed a novel method that recognized the whole word images without character-level segmentation. The proposed method was based on HOG features, using BLSTM and CTC to achieve text recognition. × × × × × - - - 83.0 - 92.0 82.0 - - - - - - - - - 2014 ACCV
Gordo[13] : Mid-features Authors proposed a descriptive, robust, and compact fixed-length representation: mid-level features. The proposed features could be paired with word attributes framework to improve recognition performance. × × × × × 93.3 86.6 - 91.8 - - - - - - - - - - - - 2015 CVPR
Jaderberg et al. [14] Authors presented an end-to-end system for text localizing and recognizing in natural scene images and a synthetic dataset containing 9 million images. The proposed system used CNN to identify text, and the categories was 90k, which covered almost all words. Besides, the recognition result of the system can be used to update the detector. × × × × × 97.1 92.7 - 95.4 80.7 98.7 98.6 93.3 93.1 90.8 - - - - - - 2015 IJCV
Jaderberg et al. [15] Authors proposed a model incorporating CNN and CRF for the unconstrained recognition of words in natural images. The entire model could be jointly optimized by back-propagating the structured output loss. The results set the baseline for lexicon-free. × × × × × × 95.5 89.6 - 93.2 71.7 97.8 97.0 93.4 89.6 81.8 - - - - - - 2015 ICLR
Shi, Bai, and Yao [16] : CRNN Authors modeled scene text recognition as a sequence problem by integrating the advantages of both deep CNN and RNN. They proposed an end-to-end trainable framework where transcription was made by CTC. × × × × 97.8 ^95.0 81.2 97.5 82.7 98.7 98.0 95.7 91.9 89.6 - - - - - - 2017 TPAMI
Shi et al. [17] : RARE Authors proposed RARE, a recognition model for irregular text. The input images were firstly rectified via STN, extracted features by CNN, and then decoded by a recurrent network based on attention mechanism. × × × × × 96.2 93.8 81.9 95.5 81.9 98.3 96.2 94.8 90.1 88.6 91.2 77.4 71.8 59.2 - - 2016 CVPR
Lee and Osindero [18] : R2AM Authors proposed recursive CNN, which allowed for parametrically efficient and effective image feature extraction. They constructed R2AM, which implicitly learned character-level language model. The use of a soft-attention mechanism allowed the model to decode selectively. × × × × × 96.8 94.4 78.4 96.3 80.7 97.9 97.0 - 88.7 90.0 - - - - - - 2016 CVPR
Liu et al. [19] : STAR-Net Authors presented the STAR-Net for irregular scene text recognition. The proposed network used STN to remove the distortions of texts, which reduced the difficulty of recognition module. The residue convolutional blocks were employed for extracting image-based features, and words were decoded by BLSTM and CTC. × × × × × 97.7 94.5 83.3 95.5 83.6 96.9 95.3 - 89.9 89.1 94.3 83.6 73.5 - - - 2016 BMVC
*Yang et al. [20] Authors presented a robust end-to-end neural-based model to attentively recognize text in natural images. An auxiliary dense character detection task was introduced that helped to learn text specific visual patterns. A recurrent decoder network based on the attention mechanism that generated target sequences. Besides, they proposed a synthetic dataset containing perspective distortion, curvature, etc. × × × × 97.8 96.1 - 95.2 - 97.7 - - - - 93.0 80.2 75.8 69.3 - - 2017 IJCAI
Yin et al. [21] Authors proposed a novel system for scene text recognition. The method simultaneously detected and recognized characters by sliding the text line image with character models. CNN was used to extract image-based feature and the final recognition results were decoded with CTC based algorithm. × × × × × 98.7 96.1 78.2 95.1 72.5 97.6 96.5 - 81.1 81.4 - - - - - - 2017 ICCV
*Cheng et al. [22] : FAN Authors proposed the concept of 'attention drift' and a novel method called FAN for scene text recognition. A focusing network (FN) was introduced, which could focus AN’s deviated attention back on the target areas. × × × × 99.3 97.5 87.4 97.1 85.9 99.2 97.3 - 94.2 93.3 - - - - @85.3 - 2017 ICCV
Cheng et al. [23] : AON Authors developed the arbitrary orientation network (AON) to directly capture the deep features of irregular texts. The proposed network extracted scene text features in four directions and the character placement clues. A filter gate (FG) was designed for fusing four-direction features with the learned placement clues. × × × × × 99.6 98.1 87.0 96.0 82.8 98.5 97.1 - 91.5 - 94.0 83.7 73.0 76.8 68.2 - 2018 CVPR
Gao et al. [24] Authors proposed a novel scene text recognition system. In order to suppress the background noise, the residual attention modules were incorporated into a small densely connected network to improve the discriminability of CNN features. Besides, the proposed system speed up recognition by removing the RNN. × × × × 99.1 97.9 81.8 97.4 82.7 98.7 96.7 - 89.2 88.0 - - - - - - 2017 arXiv
Liu et al. [25] : Char-Net Authors presented a Char-Net for recognizing distorted scene text. The proposed system extracted the single-character feature region and removed distortion of it. Finally, the recurrent decoder network based on the attention mechanism generated target sequences. × × × × - - 83.6 - 84.4 - 93.3 - 91.5 90.8 - - 73.5 - 60.0 - 2018 AAAI
*Liu et al. [26] : SqueezedText Authors proposed SqueezedText for real-time scene text recognition. The front-end B-CEDNet was trained under binary constraints with significant compression . Hence, the proposed method lead to both remarkable inference run-time speedup as well as memory usage reduction. × × × × × 97.0 94.1 87.0 95.2 - 98.8 97.9 93.8 93.1 92.9 - - - - - - 2018 AAAI
*Bai et al. [27] : EP Authors focused on the the misalignment between the ground truth strings and the attention’s output sequences of probability distribution and proposed EP for scene text recognition. The advantage of using edit probability was that the training process could focus on the missing, superfluous and unrecognized characters, and thus the impact of the misalignment problem could be alleviated or even overcome. × × × × 99.5 97.9 88.3 96.6 87.5 98.7 97.9 - 94.6 ^94.4 - - - - 73.9 - 2018 CVPR
Liu et al. [28] Authors addressed the problem of image feature learning for scene text recognition. The novel multi-task network was presented with an encoder-generator-discriminator-decoder architecture that guided image feature learning by using clean images. × × × × × 97.3 96.1 89.4 96.8 87.1 98.1 97.5 - 94.7 94.0 - - 73.9 62.5 - - 2018 ECCV
Gao et al. [29] Authors proposed a novel scene text recognition system. In order to suppress the background noise, the residual attention modules were incorporated into a small densely connected network to improve the discriminability of CNN features. Finally, a recurrent decoder based on CTC algorithm generated the recognition results. × × × × 99.1 97.2 83.6 97.7 83.9 98.6 96.6 - 91.4 89.5 - - - - - - 2018 ICIP
Shi et al. [30] : ASTER Authors proposed ASTER for irregular scene text recognition. The proposed system was an improved version of [17]. The input images were firstly rectified via TPS, extracted features by Res-Net, and then decoded by a recurrent network based on attention mechanism. × × × × 99.6 98.8 93.4 97.4 89.5 98.8 98.0 - 94.5 91.8 - - 78.5 79.5 76.1 - 2018 TPAMI
Luo et al. [46] : MORAN Authors designed MORAN for rectifying images that contain irregular text. The proposed method predicted the offsets of different regions in text images, decreased the difficulty of recognition and enabled the attention-based sequence recognition network to more easily read irregular text. Compared with TPS, MORAN was simpler and more usable. × × × × 97.9 96.2 91.2 96.6 88.3 98.7 97.8 - 95.0 92.4 94.3 86.7 76.1 77.4 68.8 - 2019 PR
Xie et al. [47] : CAN Authors presented CAN for unconstrained scene text recognition. A Res-Net was used to extract image-based features .CNN and GLU were incorporated as the decoder instead of RNN to obtain the final recognition result. × × × × × 97.0 94.2 80.5 96.9 83.4 98.4 97.8 - 91.0 90.5 - - - - - - 2019 ACM
*Liao et al.[48] : CA-FCN CA-FCN was devised for recognizing the text of arbitrary shapes. The proposed method was conducted from two-dimensional perspective and could simultaneously recognize the script and predict the position of each character. Besides, the proposed method required character-level labeling. × × × ^99.8 ^98.9 92.0 ^98.8 82.1 - - - - 91.4 - - - 78.1 - - 2019 AAAI
*Li et al. [49] : SAR Authors proposed SAR for irregular scene text recognition. A Res-Net was used to extract image features and the decoder network based on the 2D-attention mechanism generated target sequences. × × × × 99.4 98.2 95.0 98.5 ^91.2 - - - - 94.0 ^95.8 ^91.2 ^86.4 ^89.6 ^78.8 ^66.8 2019 AAAI

2.3 Chinese Scene Text Recognition Results

Chinese Scene Text Recognition Results
Method RCTW MTWI CTW Time Source
Zheqi He,Yongtao Wang : Foo & Bar 82.0 (end-to-end) - - 2017 RCTW competition
IFLYTEK : nelslip(iflytek&ustc) - 85.8 (AR) - 2018 MTWI competition
Yuan et al.[42] : CTW - - 80.5 (AR) 2018 CTW
Liu et al. [43] : CTW-1500 - - - 2017 CTW-1500

3. Field Survey

[50] [TPAMI-2015] Ye Q, Doermann D. Text detection and recognition in imagery: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(7): 1480-1500. paper

[51] [Frontiers-Comput. Sci-2016] Zhu Y, Yao C, Bai X. Scene text detection and recognition: Recent advances and future trends[J]. Frontiers of Computer Science, 2016, 10(1): 19-36. paper

[52] [arXiv-2018] Long S, He X, Ya C. Scene Text Detection and Recognition: The Deep Learning Era[J]. arXiv preprint arXiv:1811.04256, 2018. paper


4. OCR Service

OCR API Free
Tesseract OCR Engine ×
Azure
ABBYY
OCR Space
SODA PDF OCR
Free Online OCR
Online OCR
Super Tools
在线中文识别
Calamari OCR ×
腾讯OCR ×

5. References

[1] [ICCV-2011] K. Wang, B. Babenko, and S. Belongie. End-to-end scene text recognition. In Proceedings of International Conference on Computer Vision (ICCV), pages 1457–1464, 2011. paper

[2] [BMVC-2012] A. Mishra, K. Alahari, and C. Jawahar. Scene text recognition using higher order language priors. In Proceedings of British Machine Vision Conference (BMVC), pages 1–11, 2012. paper dataset

[3] [ICPR-2012] T. Wang, D. J. Wu, A. Coates, and A. Y. Ng. End-to-end text recognition with convolutional neural networks. In Proceedings of International Conference on Pattern Recognition (ICPR), pages 3304–3308, 2012. paper

[4] [ICDAR-2013] V. Goel, A. Mishra, K. Alahari, and C. Jawahar. Whole is greater than sum of parts: Recognizing scene text words. In Proceedings of International Conference on Document Analysis and Recognition (ICDAR), pages 398–402, 2013. paper

[5] [ICCV-2013] A. Bissacco, M. Cummins, Y. Netzer, and H. Neven. Photoocr: Reading text in uncontrolled conditions. In Proceedings of International Conference on Computer Vision (ICCV), pages 785–792, 2013. paper

[6] [ICCV-2013] T. Quy Phan, P. Shivakumara, S. Tian, and C. Lim Tan. Recognizing text with perspective distortion in natural scenes.In Proceedings of International Conference on Computer Vision (ICCV), pages 569–576, 2013. paper

[7] [ICLR-2014] O. Alsharif and J. Pineau, End-to-end text recognition with hybrid HMM maxout models, in: Proceedings of International Conference on Learning Representations (ICLR), 2014. paper

[8] [TPAMI-2014] J. Almaz ́ an, A. Gordo, A. Forn ́ es, and E. Valveny. Word spotting and recognition with embedded attributes. IEEE Trans.Pattern Anal. Mach. Intell ., 36(12):2552–2566, 2014. paper code

[9] [CVPR-2014] C. Yao, X. Bai, B. Shi, and W. Liu. Strokelets: A learned multi-scale representation for scene text recognition. In Proceedings of Computer Vision and Pattern Recognition (CVPR), pages 4042–4049, 2014. paper

[10] [IJCV-2015] J. A. Rodriguez-Serrano, A. Gordo, and F. Perronnin. Label embedding: A frugal baseline for text recognition. International Journal of Computer Vision (IJCV) , 113(3):193–207, 2015. paper

[11] [ECCV-2014] M. Jaderberg, A. Vedaldi, and A. Zisserman. Deep features for text spotting. In Proceedings of European Conference on Computer Vision (ECCV), pages 512–528, 2014. paper code

[12] [ACCV-2014] B. Su and S. Lu. Accurate scene text recognition based on recurrent neural network. In Proceedings of Asian Conference on Computer Vision (ACCV), pages 35–48, 2014. paper

[13] [CVPR-2015] A. Gordo. Supervised mid-level features for word image representation. In Proceedings of Computer Vision and Pattern Recognition (CVPR), pages 2956–2964, 2015. paper

[14] [IJCV-2015] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman. Reading text in the wild with convolutional neural networks. Int. J.Comput. Vision, 2015. paper code

[15] [ICLR-2015] M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman, Deep structured output learning for unconstrained text recognition, in: Proceedings of International Conference on Learning Representations (ICLR), 2015. paper

[16] [TPAMI-2017] B. Shi, X. Bai, and C. Yao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell., 39(11):2298–2304, 2017. paper code-Torch7 code-Pytorch

[17] [CVPR-2016] B. Shi, X. Wang, P. Lyu, C. Yao, and X. Bai. Robust scene text recognition with automatic rectification. In Proceedings of Computer Vision and Pattern Recognition (CVPR), pages 4168–4176, 2016. paper

[18] [CVPR-2016] C.-Y. Lee and S. Osindero. Recursive recurrent nets with attention modeling for OCR in the wild. In Proceedings of Computer Vision and Pattern Recognition (CVPR), pages 2231–2239, 2016. paper

[19] [BMVC-2016] W. Liu, C. Chen, K.-Y. K. Wong, Z. Su, and J. Han. STAR-Net: A spatial attention residue network for scene text recognition. In Proceedings of British Machine Vision Conference (BMVC), page 7, 2016. paper

[20] [IJCAI-2017] X. Yang, D. He, Z. Zhou, D. Kifer, and C. L. Giles. Learning to read irregular text with attention mechanisms. Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), 2017. paper

[21] [ICCV-2017] F. Yin, Y.-C. Wu, X.-Y. Zhang, and C.-L. Liu. Scene text recognition with sliding convolutional character models. In Proceedings of International Conference on Computer Vision (ICCV), 2017. paper

[22] [ICCV-2017] Z. Cheng, F. Bai, Y. Xu, G. Zheng, S. Pu, and S. Zhou. Focusing attention: Towards accurate text recognition in natural images. In Proceedings of International Conference on Computer Vision (ICCV), pages 5086–5094, 2017. paper

[23] [CVPR-2018] Cheng Z, Xu Y, Bai F, et al. AON: Towards Arbitrarily-Oriented Text Recognition.In Proceedings of Computer Vision and Pattern Recognition (CVPR), pages 5571-5579, 2018. paper

[24] [arXiv-2017] Gao Y, Chen Y, Wang J, et al. Reading Scene Text with Attention Convolutional Sequence Modeling[J]. arXiv preprint arXiv:1709.04303, 2017. paper

[25] [AAAI-2018] Liu W, Chen C, Wong K Y K. Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition[C]//AAAI. 2018. paper

[26] [AAAI-2018] Liu Z, Li Y, Ren F, et al. SqueezedText: A Real-Time Scene Text Recognition by Binary Convolutional Encoder-Decoder Network[C]//AAAI. 2018. paper

[27] [CVPR-2018] Bai, F, Cheng, Z, Niu, Y, Pu, S and Zhou,S. Edit probability for scene text recognition, pages 1508-1516, 2018. paper

[28] [ECCV-2018] Liu Y, Wang Z, Jin H, et al. Synthetically Supervised Feature Learning for Scene Text Recognition[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 435-451. paper

[29] [ICIP-2018] Gao Y, Chen Y, Wang J, et al. Dense Chained Attention Network for Scene Text Recognition[C]//2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, 2018: 679-683. paper

[30] [TPAMI-2018] Shi B, Yang M, Wang X, et al. Aster: An attentional scene text recognizer with flexible rectification[J]. IEEE transactions on pattern analysis and machine intelligence, 2018. paper code

[31] [CVPR-2012] A. Mishra, K. Alahari, and C. V. Jawahar. Top-down and bottom-up cues for scene text recognition. In CVPR, 2012. paper

[32] [ICCV-2011] K. Wang, B. Babenko, and S. Belongie. End-to-end scene text recognition. In ICCV, 2011. paper

[33] [IJDAR-2005] S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, R. Young,K. Ashida, H. Nagai, M. Okamoto, H. Yamamoto, H. Miyao,J. Zhu, W. Ou, C. Wolf, J. Jolion, L. Todoran, M. Worring, and X. Lin. ICDAR 2003 robust reading competitions:entries, results,and future directions. IJDAR, 7(2-3):105–122, 2005. paper

[34] [ICDAR-2013] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Bigorda,S. R. Mestre, J. Mas, D. F. Mota, J. Almaz ́ an, and L. de las Heras. ICDAR 2013 robust reading competition. In ICDAR, 2013. paper

[35] [ICCV-2013] T. Q. Phan, P. Shivakumara, S. Tian, and C. L. Tan. Recognizing text with perspective distortion in natural scenes. In ICCV, 2013. paper

[36] [Expert Syst.Appl-2014] A. Risnumawan, P. Shivakumara, C. S. Chan, and C. L. Tan. A robust arbitrary text detection system for natural scene images. Expert Syst. Appl., 41(18):8027–8048, 2014. paper

[37] [ICDAR-2015] D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. K. Ghosh, A. D.Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny. ICDAR 2015 competition on robust reading. In ICDAR, pages 1156–1160,2015. paper

[38] [arXiv-2016] Veit A, Matera T, Neumann L, et al. Coco-text: Dataset and benchmark for text detection and recognition in natural images[J]. arXiv preprint arXiv:1601.07140, 2016. paper

[39] [ICDAR-2017] Ch'ng C K, Chan C S. Total-text: A comprehensive dataset for scene text detection and recognition[C]//Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on. IEEE, 2017, 1: 935-942. paper

[40] [ICDAR-2017] Shi B, Yao C, Liao M, et al. ICDAR2017 competition on reading chinese text in the wild (RCTW-17)[C]//Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on. IEEE, 2017, 1: 1429-1434. paper

[41] [ICPR-2018] He M, Liu Y, Yang Z, et al. ICPR2018 Contest on Robust Reading for Multi-Type Web Images[C]//2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 2018: 7-12. paper

[42] [arXiv-2018] Yuan T L, Zhu Z, Xu K, et al. Chinese Text in the Wild[J]. arXiv preprint arXiv:1803.00085, 2018. paper

[43] [arXiv-2017] Yuliang L, Lianwen J, Shuaitao Z, et al. Detecting curve text in the wild: New dataset and new solution[J]. arXiv preprint arXiv:1712.02170, 2017. paper

[44] [ECCV-2018] Yao C, Wu W. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes.//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 71-88. paper

[45] [NIPS-WORKSHOP-2011] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco,Bo Wu, and Andrew YNg. Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning, volume 2011, page 5, 2011. paper

[46] [PR-2019] C. Luo, L. Jin, and Z. Sun, “MORAN: A multi-object rectified attention network for scene text recognition,” Pattern Recognition, vol. 90, pp. 109–118, 2019. paper code

[47] [ACM-2019] Xie H, Fang S, Zha Z J, et al, “Convolutional Attention Networks for Scene Text Recognition,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 15, pp. 3 2019. paper

[48] [AAAI-2019] Liao M, Zhang J, Wan Z, et al, “Scene text recognition from two-dimensional perspective,” //AAAI. 2019. paper

[49] [AAAI-2019] Li H, Wang P, Shen C, et al, “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition,” //AAAI. 2019. paper

[50] [TPAMI-2015] Ye Q, Doermann D. Text detection and recognition in imagery: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(7): 1480-1500. paper

[51] [Frontiers-Comput. Sci-2016] Zhu Y, Yao C, Bai X. Scene text detection and recognition: Recent advances and future trends[J]. Frontiers of Computer Science, 2016, 10(1): 19-36. paper

[52] [arXiv-2018] Long S, He X, Ya C. Scene Text Detection and Recognition: The Deep Learning Era[J]. arXiv preprint arXiv:1811.04256, 2018. paper

[53] [NIPS-WORKSHOP-2014] M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman, Synthetic data and artificial neural networks for natural scene text recognition, in: Proceedings of Advances in Neural Information Processing Deep Learn. Workshop (NIPS-W).2014. paper

[54] [CVPR-2016] A. Gupta, A. Vedaldi, A. Zisserman, Synthetic data for text localisation in natural images, in: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2315–2324. paper

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •