Skip to content

HCIILAB/Scene-Text-Recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Scene Text Recognition Resources

Author:陈晓雪

Chinese_version


1. Datasets

1.1 Regular Scene Text Datasets

  • IIIT5K[31]:
    • Introduction: It contains 5000 images in total, 2000 for training and 3000 for testing. Every image is associated with a 50-word lexicon and a 1000-word lexicon. The lexicon consists of a ground truth and some randomly picked words.
    • Link: IIIT5K-download
  • SVT[32]:
    • Introduction: It contains 647 cropped word images. Many images are severely corrupted by noise, blur, and low resolution. SVT was collected from the Google Street View, and every image is associated with a 50-word lexicon. Specifically, it only provides word-level annotations.
    • Link: SVT-download
  • ICDAR 2003(IC03)[33]:
    • Introduction: It contains 509 images in total, 258 for training and 251 for testing. Specifically, it contains 867 cropped word images after discarding images that contain non-alphanumeric characters or those have less than three characters. Every image is associated with a 50-word lexicon and a full-word lexicon. The full lexicon combines all lexicon words.
    • Link: IC03-download
  • ICDAR 2013(IC13)[34]:
    • Introduction: It contains 1015 cropped word images and inherits most of its samples from IC03. No lexicon is associated with this dataset.
    • Link: IC13-download
  • SVHN[45]:
    • Introduction: It contains more than 600000 digits of house numbers in natural scenes. The images were collected from the Google View images, and were used to digit recognition.
    • Link: SVHN-download

1.2 Irregular Scene Text Datasets

  • SVT-P[35]:
    • Introduction: It contains 639 cropped word images for testing. Images were selected from the side-view angle snapshots in Google Street View. Therefore, most images are heavily distorted by the non-frontal view angle. Every image is associated with a 50-word lexicon and a full-word lexicon.
    • Link: SVT-P-download (Password : vnis)
  • CUTE80[36]:
    • Introduction: It contains 80 high-resolution images taken in natural scenes. Specifically, it contains 288 cropped word images for testing. The dataset focuses on curved text. No lexicon is provided.
    • Link: CUTE80-download
  • ICDAR 2015(IC15)[37]:
    • Introduction: It contains 1500 images in total, 1000 for training and 500 for testing. Specifically, it contains 2077 cropped images including more than 200 irregular text. No lexicon is associated with this dataset.
    • Link: IC15-download
  • COCO-Text[38]:
    • Introduction: It contains 63686 images in total. Specifically, it contains 145859 cropped word images for testing, including handwritten and printed, clear and blur, English and non-English.
    • Link: COCO-Text-download
  • Total-Text[39]:
    • Introduction: It contains 1555 images in total. Specifically, it contains 11459 cropped word images with more than three different text orientations: horizontal, multi-oriented and curved.
    • Link: Total-Text-download
  • CTW-1500[43]:
    • Introduction: It contains 1500 images in total, 1000 for training and 500 for testing. Specifically, it contains 10751 cropped word images for testing. Annotations in CTW-1500 are polygons with 14 vertexes. The dataset mainly consists of Chinese and English.
    • Link: CTW-1500-download

1.3 Chinese Scene Text Datasets

  • CTW-12k(RCTW competition,ICDAR17)[40]:
    • Introduction: It contains 12514 images in total, 11514 for training and 1000 for testing. Images in CTW-12k were mostly collected by camera or mobile phone, and others were generated images. Text instances are annotated with parallelograms. It is the first large scale Chinese dataset, and was also the largest published one by then.
    • Link: CTW-12K-download
  • MTWI(competition)[41]:
    • Introduction: It contains 20000 images. The dataset mainly consists of Chinese or English web text. The competition includes three tasks: web text recognition, web text detection and end-to-end web text detection and recognition.
    • Link: MTWI-download (Password:gox9)
  • CTW[42]:
    • Introduction: It contains 32285 high resolution street view images of Chinese text, with 1018402 character instances in total. All images are annotated at the character level, including its underlying character type, bouding box, and 6 other attributes. These attributes indicate whether its background is complex, whether it’s raised, whether it’s hand-written or printed, whether it’s occluded, whether it’s distorted, whether it uses word-art.
    • Link: CTW-download

1.4 Synthetic Datasets

  • Synth90k [53] :
    • Introduction: It contains 9 million cropped word images generated from a set of 90k common English words. Words are rendered onto natural images with random transformations and effects. Every image is annotated with a ground-truth word.
    • Link: Synth90k-download
  • SynthText [54] :
    • Introduction: It contains 6 million cropped word images. The generation process is similar to that of Synth90k.
    • Link: SynthText-download

1.5 Comparison of Datasets

Comparison of Datasets
Datasets     Language     Images Lexicon Label Types
Pictures Instances Training Pictures Training Instances Testing Pictures Testing Instances 50 1k Full None Char Word
IIIT5K[31] English 5000 5000 2000 2000 3000 3000 × × regular
SVT[32] English 350 - 100 - 250 647 × × × × regular
IC03[33] English 509 - 258 - 251 867 × regular
IC13[34] English 462 - 229 - 233 1015 × × × regular
SVHN[45] Digits 600000 600000 573968 573968 26032 26032 × × × regular
SVT-P[35] English 238 639 - - 238 639 × × × irregular
CUTE80[36] English 80 288 - - 80 288 × × × × irregular
IC15[37] English 1500 - 1000 - 500 2077 × × × × irregular
COCO-Text[38] English 63686 145859 43686 118309 2000 27550 × × × × regular
Total-Text[39] English 1555 11459 - - 1555 11459 × × × × irregular
CTW-1500[43] Chinese/English 1500 10751 1000 - 500 - × × × × irregular
CTW-12K[40] Chinese 12514 - 11514 - 1000 - × × × × regular
MTWI[41] Chinese 20000 - 10000 - 10000 - × × × × regular
CTW[42] Chinese 32285 1018402 25887 812872 3269 103519 × × × regular
***

2. Summary of Scene Text Recognition Results

2.1 Comparison of methods

It is important to notice that 1) "Reg" stands for regular scene text datasets. 2) "Irreg" stands for irregular scence text datasets. 3) "Seg" denotes the method based on segmentation. 4) "Extra" means the method using the extra datasets. 5) "CTC" represents the method which applies the CTC-based algorithm to decode. 6) "Attn" represents the method which applies the attention mechanism to decode.

You can also download the excels prepared by us, password: 3e39 .

Comparsion of methods
Method         Code Reg Irreg Seg Extra CTC Attn Time Source Highlight                                             
Wang et al. [1] : ABBYY × × × × 2011 ICCV A state-of-the-art text detector + a leading commercial OCR engine
Wang et al. [1] : SYNTH+PLEX × × × × × 2011 ICCV The baseline of scene text recognition.
Mishra et al. [2] × × × × × 2012 BMVC 1) Incorporating higher order statistical language models to recognize words in an unconstrained manner. 2) Introducing IIIT5K-word dataset.
Wang et al. [3] × × × × 2012 ICPR CNNs + Non-maximal suppression + beam search
Goel et al. [4] : wDTW × × × × × 2013 ICDAR Recognizing the text in the image by matching the scene and synthetic image features with wDTW.
Bissacco et al. [5] : PhotoOCR × × × × × 2013 ICCV Applying a network with five hidden layers for character classification.
Phan et al. [6] × × × × × 2013 ICCV 1) MSER + SIFT descriptors + SVM 2) Introducing the SVT-P datasets.
Alsharif et al. [7] : HMM/Maxout × × × × × 2014 ICLR Convolutional Maxout networks + Hybrid HMM
Almazan et al [8] : KCSR × × × × × 2014 TPAMI Embedding word images and text string in a common vectorial subspace and allowing one to cast recognition and retrieval tasks as a nearest neighbor problem.
Yao et al. [9] : Strokelets × × × × × 2014 CVPR Proposing a novel multi-scale representation for scene text recognition: strokelets.
R.-Serrano et al.[10] : Label embedding × × × × × × 2015 IJCV Embedding word labels and word images into a common Euclidean space and finding the cloest word label in this space.
Jaderberg et al. [11] × × × × 2014 ECCV 1) Enabling efficient feature sharing for text detection and classification. 2) Making technical changes over the traditional CNN architectures. 3) Proposing a method of automated data mining of Flickr.
Su and Lu [12] × × × × × 2014 ACCV HOG + BLSTM + CTC
Gordo[13] : Mid-features × × × × × 2015 CVPR Proposing to learn local mid-level features suitable for building word image representations.
Jaderberg et al. [14] × × × × × 2015 IJCV 1) Treating each word as a category and training very large convolutional neural networks to perform word recognition on the whole proposal region. 2) Generating 9 million images, with equal numbers of word samples from a 90k word dictionary.
Jaderberg et al. [15] × × × × × × 2015 ICLR CNN + CRF
Shi, Bai, and Yao [16] : CRNN × × × × 2017 TPAMI CNN + BLSTM + CTC
Shi et al. [17] : RARE × × × × × 2016 CVPR STN + CNN + Attentional BLSTM
Lee and Osindero [18] : R2AM × × × × × 2016 CVPR Presenting recursive recurrent neural networks with attention modeling.
Liu et al. [19] : STAR-Net × × × × × 2016 BMVC STN + ResNet + BLSTM + CTC
*Yang et al. [20] × × × × 2017 IJCAI 1) CNN + 2D-Attention-based RNN, applying an auxiliary dense character detection task that helps to learn text specific visual patterns. 2) Developing a large-scale synthetic dataset.
Yin et al. [21] × × × × × 2017 ICCV CNN + CTC
*Cheng et al. [22] : FAN × × × × 2017 ICCV 1) Proposing the concept of attention drift. 2)Introducing focusing network to focus deviated attention back on the target areas.
Cheng et al. [23] : AON × × × × × 2018 CVPR 1) Extracting scene text features in four directions. 2)CNN + Attentional BLSTM
Gao et al. [24] × × × × 2017 arXiv Attentional ResNet + CNN + CTC
Liu et al. [25] : Char-Net × × × × 2018 AAAI CNN + STN (facilitating the rectification of individual characters) + LSTM
*Liu et al. [26] : SqueezedText × × × × × 2018 AAAI Binary convolutional encoder-decoder network + Bi-RNN
*Bai et al. [27] : EP × × × × 2018 CVPR Proposing edit probability to effectively handle the misalignment between the training text and the output probability distribution sequence.
Liu et al. [28] × × × × × 2018 ECCV Designing a multi-task network with an encoder-discriminator-generator architecture to guide the feature of the original image toward that of the clean image.
Gao et al. [29] × × × × 2018 ICIP Attentional DenseNet + BLSTM + CTC
Shi et al. [30] : ASTER × × × × 2018 TPAMI TPS + ResNet + Bidirectional attention-based BLSTM
Luo et al. [46] : MORAN × × × × 2019 PR Multi-object rectification network + CNN + Attentional BLSTM
Xie et al. [47] : CAN × × × × × 2019 ACM ResNet + CNN + GLU
*Liao et al.[48] : CA-FCN × × × 2019 AAAI Performing character classification at each pixel location and needing character-level annotations.
*Li et al. [49] : SAR × × × × 2019 AAAI ResNet + 2D Attentional LSTM
Zhan el at. [55]: ESIR × × × × × 2019 CVPR Iterative rectification Network + ResNet + Attentional BLSTM
Zhang et al. [56]: SSDAN × × × × 2019 CVPR Attentional CNN + GAS + GRU

2.2 识别结果

In this section, we list the previous recognition algorithms in community on scene text recognition benchmarks, including IIIT5K,SVT,IC03,IC13,SVT-P,CUTE80,IC15,CTW,MWTI and RCTW.

It is important to notice that 1) The '*' indicates the method using the extra datasets. 2) The bold represents the highest recognition results. 3) '^' denotes the highest recognition results of using the extra datasets. 4) '@' represents the methods under different evaluation which only uses 1811 test images.

2.2.1 Recognition Results on Regular Dataset

Recognition Results on Regular Datasets
Method        IIIT5K SVT IC03 IC13
50 1K None 50 None 50 Full 50k None None
Wang et al. [1] : ABBYY 24.3 - - 35.0 - 56.0 55.0 - - -
Wang et al. [1] : SYNTH+PLEX - - - 57.0 - 76.0 62.0 - - -
Mishra et al. [2] 64.1 57.5 - 73.2 - 81.8 67.8 - - -
Wang et al. [3] - - - 70.0 - 90.0 84.0 - - -
Goel et al. [4] : wDTW - - - 77.3 - 89.7 - - - -
Bissacco et al. [5] : PhotoOCR - - - 90.4 78.0 - - - - 87.6
Phan et al. [6] - - - 73.7 - 82.2 - - - -
Alsharif et al. [7] : HMM/Maxout - - - 74.3 - 93.1 88.6 85.1 - -
Almazan et al [8] : KCSR 88.6 75.6 - 87.0 - - - - - -
Yao et al. [9] : Strokelets 80.2 69.3 - 75.9 - 88.5 80.3 - - -
R.-Serrano et al.[10] : Label embedding 76.1 57.4 - 70.0 - - - - - -
Jaderberg et al. [11] - - - 86.1 - 96.2 91.5 - - -
Su and Lu [12] - - - 83.0 - 92.0 82.0 - - -
Gordo[13] : Mid-features 93.3 86.6 - 91.8 - - - - - -
Jaderberg et al. [14] 97.1 92.7 - 95.4 80.7 98.7 98.6 93.3 93.1 90.8
Jaderberg et al. [15] 95.5 89.6 - 93.2 71.7 97.8 97.0 93.4 89.6 81.8
Shi, Bai, and Yao [16] : CRNN 97.8 95.0 81.2 97.5 82.7 98.7 98.0 95.7 91.9 89.6
Shi et al. [17] : RARE 96.2 93.8 81.9 95.5 81.9 98.3 96.2 94.8 90.1 88.6
Lee and Osindero [18] : R2AM 96.8 94.4 78.4 96.3 80.7 97.9 97.0 - 88.7 90.0
Liu et al. [19] : STAR-Net 97.7 94.5 83.3 95.5 83.6 96.9 95.3 - 89.9 89.1
*Yang et al. [20] 97.8 96.1 - 95.2 - 97.7 - - - -
Yin et al. [21] 98.7 96.1 78.2 95.1 72.5 97.6 96.5 - 81.1 81.4
*Cheng et al. [22] : FAN 99.3 97.5 87.4 97.1 85.9 ^99.2 97.3 - 94.2 93.3
Cheng et al. [23] : AON 99.6 98.1 87.0 96.0 82.8 98.5 97.1 - 91.5 -
Gao et al. [24] 99.1 97.9 81.8 97.4 82.7 98.7 96.7 - 89.2 88.0
Liu et al. [25] : Char-Net - - 83.6 - 84.4 - 93.3 - 91.5 90.8
*Liu et al. [26] : SqueezedText 97.0 94.1 87.0 95.2 - 98.8 97.9 93.8 93.1 92.9
*Bai et al. [27] : EP 99.5 97.9 88.3 96.6 87.5 98.7 97.9 - 94.6 94.4
Liu et al. [28] 97.3 96.1 89.4 96.8 87.1 98.1 97.5 - 94.7 94.0
Gao et al. [29] 99.1 97.2 83.6 97.7 83.9 98.6 96.6 - 91.4 89.5
Shi et al. [30] : ASTER 99.6 98.8 93.4 97.4 89.5 98.8 98.0 - 94.5 91.8
Luo et al. [46] : MORAN 97.9 96.2 91.2 96.6 88.3 98.7 97.8 - 95.0 92.4
Xie et al. [47] : CAN 97.0 94.2 80.5 96.9 83.4 98.4 97.8 - 91.0 90.5
*Liao et al.[48] : CA-FCN ^99.8 ^98.9 92.0 ^98.8 82.1 - - - - 91.4
*Li et al. [49] : SAR 99.4 98.2 ^95.0 98.5 ^91.2 - - - - 94.0
Zhan el at. [55]: ESIR 99.6 98.8 93.3 97.4 90.2 - - - - 91.3
Zhang et al. [56]: SSDAN - - 83.8 - 84.5 - - - 92.1 91.8

2.2.2 Recognition Results on Irregular Dataset

Recognition Results on Irregualr Datasets
Method        SVT-P CUTE80 IC15 COCO-TEXT
50 Full None None None None
Wang et al. [1] : ABBYY 40.5 26.1 - - - -
Wang et al. [1] : SYNTH+PLEX - - - - - -
Mishra et al. [2] 45.7 24.7 - - - -
Wang et al. [3] 40.2 32.4 - - - -
Goel et al. [4] : wDTW - - - - - -
Bissacco et al. [5] : PhotoOCR - - - - - -
Phan et al. [6] 62.3 42.2 - - - -
Alsharif et al. [7] : HMM/Maxout - - - - - -
Almazan et al [8] : KCSR - - - - - -
Yao et al. [9] : Strokelets - - - - - -
R.-Serrano et al.[10] : Label embedding - - - - - -
Jaderberg et al. [11] - - - - - -
Su and Lu [12] - - - - - -
Gordo[13] : Mid-features - - - - - -
Jaderberg et al. [14] - - - - - -
Jaderberg et al. [15] - - - - - -
Shi, Bai, and Yao [16] : CRNN - - - - - -
Shi et al. [17] : RARE 91.2 77.4 71.8 59.2 - -
Lee and Osindero [18] : R2AM - - - - - -
Liu et al. [19] : STAR-Net 94.3 83.6 73.5 - - -
*Yang et al. [20] 93.0 80.2 75.8 69.3 - -
Yin et al. [21] - - - - - -
*Cheng et al. [22] : FAN - - - - *85.3 -
Cheng et al. [23] : AON 94.0 83.7 73.0 76.8 68.2 -
Gao et al. [24] - - - - - -
Liu et al. [25] : Char-Net - - 73.5 - 60.0 -
*Liu et al. [26] : SqueezedText - - - - - -
*Bai et al. [27] : EP - - - - 73.9 -
Liu et al. [28] - - 73.9 62.5 - -
Gao et al. [29] - - - - - -
Shi et al. [30] : ASTER - - 78.5 79.5 76.1 -
Luo et al. [46] : MORAN 94.3 86.7 76.1 77.4 68.8 -
Xie et al. [47] : CAN - - - - - -
*Liao et al.[48] : CA-FCN - - - 78.1 - -
*Li et al. [49] : SAR ^95.8 ^91.2 ^86.4 ^89.6 ^78.8 ^66.8
Zhan el at. [55]: ESIR - - 79.6 83.3 76.9 -
Zhang et al. [56]: SSDAN - - - - - -

2.2.3 Chinses Scene Text Dataset

Recognition Results on Chinese Scene Text Dataset
Method RCTW MTWI CTW Time Source
Zheqi He,Yongtao Wang : Foo & Bar 82.0 (end-to-end) - - 2017 RCTW competition
IFLYTEK : nelslip(iflytek&ustc) - 85.8 (AR) - 2018 MTWI competition
Yuan et al.[42] : CTW - - 80.5 (AR) 2018 CTW
Liu et al. [43] : CTW-1500 - - - 2017 CTW-1500
***

3. Survey

[50] [TPAMI-2015] Ye Q, Doermann D. Text detection and recognition in imagery: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(7): 1480-1500. paper

[51] [Frontiers-Comput. Sci-2016] Zhu Y, Yao C, Bai X. Scene text detection and recognition: Recent advances and future trends[J]. Frontiers of Computer Science, 2016, 10(1): 19-36. paper

[52] [arXiv-2018] Long S, He X, Ya C. Scene Text Detection and Recognition: The Deep Learning Era[J]. arXiv preprint arXiv:1811.04256, 2018. paper


4. OCR Service

OCR API Free
Tesseract OCR Engine ×
Azure
ABBYY
OCR Space
SODA PDF OCR
Free Online OCR
Online OCR
Super Tools
在线中文识别
Calamari OCR ×
腾讯OCR ×

5. References

[1] [ICCV-2011] K. Wang, B. Babenko, and S. Belongie. End-to-end scene text recognition. In Proceedings of International Conference on Computer Vision (ICCV), pages 1457–1464, 2011. paper

[2] [BMVC-2012] A. Mishra, K. Alahari, and C. Jawahar. Scene text recognition using higher order language priors. In Proceedings of British Machine Vision Conference (BMVC), pages 1–11, 2012. paper dataset

[3] [ICPR-2012] T. Wang, D. J. Wu, A. Coates, and A. Y. Ng. End-to-end text recognition with convolutional neural networks. In Proceedings of International Conference on Pattern Recognition (ICPR), pages 3304–3308, 2012. paper

[4] [ICDAR-2013] V. Goel, A. Mishra, K. Alahari, and C. Jawahar. Whole is greater than sum of parts: Recognizing scene text words. In Proceedings of International Conference on Document Analysis and Recognition (ICDAR), pages 398–402, 2013. paper

[5] [ICCV-2013] A. Bissacco, M. Cummins, Y. Netzer, and H. Neven. Photoocr: Reading text in uncontrolled conditions. In Proceedings of International Conference on Computer Vision (ICCV), pages 785–792, 2013. paper

[6] [ICCV-2013] T. Quy Phan, P. Shivakumara, S. Tian, and C. Lim Tan. Recognizing text with perspective distortion in natural scenes.In Proceedings of International Conference on Computer Vision (ICCV), pages 569–576, 2013. paper

[7] [ICLR-2014] O. Alsharif and J. Pineau, End-to-end text recognition with hybrid HMM maxout models, in: Proceedings of International Conference on Learning Representations (ICLR), 2014. paper

[8] [TPAMI-2014] J. Almaz ́ an, A. Gordo, A. Forn ́ es, and E. Valveny. Word spotting and recognition with embedded attributes. IEEE Trans.Pattern Anal. Mach. Intell ., 36(12):2552–2566, 2014. paper code

[9] [CVPR-2014] C. Yao, X. Bai, B. Shi, and W. Liu. Strokelets: A learned multi-scale representation for scene text recognition. In Proceedings of Computer Vision and Pattern Recognition (CVPR), pages 4042–4049, 2014. paper

[10] [IJCV-2015] J. A. Rodriguez-Serrano, A. Gordo, and F. Perronnin. Label embedding: A frugal baseline for text recognition. International Journal of Computer Vision (IJCV) , 113(3):193–207, 2015. paper

[11] [ECCV-2014] M. Jaderberg, A. Vedaldi, and A. Zisserman. Deep features for text spotting. In Proceedings of European Conference on Computer Vision (ECCV), pages 512–528, 2014. paper code

[12] [ACCV-2014] B. Su and S. Lu. Accurate scene text recognition based on recurrent neural network. In Proceedings of Asian Conference on Computer Vision (ACCV), pages 35–48, 2014. paper

[13] [CVPR-2015] A. Gordo. Supervised mid-level features for word image representation. In Proceedings of Computer Vision and Pattern Recognition (CVPR), pages 2956–2964, 2015. paper

[14] [IJCV-2015] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman. Reading text in the wild with convolutional neural networks. Int. J.Comput. Vision, 2015. paper code

[15] [ICLR-2015] M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman, Deep structured output learning for unconstrained text recognition, in: Proceedings of International Conference on Learning Representations (ICLR), 2015. paper

[16] [TPAMI-2017] B. Shi, X. Bai, and C. Yao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell., 39(11):2298–2304, 2017. paper code-Torch7 code-Pytorch

[17] [CVPR-2016] B. Shi, X. Wang, P. Lyu, C. Yao, and X. Bai. Robust scene text recognition with automatic rectification. In Proceedings of Computer Vision and Pattern Recognition (CVPR), pages 4168–4176, 2016. paper

[18] [CVPR-2016] C.-Y. Lee and S. Osindero. Recursive recurrent nets with attention modeling for OCR in the wild. In Proceedings of Computer Vision and Pattern Recognition (CVPR), pages 2231–2239, 2016. paper

[19] [BMVC-2016] W. Liu, C. Chen, K.-Y. K. Wong, Z. Su, and J. Han. STAR-Net: A spatial attention residue network for scene text recognition. In Proceedings of British Machine Vision Conference (BMVC), page 7, 2016. paper

[20] [IJCAI-2017] X. Yang, D. He, Z. Zhou, D. Kifer, and C. L. Giles. Learning to read irregular text with attention mechanisms. Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), 2017. paper

[21] [ICCV-2017] F. Yin, Y.-C. Wu, X.-Y. Zhang, and C.-L. Liu. Scene text recognition with sliding convolutional character models. In Proceedings of International Conference on Computer Vision (ICCV), 2017. paper

[22] [ICCV-2017] Z. Cheng, F. Bai, Y. Xu, G. Zheng, S. Pu, and S. Zhou. Focusing attention: Towards accurate text recognition in natural images. In Proceedings of International Conference on Computer Vision (ICCV), pages 5086–5094, 2017. paper

[23] [CVPR-2018] Cheng Z, Xu Y, Bai F, et al. AON: Towards Arbitrarily-Oriented Text Recognition.In Proceedings of Computer Vision and Pattern Recognition (CVPR), pages 5571-5579, 2018. paper

[24] [arXiv-2017] Gao Y, Chen Y, Wang J, et al. Reading Scene Text with Attention Convolutional Sequence Modeling[J]. arXiv preprint arXiv:1709.04303, 2017. paper

[25] [AAAI-2018] Liu W, Chen C, Wong K Y K. Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition[C]//AAAI. 2018. paper

[26] [AAAI-2018] Liu Z, Li Y, Ren F, et al. SqueezedText: A Real-Time Scene Text Recognition by Binary Convolutional Encoder-Decoder Network[C]//AAAI. 2018. paper

[27] [CVPR-2018] Bai, F, Cheng, Z, Niu, Y, Pu, S and Zhou,S. Edit probability for scene text recognition, pages 1508-1516, 2018. paper

[28] [ECCV-2018] Liu Y, Wang Z, Jin H, et al. Synthetically Supervised Feature Learning for Scene Text Recognition[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 435-451. paper

[29] [ICIP-2018] Gao Y, Chen Y, Wang J, et al. Dense Chained Attention Network for Scene Text Recognition[C]//2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, 2018: 679-683. paper

[30] [TPAMI-2018] Shi B, Yang M, Wang X, et al. Aster: An attentional scene text recognizer with flexible rectification[J]. IEEE transactions on pattern analysis and machine intelligence, 2018. paper code

[31] [CVPR-2012] A. Mishra, K. Alahari, and C. V. Jawahar. Top-down and bottom-up cues for scene text recognition. In CVPR, 2012. paper

[32] [ICCV-2011] K. Wang, B. Babenko, and S. Belongie. End-to-end scene text recognition. In ICCV, 2011. paper

[33] [IJDAR-2005] S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, R. Young,K. Ashida, H. Nagai, M. Okamoto, H. Yamamoto, H. Miyao,J. Zhu, W. Ou, C. Wolf, J. Jolion, L. Todoran, M. Worring, and X. Lin. ICDAR 2003 robust reading competitions:entries, results,and future directions. IJDAR, 7(2-3):105–122, 2005. paper

[34] [ICDAR-2013] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Bigorda,S. R. Mestre, J. Mas, D. F. Mota, J. Almaz ́ an, and L. de las Heras. ICDAR 2013 robust reading competition. In ICDAR, 2013. paper

[35] [ICCV-2013] T. Q. Phan, P. Shivakumara, S. Tian, and C. L. Tan. Recognizing text with perspective distortion in natural scenes. In ICCV, 2013. paper

[36] [Expert Syst.Appl-2014] A. Risnumawan, P. Shivakumara, C. S. Chan, and C. L. Tan. A robust arbitrary text detection system for natural scene images. Expert Syst. Appl., 41(18):8027–8048, 2014. paper

[37] [ICDAR-2015] D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. K. Ghosh, A. D.Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny. ICDAR 2015 competition on robust reading. In ICDAR, pages 1156–1160,2015. paper

[38] [arXiv-2016] Veit A, Matera T, Neumann L, et al. Coco-text: Dataset and benchmark for text detection and recognition in natural images[J]. arXiv preprint arXiv:1601.07140, 2016. paper

[39] [ICDAR-2017] Ch'ng C K, Chan C S. Total-text: A comprehensive dataset for scene text detection and recognition[C]//Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on. IEEE, 2017, 1: 935-942. paper

[40] [ICDAR-2017] Shi B, Yao C, Liao M, et al. ICDAR2017 competition on reading chinese text in the wild (RCTW-17)[C]//Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on. IEEE, 2017, 1: 1429-1434. paper

[41] [ICPR-2018] He M, Liu Y, Yang Z, et al. ICPR2018 Contest on Robust Reading for Multi-Type Web Images[C]//2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 2018: 7-12. paper

[42] [arXiv-2018] Yuan T L, Zhu Z, Xu K, et al. Chinese Text in the Wild[J]. arXiv preprint arXiv:1803.00085, 2018. paper

[43] [arXiv-2017] Yuliang L, Lianwen J, Shuaitao Z, et al. Detecting curve text in the wild: New dataset and new solution[J]. arXiv preprint arXiv:1712.02170, 2017. paper

[44] [ECCV-2018] Yao C, Wu W. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes.//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 71-88. paper

[45] [NIPS-WORKSHOP-2011] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco,Bo Wu, and Andrew YNg. Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning, volume 2011, page 5, 2011. paper

[46] [PR-2019] C. Luo, L. Jin, and Z. Sun, “MORAN: A multi-object rectified attention network for scene text recognition,” Pattern Recognition, vol. 90, pp. 109–118, 2019. paper code

[47] [ACM-2019] Xie H, Fang S, Zha Z J, et al, “Convolutional Attention Networks for Scene Text Recognition,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 15, pp. 3 2019. paper

[48] [AAAI-2019] Liao M, Zhang J, Wan Z, et al, “Scene text recognition from two-dimensional perspective,” //AAAI. 2019. paper

[49] [AAAI-2019] Li H, Wang P, Shen C, et al, “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition,” //AAAI. 2019. paper

[50] [TPAMI-2015] Ye Q, Doermann D. Text detection and recognition in imagery: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(7): 1480-1500. paper

[51] [Frontiers-Comput. Sci-2016] Zhu Y, Yao C, Bai X. Scene text detection and recognition: Recent advances and future trends[J]. Frontiers of Computer Science, 2016, 10(1): 19-36. paper

[52] [arXiv-2018] Long S, He X, Ya C. Scene Text Detection and Recognition: The Deep Learning Era[J]. arXiv preprint arXiv:1811.04256, 2018. paper

[53] [NIPS-WORKSHOP-2014] M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman, Synthetic data and artificial neural networks for natural scene text recognition, in: Proceedings of Advances in Neural Information Processing Deep Learn. Workshop (NIPS-W).2014. paper

[54] [CVPR-2016] A. Gupta, A. Vedaldi, A. Zisserman, Synthetic data for text localisation in natural images, in: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2315–2324. paper

[55] [CVPR-2019] Zhan F, Lu S. Esir: End-to-end scene text recognition via iterative image rectification, in: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2059-2068. paper

[56] [CVPR-2019] Zhang Y, Nie S, Liu W, et al. Sequence-To-Sequence Domain Adaptation Network for Robust Text Image Recognition, in: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2740-2749. paper

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •