Skip to content

HCIILAB/Scene-Text-Recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 

Repository files navigation

Scene Text Recognition Resources

Author: 陈晓雪

Updates

Dec 24, 2019: added 20 papers and the C-SVT dataset, and updated corresponding tables. You can download the new Excel prepared by us. (Password: teqv)


1. Datasets

1.1 Regular Scene Text Datasets

  • IIIT5K[31]:
    • Introduction: It contains 5,000 images in total, 2,000 for training and 3,000 for testing. Every image is associated with a 50-word lexicon and a 1000-word lexicon. The lexicon consists of a ground truth and some randomly picked words.
    • Link: IIIT5K-download
  • SVT[1]:
    • Introduction: It contains 647 cropped word images. Many images are severely corrupted by noise, blur, and low resolution. SVT was collected from the Google Street View, and every image is associated with a 50-word lexicon. Specifically, it only provides word-level annotations.
    • Link: SVT-download
  • ICDAR 2003(IC03)[33]:
    • Introduction: It contains 509 images in total, 258 for training and 251 for testing. Specifically, it contains 867 cropped word images after discarding images that contain non-alphanumeric characters or those have less than three characters. Every image is associated with a 50-word lexicon and a full-word lexicon. The full lexicon combines all lexicon words.
    • Link: IC03-download
  • ICDAR 2013(IC13)[34]:
    • Introduction: It contains 1,015 cropped word images and inherits most of its samples from IC03. No lexicon is associated with this dataset.
    • Link: IC13-download
  • COCO-Text[38]:
    • Introduction: It contains 63,686 images in total. Specifically, it contains 145,859 cropped word images for testing, including handwritten and printed, clear and blur, English and non-English.
    • Link: COCO-Text-download
  • SVHN[45]:
    • Introduction: It contains more than 600,000 digits of house numbers in natural scenes. The images were collected from the Google View images, and were used to digit recognition.
    • Link: SVHN-download

1.2 Irregular Scene Text Datasets

  • SVT-P[35]:
    • Introduction: It contains 639 cropped word images for testing. Images were selected from the side-view angle snapshots in Google Street View. Therefore, most images are heavily distorted by the non-frontal view angle. Every image is associated with a 50-word lexicon and a full-word lexicon.
    • Link: SVT-P-download (Password : vnis)
  • CUTE80[36]:
    • Introduction: It contains 80 high-resolution images taken in natural scenes. Specifically, it contains 288 cropped word images for testing. The dataset focuses on curved text. No lexicon is provided.
    • Link: CUTE80-download
  • ICDAR 2015(IC15)[37]:
    • Introduction: It contains 1,500 images in total, 1,000 for training and 500 for testing. Specifically, it contains 2,077 cropped images including more than 200 irregular text. No lexicon is associated with this dataset.
    • Link: IC15-download
  • Total-Text[39]:
    • Introduction: It contains 1,555 images in total. Specifically, it contains 11,459 cropped word images with more than three different text orientations: horizontal, multi-oriented and curved.
    • Link: Total-Text-download

1.3 Bilingual Scene Text Datasets (mainly in Chinese and English)

  • RCTW-17(RCTW competition,ICDAR17)[40]:
    • Introduction: It contains 12,514 images in total, 11,514 for training and 1,000 for testing. Images in RCTW-17 were mostly collected by camera or mobile phone, and others were generated images. Text instances are annotated with parallelograms. It is the first large scale Chinese dataset, and was also the largest published one by then.
    • Link: RCTW-17-download
  • MTWI(competition)[41]:
    • Introduction: It contains 20,000 images. The dataset mainly consists of Chinese or English web text. The competition includes three tasks: web text recognition, web text detection and end-to-end web text detection and recognition.
    • Link: MTWI-download (Password:gox9)
  • CTW[42]:
    • Introduction: It contains 32,285 high resolution street view images of Chinese text, with 1,018,402 character instances in total. All images are annotated at the character level, including its underlying character type, bounding box, and 6 other attributes. These attributes indicate whether its background is complex, whether it’s raised, whether it’s hand-written or printed, whether it’s occluded, whether it’s distorted, whether it uses word-art.
    • Link: CTW-download
  • SCUT-CTW1500[43]:
    • Introduction: It contains 1,500 images in total, 1,000 for training and 500 for testing. Specifically, it contains 10,751 cropped word images for testing. Annotations in SCUT-CTW1500 are polygons with 14 vertexes. The dataset mainly consists of Chinese and English.
    • Link: SCUT-CTW1500-download
  • LSVT(LSVT competition, ICDAR2019)[57]:
    • Introduction: It contains 20,000 testing data, 30,000 training data in full annotations and 400,000 training data in weak annotations, which are referred to as partial labels. For most of the training data in weak labels, only one transcription per image is provided. All the images were captured from streets, which consist of a large variety of complicated real-world scenarios, e.g., store fronts and landmarks.
    • Link: LSVT-download
  • ArT(ArT competition, ICDAR2019)[58]:
    • Introduction: It contains 10,166 images in total, 5,603 for training and 4,563 for testing. ArT is a combination of Total-Text, SCUT-CTW1500 and Baidu Curved Scene Text, which were collected with the motive of introducing the arbitrary-shaped text problem to the scene text community. The ArT dataset was collected with text shape diversity, hence all existing text shapes (i.e. horizontal, multi-oriented, and curved) have high number of existence in the dataset.
    • Link: ArT-download
  • ReCTS(ReCTS competition, ICDAR2019)[59]:
    • Introduction: A practical and challenging multi-orientation natural scene text dataset (ReCTS) was collected with 25,000 images, which consist of lots of signboards. In the dataset, all text lines and characters are labeled with locations and character codes.
    • Link: ReCTS-download
  • Chinese Street View Text(C-SVT) [63]:
    • Introduction: It contains more than 430,000 street view images in total, including 30,000 fully annotated images with locations and text labels for the regions and 400,000 more images in which only the annotations of the text-of-interest are given. It is the largest one compared with existing Chinese text reading datasets.

1.4 Synthetic Datasets

  • Synth90k [53] :
    • Introduction: It contains 8 million cropped word images generated from a set of 90k common English words. Words are rendered onto natural images with random transformations and effects. Every image is annotated with a ground-truth word.
    • Link: Synth90k-download
  • SynthText [54] :
    • Introduction: It contains 6 million cropped word images. The generation process is similar to that of Synth90k.
    • Link: SynthText-download

1.5 Comparison of Datasets

Comparison of Datasets
                  Datasets                   Language Images Lexicon Label Type
Pictures Instances Training Pictures Training Instances Testing Pictures Testing Instances 50 1k Full None Char Word
IIIT5K[31] English 1120 5000 380 2000 740 3000 × Regular
SVT[32] English 350 725 100 211 250 514 × × × Regular
IC03[33] English 509 2268 258 1157 251 1111 Regular
IC13[34] English 561 5003 420 3564 141 1439 × × × Regular
COCO-Text[38] English 63686 145859 43686 118309 10000 27550 × × × × Regular
SVHN[45] Digits 600000 600000 573968 573968 26032 26032 × × × Regular
SVT-P[35] English 238 639 - - 238 639 × × Irregular
CUTE80[36] English 80 288 - - 80 288 × × × × Irregular
IC15[37] English 1500 - 1000 - 500 2077 × × × × Irregular
Total-Text[39] English 1555 11459 1255 - 300 - × × × × Irregular
RCTW-17[40] Chinese/English 12514 - 11514 - 1000 - × × × × Regular
MTWI[41] Chinese/English 20000 - 10000 - 10000 - × × × × Regular
CTW[42] Chinese/English 32285 1018402 25887 812872 3269 103519 × × × Regular
SCUT-CTW1500[43] Chinese/English 1500 10751 1000 - 500 - × × × × Irregular
LSVT[57] Chinese/English 450000 - 30000 - 20000 - × × × × Irregular
ArT[58] Chinese/English 10166 - 5603 - 4563 - × × × × Irregular
ReCTS[59] Chinese/English 25000 - - - - - × × × Irregular
Synth90k[53] English 8000000 - - - - - × × × × Regular
SynthText[54] English 6000000 - - - - - × × × × Regular
C-SVT(full annotations)[63] Chinese 29966 908305 20157 620368 4841 143849 × × × × Irregular

2. Summary of Scene Text Recognition Results

2.1 Comparison of methods

It is notable that 1) "Reg" stands for regular scene text datasets. 2) "Irreg" stands for irregular scene text datasets. 3) "Seg" denotes the method based on segmentation. 4) "Extra" means the method uses the extra datasets in addition to Synth90k and SynthText. 5) "CTC" represents the method applies the CTC-based algorithm to decode. 6) "Attn" represents the method applies the attention mechanism to decode.

You can also download the new Excel prepared by us. (Password: teqv)

Comparison of methods
                          Method                           Code Regular Irregular Segmentation Extra data CTC Attention Source Time                                                                         Highlight                                                                        
Wang et al. [1] : ABBYY × × × × ICCV 2011 A state-of-the-art text detector + a leading commercial OCR engine
Wang et al. [1] : SYNTH+PLEX × × × × × ICCV 2011 The baseline of scene text recognition.
Mishra et al. [2] × × × × × BMVC 2012 1) Incorporating higher order statistical language models to recognize words in an unconstrained manner. 2) Introducing IIIT5K-word dataset.
Wang et al. [3] × × × × ICPR 2012 CNNs + Non-maximal suppression + beam search
Goel et al. [4] : wDTW × × × × × ICDAR 2013 Recognizing the text in the image by matching the scene and synthetic image features with wDTW.
Bissacco et al. [5] : PhotoOCR × × × × × ICCV 2013 Applying a network with five hidden layers for character classification.
Phan et al. [6] × × × × × ICCV 2013 1) MSER + SIFT descriptors + SVM 2) Introducing the SVT-P datasets.
Alsharif et al. [7] : HMM/Maxout × × × × × ICLR 2014 Convolutional Maxout networks + Hybrid HMM
Almazan et al [8] : KCSR × × × × × TPAMI 2014 Embedding word images and text string in a common vectorial subspace and allowing one to cast recognition and retrieval tasks as a nearest neighbor problem.
Yao et al. [9] : Strokelets × × × × × CVPR 2014 Proposing a novel multi-scale representation for scene text recognition: strokelets.
R.-Serrano et al.[10] : Label embedding × × × × × × IJCV 2015 Embedding word labels and word images into a common Euclidean space and finding the cloest word label in this space.
Jaderberg et al. [11] × × × × ECCV 2014 1) Enabling efficient feature sharing for text detection and classification. 2) Making technical changes over the traditional CNN architectures. 3) Proposing a method of automated data mining of Flickr.
Su and Lu [12] × × × × × ACCV 2014 HOG + BLSTM + CTC
Gordo[13] : Mid-features × × × × × CVPR 2015 Proposing to learn local mid-level features suitable for building word image representations.
Jaderberg et al. [14] × × × × × IJCV 2015 1) Treating each word as a category and training very large convolutional neural networks to perform word recognition on the whole proposal region. 2) Generating 9 million images, with equal numbers of word samples from a 90k word dictionary.
Jaderberg et al. [15] × × × × × × ICLR 2015 CNN + CRF
Shi, Bai, and Yao [16] : CRNN × × × × TPAMI 2017 CNN + BLSTM + CTC
Shi et al. [17] : RARE × × × × × CVPR 2016 STN + CNN + Attentional BLSTM
Lee and Osindero [18] : R2AM × × × × × CVPR 2016 Presenting recursive recurrent neural networks with attention modeling.
Liu et al. [19] : STAR-Net × × × × × BMVC 2016 STN + ResNet + BLSTM + CTC
Liu et al. [78] × × × × ICPR 2016 integrating the CNN and WFST classification model
Mishra et al. [77] × × × × CVIU 2016 character detection (HOG/CNN + SVM +Sliding window) + CRF, combining bottom-up cues from character detections and top-down cues from lexicon.
Su and Lu [76] × × × × PR 2017 HOG(different scale) + BLSTM + CTC (ensemble)
*Yang et al. [20] × × × × IJCAI 2017 1) CNN + 2D-Attention-based RNN, applying an auxiliary dense character detection task that helps to learn text specific visual patterns. 2) Developing a large-scale synthetic dataset.
Yin et al. [21] × × × × × ICCV 2017 CNN + CTC
Wang et al.[66] : GRCNN × × × × NIPS 2017 Gated Recurrent Convulution Layer + BLSTM + CTC
*Cheng et al. [22] : FAN × × × × ICCV 2017 1) Proposing the concept of attention drift. 2)Introducing focusing network to focus deviated attention back on the target areas.
Cheng et al. [23] : AON × × × × × CVPR 2018 1) Extracting scene text features in four directions. 2)CNN + Attentional BLSTM
Gao et al. [24] × × × × arXiv 2017 Attentional ResNet + CNN + CTC
Liu et al. [25] : Char-Net × × × × AAAI 2018 CNN + STN (facilitating the rectification of individual characters) + LSTM
*Liu et al. [26] : SqueezedText × × × × × AAAI 2018 Binary convolutional encoder-decoder network + Bi-RNN
Zhan et al.[73] × × × CVPR 2018 CRNN, achieving verisimilar scene text image synthesis by combining three novel designs including semantic coherence, visual attention and adaptive text appearance.
*Bai et al. [27] : EP × × × × CVPR 2018 Proposing edit probability to effectively handle the misalignment between the training text and the output probability distribution sequence.
Fang et al.[74] × × × × MultiMedia 2018 ResNet + [2D Attentional CNN, CNN-based language module]
Liu et al.[75] : EnEsCTC × × × × NIPS 2018 proposing a novel maximum entropy based regularization for CTC(EnCTC) and an entropy-based pruning method(EsCTC) to effectively reduce the space of the feasible set.
Liu et al. [28] × × × × × ECCV 2018 Designing a multi-task network with an encoder-discriminator-generator architecture to guide the feature of the original image toward that of the clean image.
Wang et al.[61] : MAAN × × × × × ICFHR 2018 ResNet + BLSTM + Memory-Augmented Attentional Decoder
Gao et al. [29] × × × × ICIP 2018 Attentional DenseNet + BLSTM + CTC
Shi et al. [30] : ASTER × × × × TPAMI 2018 TPS + ResNet + Bidirectional attention-based BLSTM
Chen et al. [60] : ASTER + AEG × × × × × NC 2019 TPS + ResNet + Bidirectional attention-based BLSTM + AEG
Luo et al. [46] : MORAN × × × × PR 2019 Multi-object rectification network + CNN + Attentional BLSTM
Luo et al. [61] : MORAN-v2 × × × × PR 2019 Multi-object rectification network + ResNet + Attentional BLSTM
Chen et al. [60] : MORAN-v2 + AEG × × × × × NC 2019 Multi-object rectification network + ResNet + Attentional BLSTM + AEG
Xie et al. [47] : CAN × × × × × ACM 2019 ResNet + CNN + GLU
*Liao et al.[48] : CA-FCN × × × AAAI 2019 Performing character classification at each pixel location and needing character-level annotations.
*Li et al. [49] : SAR × × × AAAI 2019 ResNet + 2D Attentional LSTM
Zhan el at. [55]: ESIR × × × × × CVPR 2019 Iterative rectification Network + ResNet + Attentional BLSTM
Zhang et al. [56]: SSDAN × × × × CVPR 2019 Attentional CNN + GAS + GRU
Yang et al. [62]: ScRN × × × × ICCV 2019 Symmetry-constrained Rectification Network + ResNet + BLSTM + Attentional GRU
Wang et al. [64]: GCAM × × × × × ICME 2019 Convolutional Block Attention Module (CBAM) + ResNet + BLSTM + the proposed Gated Cascade Attention Module (GCAM)
Jeonghun et al. [65] × × × × ICCV 2019 TPS + ResNet + BLSTM + Attentional Mechanism
Huang et al. [67] : EPAN × × × × × NC 2019 learning to sample features from the text region of 2D feature maps, and innovatively introducing a two-stage attention mechanism
Gao et al. [68] × × × × × NC 2019 Attentional DenseNET + 4-layer CNN + CTC
Qi et al. [69] : CCL × × × × ICDAR 2019 ResNet + [CTC, CCL]
Wang et al. [70] : ReELFA × × × × ICDAR 2019 VGG + Attentional LSTM, utilizing one-hot encoded coordinates to indicate the spatial relationship of pixels and character center masks to help focus attention on the right feature areas.
Zhu et al. [71] : HATN × × × × ICIP 2019 ResNet50 + Hierarchical Attention Mechanism (Transformer structure)
Zhan et al. [72] : SF-GAN × × × × CVPR 2019 ResNet50 + Attentional Decoder, synthesising realistic scene text image for training better recognition models.
Liao et al. [79] : SAM × × × × TPAMI 2019 Spatial attentional module (SAM)
Liao et al. [79] : seg-SAM × × × TPAMI 2019 Character segmentation module + Spatial attention module (SAM)
Wang et al. [80] : DAN × × × × AAAI 2020 decoupling the decoder of the traditional attention mechanism into a convolutional alignment module and a decoupled text decoder

2.2 Recognition Results

In this section, we list the results on different scene text recognition benchmarks, including IIIT5K,SVT,IC03,IC13,SVT-P,CUTE80,IC15,RCTW-17, MWTI, CTW,SCUT-CTW1500, LSVT, ArT and ReCTS.

It is notable that 1) The '*' indicates the methods use the extra datasets in addition to Synth90k and SynthText. 2) The bold represents the best recognition results. 3) '^' denotes the best recognition results of using the extra datasets. 4) '@' represents the methods under different evaluation which only uses 1811 test images. 5) 'SK', 'ST', 'ExPu', 'ExPr' and 'Un' indicates the methods use Synth90K, SynthText, Extra Public Data, Extra Private Data and unknown data, respectively. 6) 'D_A' means data augmentation.

2.2.1 Recognition Results on Regular Dataset

Recognition Results on Regular Dataset
                          Method                           IIIT5K SVT IC03 IC13                         Data                         Source Time
50 1K None 50 None 50 Full 50k None None
Wang et al. [1] : ABBYY 24.3 - - 35 - 56 55 - - - Un ICCV 2011
Wang et al. [1] : SYNTH+PLEX - - - 57 - 76 62 - - - ExPr ICCV 2011
Mishra et al. [2] 64.1 57.5 - 73.2 - 81.8 67.8 - - - ExPu BMVC 2012
Wang et al. [3] - - - 70 - 90 84 - - - ExPr ICPR 2012
Goel et al. [4] : wDTW - - - 77.3 - 89.7 - - - - Un ICDAR 2013
Bissacco et al. [5] : PhotoOCR - - - 90.4 78 - - - - 87.6 ExPr ICCV 2013
Phan et al. [6] - - - 73.7 - 82.2 - - - - ExPu ICCV 2013
Alsharif et al. [7] : HMM/Maxout - - - 74.3 - 93.1 88.6 85.1 - - ExPu ICLR 2014
Almazan et al [8] : KCSR 88.6 75.6 - 87 - - - - - - ExPu TPAMI 2014
Yao et al. [9] : Strokelets 80.2 69.3 - 75.9 - 88.5 80.3 - - - ExPu CVPR 2014
R.-Serrano et al.[10] : Label embedding 76.1 57.4 - 70 - - - - - - ExPu IJCV 2015
Jaderberg et al. [11] - - - 86.1 - 96.2 91.5 - - - ExPu ECCV 2014
Su and Lu [12] - - - 83 - 92 82 - - - ExPu ACCV 2014
Gordo[13] : Mid-features 93.3 86.6 - 91.8 - - - - - - ExPu CVPR 2015
Jaderberg et al. [14] 97.1 92.7 - 95.4 80.7 98.7 98.6 93.3 93.1 90.8 ExPr IJCV 2015
Jaderberg et al. [15] 95.5 89.6 - 93.2 71.7 97.8 97 93.4 89.6 81.8 SK + ExPr ICLR 2015
Shi, Bai, and Yao [16] : CRNN 97.8 95 81.2 97.5 82.7 98.7 98 95.7 91.9 89.6 SK TPAMI 2017
Shi et al. [17] : RARE 96.2 93.8 81.9 95.5 81.9 98.3 96.2 94.8 90.1 88.6 SK CVPR 2016
Lee and Osindero [18] : R2AM 96.8 94.4 78.4 96.3 80.7 97.9 97 - 88.7 90 SK CVPR 2016
Liu et al. [19] : STAR-Net 97.7 94.5 83.3 95.5 83.6 96.9 95.3 - 89.9 89.1 SK + ExPr BMVC 2016
*Liu et al. [78] 94.1 84.7 - 92.5 - 96.8 92.2 - - - ExPu (D_A) ICPR 2016
*Mishra et al. [77] 78.07 - 46.73 78.2 - 88 - - 67.7 60.18 ExPu (D_A) CVIU 2016
*Su and Lu [76] - - - 91 - 95 89 - - 76 SK + ExPu PR 2017
*Yang et al. [20] 97.8 96.1 - 95.2 - 97.7 - - - - ExPu IJCAI 2017
Yin et al. [21] 98.7 96.1 78.2 95.1 72.5 97.6 96.5 - 81.1 81.4 SK ICCV 2017
Wang et al.[66] : GRCNN 98 95.6 80.8 96.3 81.5 98.8 97.8 - 91.2 - SK NIPS 2017
*Cheng et al. [22] : FAN 99.3 97.5 87.4 97.1 85.9 99.2 97.3 - 94.2 93.3 SK + ST (Pixel_wise) ICCV 2017
Cheng et al. [23] : AON 99.6 98.1 87 96 82.8 98.5 97.1 - 91.5 - SK + ST (D_A) CVPR 2018
Gao et al. [24] 99.1 97.9 81.8 97.4 82.7 98.7 96.7 - 89.2 88 SK arXiv 2017
Liu et al. [25] : Char-Net - - 83.6 - 84.4 - 93.3 - 91.5 90.8 SK (D_A) AAAI 2018
*Liu et al. [26] : SqueezedText 97 94.1 87 95.2 - 98.8 97.9 93.8 93.1 92.9 ExPr AAAI 2018
*Zhan et al.[73] 98.1 95.3 79.3 96.7 81.5 - - - - 87.1 Pr(5 million) CVPR 2018
*Bai et al. [27] : EP 99.5 97.9 88.3 96.6 87.5 98.7 97.9 - 94.6 94.4 SK + ST (Pixel_wise) CVPR 2018
Fang et al.[74] 98.5 96.8 86.7 97.8 86.7 99.3 98.4 - 94.8 93.5 SK + ST MultiMedia 2018
Liu et al.[75] : EnEsCTC - - 82 - 80.6 - - - 92 90.6 SK NIPS 2018
Liu et al. [28] 97.3 96.1 89.4 96.8 87.1 98.1 97.5 - 94.7 94 SK ECCV 2018
Wang et al.[61] : MAAN 98.3 96.4 84.1 96.4 83.5 97.4 96.4 - 92.2 91.1 SK ICFHR 2018
Gao et al. [29] 99.1 97.2 83.6 97.7 83.9 98.6 96.6 - 91.4 89.5 SK ICIP 2018
Shi et al. [30] : ASTER 99.6 98.8 93.4 97.4 89.5 98.8 98 - 94.5 91.8 SK + ST TPAMI 2018
Chen et al. [60] : ASTER + AEG 99.5 98.5 94.4 97.4 90.3 99 98.3 - 95.2 95 SK + ST NC 2019
Luo et al. [46] : MORAN 97.9 96.2 91.2 96.6 88.3 98.7 97.8 - 95 92.4 SK + ST PR 2019
Luo et al. [61] : MORAN-v2 - - 93.4 - 88.3 - - - 94.2 93.2 SK + ST PR 2019
Chen et al. [60] : MORAN-v2 + AEG 99.5 98.7 94.6 97.4 90.4 98.8 98.3 - 95.3 95.3 SK + ST NC 2019
Xie et al. [47] : CAN 97 94.2 80.5 96.9 83.4 98.4 97.8 - 91 90.5 SK ACM 2019
*Liao et al.[48] : CA-FCN ^99.8 98.9 92 98.8 82.1 - - - - 91.4 SK + ST+ ExPr AAAI 2019
*Li et al. [49] : SAR 99.4 98.2 95 98.5 91.2 - - - - 94 SK + ST + ExPr AAAI 2019
Zhan el at. [55]: ESIR 99.6 98.8 93.3 97.4 90.2 - - - - 91.3 SK + ST CVPR 2019
Zhang et al. [56]: SSDAN - - 83.8 - 84.5 - - - 92.1 91.8 SK CVPR 2019
*Yang et al. [62]: ScRN 99.5 98.8 94.4 97.2 88.9 99 98.3 - 95 93.9 SK + ST(char-level + word-level) ICCV 2019
Wang et al. [64]: GCAM - - 93.9 - 91.3 - - - 95.3 95.7 SK + ST ICME 2019
Jeonghun et al. [65] - - 87.9 - 87.5 - - - 94.4 92.3 SK + ST ICCV 2019
Huang et al. [67]:EPAN 98.9 97.8 94 96.6 88.9 98.7 98 - 95 94.5 SK + ST NC 2019
Gao et al. [68] 99.1 97.9 81.8 97.4 82.7 98.7 96.7 - 89.2 88 SK NC 2019
*Qi et al. [69] : CCL 99.6 99.1 91.1 98 85.9 99.2 ^98.8 - 93.5 92.8 SK + ST(char-level + word-level) ICDAR 2019
*Wang et al. [70] : ReELFA 99.2 98.1 90.9 - 82.7 - - - - - ST(char-level + word-level) ICDAR 2019
*Zhu et al. [71] : HATN - - 88.6 - 82.2 - - - 91.3 91.1 SK(D_A) + Pu ICIP 2019
*Zhan et al. [72] : SF-GAN - - 63 - 69.3 - - - - 61.8 Pr(1 million) CVPR 2019
Liao et al. [79] : SAM 99.4 98.6 93.9 98.6 90.6 98.8 98 - 95.2 95.3 SK + ST TPAMI 2019
*Liao et al. [79] : seg-SAM ^99.8 ^99.3 ^95.3 ^99.1 ^91.8 99 97.9 - 95 95.3 SK + ST (char-level) TPAMI 2019
Wang et al. [80] : DAN - - 94.3 - 89.2 - - - 95 93.9 SK + ST AAAI 2020

2.2.2 Recognition Results on Irregular Dataset

Recognition Results on Irregular Datasets
                          Method                           SVT-P CUTE80 IC15 COCO-TEXT                         Data                         Source Time
50 Full None None None None
Wang et al. [1] : ABBYY 40.5 26.1 - - - - Un ICCV 2011
Wang et al. [1] : SYNTH+PLEX - - - - - - ExPr ICCV 2011
Mishra et al. [2] 45.7 24.7 - - - - ExPu BMVC 2012
Wang et al. [3] 40.2 32.4 - - - - ExPr ICPR 2012
Goel et al. [4] : wDTW - - - - - - Un ICDAR 2013
Bissacco et al. [5] : PhotoOCR - - - - - - ExPr ICCV 2013
Phan et al. [6] 62.3 42.2 - - - - ExPu ICCV 2013
Alsharif et al. [7] : HMM/Maxout - - - - - - ExPu ICLR 2014
Almazan et al [8] : KCSR - - - - - - ExPu TPAMI 2014
Yao et al. [9] : Strokelets - - - - - - ExPu CVPR 2014
R.-Serrano et al.[10] : Label embedding - - - - - - ExPu IJCV 2015
Jaderberg et al. [11] - - - - - - ExPu ECCV 2014
Su and Lu [12] - - - - - - ExPu ACCV 2014
Gordo[13] : Mid-features - - - - - - ExPu CVPR 2015
Jaderberg et al. [14] - - - - - - ExPr IJCV 2015
Jaderberg et al. [15] - - - - - - SK + ExPr ICLR 2015
Shi, Bai, and Yao [16] : CRNN - - - - - - SK TPAMI 2017
Shi et al. [17] : RARE 91.2 77.4 71.8 59.2 - - SK CVPR 2016
Lee and Osindero [18] : R2AM - - - - - - SK CVPR 2016
Liu et al. [19] : STAR-Net 94.3 83.6 73.5 - - - SK + ExPr BMVC 2016
*Liu et al. [78] - - - - - - ExPu (D_A) ICPR 2016
*Mishra et al. [77] - - - - - - ExPu (D_A) CVIU 2016
*Su and Lu [76] - - - - - - SK + ExPu PR 2017
*Yang et al. [20] 93 80.2 75.8 69.3 - - ExPu IJCAI 2017
Yin et al. [21] - - - - - - SK ICCV 2017
Wang et al.[66] : GRCNN - - - - - - SK NIPS 2017
*Cheng et al. [22] : FAN - - - - *85.3 - SK + ST (Pixel_wise) ICCV 2017
Cheng et al. [23] : AON 94 83.7 73 76.8 68.2 - SK + ST (D_A) CVPR 2018
Gao et al. [24] - - - - - - SK arXiv 2017
Liu et al. [25] : Char-Net - - 73.5 - 60 - SK (D_A) AAAI 2018
*Liu et al. [26] : SqueezedText - - - - - - ExPr AAAI 2018
*Zhan et al.[73] - - - - - - Pr(5 million) CVPR 2018
*Bai et al. [27] : EP - - - - 73.9 - SK + ST (Pixel_wise) CVPR 2018
Fang et al.[74] - - - - 71.2 - SK + ST MultiMedia 2018
Liu et al.[75] : EnEsCTC - - - - - - SK NIPS 2018
Liu et al. [28] - - 73.9 62.5 - - SK ECCV 2018
Wang et al.[61] : MAAN - - - - - - SK ICFHR 2018
Gao et al. [29] - - - - - - SK ICIP 2018
Shi et al. [30] : ASTER - - 78.5 79.5 76.1 - SK + ST TPAMI 2018
Chen et al. [60] : ASTER + AEG 94.4 89.5 82 80.9 76.7 - SK + ST NC 2019
Luo et al. [46] : MORAN 94.3 86.7 76.1 77.4 68.8 - SK + ST PR 2019
Luo et al. [61] : MORAN-v2 - - 79.7 81.9 73.9 - SK + ST PR 2019
Chen et al. [60] : MORAN-v2 + AEG 94.7 89.6 82.8 81.3 77.4 - SK + ST NC 2019
Xie et al. [47] : CAN - - - - - - SK ACM 2019
*Liao et al.[48] : CA-FCN - - - 78.1 - - SK + ST+ ExPr AAAI 2019
*Li et al. [49] : SAR ^95.8 ^91.2 ^86.4 ^89.6 78.8 ^66.8 SK + ST + ExPr AAAI 2019
Zhan el at. [55]: ESIR - - 79.6 83.3 76.9 - SK + ST CVPR 2019
Zhang et al. [56]: SSDAN - - - - - - SK CVPR 2019
*Yang et al. [62]: ScRN - - 80.8 87.5 78.7 - SK + ST(char-level + word-level) ICCV 2019
Wang et al. [64]: GCAM - - 85.7 83.3 83.5 - SK + ST ICME 2019
Jeonghun et al. [65] - - 79.2 74 71.8 - SK + ST ICCV 2019
Huang et al. [67]:EPAN 91.2 86.4 79.4 82.6 73.9 - SK + ST NC 2019
Gao et al. [68] - - - - 62.3 40 SK NC 2019
*Qi et al. [69] : CCL - - - - 72.9 - SK + ST(char-level + word-level) ICDAR 2019
*Wang et al. [70] : ReELFA - - - 82.3 68.5 - ST(char-level + word-level) ICDAR 2019
*Zhu et al. [71] : HATN - - 73.5 75.7 70.1 - SK(D_A) + Pu ICIP 2019
*Zhan et al. [72] : SF-GAN - - 48.6 40.6 39 - Pr(1 million) CVPR 2019
Liao et al. [79] : SAM - - 82.2 87.8 77.3 - SK + ST TPAMI 2019
*Liao et al. [79] : seg-SAM - - 83.6 88.5 78.2 - SK + ST (char-level) TPAMI 2019
Wang et al. [80] : DAN - - 80 84.4 74.5 - SK + ST AAAI 2020

2.2.3 Recognition Results on Bilingual Scene Text Dataset

In this section, we only list the top three results of each competition. Please refer to the competition website for more information.

Recognition Results on Bilingual Scene Text Dataset
                                Method                                                 RCTW                         MTWI         CTW                 LSVT                         ArT                 ReCTS         Time Source
Lv et al. : NLPR PAL 0.3201 (end-to-end) - - - - - 2017 RCTW Competition
Jin et al. : SCUT_DLVC 0.2374 (end-to-end) - - - - - 2017 RCTW Competition
Dai et al. : CCFLAB 0.2143 (end-to-end) - - - - - 2017 RCTW Competition
IFLYTEK : nelslip(iflytek&ustc) - 85.8 (AR) - - - - 2018 MTWI Competition
Samsung R&D China, Beijing :
SRC-B-MachineLearningLab
- 85.7(AR) - - - - 2018 MTWI Competition
NetEase : NTAI - 82.6(AR) - - - - 2018 MTWI Competition
Yuan et al.[42] : CTW - - 80.5 (AR) - - - 2018 CTW
Liu et al. [43] : SCUT-CTW1500 - - - - - - 2017 SCUT-CTW1500
Tencent-DPPR Team - - - 66.66 (end-to-end) - - 2019 LSVT Competition
HUST VLRGROUP - - - 63.42 (end-to-end) - - 2019 LSVT Competition
PMTD - - - 63.36 (end-to-end) - - 2019 LSVT Competition
Clova AI OCR Team, NAVER/LINE Corp - - - - 85.32 (AR) - 2019 ArT Competition
SenseTime Group - - - - 85.2 (AR) - 2019 ArT Competition
USTC-iFLYTEK - - - - 81.23 (AR) - 2019 ArT Competition
SCUT, The University of Adelaide, Northwestern Polytechnical University,
Lenovo, HUAWEI
- - - - - 95.55 (AR) 2019 ReCTS Competition
Tencent
(Data Platform Precision Recommendation)
- - - - - 94.86 (AR) 2019 ReCTS Competition
Huazhong University of Science and Technology - - - - - 94.83 (AR) 2019 ReCTS Competition

3. Survey

[50] [TPAMI-2015] Ye Q, Doermann D. Text detection and recognition in imagery: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(7): 1480-1500. paper

[51] [Frontiers-Comput. Sci-2016] Zhu Y, Yao C, Bai X. Scene text detection and recognition: Recent advances and future trends[J]. Frontiers of Computer Science, 2016, 10(1): 19-36. paper

[52] [arXiv-2018] Long S, He X, Ya C. Scene Text Detection and Recognition: The Deep Learning Era[J]. arXiv preprint arXiv:1811.04256, 2018. paper


4. OCR Service

OCR API Free Code
Tesseract OCR Engine ×
Azure ×
ABBYY ×
OCR Space ×
SODA PDF OCR ×
Free Online OCR ×
Online OCR ×
Super Tools ×
Online Chinese Recognition ×
Calamari OCR ×
Tencent OCR × ×

5. References

[1] [ICCV-2011] K. Wang, B. Babenko, and S. Belongie. End-to-end scene text recognition. In Proceedings of International Conference on Computer Vision (ICCV), pages 1457–1464, 2011. paper

[2] [BMVC-2012] A. Mishra, K. Alahari, and C. Jawahar. Scene text recognition using higher order language priors. In Proceedings of British Machine Vision Conference (BMVC), pages 1–11, 2012. paper dataset

[3] [ICPR-2012] T. Wang, D. J. Wu, A. Coates, and A. Y. Ng. End-to-end text recognition with convolutional neural networks. In Proceedings of International Conference on Pattern Recognition (ICPR), pages 3304–3308, 2012. paper

[4] [ICDAR-2013] V. Goel, A. Mishra, K. Alahari, and C. Jawahar. Whole is greater than sum of parts: Recognizing scene text words. In Proceedings of International Conference on Document Analysis and Recognition (ICDAR), pages 398–402, 2013. paper

[5] [ICCV-2013] A. Bissacco, M. Cummins, Y. Netzer, and H. Neven. Photoocr: Reading text in uncontrolled conditions. In Proceedings of International Conference on Computer Vision (ICCV), pages 785–792, 2013. paper

[6] [ICCV-2013] T. Quy Phan, P. Shivakumara, S. Tian, and C. Lim Tan. Recognizing text with perspective distortion in natural scenes.In Proceedings of International Conference on Computer Vision (ICCV), pages 569–576, 2013. paper

[7] [ICLR-2014] O. Alsharif and J. Pineau, End-to-end text recognition with hybrid HMM maxout models, in: Proceedings of International Conference on Learning Representations (ICLR), 2014. paper

[8] [TPAMI-2014] J. Almaz ́ an, A. Gordo, A. Forn ́ es, and E. Valveny. Word spotting and recognition with embedded attributes. IEEE Trans.Pattern Anal. Mach. Intell ., 36(12):2552–2566, 2014. paper code

[9] [CVPR-2014] C. Yao, X. Bai, B. Shi, and W. Liu. Strokelets: A learned multi-scale representation for scene text recognition. In Proceedings of Computer Vision and Pattern Recognition (CVPR), pages 4042–4049, 2014. paper

[10] [IJCV-2015] J. A. Rodriguez-Serrano, A. Gordo, and F. Perronnin. Label embedding: A frugal baseline for text recognition. International Journal of Computer Vision (IJCV) , 113(3):193–207, 2015. paper

[11] [ECCV-2014] M. Jaderberg, A. Vedaldi, and A. Zisserman. Deep features for text spotting. In Proceedings of European Conference on Computer Vision (ECCV), pages 512–528, 2014. paper code

[12] [ACCV-2014] B. Su and S. Lu. Accurate scene text recognition based on recurrent neural network. In Proceedings of Asian Conference on Computer Vision (ACCV), pages 35–48, 2014. paper

[13] [CVPR-2015] A. Gordo. Supervised mid-level features for word image representation. In Proceedings of Computer Vision and Pattern Recognition (CVPR), pages 2956–2964, 2015. paper

[14] [IJCV-2015] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman. Reading text in the wild with convolutional neural networks. Int. J.Comput. Vision, 2015. paper code

[15] [ICLR-2015] M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman, Deep structured output learning for unconstrained text recognition, in: Proceedings of International Conference on Learning Representations (ICLR), 2015. paper

[16] [TPAMI-2017] B. Shi, X. Bai, and C. Yao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell., 39(11):2298–2304, 2017. paper code-Torch7 code-Pytorch

[17] [CVPR-2016] B. Shi, X. Wang, P. Lyu, C. Yao, and X. Bai. Robust scene text recognition with automatic rectification. In Proceedings of Computer Vision and Pattern Recognition (CVPR), pages 4168–4176, 2016. paper

[18] [CVPR-2016] C.-Y. Lee and S. Osindero. Recursive recurrent nets with attention modeling for OCR in the wild. In Proceedings of Computer Vision and Pattern Recognition (CVPR), pages 2231–2239, 2016. paper

[19] [BMVC-2016] W. Liu, C. Chen, K.-Y. K. Wong, Z. Su, and J. Han. STAR-Net: A spatial attention residue network for scene text recognition. In Proceedings of British Machine Vision Conference (BMVC), page 7, 2016. paper

[20] [IJCAI-2017] X. Yang, D. He, Z. Zhou, D. Kifer, and C. L. Giles. Learning to read irregular text with attention mechanisms. Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), 2017. paper

[21] [ICCV-2017] F. Yin, Y.-C. Wu, X.-Y. Zhang, and C.-L. Liu. Scene text recognition with sliding convolutional character models. In Proceedings of International Conference on Computer Vision (ICCV), 2017. paper code

[22] [ICCV-2017] Z. Cheng, F. Bai, Y. Xu, G. Zheng, S. Pu, and S. Zhou. Focusing attention: Towards accurate text recognition in natural images. In Proceedings of International Conference on Computer Vision (ICCV), pages 5086–5094, 2017. paper

[23] [CVPR-2018] Cheng Z, Xu Y, Bai F, et al. AON: Towards Arbitrarily-Oriented Text Recognition.In Proceedings of Computer Vision and Pattern Recognition (CVPR), pages 5571-5579, 2018. paper code

[24] [arXiv-2017] Gao Y, Chen Y, Wang J, et al. Reading Scene Text with Attention Convolut ional Sequence Modeling[J]. arXiv preprint arXiv:1709.04303, 2017. paper

[25] [AAAI-2018] Liu W, Chen C, Wong K Y K. Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition[C]//AAAI. 2018. paper

[26] [AAAI-2018] Liu Z, Li Y, Ren F, et al. SqueezedText: A Real-Time Scene Text Recognition by Binary Convolutional Encoder-Decoder Network[C]//AAAI. 2018. paper

[27] [CVPR-2018] Bai, F, Cheng, Z, Niu, Y, Pu, S and Zhou,S. Edit probability for scene text recognition, pages 1508-1516, 2018. paper

[28] [ECCV-2018] Liu Y, Wang Z, Jin H, et al. Synthetically Supervised Feature Learning for Scene Text Recognition[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 435-451. paper

[29] [ICIP-2018] Gao Y, Chen Y, Wang J, et al. Dense Chained Attention Network for Scene Text Recognition[C]//2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, 2018: 679-683. paper

[30] [TPAMI-2018] Shi B, Yang M, Wang X, et al. Aster: An attentional scene text recognizer with flexible rectification[J]. IEEE transactions on pattern analysis and machine intelligence, 2018. paper code

[31] [CVPR-2012] A. Mishra, K. Alahari, and C. V. Jawahar. Top-down and bottom-up cues for scene text recognition. In CVPR, 2012. paper

[32] https://github.com/Canjie-Luo/MORAN_v2

[33] [IJDAR-2005] S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, R. Young,K. Ashida, H. Nagai, M. Okamoto, H. Yamamoto, H. Miyao,J. Zhu, W. Ou, C. Wolf, J. Jolion, L. Todoran, M. Worring, and X. Lin. ICDAR 2003 robust reading competitions:entries, results,and future directions. IJDAR, 7(2-3):105–122, 2005. paper

[34] [ICDAR-2013] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Bigorda,S. R. Mestre, J. Mas, D. F. Mota, J. Almaz ́ an, and L. de las Heras. ICDAR 2013 robust reading competition. In ICDAR, 2013. paper

[35] [ICCV-2013] T. Q. Phan, P. Shivakumara, S. Tian, and C. L. Tan. Recognizing text with perspective distortion in natural scenes. In ICCV, 2013. paper

[36] [Expert Syst.Appl-2014] A. Risnumawan, P. Shivakumara, C. S. Chan, and C. L. Tan. A robust arbitrary text detection system for natural scene images. Expert Syst. Appl., 41(18):8027–8048, 2014. paper

[37] [ICDAR-2015] D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. K. Ghosh, A. D.Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny. ICDAR 2015 competition on robust reading. In ICDAR, pages 1156–1160,2015. paper

[38] [arXiv-2016] Veit A, Matera T, Neumann L, et al. Coco-text: Dataset and benchmark for text detection and recognition in natural images[J]. arXiv preprint arXiv:1601.07140, 2016. paper code

[39] [ICDAR-2017] Ch'ng C K, Chan C S. Total-text: A comprehensive dataset for scene text detection and recognition[C]//Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on. IEEE, 2017, 1: 935-942. paper code

[40] [ICDAR-2017] Shi B, Yao C, Liao M, et al. ICDAR2017 competition on reading chinese text in the wild (RCTW-17)[C]//Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on. IEEE, 2017, 1: 1429-1434. paper

[41] [ICPR-2018] He M, Liu Y, Yang Z, et al. ICPR2018 Contest on Robust Reading for Multi-Type Web Images[C]//2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 2018: 7-12. paper

[42] [arXiv-2018] Yuan T L, Zhu Z, Xu K, et al. Chinese Text in the Wild[J]. arXiv preprint arXiv:1803.00085, 2018. paper code

[43] [arXiv-2017] Yuliang L, Lianwen J, Shuaitao Z, et al. Detecting curve text in the wild: New dataset and new solution[J]. arXiv preprint arXiv:1712.02170, 2017. paper code

[44] [ECCV-2018] Yao C, Wu W. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes.//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 71-88. paper code

[45] [NIPS-WORKSHOP-2011] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco,Bo Wu, and Andrew YNg. Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning, volume 2011, page 5, 2011. paper

[46] [PR-2019] C. Luo, L. Jin, and Z. Sun, “MORAN: A multi-object rectified attention network for scene text recognition,” Pattern Recognition, vol. 90, pp. 109–118, 2019. paper code

[47] [ACM-2019] Xie H, Fang S, Zha Z J, et al, “Convolutional Attention Networks for Scene Text Recognition,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 15, pp. 3 2019. paper

[48] [AAAI-2019] Liao M, Zhang J, Wan Z, et al, “Scene text recognition from two-dimensional perspective,” //AAAI. 2019. paper

[49] [AAAI-2019] Li H, Wang P, Shen C, et al, “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition,” //AAAI. 2019. paper code

[50] [TPAMI-2015] Ye Q, Doermann D. Text detection and recognition in imagery: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(7): 1480-1500. paper

[51] [Frontiers-Comput. Sci-2016] Zhu Y, Yao C, Bai X. Scene text detection and recognition: Recent advances and future trends[J]. Frontiers of Computer Science, 2016, 10(1): 19-36. paper

[52] [arXiv-2018] Long S, He X, Ya C. Scene Text Detection and Recognition: The Deep Learning Era[J]. arXiv preprint arXiv:1811.04256, 2018. paper

[53] [NIPS-WORKSHOP-2014] M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman, Synthetic data and artificial neural networks for natural scene text recognition, in: Proceedings of Advances in Neural Information Processing Deep Learn. Workshop (NIPS-W).2014. paper code

[54] [CVPR-2016] A. Gupta, A. Vedaldi, A. Zisserman, Synthetic data for text localisation in natural images, in: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2315–2324. paper code

[55] [CVPR-2019] Zhan F, Lu S. Esir: End-to-end scene text recognition via iterative image rectification, in: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2059-2068. paper

[56] [CVPR-2019] Zhang Y, Nie S, Liu W, et al. Sequence-To-Sequence Domain Adaptation Network for Robust Text Image Recognition, in: Proceedings of Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2740-2749. paper code

[57] ICDAR2019 Robust Reading Challenge on Large-scale Street View Text with Partial Labeling. Link

[58] ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text. Link

[59] ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard. Link

[60] [arXiv-2019] X. Chen, T. Wang, Y. Zhu, L. Jin, and C. Luo. Adaptive Embedding Gate for Attention-Based Scene Text Recognition.[J] arXiv preprint arXiv:1908.09475, 2019. paper


Newly added references

[61] [ICFHR-2018] Wang C, Yin F, Liu C L. Memory-Augmented Attention Model for Scene Text Recognition[C] //2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, 2018: 62-67. paper

[62] [ICCV-2019] Yang M K, Guan Y, Liao M, et al. Symmetry-constrained Rectification Network for Scene Text Recognition[J]. arXiv preprint arXiv:1908.01957, 2019. paper

[63] [ICCV-2019] Sun Y, Liu J, Liu W, et al. Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning[J]. arXiv preprint arXiv:1909.07808, 2019. paper

[64] [ICME-2019] Wang S, Wang Y, Qin X, et al. Scene Text Recognition via Gated Cascade Attention[C]//2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2019: 1018-1023. paper

[65] [ICCV-2019] Baek J, Kim G, Lee J, et al. What is wrong with scene text recognition model comparisons? dataset and model analysis[J]. arXiv preprint arXiv:1904.01906, 2019. paper code

[66] [Nips-2017] Wang J, Hu X. Gated recurrent convolution neural network for ocr[C]//Advances in Neural Information Processing Systems. 2017: 335-344. paper code

[67] [NC-2019] Huang, Yunlong, et al. "EPAN: Effective parts attention network for scene text recognition." Neurocomputing (2019). paper

[68] [NC-2019] Gao, Yunze, et al. "Reading scene text with fully convolutional sequence modeling." Neurocomputing 339 (2019): 161-170. paper

[69] [ICDAR-2019] Qi, Xianbiao, et al. "A Novel Joint Character Categorization and Localization Approach for Character-Level Scene Text Recognition." 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW). Vol. 5. IEEE, 2019. paper

[70] [ICDAR-2019] Wang, Qingqing, et al. "ReELFA: A Scene Text Recognizer with Encoded Location and Focused Attention." 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW). Vol. 5. IEEE, 2019. paper

[71] [ICIP-2019] Zhu, Yiwei, et al. "Text Recognition in Images Based on Transformer with Hierarchical Attention." 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 2019. paper

[72] [CVPR-2019] Zhan, Fangneng, Hongyuan Zhu, and Shijian Lu. "Spatial fusion gan for image synthesis." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. paper

[73] [ECCV-2018] Zhan, Fangneng, Shijian Lu, and Chuhui Xue. "Verisimilar image synthesis for accurate detection and recognition of texts in scenes." Proceedings of the European Conference on Computer Vision (ECCV). 2018. paper code

[74] [MultiMedia-2018] Fang, Shancheng, et al. "Attention and language ensemble for scene text recognition with convolutional sequence modeling." 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 2018. paper code

[75] [Nips-2018] Liu, Hu, Sheng Jin, and Changshui Zhang. "Connectionist temporal classification with maximum entropy regularization." Advances in Neural Information Processing Systems. 2018. paper code

[76] [PR-2017] Su, Bolan, and Shijian Lu. "Accurate recognition of words in scenes without character segmentation using recurrent neural network." Pattern Recognition 63 (2017): 397-405. paper

[77] [CVIU-2016] Mishra, Anand, Karteek Alahari, and C. V. Jawahar. "Enhancing energy minimization framework for scene text recognition with top-down cues." Computer Vision and Image Understanding 145 (2016): 30-42. paper

[78] [ICPR-2016] Liu, Xinhao, et al. "Scene text recognition with CNN classifier and WFST-based word labeling." 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 2016. paper

[79] [TPAMI-2019] Liao M, Lyu P, He M, et al. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes[J]. IEEE transactions on pattern analysis and machine intelligence, 2019. paper code

[80] [AAAI-2020] T. Wang, Y. Zhu, L. Jin, C. Luo and X. Chen. Decoupled Attention Network for Text Recognition[C]//AAAI. 2020. paper code


6.Help

If you find any problems in our resources, or any good papers/codes we have missed, please inform us at xxuechen@foxmail.com. Thank you for your contribution.


7.Copyright

Copyright © 2019 SCUT-DLVC. All Rights Reserved.

Sample

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •