-
-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
版面分析/版式分析入门 #86
Comments
University of Kaiserslautern http://www.iupr.com/ Prof. Breuel started working at Google in 2014, but still supervising several students in the department. Publications of the research group from 2004-2014 can be found on the Publications Page. From the Summer semester 2014 on Vertr.-Prof. Dr. Marcus Eichenberger-Liwicki is heading the group now as a substitute. Adnan Ul-Hasan |
http://coen.boisestate.edu/EBarneySmith/sp_lab/past_projects/document-imaging-defect-analysis/ Model the nonlinear systems of printing, scanning, photocopying and FAXing, and multiple combinations of these, that produce degraded images, and develop methods to calibrate these models. From a calibrated model one can predict how a document will look after being subjected to these processes. This can be used to develop products that degrade text images less. |
APPLICATIONS Reading books and documents for visually impaired Machine printed documents – such as memos, letter technical reports, books. Low accuracy rates are most common in documents with image degradations caused by printing, scanning, photocopying and/or FAXing documents. These four operations all share the processes of spatial and intensity quantization. These are the primary sources that change the appearance of bilevel images such as characters and line drawings. Camera-based acquisition (such as with a cell phone) adds to the degradation by introducing out-of-focus degradations, and perspective distortions. To date the most common method of overcoming these degradations is to provide the classifier with enough variety of samples that the classifier can recognize the degraded characters. However, by understanding the degradation and being able to estimate the degradation characteristics for each document, a more effective method of preprocessing or recognizing the characters can be developed. |
http://cvit.iiit.ac.in/SSDA/program.html GROUNDTRUTH GENERATION AND DOCUMENT IMAGE DEGRADATION .pdf Monday, January 23 Speaker: Prof. Pushpak Bhattacharya Abstract: We present in this talk the use of eye tracking for Natural Language Processing, which we call Cognitive Natural Language Processing. NLP is machine learning dependent these days, and clues from eye tracking provide valuable features in ML for NLP. We study Machine Translation, Sentiment Analysis, Readability, Sarcasm and such problems to show that cognition based features augment the efficacy of ML based NLP manifolds. An additional attractiveness of cognitive NLP is possible rationalization of compensation for annotation effort on text. The presentation is derived from multiple publications in ACL, EMNLP, NAACL etc. based on work done by PhD and Masters Students. Bio: Prof. Pushpak Bhattacharyya is the current President of ACL (2016-17). He is the Director of IIT Patna and Vijay and SitaVashee Chair Professor in IIT Bombay, Computer Science and Engineering Department. He was educated in IIT Kharagpur (B.Tech), IIT Kanpur (M.Tech) and IIT Bombay (PhD). He has been visiting scholar and faculty in MIT, Stanford, UT Houston and University Joseph Fouriere (France). Prof. Bhattacharyya's research areas are Natural Language Processing, Machine Learning and AI. He has guided more than 250 students (PhD, masters and Bachelors), has published more than 250 research papers and led government and industry projects of international and national importance. A significant contribution of his is Multilingual Lexical Knowledge Bases and Projection. Author of the text book "Machine Translation", Prof. Bhattacharyya is loved by his students for his inspiring teaching and mentorship. He is a Fellow of National Academy of Engineering and recipient of Patwardhan Award of IIT Bombay and VNMM award of IIT Roorkey- both for technology development, and faculty grants of IBM, Microsoft, Yahoo and United Nations. Developing Multilingual OCR and Handwriting Recognition at Google Monday, January 23 Speaker: Dr. Ashok Popat Lecture Slides Abstract: In this talk I will I reflect on our team's experiences in developing a multilingual OCR and handwriting recognition systems at Google: enabling factors, effective practices, and challenges. I'll tell you what I think I've learned along the way, drawing on some experiences with other projects inside and outside Google. Bio: Dr. Ashok C. Popat received the SB and SM degrees from the Massachusetts Institute of Technology in Electrical Engineering in 1986 and 1990, and the PhD from the MIT Media Lab in 1997. He is a Staff Research Scientist and manager at Google in Mountain View, California. Prior to joining Google in 2005 he worked at Xerox PARC for 8 years, as a researcher and later as a research area manager. Between 2002 and 2005 he was also a consulting assistant professor of Electrical Engineering at Stanford, where he taught a course "Electronic documents: paper to digital." He has also worked at Motorola, Hewlett Packard, PictureTel, and the EPFL in Switzerland. His areas of interest include signal processing, data compression, machine translation, and pattern recognition.Personal: skiing, sailing, hiking, traveling, learning languages. Word Spotting: From Bag-of-Features to Deep Learning Tuesday, January 24 Speaker: Prof. Gernot Fink Abstract: Research in building automatic reading systems has made considerable progress since its first inception in the 1960's. Today, quite mature techniques are available for the automatic recognition of machine-printed text. However, the automatic reading of handwriting is a considerably more challanging task, especially when it comes to historical manuscripts. When current methods for handwriting recognition reach their limits, approaches for so-called word spotting come into play. These can be considered as specialized versions of image retrieval techniques. The most successful methods rely on machine learning methods in order to derive powerful models for representing queries for handwriting retrieval. This lecture will first give a brief introduction to the problem of word spotting and the methodological developments in the field. In the first part of the lecture, classical approaches for learning word spotting models will be described that build on on Bag-of-Features (BoF) representations. These have been developed in the field of computer vision for learning characteristic representations of image content in an unsupervised manner. It will be shown how word spotting models can be built applying the BoF principle. It will also be described, how basic BoF models can be extended by learning common sub-space representations between different modalities. In the second part of the lecture, advanced models for word spotting will be presented that apply techniques of deep learning and, currently, define the state-of-the-art in the field. After a discussion of pros and cons of the classical approaches, first foundations of neural networks in general and deep architectures in particular will be laid. Combining the idea of common sub-space representations and the application of a unified framework that can be learned in an end-to-end fashion, unprecedented performance on a number of challenging word spotting tasks can be achieved, as has been demonstrated by the PHOCNet. Bio: Prof. Gernot A. Fink received his diploma in computer science from the University of Erlangen-Nuremberg, Germany, in 1991. From 1991 to 2005, he was with the Applied Computer Science Group at Bielefeld University, Germany, where he received his Ph.D. degree (Dr.- Ing.) in 1995 and his venialegendi (Habilitation) in 2002. Since 2005, he has been a professor at the Technical University of Dortmund, Germany, where he heads the Pattern Recognition in Embedded Systems Group. His research interests are machine perception, statistical pattern recognition, and document analysis. He has published more than 150 papers and a textbook on Markov models for pattern recognition. Lab-Session: In the accompanying lab-session, participants of the summer school will be able to experiment themselves with different word spotting models and thus obtain hands-on experience with the techniques presented in the lecture. Lab related material: http://patrec.cs.tu-dortmund.de/cms/en/home/Resources/index.html Lecture Slides: http://patrec.cs.tu-dortmund.de/pubs/papers/SSDA17-Tutorial-Fink.pdf Detection and cleaning of strike-out texts in offline handwritten documents Tuesday, January 24 Speaker: Prof. B. B. Chaudhuri Lecture Slides Abstract: The talk starts with brief study on OCR of offline unconstrained handwritten text, including our BLSTM based work on Bangla script. It is noted that the published papers on the topic consider ideal inputs, i.e. the documents containing no writing error. However, a free-form creative handwritten page may contain misspelled/inappropriate word, that is struck-out by the writer and the adequate word is written next to it. The strike-out may also be longer e.g. consisting several consecutive words, even several lines, after which the writer pens his/her revised statement at the next free space. If a document image with such errors is fed to handwriting OCR, then unpredictable erroneous strings will be generated for the struck-out texts. The present talk mainly deals with such strike-out problem in English and Bangla script. Here a pattern classifier followed by a graph based method is employed to detect struck-out text and locate the strike-out strokes. For detection, we employed hand-crafted as well as Recurrent Neural Net generated features into a SVM classifier to detect the struck-out words. Then, to locate the strike-out stroke, the skeleton of the text component is computed. The skeleton is treated as a graph and a shortest-path algorithm, which satisfies certain properties of strike-out stroke is employed. To locate the zig-zag, wavy, slanted or crossed strike-outs, appropriate modification in the path detection algorithm is made. Multiword/multiline strike-outs are also tackled in a suitable manner. Sometimes the user may be interested in deleting the detected strike-out stroke. When this is done, the cleaned text may be better visible for manual analysis, or subjected to OCR system for transcript generation of a manuscript (of say, a famous person). We have employed Inpainting method for such cleaning. Tested on 250 English and 250 Bangla document pages, fairly good results on the above tasks have been obtained. Bio: Prof. Bidyut B. Chaudhuri received Ph.D. degree from Indian Institute of Technology, Kanpur, in 1980 and worked as a Lever hulme PostDoc fellow at Queen's University, UK, in 1981-1982. He joined Indian Statistical Institute in 1978, where he is currently INAE Distinguished Professor and J.C.Bose Fellow at Computer Vision and Pattern Recognition Unit. His research interests include pattern recognition, image processing, computer vision, NLP, information retrieval, digital document processing and OCR. He pioneered the first Indian language Bharati Braille System for the blind, a successful Bangla speech synthesis system, as well as the first workable OCR for Bangla, Devanagari, Assamese and Oriya scripts. In NLP, a robust Indian language spell-checker, morphological processor, multi-word expression detector and statistical analyser were pioneered by him. Some of his technologies have been transferred to industry for commercialization. He has published about 400 research papers in reputed international journals, conference Proceedings, and edited books. He has authored/co-authored 8technical books and holds four international patents. He is a Fellow of Indian national academies like INSA, NASc and INAE. Among International academies, he is a Fellow of IAPR and TWAS, and a Life Fellow of IEEE. He is serving as an Associate editor of IJPRAI, IJDAR, JIETE and served as guest editor to special issues of several journals. Reading behavior analysis for reading-life logand its fundamental technologies Wednesday, January 25 Speaker: Koichi Kise Lecture Slides Abstract: In our daily life, we are spending hours for reading documents. This is because “reading” is our primal mean of acquiring information. “Reading-life log” is a field of research to extract fruitful information for enriching our life by mutual analysis of reading activity and documents read by readers. We can estimate many things from the results of analysis, e.g., how much we read (wordometer, reading detection), and how well we understand (the level of understanding and proficiency), both by analyzing eye gaze obtained by eye-trackers. Fundamental technologies which support reading-life log are sensing human reading behavior and retrieval of documents inputted as images. In my talk, I introduce the fundamental technologies and their application to implementation of various types of reading-life log. Bio: Prof. Koichi Kise received B.E., M.E. and Ph.D. degrees in communication engineering from Osaka University, Osaka, Japan in 1986, 1988 and 1991, respectively. From 2000 to 2001, he was a visiting professor at German Research Center for Artificial Intelligence (DFKI), Germany. He is now a Professor of the Department of Computer Science and Intelligent Systems, and the director of the Institute of Document Analysis and Knowledge Science (IDAKS), Osaka Prefecture University, Japan. He received awards including the best paper award of IEICE in 2008, the IAPR/ICDAR best paper awards in 2007 and 2013, the IAPR Nakano award in 2010, the ICFHR best paper award in 2010 and the ACPR best paper award in 2011. He works as the chair of the IAPR technical committee 11 (reading systems), a member of the IAPR conferences and meetings committee, and an editor-in-chief of the international journal of document analysis and recognition. His major research activities are in analysis, recognition and retrieval of documents, images and activities. He is a member of IEEE, ACM, IPSJ, IEEJ, ANLP and HIS Demo:I will demonstrate fundamental technologies and implementations of reading-life log using some sensors. Document image retrieval called LLAH (Locally Likely Arrangement Hashing) is a fundamental technology to be demonstrated. I also show several sensing technologies such as eye-tracking and EOG (electrooculography). Students are able to try to use sensors to know more about their functions. In addition, students have an opportunity of implementing simple activity recognition by using an eye-tracker. Document page layout analysis Wednesday, January 25 Speaker: Prof. Bhabatosh Chanda Lecture Slides Abstract: ‘Document page layout analysis’ usually refers to decomposition of page image into textual and various non-textual components, to understand geometrical and logical structure, and thereafter linking them together for efficient presentation and abstraction. With the growing necessity in automatic transformation of complex paper document to its electronic version, geometrical and logical structure analysis remains an active research area for decades. Such analysis helps ‘OCR’ to produce its best possible result. It also helps extracting various logical components such as image and line drawing. In this presentation our objective is to make a quick journey starting from elementary approach suitable for strictly structured layout to more sophisticated methods that can handle complicated designer layout. We also discuss evaluation methodology for layout analysis algorithms and mention various benchmark datasets available for performance evaluation. Bio: Prof. Bhabatosh Chanda received B.E. in Electronics and Telecommunication Engineering and PhD in Electrical Engineering from University of Calcutta in 1979 and 1988 respectively. His research interest includes Image and video Processing, Pattern Recognition, Computer Vision and Mathematical Morphology. He has published more than 100 technical articles in refereed journals and conferences, authored one book and edited five books. He has received ‘Young Scientist Medal’ of Indian National Science Academy in 1989, ‘Computer Engineering Division Medal’ of the Institution of Engineers (India) in 1998, ’Vikram Sarabhai Research Award in 2002, and IETE-Ram Lal Wadhwa Gold medal in 2007. He is also recipient of UN fellowship, UNESCO-INRIA fellowship and Diamond Jubilee fellowship of National Academy of Science, India. He is fellow of Institute of Electronics and Telecommunication Engineers (FIETE), of National Academy of Science, India (FNASc.), of Indian National Academy of Engineering (FNAE) and of International Association of Pattern Recognition (FIAPR). He is a Professor in Indian Statistical Institute, Kolkata, India. Historical Document Analysis Friday, January 27 Speaker: Prof. Marcus Liwicki Lecture Slides Abstract: I will give an overview over the challenges of historical documents and the current research highlights for various document image analysis (DIA) problems. Historical Documents pose very tough challenges to automatic DIA algorithms. Typically, exotic scripts and layouts have been used and the documents degraded over time. I will give an overview over typical processing algorithms and furthermore report on recent trends towards interoperability. In the first part of the presentation, I will describe methods for line segmentation, binarization, and layout analysis. Especially very recent deep learning trends led to remarkable improvements of the processing systems when compared to conventional methods. On top of that, if enough data is available, those methods are also much easier to apply since they perform end-to-end recognition and make several processing steps obsolete. On the basis of examples, I will show that the separation of the analysis into several independent steps even leads to problems and worse performance of the later methods. The reasons for that are twofold: First, it is not clear how to define the ground truth (i.e., the expected perfect outcome) of some individual steps; second, early recognition errors can lead to much more difficult processing for the later stages. The only remaining problem for deep learning is the need for large amount of training data. I will demonstrate methods to automatically extend existing ground truthed datasets for more training data generation. In the second part, I will sketch recent approaches of the Document, Image, and Voice Analysis (DIVA) group towards enabling libraries and researchers in the humanities for easier use of state-of-the-art DIA methods. Common structures, adaptable methods, public datasets, and Open Services (e.g., the DIVAServices which will be more deeply presented by Marcel Würsch in the next presentation) lead to easier re-use, access, and integration into tools used at the libraries or archives or in research environments Lab: It will involve hands-on practices on DIVAServices, web services for Document Image Analysis. The participants will be able to try out state-of-the-art Document Image Processing methods and learn how to easily integrate their own methods into DIVAServices. Bio: Marcus Liwicki received his M.S. degree in Computer Science from the Free University of Berlin, Germany, in 2004, his PhD degree from the University of Bern, Switzerland, in 2007, and his habilitation degree at the Technical University of Kaiserslautern, Germany, in 2011. Currently he is an apl.-professor in the University of Kaiserslautern and a senior assistant in the University of Fribourg. His research interests include machine learning, pattern recognition, artificial intelligence, human computer interaction, digital humanities, knowledge management, ubiquitous intuitive input devices, document analysis, and graph matching. From October 2009 to March 2010 he visited Kyushu University (Fukuoka, Japan) as a research fellow (visiting professor), supported by the Japanese Society for the Promotion of Science. In 2015, at the young age of 32, he received the ICDAR young investigator award, a bi-annual award acknowledging outstanding achievements of in pattern recognition for researchers up to the age of 40. Marcus Liwicki gave a number of invited talks at several international workshops, universities, and companies. He also gave several tutorials on IAPR conferences. Marcus Liwicki is a co-author of the book "Recognition of Whiteboard Notes – Online, Offline, and Combination", published by World Scientific in October 2008. He has more than 150 publications, including more than 20 journal papers, and excluding more than 20 publications which currently undergo the review stage or will soon be published. Analyzing text documents - separating the wheat from chaff Friday, January 27 Speaker: Dr. Lipika Dey Lecture Slides Abstract: The rapid rise of digital text document collections is exciting for decision makers across different sections - be it from academia or industry. While the academia is interested to gather insights about scientific and technical progress in different areas of research, industry is interested to know more about its potential consumers and competitors. All this and much more is available today almost free of cost on the open web. However, text data can be extremely noisy and deceptive. Noise creeps in from various sources - some intended and some unintended. While some of this noise can be treated at pre-processing levels, some need to be dealt with during the analysis process itself. In this talk we shall take a look at the various pitfalls that need to be carefully avoided or taken care of in order to come up with meaningful insights from text documents. Demo: Texcape Given the volumes and velocity at which research publications are growing, keeping up with the advances in various fields is a challenging task. However decision makers including academics, program managers, venture capital investors, industry leaders and funding agencies not only need to be abreast of latest developments but also be able to assess the future impact of research on industry, academics or society. Automated extraction of key information and insights from these text documents is necessary to help in this endeavor. Texcape is a technology landscaping tool built on top of scientific publications and patents, that attempts to help in this task. This demo will show how Texcape performs automated topical analysis from large volumes of text and analyzes evolutions, commercialization’s and trends to help in collaborative decision making. Bio: Dr. LipikaDey is a Senior Consultant and Principal Scientist at Tata Consultancy Services, India with over 20 years of experience in Academic and Industrial R&D. She heads the Web Intelligence and Text Mining research group at Innovation Labs. Lipika's research interests are in the areas of content analytics from social media and News, social network analytics, predictive modeling, sentiment analysis and opinion mining, and semantic search of enterprise content. Her focus is on seamless integration of social intelligence and business intelligence. She is keenly interested in developing analytical frameworks for integrated analysis of unstructured and structured data. Lipika publishes her work in various International Conferences and Journals. She has also presented her earlier works at Sentiment Analysis Symposium and Text Mining Summit. Lipika was awarded with the Distinguished Scientist award by TCS in 2012. Prior to joining the industry in 2007, Lipika was a faculty member in the Department of Mathematics at Indian Institute of Technology, Delhi, from 1995 to 2006. She has several publications in International journals and refereed conference proceedings. Lipika has a Ph.D. in Computer Science and Engineering, M.Tech in Computer Science and Data Processing and 5 Year Integrated M.Sc in Mathematics from IIT Kharagpur. Language Model: Theory and Applications Saturday, January 28 Speaker: Dr. Utkarsh Porwal Lab related material Abstract: A language model helps us compute the probability of sequence of terms such as words given a /corpus/. It is widely used in applications like spell correction, POS tagging, information retrieval, speech recognition and handwriting recognition. In this talk, we will cover theory of language models from n-gram based models to recent RNN based models, parameter estimation, evaluation etc. We will also cover a wide range of applications where language modeling is used. Lab: In this lab session, participants will learn to train and evaluate different types of language models such as n-gram based model and RNN based model and will be able to compare them based on performance, data efficiency, storage etc. Bio: Dr. UtkarshPorwal is an applied researcher at eBay. He works on automatic query rewrites, entity recognition and structured data. Before joining search science, he was part of the trust science group where he was working on detecting abusive buyers and feature selection. His research interest lies broadly in the areas of information retrieval, pattern recognition and applied machine learning. He received his Ph. D. from State University of New York at Buffalo in 2014. Extreme Classification for Tagging on Wikipedia, Query Ranking on Bing and Product Recommendation on Amazon Saturday, January 28 Speaker: Prof. Manik Verma Abstract: The objective in extreme classification is to develop classifiers that can automatically annotate each data point with the most relevant subset of labels from an extremely large label set. In this talk, we will develop a new paradigm for tagging, ranking and recommendation based on extreme classification. In particular, we design extreme multi-label loss functions which are tailored for tagging, ranking and recommendation and show that these loss functions are more suitable for performance evaluation as compared to traditional metrics. Furthermore, we develop novel algorithms for optimizing the proposed loss functions and demonstrate that these can lead to a significant improvement over the state-of-the-art on various real world applications ranging from tagging on Wikipedia to sponsored search advertising on Bing to product recommendation on Amazon. More details including publications, videos, datasets and source code can be found on http://www.manikvarma.org/. Brief Bio: Prof. Manik Varma is a researcher at Microsoft Research India and an adjunct professor of computer science at IIT Delhi. His research interests span machine learning, computational advertising and computer vision. He has served as an area chair for CVPR, ICCV, ICML, ICVGIP, IJCAI and NIPS. Classifiers that he has developed are running live on millions of devices around the world protecting them from viruses and malware. Manik has been awarded the Microsoft Gold Star award, the Microsoft Achievement award, won the PASCAL VOC Object Detection Challenge and stood first in chicken chess tournaments and Pepsi drinking competitions. He is a failed physicist (BSc St. Stephen's College, David Raja Ram Prize), theoretician (BA Oxford, Rhodes Scholar), engineer (DPhil Oxford, University Scholar) and mathematician (MSRI Berkeley, Post-doctoral Fellow) System Demo: Mr. Tushar Patnayak CDAC, Noida Abstract: Indian Language OCR: e-Aksharayan, an Indian language OCR facilitates converting hardcopy printed documents into electronic forms using a new approachleading to, for the first time, a technology for recognizing characters and words in scanned images ofdocuments in a large set of Indian scripts/languages.Optical Character Recognition (OCR) for Indian scripts opens up the possibility of delivering traditionalIndian language content, which today are confined to printed books, to readers across the world throughelectronic means. OCR makes the content searchable as well as readable via a variety of devices like mobile phones, tablets, e-readers. Further, the same content can now be transformed electronically to meetneeds of the visually challenged through generation of Braille and/or audio books among otherpossibilities. Use of OCR on printed Indian language circulars and notifications can make embedded information widely accessible facilitating effective e-governance. The circulars can be now very easilyedited, if required, for adaptation to different needs.The OCR process involves first converting printed matter into electronic image using scanner or a digitalcamera, followed by electronic image processing to generate Unicode text. This can be opened in any word-processing application for editing. e-Aksharayan has user-friendly design and allows intuitive editing of the scanned image and the generated text. Features of e-Aksharayan are: It enables users to harness the power of computersto access printed documents in Indian language/scripts. Swapnil Belhe, CDAC Pune Abstract: With the recent advancement in Indian languages Optical Character Recognition (OCR) and Online Handwritten Character Recognition(OHWR) engines, there has been wide variety of applications which are developed around these engines to cater to various needs. The engines make use of the latest developments in document and handwriting analysis making them robust to font and writing style variations. Most of the OCR and OHWR engines make use of huge collection of data during training making them robust. The demonstrations will focus on desktop and mobile based OCR’s for Indian languages and their complexities. At the same time, the demonstrations of OHWR’s will show the effectiveness of handwritten recognition for handheld devices. The effective way of multi-modal inputting for form processing in Indian languages using handwritten recognition will be showcased. Various learning games developed using the OCR’s & OHWR’s will be demonstrated. These demos will also provide the glimpse of future challenges. |
In order to create general page segmentation method without using any prior knowledge of the layout structure of the documents, we consider the page segmentation problem as |
[10] J. Pastor-Pellicer, M. Z. Afzal, M. Liwicki, and M. J. Castro-Bleda, Fast CNN-based document layout analysis
|
Table Detection Using Deep Learning https://www.researchgate.net/publication/320243569_Table_Detection_Using_Deep_Learning
Table Detection Using Deep Learning (PDF Download Available). Available from: https://www.researchgate.net/publication/320243569_Table_Detection_Using_Deep_Learning [accessed Apr 26 2018]. |
A Two-Stage Method for Text Line Detection in Historical Documents
|
Fully Convolutional Neural Networks for Page Segmentation of Historical Document Images
|
文档分析技术研究现状与趋势
http://www.nlpr.ia.ac.cn/liucl/DA%E7%A0%94%E7%A9%B6%E7%8E%B0%E7%8A%B6%E4%B8%8E%E8%B6%8B%E5%8A%BF.pdf |
https://pdfs.semanticscholar.org/presentation/0907/0b09d860a639577a9b5219d065bc47fa28de.pdf |
Open Evaluation Tool for Layout Analysis of Document Images 评估工具代码 This paper presents an open tool for standardizing the evaluation process of the layout analysis task of document images at pixel level. We introduce a new evaluation tool that is both available as a standalone Java application and as a RESTful web service. This evaluation tool is free and open-source in order to be a common tool that anyone can use and contribute to. It aims at providing as many metrics as possible to investigate layout analysis predictions, and also provide an easy way of visualizing the results. This tool evaluates document segmentation at pixel level, and support multi-labeled pixel ground truth. Finally, this tool has been successfully used for the ICDAR2017 competition on Layout Analysis for Challenging Medieval Manuscripts. |
Text and non-text separation in offline document images: a survey https://link.springer.com/article/10.1007%2Fs10032-018-0296-z Separation of text and non-text is an essential processing step for any document analysis system. Therefore, it is important to have a clear understanding of the state-of-the-art of text/non-text separation in order to facilitate the development of efficient document processing systems. This paper first summarizes the technical challenges of performing text/non-text separation. It then categorizes offline document images into different classes according to the nature of the challenges one faces, in an attempt to provide insight into various techniques presented in the literature. The pros and cons of various techniques are explained wherever possible. Along with the evaluation protocols, benchmark databases, this paper also presents a performance comparison of different methods. Finally, this article highlights the future research challenges and directions in this domain. |
Learning to detect tables in document images using line and text information |
http://ccis2k.org/iajit/PDF/July%202018,%20No.%204/10223.pdf Table extraction is usually complemented with the table annotation to find the hidden semantics in a particular |
一种新型版式文档格式的架构设计与关键技术研究 下载PDF阅读器 文档作为信息的载体,在人类历史和社会进步中发挥着重要作用。近年来随着电子技术的发展,电子文档日益普及。同时网络技术的迅速发展,手持移动设备的成本愈加低廉、性能愈加强大,使得电子文档的网络出版迈进了一个新的发展阶段。但是随着相关工作的进行,多样化的阅读终端也给网络出版带来了新的挑战。因此我们在网络出版的背景下对相关问题展开了一系列研究。 作 者: |
https://app.dimensions.ai/details/publication/pub.1034782548 DeepDeSRT_ Deep Learning for Detection and Structure Recognition of Tables in Document Images.pdf Understanding Tables on the Web A Table Detection Method for PDF Documents Based on Convolutional Neural Networks Generating Schema Labels through Dataset Content Analysis Rule-based spreadsheet data transformation from arbitrary to relational tables Effective and efficient Semantic Table Interpretation using TableMiner+ .pdf |
Dataset, ground-truth and performance metrics for table detection evaluation .pdf Ground-Truth and Performance Evaluation for Page Layout Analysis of Born-Digital Documents |
文档图像分类 解决思路 image-based( “visual similarity ) VS content (OCR) based (domain specific models are based on text ) 基于图像的又分三大类: Classification of Document Page Images.pdf Phd_Thesis_Document Image Classification Combining Textual and Visual Features [3] A. Dengel, R. Bleisinger, F. Fein, R. Hoch, F. Hones, and M. Malburg. Officemaid - a |
Classification of Document Page Images
We propose a method for using layout structures of documents (i.e., visual appearance) to facilitate |
Real-Time Document Image Classification using Deep CNN and Extreme Learning Machines
|
文档图像分类数据集 http://www.cs.cmu.edu/%7Eaharley/rvl-cdip/
The label files list the images and their categories in the following format: where the categories are numbered 0 to 15, in the following order: |
Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval
|
2018
[7] L. Kang, J. Kumar, P. Ye, Y. Li, and D. Doermann, “Convolutional neural networks for document image classification,” 22nd International Conference on Pattern Recognition (ICPR), pp. 3168–3172, 2014. |
Page Stream Segmentation with Convolutional Neural Nets Combining Textual and Visual Features
|
2017年4月11日
|
N. Chen and D. Blostein, “A survey of document image classification: |
0颗星 没有啥营养 |
https://www.jianshu.com/p/710799b985ef 基于深度学习的图像目标检测 |
Open Evaluation Tool for Layout Analysis of Document Images
3https://www.digitisation.eu/tools-resources/demonstrator-platform/ LayoutEvaluation_1.8.129.zip |
代码:https://github.com/dhlab-epfl/dhSegment dhSegment: A generic deep-learning approach for document segmentation historical document |
A probabilistic framework for handwritten text line segmentation 论文:https://arxiv.org/abs/1805.02536
ICDAR 2009 and 2013 handwriting segmentation On the other hand, we evaluate on the documents of the George Washington Last, we test our method in a collection of administrative documents |
Script Identification in Natural Scene Image and Video Frame using Attention based Convolutional-LSTM Network 论文:https://arxiv.org/abs/1801.00470 Script identification plays a significant role in analysing documents and videos. In this paper, we focus on the problem of script identification in scene text images and video scripts. Because of low image quality, complex background and similar layout of characters shared by some scripts like Greek, Latin, etc., text recognition in those cases become challenging. Most of the recent approaches generally use a patch-based CNN network with summation of obtained features, or only a CNN-LSTM network to get the identification result. Some use a discriminative CNN to jointly optimize mid-level representations and deep features. In this paper, we propose a novel method that involves extraction of local and global features using CNN-LSTM framework and weighting them dynamically for script identification. First, we convert the images into patches and feed them into a CNN-LSTM framework. Attention-based patch weights are calculated applying softmax layer after LSTM. Then we do patch-wise multiplication of these weights with corresponding CNN to yield local features. Global features are also extracted from last cell state of LSTM. We employ a fusion technique which dynamically weights the local and global features for an individual patch. Experiments have been done in two public script identification datasets, SIW-13 and CVSI2015. The proposed framework achieves superior results in comparison to conventional methods. |
http://www.music.mcgill.ca/~ich/classes/mumt611_07/Evaluation/liang97performance.pdf |
http://www.mdpi.com/2313-433X/3/4/62/htm https://github.com/DocCreator/DocCreator |
A free cloud service for OCR 2016 |
http://www.europeana-newspapers.eu/public-materials/deliverables/ https://github.com/KBNLresearch |
http://ceng.anadolu.edu.tr/CV/EDLines/demo.aspx lsd hough |
Improving Document Clustering by Eliminating Unnatural Language |
https://github.com/RaymondMcGuire/BOOK-CONTENT-SEGMENTATION-AND-DEWARPING Using FCN to segment the book's content and background, then dewarping the pages, |
Scribble Based Interactive Page Layout Segmentation using Gabor Filter |
2017 |
Page Segmentation Performance Using Horizontal Image Strips Instead of Full Page Images |
LouZhu,HouXuMeiZaiGuanZhu OCR layout analysis ya |
Ni Pa Bu Shi Ge Xiu Er yo~ |
Bu Fu Battle |
@JohnMingbo |
Hao de, dhsegment is ok la, good luck ya louzhu |
@JohnMingbo |
https://github.com/tmbdev/teaching-dca
Thomas_Breuel 开授的课程
1.转换成pdf
2.pdf转换成html
3.翻译
The text was updated successfully, but these errors were encountered: