Skip to content

Using Templating API

juraskrlec edited this page Aug 7, 2017 · 1 revision

Using Templating API

This section discusses the setting up of BlinkOCR recognizer for scanning templated documents. Please check demo app for examples.

Templated document is any document which is defined by its template. Template contains the information about how the document should be detected, i.e. found on the camera scene and information about which part of document contains which useful information.

Before performing OCR of the document, BlinkID first needs to find its location on camera scene. In order to perform detection, you need to define PPDetectorSettings which will be used to instantiate detector which perform document detection. You can set detector with detectorSettings property. Check our guide for initializing PPDetectorSettings.

If you do not set detector settings, BlinkOCR recognizer will work in normal mode, recognizing characters on input images.

Defining how document should be recognized

After document has been detected, it will be recognized. This is done in following way:

  • the detector produces a PPDetectorResult which contains one or more detection locations.
  • based on array of PPDecodingInfo's that were defined as part of concrete PPDetectorSettings, for each element of array following is performed:
    • location defined in PPDecodingInfo is dewarped to image of height defined within PPDecodingInfo. For example take this image: PPDecodingInfo location Location of PPDecodingInfo containing surname would be
     CGRectMake(292.0/1024.0, 145.0/645.0, 355.0/1024.0, 65.0/645.0);
     //OR
     CGRectMake(0.28, 0.22, 0.33, 0.1);
    • a parser group that has same name/uniqueId as current PPDecodingInfo is searched and if it is found, optimal OCR settings for all parsers from that parser group is calculated
    • using optimal OCR settings OCR is performaed on the dewarped image
    • finally, OCR result is parsed with each parser from that parser group
    • if parser group with the same name as current DecodingInfo cannot be found, no OCR will be performed, however image will be reported via didOutputMetadata: if receiving of DEWARPED images has been enabled
  • if property documentClassifier hasn't been set recognition is done. If PPDocumentClassifier exists, its method classifyDocumentFromResult: is called to determine which type document has been detected
  • If classifier returned string which is same as one used previously to setup parser decoding infos, then this array of PPDecodingInfos is obtained and step 2. is performed again with obtained array of PPDecodingInfos.

When to use DocumentClassifier?

If you plan scanning several different documents of same size, for example different ID cards, which are all 85x54 mm (credit card) size, then you need to use PPDocumentClassifer to classify the type of document so correct PPDecodingInfo array can be used for obtaining relevant information. An example would be the case where you need to scan both front sides of croatian and german ID cards - the location of first and last names are not same on both documents. Therefore, you first need to classify the document based on some discriminative features.

If you plan supporting only single document type (i.e. national ID of a single country), then you do not need to use PPDocumentClassifier.

How to implement DocumentClassifier?

PPDocumentClassifier is protocol that should be implemented to support classification of documents that cannot be differentiated by detector. Classification result is used to determine which set of decoding infos will be used to extract classification-specific data.The following method has to be implemented:

- (NSString *)classifyDocumentFromResult:(PPTemplatingRecognizerResult *)result;

Based on PPTemplatingRecognizerResult (superclass of PPBlinkOcrRecognizerResult) which contains data extracted from decoding infos inherent to detector, classifies the document. For each document type that you want to support, returned result string has to be equal to the name/uniqueId of the corresponding set of PPDecodingInfo objects which are defined for that document type. Named decoding info sets should be defined using the following method in PPTemplatingRecognizerSettings superclass:

- (void)setDecodingInfoSet:(NSArray<PPDecodingInfo*> *)decodingInfos forClassifierResult:(NSString *)classifierResult;

Additional tips

It can be hard fine-tuning the exact location of each PPDecodingInfo but outputting each image that is sent to our OCR engine can help. To enable this feature this property needs to be set to YES before initializing your PPCoordinator. Sample code below shows how this can be done:

/** 1. Initialize the Scanning settings */
    
    // Initialize the scanner settings object. This initialize settings with all default values.
    PPSettings *settings = [[PPSettings alloc] init];
    settings.metadataSettings.debugMetadata.debugOcrInputFrame = YES;
	
	/** 2. Setup the license key */
	// Add your license key here, like in our sample applications
	
	/** 3. Set up what is being scanned. See detailed guides for specific use cases. */
	// Add your recognizers here, like in our sample applications
	
	/** 4. Initialize the Scanning Coordinator object */
	PPCameraCoordinator *coordinator = [[PPCameraCoordinator alloc] initWithSettings:settings];

The images will then be outputted to scanningViewController:didOutputMetadata: callback in your PPScanningDelegate. Below is sample code demonstrating how to fetch these images:

- (void)scanningViewController:(UIViewController<PPScanningViewController> *)scanningViewController
             didOutputMetadata:(PPMetadata *)metadata {
    if ([metadata isKindOfClass:[PPImageMetadata class]]) {
        PPImageMetadata *imageMetadata = (PPImageMetadata *)metadata;
		// Fetch the image
		UIImage *ocrInputImage = imageMetadata.image;
    }
}

How to obtain recognition results?

Just like when using PPBlinkOcrRecognizer recognizer in segment scan mode, same principles apply here (guide is available here. You use the same approach as discussed in Obtaining results from BlinkOCR recognizer. Just keep in mind to use parser group names that are equal to decoding info names. Templating-sample app is available on GitHub for detailed example.

Clone this wiki locally