The SDK for Jetpac's iOS, Android, Linux, and OS X Deep Belief image recognition framework.
This is a framework implementing the convolutional neural network architecture described by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. The processing code has been highly optimized to run within the memory and processing constraints of modern mobile devices, and can analyze an image in under 300ms on an iPhone 5S. It's also easy to use together with OpenCV.
We're releasing this framework because we're excited by the power of this approach for general image recognition, especially when it can run locally on low-power devices. It gives your phone the ability to see, and I can't wait to see what applications that helps you build.
- iOS
- Android
- Linux
- OS X
- Raspberry Pi 1
- Raspberry Pi 2
- Raspberry Pi 3
- Beaglebone Black
- Javascript
- Building from Source
You'll need the usual tools required for developing iOS applications - XCode 5, an OS X machine and a modern iOS device (it's been tested as far back as the original iPhone 4). Open up the SimpleExample/SimpleExample.xcodeproj, build and run.
You should see some warnings (the example is based on Apple sample code which has some anachronisms in it unfortunately), then once it's running a live camera stream should be visible on your phone. Move it to look closely at your keyboard, and some tags should start appearing in the top left of the screen. These should include things that look like keyboards, including calculators, remote controls, and even typewriters!
You should experiment with other objects like coffee cups, doors, televisions, and even dogs if you have any handy! The results will not be human quality, but the important part is that they're capturing meaningful attributes of the images. Understanding images with no context is extremely hard, and while this approach is a massive step forward compared to the previous state of the art, you'll still need to adapt it to the domain you're working in to get the best results in a real application.
Happily the framework includes the ability to retrain the network for custom objects that you care about. If you have logos you need to pick out, machine parts you need to spot, or just want to be able to distinguish between different kinds of scenes like offices, beaches, mountains or forests, you should look at the LearningExample sample code. It builds a custom layer on top of the basic neural network that responds to images you've trained it on, and allows you to embed the functionality in your own application easily.
There's also this full how-to guide on training and embedding your own custom object recognition code.
To use the library in your own application:
- Add the following frameworks to the Link Binary with Libraries in your XCode project's Build Phases:
DeepBelief.framework
Accelerate.framework
libc++.dylib
- Add
#import <DeepBelief/DeepBelief.h>
to the top of the file you want to use the code in.
You should then be able to use code like this to classify a single image that you've included as a resource in your bundle. The code assumes it's called 'dog.jpg', but you should change it to match the name of your file.
NSString* networkPath = [[NSBundle mainBundle] pathForResource:@"jetpac" ofType:@"ntwk"];
if (networkPath == NULL) {
fprintf(stderr, "Couldn't find the neural network parameters file - did you add it as a resource to your application?\n");
assert(false);
}
network = jpcnn_create_network([networkPath UTF8String]);
assert(network != NULL);
NSString* imagePath = [[NSBundle mainBundle] pathForResource:@"dog" ofType:@"jpg"];
void* inputImage = jpcnn_create_image_buffer_from_file([imagePath UTF8String]);
float* predictions;
int predictionsLength;
char** predictionsLabels;
int predictionsLabelsLength;
jpcnn_classify_image(network, inputImage, 0, 0, &predictions, &predictionsLength, &predictionsLabels, &predictionsLabelsLength);
jpcnn_destroy_image_buffer(inputImage);
for (int index = 0; index < predictionsLength; index += 1) {
const float predictionValue = predictions[index];
char* label = predictionsLabels[index % predictionsLabelsLength];
NSString* predictionLine = [NSString stringWithFormat: @"%s - %0.2f\n", label, predictionValue];
NSLog(@"%@", predictionLine);
}
jpcnn_destroy_network(network);
I've been using Google's ADT toolchain. To get started import the AndroidExample into their custom version of Eclipse, build and run it. Hopefully you should see a similar result to the iPhone app, with live video and tags displayed. You'll need to hold the phone in landscape orientation, look for the tag text and use that as your guide.
The Android implementation uses NEON SIMD instructions, so it may not work on older phones, and will definitely not work on non-ARM devices. As a benchmark for expected performance, classification takes around 650ms on a Samsung Galaxy S5.
Under the hood the Android implementation uses a native C++ library that's linked to Java applications using JNA. That means the process of including the code is a bit more complex than on iOS. If you look at the AndroidExample sample code, you'll see a 'libs' folder. This contains a deepbelief.jar file that has the Java interface to the underlying native code, and then inside the armeabi there's jnidispatch.so which is part of JNA and handles the mechanics of calling native functions, and libjpcnn.so which implements the actual object recognition algorithm. You'll need to replicate this folder structure and copy the files to your own application's source tree.
Once you've done that, you should be able to import the Java interface to the library:
import com.jetpac.deepbelief.DeepBelief.JPCNNLibrary;
This class contains a list of Java functions that correspond to exactly to the C interface functions. The class code is available in the AndroidLibrary folder, and you should be able to rebuild it yourself by running ant, but here are the definitions using JNA types:
Pointer jpcnn_create_network(String filename);
void jpcnn_destroy_network(Pointer networkHandle);
Pointer jpcnn_create_image_buffer_from_file(String filename);
void jpcnn_destroy_image_buffer(Pointer imageHandle);
Pointer jpcnn_create_image_buffer_from_uint8_data(byte[] pixelData, int width, int height, int channels, int rowBytes, int reverseOrder, int doRotate);
void jpcnn_classify_image(Pointer networkHandle, Pointer inputHandle, int doMultiSample, int layerOffset, PointerByReference outPredictionsValues, IntByReference outPredictionsLength, PointerByReference outPredictionsNames, IntByReference outPredictionsNamesLength);
void jpcnn_print_network(Pointer networkHandle);
Pointer jpcnn_create_trainer();
void jpcnn_destroy_trainer(Pointer trainerHandle);
void jpcnn_train(Pointer trainerHandle, float expectedLabel, float[] predictions, int predictionsLength);
Pointer jpcnn_create_predictor_from_trainer(Pointer trainerHandle);
void jpcnn_destroy_predictor(Pointer predictorHandle);
int jpcnn_save_predictor(String filename, Pointer predictorHandle);
Pointer jpcnn_load_predictor(String filename);
float jpcnn_predict(Pointer predictorHandle, Pointer predictions, int predictionsLength);
There are a few quirks to using the interface that the example code demonstrates how to work around.
jpcnn_create_network()
requires a standard filename path, but to distribute the network with an
application it needs to be an asset, and because that may be compressed and part of an archive, there's
no way to get a path to it. To fix that, initDeepBelief()
copys the file to the application's data directory:
AssetManager am = ctx.getAssets();
String baseFileName = "jetpac.ntwk";
String dataDir = ctx.getFilesDir().getAbsolutePath();
String networkFile = dataDir + "/" + baseFileName;
copyAsset(am, baseFileName, networkFile);
networkHandle = JPCNNLibrary.INSTANCE.jpcnn_create_network(networkFile);
This has some overhead obviously, so one optimization might be to check for the existence of the file and only copy it over if it doesn't already exist.
jpcnn_create_image_buffer_from_uint8_data()
needs a plain byte array, and the classifyBitmap()
function
shows how you can extract what you need from a normal Bitmap object:
final int width = bitmap.getWidth();
final int height = bitmap.getHeight();
final int pixelCount = (width * height);
final int bytesPerPixel = 4;
final int byteCount = (pixelCount * bytesPerPixel);
ByteBuffer buffer = ByteBuffer.allocate(byteCount);
bitmap.copyPixelsToBuffer(buffer);
byte[] pixels = buffer.array();
Pointer imageHandle = JPCNNLibrary.INSTANCE.jpcnn_create_image_buffer_from_uint8_data(pixels, width, height, 4, (4 * width), 0, 0);
Native objects are not garbage-collected, so you'll have to remember to explicitly call jpcnn_destroy_image_buffer()
and other calls on objects you've created through the library if you want to avoid memory leaks.
The rest of classifyBitmap()
also demonstrates how to pull out the results as Java-accessible arrays from the JNA types.
I've been using Ubuntu 12.04 and 14.04 on x86-64 platforms, but the library ships as a simple .so with minimal dependencies, so hopefully it should work on most distros.
As long as you have git and the build-essentials packages installed, you should be able to build an example by running the following commands in a terminal:
git clone https://github.com/jetpacapp/DeepBeliefSDK.git
cd DeepBeliefSDK/LinuxLibrary
sudo ./install.sh
cd ../examples/SimpleLinux/
make
./deepbelief
If the example program ran successfully, the output should look like this:
0.016994 wool
0.016418 cardigan
0.010924 kimono
0.010713 miniskirt
0.014307 crayfish
0.015663 brassiere
0.014216 harp
0.017052 sandal
0.024082 holster
0.013580 velvet
0.057286 bonnet
0.018848 stole
0.028298 maillot
0.010915 gown
0.073035 wig
0.012413 hand blower
0.031052 stage
0.027875 umbrella
0.012592 sarong
It's analyzing the default Lena image, and giving low probabilities of a wig and a bonnet, which isn't too crazy. You can pass in a command-line argument to analyze your own images, and the results are tab separated text, so you can pipe the results into other programs for further processing.
To use the library in your own application, first make sure you've run the install.sh script in AndroidLibrary/ to install the libjpcnn.so in /usr/lib, and libjpcnn.h in /usr/include, as described in Getting Started on Linux.
Then you should be able to access all the API functions by including the libjpcnn.h header, eg:
#include <libjpcnn.h>
Here's how you would run a basic classification of a single image, from the SimpleLinux example:
networkHandle = jpcnn_create_network(NETWORK_FILE_NAME);
imageHandle = jpcnn_create_image_buffer_from_file(imageFileName);
jpcnn_classify_image(networkHandle, imageHandle, 0, 0, &predictions, &predictionsLength, &predictionsLabels, &predictionsLabelsLength);
for (index = 0; index < predictionsLength; index += 1) {
float predictionValue;
char* label;
predictionValue = predictions[index];
if (predictionValue < 0.01f) {
continue;
}
label = predictionsLabels[index];
fprintf(stdout, "%f\t%s\n", predictionValue, label);
}
Load the examples/OSXExample/MyRecorder.xcodeproj XCode project, build, and run. On any machine with a webcam, you should see a window appear showing live video. Move the webcam until it has a clear view of an object like a wine bottle, glass, mug, or a computer keyboard, and you should start to see overlaid labels and percentages.
The DeepBelief.framework you'll need is in the OSXLibrary folder. Since installing frameworks in a shared location can be a pain, and Apple recommends keeping applications as self-contained as possible, it's designed to be bundled inside your app folder. The OS X Example sample code uses this approach, and is a good starting point for understanding the process. It has a symbolic link back to the framework, but you'll probably want to copy the library into your own source tree. Apple's documentation on bundling private frameworks is the best documentation for the whole process, but here's the summary of what you'll need to do:
- Copy DeepBelief.framework into your source tree
- Drag it into the Frameworks folder of your project in the XCode navigator.
- Add it to the "Link Binary with Libraries" build phase in the project settings.
- Add a new "Copy Files Build Phase" to the project build phases.
- Add the framework as a new file in that build phase, with the destination as "Frameworks".
Once you've done that, you should be able to build your app, and then "Show package contents" on the built product should show DeepBelief.framework inside the Contents/Frameworks folder.
At that point, just add #import <DeepBelief/DeepBelief.h>
and all of the code you need should be identical to the snippets shown in the iOS guide.
It's pretty straightforward to use DeepBelief together with OpenCV, you just need to convert the images over. There's sample code showing the whole process, but the heart of it is this image format conversion:
const cv::Size size = image.size();
const int width = size.width;
const int height = size.height;
const int pixelCount = (width * height);
const int bytesPerPixel = 3;
const int byteCount = (pixelCount * bytesPerPixel);
// OpenCV images are BGR, we need RGB, so do a conversion to a temporary image
cv::Mat rgbImage;
cv::cvtColor(image, rgbImage, CV_BGR2RGB);
uint8_t* rgbPixels = (uint8_t*)rgbImage.data;
imageHandle = jpcnn_create_image_buffer_from_uint8_data(rgbPixels, width, height, 3, (3 * width), 0, 0);
Once you've done that, you can run the image classification and prediction as normal on the image handle.
The sample code has some other convenience classes too, to help make using the library in C++ a bit easier.
If you're using the Java interface, the same sort of call sequence works to handle the conversion, though you'll need byte[]
arrays and you'll have to call image.get(0, 0, pixels)
to actually get the raw image data you need.
The library is available as a Raspbian .so library in the RaspberryPiLibrary folder.
Using it is very similar to ordinary Linux, and you can follow most of the same instructions, substituting the install.sh in the Pi folder.
The biggest difference is that the Pi library uses the GPU to handle a lot of the calculations, so you need to run the example program as a super user, e.g. sudo ./deepbelief
.
This optimization allows an image to be recognized on a stock Pi in around five seconds, and in three seconds with a boosted GPU clock rate.
There's no pre-built library for the Pi 2, and the GPU version that's fastest on the Pi 1 doesn't work, so you can't just re-use the older library. The good news is that the CPU has improved so much, you can get better performance using the optimized Eigen open-source library, and compiling it from source. Here are the instructions:
mkdir ~/projects
cd ~/projects
Clone this repository into ~/projects/DeepBeliefSDK
sudo apt-get install -y mercurial
hg clone https://bitbucket.org/eigen/eigen
ln -s ~/projects/eigen ~/projects/DeepBeliefSDK/eigen
cd ~/projects/DeepBeliefSDK/source
make clean
sudo apt-get install gcc-4.8 g++-4.8
sudo rm -rf /usr/bin/gcc
sudo rm -rf /usr/bin/g++
sudo ln -s /usr/bin/gcc-4.8 /usr/bin/gcc
sudo ln -s /usr/bin/g++-4.8 /usr/bin/g++
make GEMM=eigen TARGET=pi2
./jpcnn -i data/dog.jpg -n ../networks/jetpac.ntwk -t -m s -d
You should see the classification results, with a time of around 3.8 seconds on a stock Pi 2. If you then overclock it with raspi-config
, you can increase that to 3.2s.
There's no pre-built library for the Pi 3, just like for Pi 2, and the GPU version that's fastest on the Pi 1 doesn't work, so you can't just re-use the older library, or build from source, as described in Building from Source. You should see the classification results, with a time of around 2.5 seconds on a stock Pi 3. Using Raspberry Pi 3 Model B gives a much better time, because of its 1GB RAM and 64-bit Broadcom BCM2837 ARM v8 processor which is a quad-core chip that runs at 1.2GHz.
Like the Pi 2, there’s no pre-built library for the Beaglebone Black, but you can build it yourself using the ARM-optimized Eigen open-source library. You will need to use the latest development version of Eigen to make sure NEON is enabled on the default gcc v4.6 compiler though, see this patch.
Here are the instructions:
mkdir ~/projects
cd ~/projects
Clone this repository into ~/projects/DeepBeliefSDK
sudo apt-get install -y mercurial
hg clone https://bitbucket.org/eigen/eigen
ln -s ~/projects/eigen ~/projects/DeepBeliefSDK/eigen
cd ~/projects/DeepBeliefSDK/source
make clean
make GEMM=eigen TARGET=beagle
./jpcnn -i data/dog.jpg -n ../networks/jetpac.ntwk -t -m s -d
The Javascript version of the library includes complete source, and a browser demo page.
The interface is similar to the C version, but uses native Javascript image objects, camelCase function names, and classes.
You'll need to include the jpcnn.js file, and then load the networks/jetpac_untransposed.ntwk file (which is a slightly-modified version of the standard Jetpac network).
Then you should be able to call Network.classifyImage()
, with an option to accelerate the calculations using WebGL if you're in a browser that supports it.
On my 2012 MacBook Pro in Chrome, the WebGL version takes around 600ms, whereas the naive CPU path takes 5 seconds.
If you’re on a platform that isn’t covered here, you can compile your own version of the library from the code and Makefile inside the source directory. It’s designed to have no dependencies by default, using plain, portable C++, and it’s possible to get it running on almost any device that has a compiler and the standard C libraries. Here are the minimal instructions:
mkdir ~/projects
cd ~/projects
git clone https://github.com/jetpacapp/DeepBeliefSDK.git
cd ~/projects/DeepBeliefSDK/source
make clean
make
./jpcnn -i data/dog.jpg -n ../networks/jetpac.ntwk -t -m s -d
There are two arguments you can pass into the make file to control compilation. PLATFORM (used as make PLATFORM=foo
) controls settings for specific devices, for example enabling particular cpus in gcc. The GEMM argument decides which implementation of the matrix multiplication that takes the bulk of the execution time to use, so you can swap in something like Eigen or Intel’s MKL on supported platforms.
All of the sample code projects are included in the 'examples' folder in this git repository.
This is a self-contained iOS application that shows you how to load the neural network parameters, and process live video to estimate the probability that one of the 1,000 pre-defined Imagenet objects are present.
The code is largely based on the SquareCam Apple sample application, which is fairly old and contains some ugly code.
If you look for jpcnn_*
calls in SquareCamViewController.m you should be able to follow the sequence of first loading the network, applying it to video frames as they arrive, and destroying the objects once you're all done.
This application allows you to apply the image recognition code to custom objects you care about. It demonstrates how to capture positive and negative examples, feed them into a trainer to create a prediction model, and then apply that prediction model to the live camera feed.
It can be a bit messy thanks to all the live video feed code, but if you look for jpcnn_*
you'll be able to spot the main flow. Once a prediction model has been fully trained, the parameters are written to the XCode console so they can be used as pre-trained predictors.
This shows how you can use a custom prediction model that you've built using the LearningExample sample code. I've included the simple 'wine_bottle_predictor.txt' that I quickly trained on a bottle of wine, you should be able to run it yourself and see the results of that model's prediction on your own images.
A basic Android application that applies the classification algorithm to live video from the phone's camera. The first thing it does after initialization is analyze the standard image-processing image of Lena, you should see log output from that first. After that it continuously analyzes incoming camera frames, both displaying the found labels on screen and printing them to the console.
This is a small command line tool that shows how you can load a network file and classify an image using the default Imagenet categories. If you run it with no arguments, it looks for lena.png and analyzes that, otherwise it tries to load the file name in the first argument as its input image.
The network file name is hardcoded to "jetpac.ntwk" in the current folder. In a real application you'll want to set that yourself, either hard-coding it to a known absolute location for the file, or passing it in dynamically as an argument or environment variable.
The output of the tool is tab-separated lines, with the probability first followed by the imagenet label, so you can sort and process it easily through pipes on the command line.
This project is based on Apple's MyRecorder sample code, which is both quite old and fairly gnarly thanks to its use of QTKit! The complexity is mostly in the way it accesses the webcam, and converts the supplied image down to a simple array of RGB bytes to feed into the neural network code. If you search for 'jpcnn' in the code, you'll see the calls to the library nestled amongst all the plumbing for the interface and the video, they should be fairly straightforward.
The main steps are loading the 'jetpac.ntwk' neural network, that's included as a resource in the app, then extracting an image from the video, classifying it, and displaying the found labels in the UI. When you build and run the project, you should see a window appear with the webcam view in it, and any found labels overlaid on top. You'll also see some performance stats being output to the console - on my mid-2012 Macbook Pro it takes around 60ms to do the calculations.
This is a basic Linux command-line tool that shows how OpenCV and the DeepBelief framework can work together. The main() function uses C++ classes defined in deepbeliefopencv.h to load a network, then it creates an OpenCV image from either lena.png or another file supplied on the command line. A wrapper class for the library's image handle object is then used to convert the OpenCV image into one the DeepBelief framework can analyze. The classification is run on that image, and the found labels are printed out.
If you're doing a lot of work with OpenCV, the most crucial part for you is probably the conversion of the image objects between the two systems.
That's defined in deepbeliefopencv.cpp in the Image::Image(const cv::Mat& image)
constructor, and the section on using OpenCV covers what's going on in the actual code.
There are currently three pre-built models available in the networks folder. jetpac.ntwk is the in-house model used here at Jetpac, and it's licensed under the same BSD conditions as the rest of the project. It has a few oddities, like only 999 labels (a file truncation problem I discovered too late during training) but has served us well and is a good place to start.
The excellent libccv project also made a couple of networks available under a Creative Commons Attribution 4.0 International License. I've converted them over into a binary format, and they're in the networks folder as ccv2010.ntwk and ccv2012.ntwk. You should be able to substitute these in anywhere you'd use jetpac.ntwk. The 2012 file has very similar labels to our original, and the 2010 is an older architecture. You may notice slightly slower performance, the arrangement of the layers is a bit different (in technical terms the local-response normalization happens before the max-pooling in these models, which is more expensive since there's more data to normalize), but the accuracy of the 2012 model especially is good. One common technique in the academic world is to take multiple models and merge their votes for higher accuracy, so one application of the multiple models might be improved accuracy.
Because we reuse the same code across a lot of different platforms, we use a
plain-old C interface to our library. All of the handles to different objects are
opaque pointers, and you have to explictly call the *_destroy_*
function on any
handles that have been returned from *_create_*
calls if you want to avoid memory
leaks. Input images are created from raw arrays of 8-bit RGB data, you can see how
to build those from iOS types by searching for jpcnn_create_image_buffer()
in the
sample code.
The API is broken up into two sections. The first gives you access to one of the pre-trained neural networks you'll find in the networks folder. These have been trained on 1,000 Imagenet categories, and the output will give you a decent general idea of what's in an image.
The second section lets you replace the highest layer of the neural network with your own classification step. This means you can use it to recognize the objects you care about more accurately.
- jpcnn_create_network
- jpcnn_destroy_network
- jpcnn_create_image_buffer_from_file
- jpcnn_create_image_buffer_from_uint8_data
- jpcnn_destroy_image_buffer
- jpcnn_classify_image
- jpcnn_print_network
- jpcnn_create_trainer
- jpcnn_destroy_trainer
- jpcnn_train
- jpcnn_create_predictor_from_trainer
- jpcnn_destroy_predictor
- jpcnn_load_predictor
- jpcnn_print_predictor
- jpcnn_predict
void* jpcnn_create_network(const char* filename)
This takes the filename of the network parameter file as an input, and builds a neural network stack based on that definition. Right now the only available file is the 1,000 category jetpac.ntwk, built here at Jetpac based on the approach used by Krizhevsky to win the Imagenet 2012 competition.
You'll need to make sure you include this 60MB file in the 'Copy Files' build phase of your application, and then call something like this to get the actual path:
NSString* networkPath = [[NSBundle mainBundle] pathForResource:@"jetpac" ofType:@"ntwk"];
network = jpcnn_create_network([networkPath UTF8String]);
void jpcnn_destroy_network(void* networkHandle)
Once you're finished with the neural network, call this to destroy it and free up the memory it used.
void* jpcnn_create_image_buffer_from_file(const char* filename)
Takes a filename (see above for how to get one from your bundle) and creates an image object that you can run the classification process on. It can load PNGS and JPEGS.
void* jpcnn_create_image_buffer_from_uint8_data(unsigned char* pixelData, int width, int height, int channels, int rowBytes, int reverseOrder, int doRotate)
If you already have data in memory, you can use this function to copy it into an image object that you can then classify. It's useful if you're doing video capture, as the sample code does.
void jpcnn_destroy_image_buffer(void* imageHandle)
Once you're done classifying an image, call this to free up the memory it used.
void jpcnn_classify_image(void* networkHandle, void* inputHandle, unsigned int flags, int layerOffset, float** outPredictionsValues, int* outPredictionsLength, char*** outPredictionsNames, int* outPredictionsNamesLength)
This is how you actually get tags for an image. It takes in a neural network and an image, and returns an array of floats. Each float is a predicted value for an imagenet label, between 0 and 1, where higher numbers are more confident predictions.
The three outputs are:
- outPredictionsValues is a pointer to the array of predictions.
- outPredictionsLength holds the length of the predictions array.
- outPredictionsNames is an array of C strings representing imagenet labels, each corresponding to the prediction value at the same index in outPredictionsValues.
- outPredictionsNamesLength is the number of name strings in the label array. In the simple case this is the same as the number of predictions, but in different modes this can get more complicated! See below for details.
In the simple case you can leave the flags and layerOffset arguments as zero, and you'll get an array of prediction values out. Pick the highest (possibly with a threshold like 0.1 to avoid shaky ones), and you can use that as a simple tag for the image.
There are several optional arguments you can use to improve your results though.
The final output of the neural network represents the high-level categories that it's been trained on, but often you'll want to work with other types of objects. The good news is that it's possible to take the results from layers that are just before the final one, and use those as inputs to simple statistical algorithms to recognize entirely new kinds of things. This paper on Decaf does a good job of describing the approach, but the short version is that those high-level layers can be seen as adjectives that help the output layer make its final choice between categories, and those same adjectives turn out to be useful for choosing between a lot of other categories it hasn't been trained on too. For example, there might be some signals that correlate with 'spottiness' and 'furriness', which would be useful for picking out leopards, even if they were originally learned from pictures of dalmatians.
The layerOffset
argument lets you control which layer you're sampling, as a negative offset from the start of the network.
Try setting it to -2
, and you should get an array of 4096 floats in outPredictionsValues, though since these are no longer representing Imagenet labels the names array will no longer be valid.
You can then feed those values into a training system like libSVM to help you distinguish between the kinds of objects you care about.
The image recognition algorithm always crops the input image to the biggest square that fits within its bounds, resamples that area to 256x256 pixels and then takes a slightly smaller 224x224 sample square from somewhere within that main square.
The flags argument controls how that 224-pixel sample square is positioned within the larger one. If it's left as zero, then it's centered with a 16 pixel margin at all edges.
The sample code uses JPCNN_RANDOM_SAMPLE
to jitter the origin of the 224 square randomly within the bounds each call, since this, combined with smoothing of the results over time, helps ensure that the identification of tags is robust to slight position changes.
The JPCNN_MULTISAMPLE
flag takes ten different sample positions within the image and runs them all through the classification pipeline simultaneously. This is a costly operation, so it doesn't tend to be practical on low-processing-power platforms like the iPhone.
void jpcnn_print_network(void* networkHandle)
This is a debug logging call that prints information about a loaded neural network.
void* jpcnn_create_trainer()
Returns a handle to a trainer object that you can feed training examples into to build your own custom prediction model.
void jpcnn_destroy_trainer(void* trainerHandle)
Disposes of the memory used by the trainer object and destroys it.
void jpcnn_train(void* trainerHandle, float expectedLabel, float* predictions, int predictionsLength)
To create your own custom prediction model, you need to train it using 'positive' examples of images containing the object you care about, and 'negative' examples of images that don't. Once you've created a trainer object, you can call this with the neural network results for each positive or negative image, and with an expectedLabel of '0.0' for negatives and '1.0' for positives. Picking the exact number of each you'll need is more of an art than a science, since it depends on how easy your object is to recognize and how cluttered your environment is, but I've had decent results with as few as a hundred of each. You can use the output of any layer of the neural network, but I've found using the penultimate one works well. I discuss how to do this above in the layerOffset section. To see how this works in practice, try out the LearningExample sample code for yourself.
void* jpcnn_create_predictor_from_trainer(void* trainerHandle)
Once you've passed in all your positive and negative examples to jpcnn_train, you can call this to build a predictor model from them all. Under the hood, it's using libSVM to create a support vector machine model based on the examples.
void jpcnn_destroy_predictor(void* predictorHandle)
Deallocates any memory used by the predictor model, call this once you're finished with it.
void* jpcnn_load_predictor(const char* filename)
Loads a predictor you've already created from a libSVM-format text file. Since you can't save files on iOS devices, the only way to create this file in the first place is to call jpcnn_print_predictor once you've created a predictor, and then copy and paste the results from the developer console into a file, and then add it to your app's resources. The SavedModelExample sample code shows how to use this call.
void jpcnn_print_predictor(void* predictorHandle)
Outputs the parameters that define a custom predictor to stderr (and hence the developer console in XCode). You'll need to copy and paste this into your own text file to subsequently reload the predictor.
float jpcnn_predict(void* predictorHandle, float* predictions, int predictionsLength)
Given the output from a pre-trained neural network, and a custom prediction model, returns a value estimating the probability that the image contains the object it has been trained against.
Not right now. I hope to make it available on other devices like the Raspberry Pi in the future. I recommend checking out Caffe, OverFeat and libCCV if you're on the desktop too, they're great packages.
Not at the moment. The compiled library and the neural network parameter set are freely reusable in your own apps under the BSD license though.
There aren't any standard formats for sharing large neural networks unfortunately, so there's no easy way to import other CNNs into the app. The custom training should help you apply the included pre-trained network to your own problems to a large extent though.
Join the Deep Belief Developers email list to find out more about the practical details of implementing deep learning.
The binary framework and jetpac.ntwk network parameter file are under the BSD three-clause license, included in this folder as LICENSE. All source code is under that BSD license unless otherwise noted.
The ccv2010.ntwk and ccv2012.ntwk network models were converted from files created as part of the LibCCV project and are licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
Big thanks go to:
- Daniel Nouri for his invaluable help on CNNs.
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton for their ground-breaking work on ConvNet.
- Yann LeCun for his pioneering research and continuing support of the field.
- The Berkeley Vision and Learning Center for their work on lightweight custom training of CNNs.
- My colleagues Cathrine, Dave, Julian, and Sophia at Jetpac for all their help.