转载

arXiv Paper Daily: Fri, 16 Dec 2016

Neural and Evolutionary Computing

Graphical RNN Models

Ashish Bora , Sugato Basu , Joydeep Ghosh Subjects : Neural and Evolutionary Computing (cs.NE) ; Learning (cs.LG)

Many time series are generated by a set of entities that interact with one

another over time. This paper introduces a broad, flexible framework to learn

from multiple inter-dependent time series generated by such entities. Our

framework explicitly models the entities and their interactions through time.

It achieves this by building on the capabilities of Recurrent Neural Networks,

while also offering several ways to incorporate domain knowledge/constraints

into the model architecture. The capabilities of our approach are showcased

through an application to weather prediction, which shows gains over strong

baselines.

Improving Neural Network Generalization by Combining Parallel Circuits with Dropout

Kien Tuong Phan , Tomas Henrique Maul , Tuong Thuy Vu , Lai Weng Kin

Comments: Pre-print. The final publication is available at Springer via this http URL

Subjects

Neural and Evolutionary Computing (cs.NE)

; Learning (cs.LG)

In an attempt to solve the lengthy training times of neural networks, we

proposed Parallel Circuits (PCs), a biologically inspired architecture.

Previous work has shown that this approach fails to maintain generalization

performance in spite of achieving sharp speed gains. To address this issue, and

motivated by the way Dropout prevents node co-adaption, in this paper, we

suggest an improvement by extending Dropout to the PC architecture. The paper

provides multiple insights into this combination, including a variety of fusion

approaches. Experiments show promising results in which improved error rates

are achieved in most cases, whilst maintaining the speed advantage of the PC

approach.

Learning binary or real-valued time-series via spike-timing dependent plasticity

Takayuki Osogami

Comments: This paper was accepted and presented at Computing with Spikes NIPS 2016 Workshop, Barcelona, Spain, December 2016

Subjects

Neural and Evolutionary Computing (cs.NE)

; Machine Learning (stat.ML)

A dynamic Boltzmann machine (DyBM) has been proposed as a model of a spiking

neural network, and its learning rule of maximizing the log-likelihood of given

time-series has been shown to exhibit key properties of spike-timing dependent

plasticity (STDP), which had been postulated and experimentally confirmed in

the field of neuroscience as a learning rule that refines the Hebbian rule.

Here, we relax some of the constraints in the DyBM in a way that it becomes

more suitable for computation and learning. We show that learning the DyBM can

be considered as logistic regression for binary-valued time-series. We also

show how the DyBM can learn real-valued data in the form of a Gaussian DyBM and

discuss its relation to the vector autoregressive (VAR) model. The Gaussian

DyBM extends the VAR by using additional explanatory variables, which

correspond to the eligibility traces of the DyBM and capture long term

dependency of the time-series. Numerical experiments show that the Gaussian

DyBM significantly improves the predictive accuracy over VAR.

Neural Networks for Joint Sentence Classification in Medical Paper Abstracts

Franck Dernoncourt , Ji Young Lee , Peter Szolovits Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Existing models based on artificial neural networks (ANNs) for sentence

classification often do not incorporate the context in which sentences appear,

and classify sentences individually. However, traditional sentence

classification approaches have been shown to greatly benefit from jointly

classifying subsequent sentences, such as with conditional random fields. In

this work, we present an ANN architecture that combines the effectiveness of

typical ANN models to classify sentences in isolation, with the strength of

structured prediction. Our model achieves state-of-the-art results on two

different datasets for sequential sentence classification in medical abstracts.

Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNN

Li Jing , Yichen Shen , Tena Dubček , John Peurifoy , Scott Skirlo , Max Tegmark , Marin Soljačić

Comments: 9 pages, 4 figures

Subjects

Learning (cs.LG)

; Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

We present a method for implementing an Efficient Unitary Neural Network

(EUNN) whose computational complexity is merely (mathcal{O}(1)) per parameter

and has full tunability, from spanning part of unitary space to all of it. We

apply the EUNN in Recurrent Neural Networks, and test its performance on the

standard copying task and the MNIST digit recognition benchmark, finding that

it significantly outperforms a non-unitary RNN, an LSTM network, an exclusively

partial space URNN and a projective URNN with comparable parameter numbers.

Computer Vision and Pattern Recognition

Visual Compiler: Synthesizing a Scene-Specific Pedestrian Detector and Pose Estimator

Namhoon Lee , Xinshuo Weng , Vishnu Naresh Boddeti , Yu Zhang , Fares Beainy , Kris Kitani , Takeo Kanade

Comments: submitted to CVPR 2017

Subjects

Computer Vision and Pattern Recognition (cs.CV)

We introduce the concept of a Visual Compiler that generates a scene specific

pedestrian detector and pose estimator without any pedestrian observations.

Given a single image and auxiliary scene information in the form of camera

parameters and geometric layout of the scene, the Visual Compiler first infers

geometrically and photometrically accurate images of humans in that scene

through the use of computer graphics rendering. Using these renders we learn a

scene-and-region specific spatially-varying fully convolutional neural network,

for simultaneous detection, pose estimation and segmentation of pedestrians. We

demonstrate that when real human annotated data is scarce or non-existent, our

data generation strategy can provide an excellent solution for bootstrapping

human detection and pose estimation. Experimental results show that our

approach outperforms off-the-shelf state-of-the-art pedestrian detectors and

pose estimators that are trained on real data.

CSVideoNet: A Recurrent Convolutional Neural Network for Compressive Sensing Video Reconstruction

Kai Xu , Fengbo Ren

Comments: 10 pages, 6 pages, 2 tables

Subjects

Computer Vision and Pattern Recognition (cs.CV)

; Learning (cs.LG)

In this paper, we develop a deep neural network architecture called

“CSVideoNet” that can learn visual representations from random measurements for

compressive sensing (CS) video reconstruction. CSVideoNet is an end-to-end

trainable and non-iterative model that combines convolutional neural networks

(CNNs) with a recurrent neural networks (RNN) to facilitate video

reconstruction by leveraging temporal-spatial features. The proposed network

can accept random measurements with a multi-level compression ratio (CR). The

lightly and aggressively compressed measurements offer background information

and object details, respectively. This is similar to the variable bit rate

techniques widely used in conventional video coding approaches. The RNN

employed by CSVideoNet can leverage temporal coherence that exists in adjacent

video frames to extrapolate motion features and merge them with spatial visual

features extracted by the CNNs to further enhance reconstruction quality,

especially at high CRs. We test our CSVideoNet on the UCF-101 dataset.

Experimental results show that CSVideoNet outperforms the existing video CS

reconstruction approaches. The results demonstrate that our method can preserve

relatively excellent visual details from original videos even at a 100x CR,

which is difficult to realize with the reference approaches. Also, the

non-iterative nature of CSVideoNet results in an decrease in runtime by three

orders of magnitude over iterative reconstruction algorithms. Furthermore,

CSVideoNet can enhance the CR of CS cameras beyond the limitation of

conventional approaches, ensuring a reduction in bandwidth for data

transmission. These benefits are especially favorable to high-frame-rate video

applications.

SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth

John McCormac , Ankur Handa , Stefan Leutenegger , Andrew J. Davison Subjects : Computer Vision and Pattern Recognition (cs.CV)

We introduce SceneNet RGB-D, expanding the previous work of SceneNet to

enable large scale photorealistic rendering of indoor scene trajectories. It

provides pixel-perfect ground truth for scene understanding problems such as

semantic segmentation, instance segmentation, and object detection, and also

for geometric computer vision problems such as optical flow, depth estimation,

camera pose estimation, and 3D reconstruction. Random sampling permits

virtually unlimited scene configurations, and here we provide a set of 5M

rendered RGB-D images from over 15K trajectories in synthetic layouts with

random but physically simulated object poses. Each layout also has random

lighting, camera trajectories, and textures. The scale of this dataset is well

suited for pre-training data-driven computer vision techniques from scratch

with RGB-D inputs, which previously has been limited by relatively small

labelled datasets in NYUv2 and SUN RGB-D. It also provides a basis for

investigating 3D scene labelling tasks by providing perfect camera poses and

depth data as proxy for a SLAM system. We host the dataset at

this http URL

Reflectance Adaptive Filtering Improves Intrinsic Image Estimation

Thomas Nestmeyer , Peter V. Gehler Subjects : Computer Vision and Pattern Recognition (cs.CV)

Separation of an input image into its reflectance and shading layers poses a

challenge for learning approaches because no large corpus of precise and

realistic ground truth decompositions exists. The Intrinsic Images in the Wild

dataset (IIW) provides a sparse set of relative human reflectance judgments,

which serves as a standard benchmark for intrinsic images. This dataset led to

an increase in methods that learn statistical dependencies between the images

and their reflectance layer. Although learning plays a role in pushing

state-of-the-art performance, we show that a standard signal processing

technique achieves performance on par with recent developments. We propose a

loss function that enables learning dense reflectance predictions with a CNN.

Our results show a simple pixel-wise decision, without any context or prior

knowledge, is sufficient to provide a strong baseline on IIW. This sets a

competitive bar and we find that only two approaches surpass this result. We

then develop a joint bilateral filtering method that implements strong prior

knowledge about reflectance constancy. This filtering operation can be applied

to any intrinsic image algorithm and we improve several previous results

achieving a new state-of-the-art on IIW. Our findings suggest that the effect

of learning-based approaches may be over-estimated and that it is still the use

of explicit prior knowledge that drives performance on intrinsic image

decompositions.

Objective Micro-Facial Movement Detection Using FACS-Based Regions and Baseline Evaluation

Adrian K. Davison , Cliff Lansley , Choon Ching Ng , Kevin Tan , Moi Hoon Yap Subjects : Computer Vision and Pattern Recognition (cs.CV)

Micro-facial expressions are regarded as an important human behavioural event

that can highlight emotional deception. Spotting these movements is difficult

for humans and machines, however research into using computer vision to detect

subtle facial expressions is growing in popularity. This paper proposes an

individualised baseline micro-movement detection method using 3D Histogram of

Oriented Gradients (3D HOG) temporal difference method. We define a face

template consisting of 26 regions based on the Facial Action Coding System

(FACS). We extract the temporal features of each region using 3D HOG. Then, we

use Chi-square distance to find subtle facial motion in the local regions.

Finally, an automatic peak detector is used to detect micro-movements above the

newly proposed adaptive baseline threshold. The performance is validated on two

FACS coded datasets: SAMM and CASME II. This objective method focuses on the

movement of the 26 face regions. When comparing with the ground truth, the best

result was an AUC of 0.7512 and 0.7261 on SAMM and CASME II, respectively. The

results show that 3D HOG outperformed for micro-movement detection, compared to

state-of-the-art feature representations: Local Binary Patterns in Three

Orthogonal Planes and Histograms of Oriented Optical Flow.

A Multilinear Tongue Model Derived from Speech Related MRI Data of the Human Vocal Tract

Alexander Hewer , Stefanie Wuhrer , Ingmar Steiner , Korin Richmond Subjects : Computer Vision and Pattern Recognition (cs.CV)

We present a multilinear statistical model of the human tongue that captures

anatomical and tongue pose related shape variations separately. The model was

derived from 3D magnetic resonance imaging data of 11 speakers sustaining

speech related vocal tract configurations. The extraction was performed by

using a minimally supervised method that uses as basis an image segmentation

approach and a template fitting technique. Furthermore, it uses image denoising

to deal with possibly corrupt data, palate surface information reconstruction

to handle palatal tongue contacts, and a bootstrap strategy to refine the

obtained shapes. Our experiments concluded that limiting the degrees of freedom

for the anatomical and speech related variations to 5 and 4 respectively

produces a model that can reliably register unknown data while avoiding

overfitting effects.

Development of a Real-time Colorectal Tumor Classification System for Narrow-band Imaging zoom-videoendoscopy

Tsubasa Hirakawa , Toru Tamaki , Bisser Raytchev , Kazufumi Kaneda , Tetsushi Koide , Shigeto Yoshida , Hiroshi Mieno , Shinji Tanaka

Comments: 9 pages, 8 figures

Subjects

Computer Vision and Pattern Recognition (cs.CV)

Colorectal endoscopy is important for the early detection and treatment of

colorectal cancer and is used worldwide. A computer-aided diagnosis (CAD)

system that provides an objective measure to endoscopists during colorectal

endoscopic examinations would be of great value. In this study, we describe a

newly developed CAD system that provides real-time objective measures. Our

system captures the video stream from an endoscopic system and transfers it to

a desktop computer. The captured video stream is then classified by a

pretrained classifier and the results are displayed on a monitor. The

experimental results show that our developed system works efficiently in actual

endoscopic examinations and is medically significant.

Design of Image Matched Non-Separable Wavelet using Convolutional Neural Network

Naushad Ansari , Anubha Gupta , Rahul Duggal Subjects : Computer Vision and Pattern Recognition (cs.CV)

Image-matched nonseparable wavelets can find potential use in many

applications including image classification, segmen- tation, compressive

sensing, etc. This paper proposes a novel design methodology that utilizes

convolutional neural net- work (CNN) to design two-channel non-separable

wavelet matched to a given image. The design is proposed on quin- cunx lattice.

The loss function of the convolutional neural network is setup with total

squared error between the given input image to CNN and the reconstructed image

at the output of CNN, leading to perfect reconstruction at the end of train-

ing. Simulation results have been shown on some standard images.

Cloud Dictionary: Sparse Coding and Modeling for Point Clouds

Or Litany , Tal Remez , Alex Bronstein Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Graphics (cs.GR)

With the development of range sensors such as LIDAR and time-of-flight

cameras, 3D point cloud scans have become ubiquitous in computer vision

applications, the most prominent ones being gesture recognition and autonomous

driving. Parsimony-based algorithms have shown great success on images and

videos where data points are sampled on a regular Cartesian grid. We propose an

adaptation of these techniques to irregularly sampled signals by using

continuous dictionaries. We present an example application in the form of point

cloud denoising.

Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering

Hao Liu , Yang Yang , Fumin Shen , Lixin Duan , Heng Tao Shen Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Computation and Language (cs.CL)

Along with the prosperity of recurrent neural network in modelling sequential

data and the power of attention mechanism in automatically identify salient

information, image captioning, a.k.a., image description, has been remarkably

advanced in recent years. Nonetheless, most existing paradigms may suffer from

the deficiency of invariance to images with different scaling, rotation, etc.;

and effective integration of standalone attention to form a holistic end-to-end

system. In this paper, we propose a novel image captioning architecture, termed

Recurrent Image Captioner ( extbf{RIC}), which allows visual encoder and

language decoder to coherently cooperate in a recurrent manner. Specifically,

we first equip CNN-based visual encoder with a differentiable layer to enable

spatially invariant transformation of visual signals. Moreover, we deploy an

attention filter module (differentiable) between encoder and decoder to

dynamically determine salient visual parts. We also employ bidirectional LSTM

to preprocess sentences for generating better textual representations. Besides,

we propose to exploit variational inference to optimize the whole architecture.

Extensive experimental results on three benchmark datasets (i.e., Flickr8k,

Flickr30k and MS COCO) demonstrate the superiority of our proposed architecture

as compared to most of the state-of-the-art methods.

Regressing Robust and Discriminative 3D Morphable Models with a very Deep Neural Network

Anh Tuan Tran , Tal Hassner , Iacopo Masi , Gerard Medioni Subjects : Computer Vision and Pattern Recognition (cs.CV)

The 3D shapes of faces are well known to be discriminative. Yet despite this,

they are rarely used for face recognition and always under controlled viewing

conditions. We claim that this is a symptom of a serious but often overlooked

problem with existing methods for single view 3D face reconstruction: when

applied “in the wild”, their 3D estimates are either unstable and change for

different photos of the same subject or they are over-regularized and generic.

In response, we describe a robust method for regressing discriminative 3D

morphable face models (3DMM). We use a convolutional neural network (CNN) to

regress 3DMM shape and texture parameters directly from an input photo. We

overcome the shortage of training data required for this purpose by offering a

method for generating huge numbers of labeled examples. The 3D estimates

produced by our CNN surpass state of the art accuracy on the MICC data set.

Coupled with a 3D-3D face matching pipeline, we show the first competitive face

recognition results on the LFW, YTF and IJB-A benchmarks using 3D face shapes

as representations, rather than the opaque deep feature vectors used by other

modern systems.

Tinkering Under the Hood: Interactive Zero-Shot Learning with Net Surgery

Vivek Krishnan , Deva Ramanan Subjects : Computer Vision and Pattern Recognition (cs.CV)

We consider the task of visual net surgery, in which a CNN can be

reconfigured without extra data to recognize novel concepts that may be omitted

from the training set. While most prior work make use of linguistic cues for

such “zero-shot” learning, we do so by using a pictorial language

representation of the training set, implicitly learned by a CNN, to generalize

to new classes. To this end, we introduce a set of visualization techniques

that better reveal the activation patterns and relations between groups of CNN

filters. We next demonstrate that knowledge of pictorial languages can be used

to rewire certain CNN neurons into a part model, which we call a pictorial

language classifier. We demonstrate the robustness of simple PLCs by applying

them in a weakly supervised manner: labeling unlabeled concepts for visual

classes present in the training data. Specifically we show that a PLC built on

top of a CNN trained for ImageNet classification can localize humans in Graz-02

and determine the pose of birds in PASCAL-VOC without extra labeled data or

additional training. We then apply PLCs in an interactive zero-shot manner,

demonstrating that pictorial languages are expressive enough to detect a set of

visual classes in MS-COCO that never appear in the ImageNet training set.

Scale Coding Bag of Deep Features for Human Attribute and Action Recognition

Fahad Shahbaz Khan , Joost van de Weijer , Rao Muhammad Anwer , Andrew D. Bagdanov , Michael Felsberg , Jorma Laaksonen Subjects : Computer Vision and Pattern Recognition (cs.CV)

Most approaches to human attribute and action recognition in still images are

based on image representation in which multi-scale local features are pooled

across scale into a single, scale-invariant encoding. Both in bag-of-words and

the recently popular representations based on convolutional neural networks,

local features are computed at multiple scales. However, these multi-scale

convolutional features are pooled into a single scale-invariant representation.

We argue that entirely scale-invariant image representations are sub-optimal

and investigate approaches to scale coding within a Bag of Deep Features

framework.

Our approach encodes multi-scale information explicitly during the image

encoding stage. We propose two strategies to encode multi-scale information

explicitly in the final image representation. We validate our two scale coding

techniques on five datasets: Willow, PASCAL VOC 2010, PASCAL VOC 2012,

Stanford-40 and Human Attributes (HAT-27). On all datasets, the proposed scale

coding approaches outperform both the scale-invariant method and the standard

deep features of the same network. Further, combining our scale coding

approaches with standard deep features leads to consistent improvement over the

state-of-the-art.

Border-Peeling Clustering

Nadav Bar , Hadar Averbuch-Elor , Daniel Cohen-Or

Comments: 9 pages, 9 figures, supplementary material added as ancillary file

Subjects

Computer Vision and Pattern Recognition (cs.CV)

In this paper, we present a novel non-parametric clustering technique, which

is based on an iterative algorithm that peels off layers of points around the

clusters. Our technique is based on the notion that each latent cluster is

comprised of layers that surround its core, where the external layers, or

border points, implicitly separate the clusters. Analyzing the K-nearest

neighbors of the points makes it possible to identify the border points and

associate them with points of inner layers. Our clustering algorithm

iteratively identifies border points, peels them, and separates the latent

clusters. We show that the peeling process adapts to the local density and

successfully separates adjacent clusters. A notable quality of the

Border-Peeling algorithm is that it does not require any parameter tuning in

order to outperform state-of-the-art finely-tuned non-parametric clustering

methods, including Mean-Shift and DBSCAN. We further assess our technique on

high-dimensional datasets that vary in size and characteristics. In particular,

we analyze the space of deep features that were trained by a convolutional

neural network.

A fuzzy approach for segmentation of touching characters

Giuseppe Airò Farulla , Nadir Murru , Rosaria Rossini Subjects : Computer Vision and Pattern Recognition (cs.CV)

The problem of correctly segmenting touching characters is an hard task to

solve and it is of major relevance in pattern recognition. In the recent years,

many methods and algorithms have been proposed; still, a definitive solution is

far from being found. In this paper, we propose a novel method based on fuzzy

logic. The proposed method combines in a novel way three features for

segmenting touching characters that have been already proposed in other studies

but have been exploited only singularly so far. The proposed strategy is based

on a 3–input/1–output fuzzy inference system with fuzzy rules specifically

optimized for segmenting touching characters in the case of Latin printed and

handwritten characters. The system performances are illustrated and supported

by numerical examples showing that our approach can achieve a reasonable good

overall accuracy in segmenting characters even on tricky conditions of touching

characters. Moreover, numerical results suggest that the method can be applied

to many different datasets of characters by means of a convenient tuning of the

fuzzy sets and rules.

Temporal-Needle: A view and appearance invariant video descriptor

Michal Yarom , Michal Irani Subjects : Computer Vision and Pattern Recognition (cs.CV)

The ability to detect similar actions across videos can be very useful for

real-world applications in many fields. However, this task is still challenging

for existing systems, since videos that present the same action, can be taken

from significantly different viewing directions, performed by different actors

and backgrounds and under various video qualities. Video descriptors play a

significant role in these systems. In this work we propose the

“temporal-needle” descriptor which captures the dynamic behavior, while being

invariant to viewpoint and appearance. The descriptor is computed using multi

temporal scales of the video and by computing self-similarity for every patch

through time in every temporal scale. The descriptor is computed for every

pixel in the video. However, to find similar actions across videos, we consider

only a small subset of the descriptors – the statistical significant

descriptors. This allow us to find good correspondences across videos more

efficiently. Using the descriptor, we were able to detect the same behavior

across videos in a variety of scenarios. We demonstrate the use of the

descriptor in tasks such as temporal and spatial alignment, action detection

and even show its potential in unsupervised video clustering into categories.

In this work we handled only videos taken with stationary cameras, but the

descriptor can be extended to handle moving camera as well.

The More You Know: Using Knowledge Graphs for Image Classification

Kenneth Marino , Ruslan Salakhutdinov , Abhinav Gupta Subjects : Computer Vision and Pattern Recognition (cs.CV)

Humans have the remarkable capability to learn a large variety of visual

concepts, often with very few examples, whereas current state-of-the-art vision

algorithms require hundreds or thousands of examples per category and struggle

with ambiguity. One characteristic that sets humans apart is our ability to

acquire knowledge about the world and reason using this knowledge. This paper

investigates the use of structured prior knowledge in the form of knowledge

graphs and shows that using this knowledge improves performance on image

classification. Specifically, we introduce the Graph Search Neural Network as a

way of efficiently incorporating large knowledge graphs into a fully end-to-end

learning system. We show in a number of experiments that our method outperforms

baselines for multi-label classification, even under low data and few-shot

settings.

Coupling Adaptive Batch Sizes with Learning Rates

Lukas Balles , Javier Romero , Philipp Hennig Subjects : Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Mini-batch stochastic gradient descent and variants thereof have become

standard for large-scale empirical risk minimization like the training of

neural networks. These methods are usually used with a constant batch size

chosen by simple empirical inspection. The batch size significantly influences

the behavior of the stochastic optimization algorithm, though, since it

determines the variance of the gradient estimates. This variance also changes

over the optimization process; when using a constant batch size, stability and

convergence is thus often enforced by means of a (manually tuned) decreasing

learning rate schedule. We propose a practical method for dynamic batch size

adaptation. It estimates the variance of the stochastic gradients and adapts

the batch size to decrease the variance proportionally to the value of the

objective function, removing the need for the aforementioned learning rate

decrease. In contrast to recent related work, our algorithm couples the batch

size to the learning rate, directly reflecting the known relationship between

the two. On three image classification benchmarks, our batch size adaptation

yields faster optimization convergence, while simultaneously simplifying

learning rate tuning. A TensorFlow implementation is available.

Towards Score Following in Sheet Music Images

Matthias Dorfer , Andreas Arzt , Gerhard Widmer

Comments: Published In Proceedings of the 17th International Society for Music Information Retrieval Conference (2016)

Subjects

Learning (cs.LG)

; Computer Vision and Pattern Recognition (cs.CV)

This paper addresses the matching of short music audio snippets to the

corresponding pixel location in images of sheet music. A system is presented

that simultaneously learns to read notes, listens to music and matches the

currently played music to its corresponding notes in the sheet. It consists of

an end-to-end multi-modal convolutional neural network that takes as input

images of sheet music and spectrograms of the respective audio snippets. It

learns to predict, for a given unseen audio snippet (covering approximately one

bar of music), the corresponding position in the respective score line. Our

results suggest that with the use of (deep) neural networks — which have

proven to be powerful image processing models — working with sheet music

becomes feasible and a promising future research direction.

Deep learning is effective for the classification of OCT images of normal versus Age-related Macular Degeneration

Cecilia S. Lee , Doug M. Baughman , Aaron Y. Lee

Comments: 4 Figures, 1 Table

Subjects

Machine Learning (stat.ML)

; Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Objective: The advent of Electronic Medical Records (EMR) with large

electronic imaging databases along with advances in deep neural networks with

machine learning has provided a unique opportunity to achieve milestones in

automated image analysis. Optical coherence tomography (OCT) is the most

commonly obtained imaging modality in ophthalmology and represents a dense and

rich dataset when combined with labels derived from the EMR. We sought to

determine if deep learning could be utilized to distinguish normal OCT images

from images from patients with Age-related Macular Degeneration (AMD). Methods:

Automated extraction of an OCT imaging database was performed and linked to

clinical endpoints from the EMR. OCT macula scans were obtained by Heidelberg

Spectralis, and each OCT scan was linked to EMR clinical endpoints extracted

from EPIC. The central 11 images were selected from each OCT scan of two

cohorts of patients: normal and AMD. Cross-validation was performed using a

random subset of patients. Area under receiver operator curves (auROC) were

constructed at an independent image level, macular OCT level, and patient

level. Results: Of an extraction of 2.6 million OCT images linked to clinical

datapoints from the EMR, 52,690 normal and 48,312 AMD macular OCT images were

selected. A deep neural network was trained to categorize images as either

normal or AMD. At the image level, we achieved an auROC of 92.78% with an

accuracy of 87.63%. At the macula level, we achieved an auROC of 93.83% with an

accuracy of 88.98%. At a patient level, we achieved an auROC of 97.45% with an

accuracy of 93.45%. Peak sensitivity and specificity with optimal cutoffs were

92.64% and 93.69% respectively. Conclusions: Deep learning techniques are

effective for classifying OCT images. These findings have important

implications in utilizing OCT in automated screening and computer aided

diagnosis tools.

Artificial Intelligence

Ontohub: A semantic repository for heterogeneous ontologies

Mihai Codescu , Eugen Kuksa , Oliver Kutz , Till Mossakowski , Fabian Neuhaus

Comments: Preprint, journal special issue

Subjects

Artificial Intelligence (cs.AI)

Ontohub is a repository engine for managing distributed heterogeneous

ontologies. The distributed nature enables communities to share and exchange

their contributions easily. The heterogeneous nature makes it possible to

integrate ontologies written in various ontology languages. Ontohub supports a

wide range of formal logical and ontology languages, as well as various

structuring and modularity constructs and inter-theory (concept) mappings,

building on the OMG-standardized DOL language. Ontohub repositories are

organised as Git repositories, thus inheriting all features of this popular

version control system. Moreover, Ontohub is the first repository engine

meeting a substantial amount of the requirements formulated in the context of

the Open Ontology Repository (OOR) initiative, including an API for federation

as well as support for logical inference and axiom selection.

Crowdsourced Outcome Determination in Prediction Markets

Rupert Freeman , Sebastien Lahaie , David M. Pennock Subjects : Artificial Intelligence (cs.AI) ; Computer Science and Game Theory (cs.GT)

A prediction market is a useful means of aggregating information about a

future event. To function, the market needs a trusted entity who will verify

the true outcome in the end. Motivated by the recent introduction of

decentralized prediction markets, we introduce a mechanism that allows for the

outcome to be determined by the votes of a group of arbiters who may themselves

hold stakes in the market. Despite the potential conflict of interest, we

derive conditions under which we can incentivize arbiters to vote truthfully by

using funds raised from market fees to implement a peer prediction mechanism.

Finally, we investigate what parameter values could be used in a real-world

implementation of our mechanism.

Collaborative creativity with Monte-Carlo Tree Search and Convolutional Neural Networks

Memo Akten , Mick Grierson

Comments: Presented at the Constructive Machine Learning workshop at NIPS 2016 as a poster and spotlight talk. 8 pages including 2 page references, 2 page appendix, 3 figures. Blog post (including videos) at this https URL

Subjects

Artificial Intelligence (cs.AI)

We investigate a human-machine collaborative drawing environment in which an

autonomous agent sketches images while optionally allowing a user to directly

influence the agent’s trajectory. We combine Monte Carlo Tree Search with image

classifiers and test both shallow models (e.g. multinomial logistic regression)

and deep Convolutional Neural Networks (e.g. LeNet, Inception v3). We found

that using the shallow model, the agent produces a limited variety of images,

which are noticably recogonisable by humans. However, using the deeper models,

the agent produces a more diverse range of images, and while the agent remains

very confident (99.99%) in having achieved its objective, to humans they mostly

resemble unrecognisable ‘random’ noise. We relate this to recent research which

also discovered that ‘deep neural networks are easily fooled’ cite{Nguyen2015}

and we discuss possible solutions and future directions for the research.