
arXiv Paper Daily: Fri, 16 Dec 2016

Neural and Evolutionary Computing

Graphical RNN Models

Ashish Bora , Sugato Basu , Joydeep Ghosh Subjects : Neural and Evolutionary Computing (cs.NE) ; Learning (cs.LG)

Many time series are generated by a set of entities that interact with one

another over time. This paper introduces a broad, flexible framework to learn

from multiple inter-dependent time series generated by such entities. Our

framework explicitly models the entities and their interactions through time.

It achieves this by building on the capabilities of Recurrent Neural Networks,

while also offering several ways to incorporate domain knowledge/constraints

into the model architecture. The capabilities of our approach are showcased

through an application to weather prediction, which shows gains over strong


Improving Neural Network Generalization by Combining Parallel Circuits with Dropout

Kien Tuong Phan , Tomas Henrique Maul , Tuong Thuy Vu , Lai Weng Kin

Comments: Pre-print. The final publication is available at Springer via this http URL



Neural and Evolutionary Computing (cs.NE)

; Learning (cs.LG)

In an attempt to solve the lengthy training times of neural networks, we

proposed Parallel Circuits (PCs), a biologically inspired architecture.

Previous work has shown that this approach fails to maintain generalization

performance in spite of achieving sharp speed gains. To address this issue, and

motivated by the way Dropout prevents node co-adaption, in this paper, we

suggest an improvement by extending Dropout to the PC architecture. The paper

provides multiple insights into this combination, including a variety of fusion

approaches. Experiments show promising results in which improved error rates

are achieved in most cases, whilst maintaining the speed advantage of the PC


Learning binary or real-valued time-series via spike-timing dependent plasticity

Takayuki Osogami

Comments: This paper was accepted and presented at Computing with Spikes NIPS 2016 Workshop, Barcelona, Spain, December 2016



Neural and Evolutionary Computing (cs.NE)

; Machine Learning (stat.ML)

A dynamic Boltzmann machine (DyBM) has been proposed as a model of a spiking

neural network, and its learning rule of maximizing the log-likelihood of given

time-series has been shown to exhibit key properties of spike-timing dependent

plasticity (STDP), which had been postulated and experimentally confirmed in

the field of neuroscience as a learning rule that refines the Hebbian rule.

Here, we relax some of the constraints in the DyBM in a way that it becomes

more suitable for computation and learning. We show that learning the DyBM can

be considered as logistic regression for binary-valued time-series. We also

show how the DyBM can learn real-valued data in the form of a Gaussian DyBM and

discuss its relation to the vector autoregressive (VAR) model. The Gaussian

DyBM extends the VAR by using additional explanatory variables, which

correspond to the eligibility traces of the DyBM and capture long term

dependency of the time-series. Numerical experiments show that the Gaussian

DyBM significantly improves the predictive accuracy over VAR.

Neural Networks for Joint Sentence Classification in Medical Paper Abstracts

Franck Dernoncourt , Ji Young Lee , Peter Szolovits Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Existing models based on artificial neural networks (ANNs) for sentence

classification often do not incorporate the context in which sentences appear,

and classify sentences individually. However, traditional sentence

classification approaches have been shown to greatly benefit from jointly

classifying subsequent sentences, such as with conditional random fields. In

this work, we present an ANN architecture that combines the effectiveness of

typical ANN models to classify sentences in isolation, with the strength of

structured prediction. Our model achieves state-of-the-art results on two

different datasets for sequential sentence classification in medical abstracts.

Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNN

Li Jing , Yichen Shen , Tena Dubček , John Peurifoy , Scott Skirlo , Max Tegmark , Marin Soljačić

Comments: 9 pages, 4 figures



Learning (cs.LG)

; Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

We present a method for implementing an Efficient Unitary Neural Network

(EUNN) whose computational complexity is merely (mathcal{O}(1)) per parameter

and has full tunability, from spanning part of unitary space to all of it. We

apply the EUNN in Recurrent Neural Networks, and test its performance on the

standard copying task and the MNIST digit recognition benchmark, finding that

it significantly outperforms a non-unitary RNN, an LSTM network, an exclusively

partial space URNN and a projective URNN with comparable parameter numbers.

Computer Vision and Pattern Recognition

Visual Compiler: Synthesizing a Scene-Specific Pedestrian Detector and Pose Estimator

Namhoon Lee , Xinshuo Weng , Vishnu Naresh Boddeti , Yu Zhang , Fares Beainy , Kris Kitani , Takeo Kanade

Comments: submitted to CVPR 2017



Computer Vision and Pattern Recognition (cs.CV)

We introduce the concept of a Visual Compiler that generates a scene specific

pedestrian detector and pose estimator without any pedestrian observations.

Given a single image and auxiliary scene information in the form of camera

parameters and geometric layout of the scene, the Visual Compiler first infers

geometrically and photometrically accurate images of humans in that scene

through the use of computer graphics rendering. Using these renders we learn a

scene-and-region specific spatially-varying fully convolutional neural network,

for simultaneous detection, pose estimation and segmentation of pedestrians. We

demonstrate that when real human annotated data is scarce or non-existent, our

data generation strategy can provide an excellent solution for bootstrapping

human detection and pose estimation. Experimental results show that our

approach outperforms off-the-shelf state-of-the-art pedestrian detectors and

pose estimators that are trained on real data.

CSVideoNet: A Recurrent Convolutional Neural Network for Compressive Sensing Video Reconstruction

Kai Xu , Fengbo Ren

Comments: 10 pages, 6 pages, 2 tables



Computer Vision and Pattern Recognition (cs.CV)

; Learning (cs.LG)

In this paper, we develop a deep neural network architecture called

“CSVideoNet” that can learn visual representations from random measurements for

compressive sensing (CS) video reconstruction. CSVideoNet is an end-to-end

trainable and non-iterative model that combines convolutional neural networks

(CNNs) with a recurrent neural networks (RNN) to facilitate video

reconstruction by leveraging temporal-spatial features. The proposed network

can accept random measurements with a multi-level compression ratio (CR). The

lightly and aggressively compressed measurements offer background information

and object details, respectively. This is similar to the variable bit rate

techniques widely used in conventional video coding approaches. The RNN

employed by CSVideoNet can leverage temporal coherence that exists in adjacent

video frames to extrapolate motion features and merge them with spatial visual

features extracted by the CNNs to further enhance reconstruction quality,

especially at high CRs. We test our CSVideoNet on the UCF-101 dataset.

Experimental results show that CSVideoNet outperforms the existing video CS

reconstruction approaches. The results demonstrate that our method can preserve

relatively excellent visual details from original videos even at a 100x CR,

which is difficult to realize with the reference approaches. Also, the

non-iterative nature of CSVideoNet results in an decrease in runtime by three

orders of magnitude over iterative reconstruction algorithms. Furthermore,

CSVideoNet can enhance the CR of CS cameras beyond the limitation of

conventional approaches, ensuring a reduction in bandwidth for data

transmission. These benefits are especially favorable to high-frame-rate video


SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth

John McCormac , Ankur Handa , Stefan Leutenegger , Andrew J. Davison Subjects : Computer Vision and Pattern Recognition (cs.CV)

We introduce SceneNet RGB-D, expanding the previous work of SceneNet to

enable large scale photorealistic rendering of indoor scene trajectories. It

provides pixel-perfect ground truth for scene understanding problems such as

semantic segmentation, instance segmentation, and object detection, and also

for geometric computer vision problems such as optical flow, depth estimation,

camera pose estimation, and 3D reconstruction. Random sampling permits

virtually unlimited scene configurations, and here we provide a set of 5M

rendered RGB-D images from over 15K trajectories in synthetic layouts with

random but physically simulated object poses. Each layout also has random

lighting, camera trajectories, and textures. The scale of this dataset is well

suited for pre-training data-driven computer vision techniques from scratch

with RGB-D inputs, which previously has been limited by relatively small

labelled datasets in NYUv2 and SUN RGB-D. It also provides a basis for

investigating 3D scene labelling tasks by providing perfect camera poses and

depth data as proxy for a SLAM system. We host the dataset at

this http URL

Reflectance Adaptive Filtering Improves Intrinsic Image Estimation

Thomas Nestmeyer , Peter V. Gehler Subjects : Computer Vision and Pattern Recognition (cs.CV)

Separation of an input image into its reflectance and shading layers poses a

challenge for learning approaches because no large corpus of precise and

realistic ground truth decompositions exists. The Intrinsic Images in the Wild

dataset (IIW) provides a sparse set of relative human reflectance judgments,

which serves as a standard benchmark for intrinsic images. This dataset led to

an increase in methods that learn statistical dependencies between the images

and their reflectance layer. Although learning plays a role in pushing

state-of-the-art performance, we show that a standard signal processing

technique achieves performance on par with recent developments. We propose a

loss function that enables learning dense reflectance predictions with a CNN.

Our results show a simple pixel-wise decision, without any context or prior

knowledge, is sufficient to provide a strong baseline on IIW. This sets a

competitive bar and we find that only two approaches surpass this result. We

then develop a joint bilateral filtering method that implements strong prior

knowledge about reflectance constancy. This filtering operation can be applied

to any intrinsic image algorithm and we improve several previous results

achieving a new state-of-the-art on IIW. Our findings suggest that the effect

of learning-based approaches may be over-estimated and that it is still the use

of explicit prior knowledge that drives performance on intrinsic image


Objective Micro-Facial Movement Detection Using FACS-Based Regions and Baseline Evaluation

Adrian K. Davison , Cliff Lansley , Choon Ching Ng , Kevin Tan , Moi Hoon Yap Subjects : Computer Vision and Pattern Recognition (cs.CV)

Micro-facial expressions are regarded as an important human behavioural event

that can highlight emotional deception. Spotting these movements is difficult

for humans and machines, however research into using computer vision to detect

subtle facial expressions is growing in popularity. This paper proposes an

individualised baseline micro-movement detection method using 3D Histogram of

Oriented Gradients (3D HOG) temporal difference method. We define a face

template consisting of 26 regions based on the Facial Action Coding System

(FACS). We extract the temporal features of each region using 3D HOG. Then, we

use Chi-square distance to find subtle facial motion in the local regions.

Finally, an automatic peak detector is used to detect micro-movements above the

newly proposed adaptive baseline threshold. The performance is validated on two

FACS coded datasets: SAMM and CASME II. This objective method focuses on the

movement of the 26 face regions. When comparing with the ground truth, the best

result was an AUC of 0.7512 and 0.7261 on SAMM and CASME II, respectively. The

results show that 3D HOG outperformed for micro-movement detection, compared to

state-of-the-art feature representations: Local Binary Patterns in Three

Orthogonal Planes and Histograms of Oriented Optical Flow.

A Multilinear Tongue Model Derived from Speech Related MRI Data of the Human Vocal Tract

Alexander Hewer , Stefanie Wuhrer , Ingmar Steiner , Korin Richmond Subjects : Computer Vision and Pattern Recognition (cs.CV)

We present a multilinear statistical model of the human tongue that captures

anatomical and tongue pose related shape variations separately. The model was

derived from 3D magnetic resonance imaging data of 11 speakers sustaining

speech related vocal tract configurations. The extraction was performed by

using a minimally supervised method that uses as basis an image segmentation

approach and a template fitting technique. Furthermore, it uses image denoising

to deal with possibly corrupt data, palate surface information reconstruction

to handle palatal tongue contacts, and a bootstrap strategy to refine the

obtained shapes. Our experiments concluded that limiting the degrees of freedom

for the anatomical and speech related variations to 5 and 4 respectively

produces a model that can reliably register unknown data while avoiding

overfitting effects.

Development of a Real-time Colorectal Tumor Classification System for Narrow-band Imaging zoom-videoendoscopy

Tsubasa Hirakawa , Toru Tamaki , Bisser Raytchev , Kazufumi Kaneda , Tetsushi Koide , Shigeto Yoshida , Hiroshi Mieno , Shinji Tanaka

Comments: 9 pages, 8 figures



Computer Vision and Pattern Recognition (cs.CV)

Colorectal endoscopy is important for the early detection and treatment of

colorectal cancer and is used worldwide. A computer-aided diagnosis (CAD)

system that provides an objective measure to endoscopists during colorectal

endoscopic examinations would be of great value. In this study, we describe a

newly developed CAD system that provides real-time objective measures. Our

system captures the video stream from an endoscopic system and transfers it to

a desktop computer. The captured video stream is then classified by a

pretrained classifier and the results are displayed on a monitor. The

experimental results show that our developed system works efficiently in actual

endoscopic examinations and is medically significant.

Design of Image Matched Non-Separable Wavelet using Convolutional Neural Network

Naushad Ansari , Anubha Gupta , Rahul Duggal Subjects : Computer Vision and Pattern Recognition (cs.CV)

Image-matched nonseparable wavelets can find potential use in many

applications including image classification, segmen- tation, compressive

sensing, etc. This paper proposes a novel design methodology that utilizes

convolutional neural net- work (CNN) to design two-channel non-separable

wavelet matched to a given image. The design is proposed on quin- cunx lattice.

The loss function of the convolutional neural network is setup with total

squared error between the given input image to CNN and the reconstructed image

at the output of CNN, leading to perfect reconstruction at the end of train-

ing. Simulation results have been shown on some standard images.

Cloud Dictionary: Sparse Coding and Modeling for Point Clouds

Or Litany , Tal Remez , Alex Bronstein Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Graphics (cs.GR)

With the development of range sensors such as LIDAR and time-of-flight

cameras, 3D point cloud scans have become ubiquitous in computer vision

applications, the most prominent ones being gesture recognition and autonomous

driving. Parsimony-based algorithms have shown great success on images and

videos where data points are sampled on a regular Cartesian grid. We propose an

adaptation of these techniques to irregularly sampled signals by using

continuous dictionaries. We present an example application in the form of point

cloud denoising.

Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering

Hao Liu , Yang Yang , Fumin Shen , Lixin Duan , Heng Tao Shen Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Computation and Language (cs.CL)

Along with the prosperity of recurrent neural network in modelling sequential

data and the power of attention mechanism in automatically identify salient

information, image captioning, a.k.a., image description, has been remarkably

advanced in recent years. Nonetheless, most existing paradigms may suffer from

the deficiency of invariance to images with different scaling, rotation, etc.;

and effective integration of standalone attention to form a holistic end-to-end

system. In this paper, we propose a novel image captioning architecture, termed

Recurrent Image Captioner ( extbf{RIC}), which allows visual encoder and

language decoder to coherently cooperate in a recurrent manner. Specifically,

we first equip CNN-based visual encoder with a differentiable layer to enable

spatially invariant transformation of visual signals. Moreover, we deploy an

attention filter module (differentiable) between encoder and decoder to

dynamically determine salient visual parts. We also employ bidirectional LSTM

to preprocess sentences for generating better textual representations. Besides,

we propose to exploit variational inference to optimize the whole architecture.

Extensive experimental results on three benchmark datasets (i.e., Flickr8k,

Flickr30k and MS COCO) demonstrate the superiority of our proposed architecture

as compared to most of the state-of-the-art methods.

Regressing Robust and Discriminative 3D Morphable Models with a very Deep Neural Network

Anh Tuan Tran , Tal Hassner , Iacopo Masi , Gerard Medioni Subjects : Computer Vision and Pattern Recognition (cs.CV)

The 3D shapes of faces are well known to be discriminative. Yet despite this,

they are rarely used for face recognition and always under controlled viewing

conditions. We claim that this is a symptom of a serious but often overlooked

problem with existing methods for single view 3D face reconstruction: when

applied “in the wild”, their 3D estimates are either unstable and change for

different photos of the same subject or they are over-regularized and generic.

In response, we describe a robust method for regressing discriminative 3D

morphable face models (3DMM). We use a convolutional neural network (CNN) to

regress 3DMM shape and texture parameters directly from an input photo. We

overcome the shortage of training data required for this purpose by offering a

method for generating huge numbers of labeled examples. The 3D estimates

produced by our CNN surpass state of the art accuracy on the MICC data set.

Coupled with a 3D-3D face matching pipeline, we show the first competitive face

recognition results on the LFW, YTF and IJB-A benchmarks using 3D face shapes

as representations, rather than the opaque deep feature vectors used by other

modern systems.

Tinkering Under the Hood: Interactive Zero-Shot Learning with Net Surgery

Vivek Krishnan , Deva Ramanan Subjects : Computer Vision and Pattern Recognition (cs.CV)

We consider the task of visual net surgery, in which a CNN can be

reconfigured without extra data to recognize novel concepts that may be omitted

from the training set. While most prior work make use of linguistic cues for

such “zero-shot” learning, we do so by using a pictorial language

representation of the training set, implicitly learned by a CNN, to generalize

to new classes. To this end, we introduce a set of visualization techniques

that better reveal the activation patterns and relations between groups of CNN

filters. We next demonstrate that knowledge of pictorial languages can be used

to rewire certain CNN neurons into a part model, which we call a pictorial

language classifier. We demonstrate the robustness of simple PLCs by applying

them in a weakly supervised manner: labeling unlabeled concepts for visual

classes present in the training data. Specifically we show that a PLC built on

top of a CNN trained for ImageNet classification can localize humans in Graz-02

and determine the pose of birds in PASCAL-VOC without extra labeled data or

additional training. We then apply PLCs in an interactive zero-shot manner,

demonstrating that pictorial languages are expressive enough to detect a set of

visual classes in MS-COCO that never appear in the ImageNet training set.

Scale Coding Bag of Deep Features for Human Attribute and Action Recognition

Fahad Shahbaz Khan , Joost van de Weijer , Rao Muhammad Anwer , Andrew D. Bagdanov , Michael Felsberg , Jorma Laaksonen Subjects : Computer Vision and Pattern Recognition (cs.CV)

Most approaches to human attribute and action recognition in still images are

based on image representation in which multi-scale local features are pooled

across scale into a single, scale-invariant encoding. Both in bag-of-words and

the recently popular representations based on convolutional neural networks,

local features are computed at multiple scales. However, these multi-scale

convolutional features are pooled into a single scale-invariant representation.

We argue that entirely scale-invariant image representations are sub-optimal

and investigate approaches to scale coding within a Bag of Deep Features


Our approach encodes multi-scale information explicitly during the image

encoding stage. We propose two strategies to encode multi-scale information

explicitly in the final image representation. We validate our two scale coding

techniques on five datasets: Willow, PASCAL VOC 2010, PASCAL VOC 2012,

Stanford-40 and Human Attributes (HAT-27). On all datasets, the proposed scale

coding approaches outperform both the scale-invariant method and the standard

deep features of the same network. Further, combining our scale coding

approaches with standard deep features leads to consistent improvement over the


Border-Peeling Clustering

Nadav Bar , Hadar Averbuch-Elor , Daniel Cohen-Or

Comments: 9 pages, 9 figures, supplementary material added as ancillary file



Computer Vision and Pattern Recognition (cs.CV)

In this paper, we present a novel non-parametric clustering technique, which

is based on an iterative algorithm that peels off layers of points around the

clusters. Our technique is based on the notion that each latent cluster is

comprised of layers that surround its core, where the external layers, or

border points, implicitly separate the clusters. Analyzing the K-nearest

neighbors of the points makes it possible to identify the border points and

associate them with points of inner layers. Our clustering algorithm

iteratively identifies border points, peels them, and separates the latent

clusters. We show that the peeling process adapts to the local density and

successfully separates adjacent clusters. A notable quality of the

Border-Peeling algorithm is that it does not require any parameter tuning in

order to outperform state-of-the-art finely-tuned non-parametric clustering

methods, including Mean-Shift and DBSCAN. We further assess our technique on

high-dimensional datasets that vary in size and characteristics. In particular,

we analyze the space of deep features that were trained by a convolutional

neural network.

A fuzzy approach for segmentation of touching characters

Giuseppe Airò Farulla , Nadir Murru , Rosaria Rossini Subjects : Computer Vision and Pattern Recognition (cs.CV)

The problem of correctly segmenting touching characters is an hard task to

solve and it is of major relevance in pattern recognition. In the recent years,

many methods and algorithms have been proposed; still, a definitive solution is

far from being found. In this paper, we propose a novel method based on fuzzy

logic. The proposed method combines in a novel way three features for

segmenting touching characters that have been already proposed in other studies

but have been exploited only singularly so far. The proposed strategy is based

on a 3–input/1–output fuzzy inference system with fuzzy rules specifically

optimized for segmenting touching characters in the case of Latin printed and

handwritten characters. The system performances are illustrated and supported

by numerical examples showing that our approach can achieve a reasonable good

overall accuracy in segmenting characters even on tricky conditions of touching

characters. Moreover, numerical results suggest that the method can be applied

to many different datasets of characters by means of a convenient tuning of the

fuzzy sets and rules.

Temporal-Needle: A view and appearance invariant video descriptor

Michal Yarom , Michal Irani Subjects : Computer Vision and Pattern Recognition (cs.CV)

The ability to detect similar actions across videos can be very useful for

real-world applications in many fields. However, this task is still challenging

for existing systems, since videos that present the same action, can be taken

from significantly different viewing directions, performed by different actors

and backgrounds and under various video qualities. Video descriptors play a

significant role in these systems. In this work we propose the

“temporal-needle” descriptor which captures the dynamic behavior, while being

invariant to viewpoint and appearance. The descriptor is computed using multi

temporal scales of the video and by computing self-similarity for every patch

through time in every temporal scale. The descriptor is computed for every

pixel in the video. However, to find similar actions across videos, we consider

only a small subset of the descriptors – the statistical significant

descriptors. This allow us to find good correspondences across videos more

efficiently. Using the descriptor, we were able to detect the same behavior

across videos in a variety of scenarios. We demonstrate the use of the

descriptor in tasks such as temporal and spatial alignment, action detection

and even show its potential in unsupervised video clustering into categories.

In this work we handled only videos taken with stationary cameras, but the

descriptor can be extended to handle moving camera as well.

The More You Know: Using Knowledge Graphs for Image Classification

Kenneth Marino , Ruslan Salakhutdinov , Abhinav Gupta Subjects : Computer Vision and Pattern Recognition (cs.CV)

Humans have the remarkable capability to learn a large variety of visual

concepts, often with very few examples, whereas current state-of-the-art vision

algorithms require hundreds or thousands of examples per category and struggle

with ambiguity. One characteristic that sets humans apart is our ability to

acquire knowledge about the world and reason using this knowledge. This paper

investigates the use of structured prior knowledge in the form of knowledge

graphs and shows that using this knowledge improves performance on image

classification. Specifically, we introduce the Graph Search Neural Network as a

way of efficiently incorporating large knowledge graphs into a fully end-to-end

learning system. We show in a number of experiments that our method outperforms

baselines for multi-label classification, even under low data and few-shot


Coupling Adaptive Batch Sizes with Learning Rates

Lukas Balles , Javier Romero , Philipp Hennig Subjects : Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Mini-batch stochastic gradient descent and variants thereof have become

standard for large-scale empirical risk minimization like the training of

neural networks. These methods are usually used with a constant batch size

chosen by simple empirical inspection. The batch size significantly influences

the behavior of the stochastic optimization algorithm, though, since it

determines the variance of the gradient estimates. This variance also changes

over the optimization process; when using a constant batch size, stability and

convergence is thus often enforced by means of a (manually tuned) decreasing

learning rate schedule. We propose a practical method for dynamic batch size

adaptation. It estimates the variance of the stochastic gradients and adapts

the batch size to decrease the variance proportionally to the value of the

objective function, removing the need for the aforementioned learning rate

decrease. In contrast to recent related work, our algorithm couples the batch

size to the learning rate, directly reflecting the known relationship between

the two. On three image classification benchmarks, our batch size adaptation

yields faster optimization convergence, while simultaneously simplifying

learning rate tuning. A TensorFlow implementation is available.

Towards Score Following in Sheet Music Images

Matthias Dorfer , Andreas Arzt , Gerhard Widmer

Comments: Published In Proceedings of the 17th International Society for Music Information Retrieval Conference (2016)



Learning (cs.LG)

; Computer Vision and Pattern Recognition (cs.CV)

This paper addresses the matching of short music audio snippets to the

corresponding pixel location in images of sheet music. A system is presented

that simultaneously learns to read notes, listens to music and matches the

currently played music to its corresponding notes in the sheet. It consists of

an end-to-end multi-modal convolutional neural network that takes as input

images of sheet music and spectrograms of the respective audio snippets. It

learns to predict, for a given unseen audio snippet (covering approximately one

bar of music), the corresponding position in the respective score line. Our

results suggest that with the use of (deep) neural networks — which have

proven to be powerful image processing models — working with sheet music

becomes feasible and a promising future research direction.

Deep learning is effective for the classification of OCT images of normal versus Age-related Macular Degeneration

Cecilia S. Lee , Doug M. Baughman , Aaron Y. Lee

Comments: 4 Figures, 1 Table



Machine Learning (stat.ML)

; Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Objective: The advent of Electronic Medical Records (EMR) with large

electronic imaging databases along with advances in deep neural networks with

machine learning has provided a unique opportunity to achieve milestones in

automated image analysis. Optical coherence tomography (OCT) is the most

commonly obtained imaging modality in ophthalmology and represents a dense and

rich dataset when combined with labels derived from the EMR. We sought to

determine if deep learning could be utilized to distinguish normal OCT images

from images from patients with Age-related Macular Degeneration (AMD). Methods:

Automated extraction of an OCT imaging database was performed and linked to

clinical endpoints from the EMR. OCT macula scans were obtained by Heidelberg

Spectralis, and each OCT scan was linked to EMR clinical endpoints extracted

from EPIC. The central 11 images were selected from each OCT scan of two

cohorts of patients: normal and AMD. Cross-validation was performed using a

random subset of patients. Area under receiver operator curves (auROC) were

constructed at an independent image level, macular OCT level, and patient

level. Results: Of an extraction of 2.6 million OCT images linked to clinical

datapoints from the EMR, 52,690 normal and 48,312 AMD macular OCT images were

selected. A deep neural network was trained to categorize images as either

normal or AMD. At the image level, we achieved an auROC of 92.78% with an

accuracy of 87.63%. At the macula level, we achieved an auROC of 93.83% with an

accuracy of 88.98%. At a patient level, we achieved an auROC of 97.45% with an

accuracy of 93.45%. Peak sensitivity and specificity with optimal cutoffs were

92.64% and 93.69% respectively. Conclusions: Deep learning techniques are

effective for classifying OCT images. These findings have important

implications in utilizing OCT in automated screening and computer aided

diagnosis tools.

Artificial Intelligence

Ontohub: A semantic repository for heterogeneous ontologies

Mihai Codescu , Eugen Kuksa , Oliver Kutz , Till Mossakowski , Fabian Neuhaus

Comments: Preprint, journal special issue



Artificial Intelligence (cs.AI)

Ontohub is a repository engine for managing distributed heterogeneous

ontologies. The distributed nature enables communities to share and exchange

their contributions easily. The heterogeneous nature makes it possible to

integrate ontologies written in various ontology languages. Ontohub supports a

wide range of formal logical and ontology languages, as well as various

structuring and modularity constructs and inter-theory (concept) mappings,

building on the OMG-standardized DOL language. Ontohub repositories are

organised as Git repositories, thus inheriting all features of this popular

version control system. Moreover, Ontohub is the first repository engine

meeting a substantial amount of the requirements formulated in the context of

the Open Ontology Repository (OOR) initiative, including an API for federation

as well as support for logical inference and axiom selection.

Crowdsourced Outcome Determination in Prediction Markets

Rupert Freeman , Sebastien Lahaie , David M. Pennock Subjects : Artificial Intelligence (cs.AI) ; Computer Science and Game Theory (cs.GT)

A prediction market is a useful means of aggregating information about a

future event. To function, the market needs a trusted entity who will verify

the true outcome in the end. Motivated by the recent introduction of

decentralized prediction markets, we introduce a mechanism that allows for the

outcome to be determined by the votes of a group of arbiters who may themselves

hold stakes in the market. Despite the potential conflict of interest, we

derive conditions under which we can incentivize arbiters to vote truthfully by

using funds raised from market fees to implement a peer prediction mechanism.

Finally, we investigate what parameter values could be used in a real-world

implementation of our mechanism.

Collaborative creativity with Monte-Carlo Tree Search and Convolutional Neural Networks

Memo Akten , Mick Grierson

Comments: Presented at the Constructive Machine Learning workshop at NIPS 2016 as a poster and spotlight talk. 8 pages including 2 page references, 2 page appendix, 3 figures. Blog post (including videos) at this https URL



Artificial Intelligence (cs.AI)

We investigate a human-machine collaborative drawing environment in which an

autonomous agent sketches images while optionally allowing a user to directly

influence the agent’s trajectory. We combine Monte Carlo Tree Search with image

classifiers and test both shallow models (e.g. multinomial logistic regression)

and deep Convolutional Neural Networks (e.g. LeNet, Inception v3). We found

that using the shallow model, the agent produces a limited variety of images,

which are noticably recogonisable by humans. However, using the deeper models,

the agent produces a more diverse range of images, and while the agent remains

very confident (99.99%) in having achieved its objective, to humans they mostly

resemble unrecognisable ‘random’ noise. We relate this to recent research which

also discovered that ‘deep neural networks are easily fooled’ cite{Nguyen2015}

and we discuss possible solutions and future directions for the research.

Neural Networks for Joint Sentence Classification in Medical Paper Abstracts

Franck Dernoncourt , Ji Young Lee , Peter Szolovits Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Existing models based on artificial neural networks (ANNs) for sentence

classification often do not incorporate the context in which sentences appear,

and classify sentences individually. However, traditional sentence

classification approaches have been shown to greatly benefit from jointly

classifying subsequent sentences, such as with conditional random fields. In

this work, we present an ANN architecture that combines the effectiveness of

typical ANN models to classify sentences in isolation, with the strength of

structured prediction. Our model achieves state-of-the-art results on two

different datasets for sequential sentence classification in medical abstracts.

Improving Scalability of Reinforcement Learning by Separation of Concerns

Harm van Seijen , Mehdi Fatemi , Joshua Romoff Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI)

In this paper, we propose a framework for solving a single-agent task by

using multiple agents, each focusing on different aspects of the task. This

approach has two main advantages: 1) it allows for specialized agents for

different parts of the task, and 2) it provides a new way to transfer

knowledge, by transferring trained agents. Our framework generalizes the

traditional hierarchical decomposition, in which, at any moment in time, a

single agent has control until it has solved its particular subtask. We

illustrate our framework using a number of examples.

Adversarial Message Passing For Graphical Models

Theofanis Karaletsos

Comments: (12 pages, 2 figures) Presented at NIPS Advances In Approximate Inference 2016 (AABI 2016)



Machine Learning (stat.ML)

; Artificial Intelligence (cs.AI)

Bayesian inference on structured models typically relies on the ability to

infer posterior distributions of underlying hidden variables. However,

inference in implicit models or complex posterior distributions is hard. A

popular tool for learning implicit models are generative adversarial networks

(GANs) which learn parameters of generators by fooling discriminators.

Typically, GANs are considered to be models themselves and are not understood

in the context of inference. Current techniques rely on inefficient global

discrimination of joint distributions to perform learning, or only consider

discriminating a single output variable. We overcome these limitations by

treating GANs as a basis for likelihood-free inference in generative models and

generalize them to Bayesian posterior inference over factor graphs. We propose

local learning rules based on message passing minimizing a global divergence

criterion involving cooperating local adversaries used to sidestep explicit

likelihood evaluations. This allows us to compose models and yields a unified

inference and learning framework for adversarial learning. Our framework treats

model specification and inference separately and facilitates richly structured

models within the family of Directed Acyclic Graphs, including components such

as intractable likelihoods, non-differentiable models, simulators and generally

cumbersome models. A key result of our treatment is the insight that Bayesian

inference on structured models can be performed only with sampling and

discrimination when using nonparametric variational families, without access to

explicit distributions. As a side-result, we discuss the link to likelihood

maximization. These approaches hold promise to be useful in the toolbox of

probabilistic modelers and enrich the gamut of current probabilistic

programming applications.

TeKnowbase: Towards Construction of a Knowledge-base of Technical Concepts

Prajna Upadhyay , Tanuma Patra , Ashwini Purkar , Maya Ramanath Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)

In this paper, we describe the construction of TeKnowbase, a knowledge-base

of technical concepts in computer science. Our main information sources are

technical websites such as Webopedia and Techtarget as well as Wikipedia and

online textbooks. We divide the knowledge-base construction problem into two

parts — the acquisition of entities and the extraction of relationships among

these entities. Our knowledge-base consists of approximately 100,000 triples.

We conducted an evaluation on a sample of triples and report an accuracy of a

little over 90/%. We additionally conducted classification experiments on

StackOverflow data with features from TeKnowbase and achieved improved

classification accuracy.

Learning Through Dialogue Interactions

Jiwei Li , Alexander H. Miller , Sumit Chopra , Marc'Aurelio Ranzato , Jason Weston Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)

A good dialogue agent should have the ability to interact with users. In this

work, we explore this direction by designing a simulator and a set of synthetic

tasks in the movie domain that allow the learner to interact with a teacher by

both asking and answering questions. We investigate how a learner can benefit

from asking questions in both an offline and online reinforcement learning

setting. We demonstrate that the learner improves when asking questions. Our

work represents a first step in developing end-to-end learned interactive

dialogue agents.

Dynamical Kinds and their Discovery

Benjamin C. Jantzen

Comments: Accepted for the proceedings of the Causation: Foundation to Application Workshop, UAI 2016



Machine Learning (stat.ML)

; Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)

We demonstrate the possibility of classifying causal systems into kinds that

share a common structure without first constructing an explicit dynamical model

or using prior knowledge of the system dynamics. The algorithmic ability to

determine whether arbitrary systems are governed by causal relations of the

same form offers significant practical applications in the development and

validation of dynamical models. It is also of theoretical interest as an

essential stage in the scientific inference of laws from empirical data. The

algorithm presented is based on the dynamical symmetry approach to dynamical

kinds. A dynamical symmetry with respect to time is an intervention on one or

more variables of a system that commutes with the time evolution of the system.

A dynamical kind is a class of systems sharing a set of dynamical symmetries.

The algorithm presented classifies deterministic, time-dependent causal systems

by directly comparing their exhibited symmetries. Using simulated, noisy data

from a variety of nonlinear systems, we show that this algorithm correctly

sorts systems into dynamical kinds. It is robust under significant sampling

error, is immune to violations of normality in sampling error, and fails

gracefully with increasing dynamical similarity. The algorithm we demonstrate

is the first to address this aspect of automated scientific discovery.

Interpretable Semantic Textual Similarity: Finding and explaining differences between sentences

I. Lopez-Gazpio , M. Maritxalar , A. Gonzalez-Agirre , G. Rigau , L. Uria , E. Agirre

Comments: Preprint version, Knowledge-Based Systems (ISSN: 0950-7051). (2016)



Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI); Learning (cs.LG)

User acceptance of artificial intelligence agents might depend on their

ability to explain their reasoning, which requires adding an interpretability

layer that fa- cilitates users to understand their behavior. This paper focuses

on adding an in- terpretable layer on top of Semantic Textual Similarity (STS),

which measures the degree of semantic equivalence between two sentences. The

interpretability layer is formalized as the alignment between pairs of segments

across the two sentences, where the relation between the segments is labeled

with a relation type and a similarity score. We present a publicly available

dataset of sentence pairs annotated following the formalization. We then

develop a system trained on this dataset which, given a sentence pair, explains

what is similar and different, in the form of graded and typed segment

alignments. When evaluated on the dataset, the system performs better than an

informed baseline, showing that the dataset and task are well-defined and

feasible. Most importantly, two user studies show how the system output can be

used to automatically produce explanations in natural language. Users performed

better when having access to the explanations, pro- viding preliminary evidence

that our dataset and method to automatically produce explanations is useful in

real applications.

Information Retrieval

Using the Context of User Feedback in Recommender Systems

Ladislav Peska (Charles University in Prague, Faculty of Mathematics and Physics)

Comments: In Proceedings MEMICS 2016, arXiv:1612.04037

Journal-ref: EPTCS 233, 2016, pp. 1-12



Information Retrieval (cs.IR)

; Human-Computer Interaction (cs.HC)

Our work is generally focused on recommending for small or medium-sized

e-commerce portals, where explicit feedback is absent and thus the usage of

implicit feedback is necessary. Nonetheless, for some implicit feedback

features, the presentation context may be of high importance. In this paper, we

present a model of relevant contextual features affecting user feedback,

propose methods leveraging those features, publish a dataset of real e-commerce

users containing multiple user feedback indicators as well as its context and

finally present results of purchase prediction and recommendation experiments.

Off-line experiments with real users of a Czech travel agency website

corroborated the importance of leveraging presentation context in both purchase

prediction and recommendation tasks.

A Graph Summarization: A Survey

Yike Liu , Abhilash Dighe , Tara Safavi , Danai Koutra Subjects : Information Retrieval (cs.IR)

While advances in computing resources have made processing enormous amounts

of data possible, human ability to identify patterns in such data has not

scaled accordingly. Thus, efficient computational methods for condensing and

simplifying data are becoming vital for extracting actionable insights. In

particular, while data summarization techniques have been studied extensively,

only recently has summarizing interconnected data, or graphs, become popular.

This survey is a structured, comprehensive overview of the state-of-the-art

methods for summarizing graph data. We first broach the motivation behind and

the challenges of graph summarization. We then categorize summarization

approaches by the type of graphs taken as input and further organize each

category by core methodology. Finally, we discuss applications of summarization

on real-world graphs and conclude by describing some open problems in the


Towards End-to-End Audio-Sheet-Music Retrieval

Matthias Dorfer , Andreas Arzt , Gerhard Widmer

Comments: In NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop, Barcelona, Spain



Sound (cs.SD)

; Information Retrieval (cs.IR); Learning (cs.LG)

This paper demonstrates the feasibility of learning to retrieve short

snippets of sheet music (images) when given a short query excerpt of music

(audio) — and vice versa –, without any symbolic representation of music or

scores. This would be highly useful in many content-based musical retrieval

scenarios. Our approach is based on Deep Canonical Correlation Analysis (DCCA)

and learns correlated latent spaces allowing for cross-modality retrieval in

both directions. Initial experiments with relatively simple monophonic music

show promising results.

Computation and Language

Neural Networks for Joint Sentence Classification in Medical Paper Abstracts

Franck Dernoncourt , Ji Young Lee , Peter Szolovits Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Existing models based on artificial neural networks (ANNs) for sentence

classification often do not incorporate the context in which sentences appear,

and classify sentences individually. However, traditional sentence

classification approaches have been shown to greatly benefit from jointly

classifying subsequent sentences, such as with conditional random fields. In

this work, we present an ANN architecture that combines the effectiveness of

typical ANN models to classify sentences in isolation, with the strength of

structured prediction. Our model achieves state-of-the-art results on two

different datasets for sequential sentence classification in medical abstracts.

Building a robust sentiment lexicon with (almost) no resource

Mickael Rouvier , Benoit Favre Subjects : Computation and Language (cs.CL)

Creating sentiment polarity lexicons is labor intensive. Automatically

translating them from resourceful languages requires in-domain machine

translation systems, which rely on large quantities of bi-texts. In this paper,

we propose to replace machine translation by transferring words from the

lexicon through word embeddings aligned across languages with a simple linear

transform. The approach leads to no degradation, compared to machine

translation, when tested on sentiment polarity classification on tweets from

four languages.

Transition-based Parsing with Context Enhancement and Future Reward Reranking

Fugen Zhou , Fuxiang Wu , Zhengchen Zhang , Minghui Dong Subjects : Computation and Language (cs.CL)

This paper presents a novel reranking model, future reward reranking, to

re-score the actions in a transition-based parser by using a global scorer.

Different to conventional reranking parsing, the model searches for the best

dependency tree in all feasible trees constraining by a sequence of actions to

get the future reward of the sequence. The scorer is based on a first-order

graph-based parser with bidirectional LSTM, which catches different parsing

view compared with the transition-based parser. Besides, since context

enhancement has shown substantial improvement in the arc-stand transition-based

parsing over the parsing accuracy, we implement context enhancement on an

arc-eager transition-base parser with stack LSTMs, the dynamic oracle and

dropout supporting and achieve further improvement. With the global scorer and

context enhancement, the results show that UAS of the parser increases as much

as 1.20% for English and 1.66% for Chinese, and LAS increases as much as 1.32%

for English and 1.63% for Chinese. Moreover, we get state-of-the-art LASs,

achieving 87.58% for Chinese and 93.37% for English.

TeKnowbase: Towards Construction of a Knowledge-base of Technical Concepts

Prajna Upadhyay , Tanuma Patra , Ashwini Purkar , Maya Ramanath Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)

In this paper, we describe the construction of TeKnowbase, a knowledge-base

of technical concepts in computer science. Our main information sources are

technical websites such as Webopedia and Techtarget as well as Wikipedia and

online textbooks. We divide the knowledge-base construction problem into two

parts — the acquisition of entities and the extraction of relationships among

these entities. Our knowledge-base consists of approximately 100,000 triples.

We conducted an evaluation on a sample of triples and report an accuracy of a

little over 90/%. We additionally conducted classification experiments on

StackOverflow data with features from TeKnowbase and achieved improved

classification accuracy.

Learning Through Dialogue Interactions

Jiwei Li , Alexander H. Miller , Sumit Chopra , Marc'Aurelio Ranzato , Jason Weston Subjects : Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)

A good dialogue agent should have the ability to interact with users. In this

work, we explore this direction by designing a simulator and a set of synthetic

tasks in the movie domain that allow the learner to interact with a teacher by

both asking and answering questions. We investigate how a learner can benefit

from asking questions in both an offline and online reinforcement learning

setting. We demonstrate that the learner improves when asking questions. Our

work represents a first step in developing end-to-end learned interactive

dialogue agents.

Interpretable Semantic Textual Similarity: Finding and explaining differences between sentences

I. Lopez-Gazpio , M. Maritxalar , A. Gonzalez-Agirre , G. Rigau , L. Uria , E. Agirre

Comments: Preprint version, Knowledge-Based Systems (ISSN: 0950-7051). (2016)



Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI); Learning (cs.LG)

User acceptance of artificial intelligence agents might depend on their

ability to explain their reasoning, which requires adding an interpretability

layer that fa- cilitates users to understand their behavior. This paper focuses

on adding an in- terpretable layer on top of Semantic Textual Similarity (STS),

which measures the degree of semantic equivalence between two sentences. The

interpretability layer is formalized as the alignment between pairs of segments

across the two sentences, where the relation between the segments is labeled

with a relation type and a similarity score. We present a publicly available

dataset of sentence pairs annotated following the formalization. We then

develop a system trained on this dataset which, given a sentence pair, explains

what is similar and different, in the form of graded and typed segment

alignments. When evaluated on the dataset, the system performs better than an

informed baseline, showing that the dataset and task are well-defined and

feasible. Most importantly, two user studies show how the system output can be

used to automatically produce explanations in natural language. Users performed

better when having access to the explanations, pro- viding preliminary evidence

that our dataset and method to automatically produce explanations is useful in

real applications.

Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering

Hao Liu , Yang Yang , Fumin Shen , Lixin Duan , Heng Tao Shen Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Computation and Language (cs.CL)

Along with the prosperity of recurrent neural network in modelling sequential

data and the power of attention mechanism in automatically identify salient

information, image captioning, a.k.a., image description, has been remarkably

advanced in recent years. Nonetheless, most existing paradigms may suffer from

the deficiency of invariance to images with different scaling, rotation, etc.;

and effective integration of standalone attention to form a holistic end-to-end

system. In this paper, we propose a novel image captioning architecture, termed

Recurrent Image Captioner ( extbf{RIC}), which allows visual encoder and

language decoder to coherently cooperate in a recurrent manner. Specifically,

we first equip CNN-based visual encoder with a differentiable layer to enable

spatially invariant transformation of visual signals. Moreover, we deploy an

attention filter module (differentiable) between encoder and decoder to

dynamically determine salient visual parts. We also employ bidirectional LSTM

to preprocess sentences for generating better textual representations. Besides,

we propose to exploit variational inference to optimize the whole architecture.

Extensive experimental results on three benchmark datasets (i.e., Flickr8k,

Flickr30k and MS COCO) demonstrate the superiority of our proposed architecture

as compared to most of the state-of-the-art methods.

Distributed, Parallel, and Cluster Computing

Private Learning on Networks

Shripad Gade , Nitin H. Vaidya Subjects : Distributed, Parallel, and Cluster Computing (cs.DC) ; Learning (cs.LG); Optimization and Control (math.OC)

Continual data collection and widespread deployment of machine learning

algorithms, particularly the distributed variants, have raised new privacy

challenges. In a distributed machine learning scenario, the dataset is stored

among several machines and they solve a distributed optimization problem to

collectively learn the underlying model. We present a secure multi-party

computation inspired privacy preserving distributed algorithm for optimizing a

convex function consisting of several possibly non-convex functions. Each

individual objective function is privately stored with an agent while the

agents communicate model parameters with neighbor machines connected in a

network. We show that our algorithm can correctly optimize the overall

objective function and learn the underlying model accurately. We further prove

that under a vertex connectivity condition on the topology, our algorithm

preserves privacy of individual objective functions. We establish limits on the

what a coalition of adversaries can learn by observing the messages and states

shared over a network.

GentleRain+: Making GentleRain Robust on Clock Anomalies

Mohammad Roohitavaf , Sandeep Kulkarni Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)

Causal consistency is in an intermediate consistency model that can be

achieved together with high availability and high performance requirements even

in presence of network partitions. There are several proposals in the

literature for causally consistent data stores. Thanks to the use of single

scalar physical clocks, GentleRain has a throughput higher than other proposals

such as COPS or Orbe. However, both of its correctness and performance relay on

monotonic synchronized physical clocks. Specifically, if physical clocks go

backward its correctness is violated. In addition, GentleRain is sensitive on

the clock synchronization, and clock skew may slow write operations in

GenlteRain. In this paper, we want to solve this issue in GenlteRain by using

Hybrid Logical Clock (HLC) instead of physical clocks. Using HLC, GentleRain

protocl is not sensitive on the clock skew anymore. In addition, even if clocks

go backward, the correctness of the system is not violated. Furthermore, by

HLC, we timestamp versions with a clock very close to the physical clocks.

Thus, we can take causally consistency snapshot of the system at any give

physical time. We call GentleRain protocol with HLCs GentleRain+. We have

implemented GentleRain+ protocol, and have evaluated it experimentally.

GentleRain+ provides faster write operations compare to GentleRain that rely

solely on physical clocks to achieve causal consistency. We have also shown

that using HLC instead of physical clock does not have any overhead. Thus, it

makes GentleRain more robust on clock anomalies at no cost.

Scalable Byzantine Consensus via Hardware-assisted Secret Sharing

Jian Liu , Wenting Li , Ghassan O. Karame , N. Asokan

Comments: 11 pages, 10 figures



Cryptography and Security (cs.CR)

; Distributed, Parallel, and Cluster Computing (cs.DC)

The surging interest in blockchain technology has revitalized the search for

effective Byzantine consensus schemes. In particular, the blockchain community

has been looking for ways to effectively integrate traditional Byzantine

fault-tolerant (BFT) protocols into a blockchain consensus layer allowing

various financial institutions to securely agree on the order of transactions.

However, existing BFT protocols can only scale to tens of nodes due to their

(O(n^2)) message complexity.

In this paper, we propose FastBFT, the fastest and most scalable BFT protocol

to-date. At the heart of FastBFT is a novel message aggregation technique that

combines hardware-based trusted execution environments (TEEs) with lightweight

secret sharing primitives. Combining this technique with several other

optimizations (i.e., optimistic execution, tree topology and failure

detection), FastBFT achieves low latency and high throughput even for large

scale networks. Via systematic analysis and experiments, we demonstrate that

FastBFT has better scalability and performance than previous BFT protocols.

Efficient Distributed Semi-Supervised Learning using Stochastic Regularization over Affinity Graphs

Sunil Thulasidasan , Jeffrey Bilmes , Garrett Kenyon

Comments: NIPS 2016 Workshop on Machine Learning Systems



Machine Learning (stat.ML)

; Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

We describe a computationally efficient, stochastic graph-regularization

technique that can be utilized for the semi-supervised training of deep neural

networks in a parallel or distributed setting. We utilize a technique, first

described in [13] for the construction of mini-batches for stochastic gradient

descent (SGD) based on synthesized partitions of an affinity graph that are

consistent with the graph structure, but also preserve enough stochasticity for

convergence of SGD to good local minima. We show how our technique allows a

graph-based semi-supervised loss function to be decomposed into a sum over

objectives, facilitating data parallelism for scalable training of machine

learning models. Empirical results indicate that our method significantly

improves classification accuracy compared to the fully-supervised case when the

fraction of labeled data is low, and in the parallel case, achieves significant

speed-up in terms of wall-clock time to convergence. We show the results for

both sequential and distributed-memory semi-supervised DNN training on a speech



Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNN

Li Jing , Yichen Shen , Tena Dubček , John Peurifoy , Scott Skirlo , Max Tegmark , Marin Soljačić

Comments: 9 pages, 4 figures



Learning (cs.LG)

; Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

We present a method for implementing an Efficient Unitary Neural Network

(EUNN) whose computational complexity is merely (mathcal{O}(1)) per parameter

and has full tunability, from spanning part of unitary space to all of it. We

apply the EUNN in Recurrent Neural Networks, and test its performance on the

standard copying task and the MNIST digit recognition benchmark, finding that

it significantly outperforms a non-unitary RNN, an LSTM network, an exclusively

partial space URNN and a projective URNN with comparable parameter numbers.

Improving Scalability of Reinforcement Learning by Separation of Concerns

Harm van Seijen , Mehdi Fatemi , Joshua Romoff Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI)

In this paper, we propose a framework for solving a single-agent task by

using multiple agents, each focusing on different aspects of the task. This

approach has two main advantages: 1) it allows for specialized agents for

different parts of the task, and 2) it provides a new way to transfer

knowledge, by transferring trained agents. Our framework generalizes the

traditional hierarchical decomposition, in which, at any moment in time, a

single agent has control until it has solved its particular subtask. We

illustrate our framework using a number of examples.

Coupling Adaptive Batch Sizes with Learning Rates

Lukas Balles , Javier Romero , Philipp Hennig Subjects : Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Mini-batch stochastic gradient descent and variants thereof have become

standard for large-scale empirical risk minimization like the training of

neural networks. These methods are usually used with a constant batch size

chosen by simple empirical inspection. The batch size significantly influences

the behavior of the stochastic optimization algorithm, though, since it

determines the variance of the gradient estimates. This variance also changes

over the optimization process; when using a constant batch size, stability and

convergence is thus often enforced by means of a (manually tuned) decreasing

learning rate schedule. We propose a practical method for dynamic batch size

adaptation. It estimates the variance of the stochastic gradients and adapts

the batch size to decrease the variance proportionally to the value of the

objective function, removing the need for the aforementioned learning rate

decrease. In contrast to recent related work, our algorithm couples the batch

size to the learning rate, directly reflecting the known relationship between

the two. On three image classification benchmarks, our batch size adaptation

yields faster optimization convergence, while simultaneously simplifying

learning rate tuning. A TensorFlow implementation is available.

A Fully Convolutional Deep Auditory Model for Musical Chord Recognition

Filip Korzeniowski , Gerhard Widmer

Comments: In Proceedings of the 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), Vietro sul Mare, Italy



Learning (cs.LG)

; Sound (cs.SD)

Chord recognition systems depend on robust feature extraction pipelines.

While these pipelines are traditionally hand-crafted, recent advances in

end-to-end machine learning have begun to inspire researchers to explore

data-driven methods for such tasks. In this paper, we present a chord

recognition system that uses a fully convolutional deep auditory model for

feature extraction. The extracted features are processed by a Conditional

Random Field that decodes the final chord sequence. Both processing stages are

trained automatically and do not require expert knowledge for optimising

parameters. We show that the learned auditory system extracts musically

interpretable features, and that the proposed chord recognition system achieves

results on par or better than state-of-the-art algorithms.

Towards Score Following in Sheet Music Images

Matthias Dorfer , Andreas Arzt , Gerhard Widmer

Comments: Published In Proceedings of the 17th International Society for Music Information Retrieval Conference (2016)



Learning (cs.LG)

; Computer Vision and Pattern Recognition (cs.CV)

This paper addresses the matching of short music audio snippets to the

corresponding pixel location in images of sheet music. A system is presented

that simultaneously learns to read notes, listens to music and matches the

currently played music to its corresponding notes in the sheet. It consists of

an end-to-end multi-modal convolutional neural network that takes as input

images of sheet music and spectrograms of the respective audio snippets. It

learns to predict, for a given unseen audio snippet (covering approximately one

bar of music), the corresponding position in the respective score line. Our

results suggest that with the use of (deep) neural networks — which have

proven to be powerful image processing models — working with sheet music

becomes feasible and a promising future research direction.

A Data-Driven Compressive Sensing Framework Tailored For Energy-Efficient Wearable Sensing

Kai Xu , Yixing Li , Fengbo Ren

Comments: Accepted as an oral presentation in 42th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)



Learning (cs.LG)

; Information Theory (cs.IT)

Compressive sensing (CS) is a promising technology for realizing

energy-efficient wireless sensors for long-term health monitoring. However,

conventional model-driven CS frameworks suffer from limited compression ratio

and reconstruction quality when dealing with physiological signals due to

inaccurate models and the overlook of individual variability. In this paper, we

propose a data-driven CS framework that can learn signal characteristics and

personalized features from any individual recording of physiologic signals to

enhance CS performance with a minimized number of measurements. Such

improvements are accomplished by a co-training approach that optimizes the

sensing matrix and the dictionary towards improved restricted isometry property

and signal sparsity, respectively. Experimental results upon ECG signals show

that the proposed method, at a compression ratio of 10x, successfully reduces

the isometry constant of the trained sensing matrices by 86% against random

matrices and improves the overall reconstructed signal-to-noise ratio by 15dB

over conventional model-driven approaches.

Bayesian Optimization for Machine Learning : A Practical Guidebook

Ian Dewancker , Michael McCourt , Scott Clark Subjects : Learning (cs.LG)

The engineering of machine learning systems is still a nascent field; relying

on a seemingly daunting collection of quickly evolving tools and best

practices. It is our hope that this guidebook will serve as a useful resource

for machine learning practitioners looking to take advantage of Bayesian

optimization techniques. We outline four example machine learning problems that

can be solved using open source machine learning libraries, and highlight the

benefits of using Bayesian optimization in the context of these common machine

learning applications.

Constraint Selection in Metric Learning

Hoel Le Capitaine Subjects : Learning (cs.LG) ; Machine Learning (stat.ML)

A number of machine learning algorithms are using a metric, or a distance, in

order to compare individuals. The Euclidean distance is usually employed, but

it may be more efficient to learn a parametric distance such as Mahalanobis

metric. Learning such a metric is a hot topic since more than ten years now,

and a number of methods have been proposed to efficiently learn it. However,

the nature of the problem makes it quite difficult for large scale data, as

well as data for which classes overlap. This paper presents a simple way of

improving accuracy and scalability of any iterative metric learning algorithm,

where constraints are obtained prior to the algorithm. The proposed approach

relies on a loss-dependent weighted selection of constraints that are used for

learning the metric. Using the corresponding dedicated loss function, the

method clearly allows to obtain better results than state-of-the-art methods,

both in terms of accuracy and time complexity. Some experimental results on

real world, and potentially large, datasets are demonstrating the effectiveness

of our proposition.

Private Learning on Networks

Shripad Gade , Nitin H. Vaidya Subjects : Distributed, Parallel, and Cluster Computing (cs.DC) ; Learning (cs.LG); Optimization and Control (math.OC)

Continual data collection and widespread deployment of machine learning

algorithms, particularly the distributed variants, have raised new privacy

challenges. In a distributed machine learning scenario, the dataset is stored

among several machines and they solve a distributed optimization problem to

collectively learn the underlying model. We present a secure multi-party

computation inspired privacy preserving distributed algorithm for optimizing a

convex function consisting of several possibly non-convex functions. Each

individual objective function is privately stored with an agent while the

agents communicate model parameters with neighbor machines connected in a

network. We show that our algorithm can correctly optimize the overall

objective function and learn the underlying model accurately. We further prove

that under a vertex connectivity condition on the topology, our algorithm

preserves privacy of individual objective functions. We establish limits on the

what a coalition of adversaries can learn by observing the messages and states

shared over a network.

CSVideoNet: A Recurrent Convolutional Neural Network for Compressive Sensing Video Reconstruction

Kai Xu , Fengbo Ren

Comments: 10 pages, 6 pages, 2 tables



Computer Vision and Pattern Recognition (cs.CV)

; Learning (cs.LG)

In this paper, we develop a deep neural network architecture called

“CSVideoNet” that can learn visual representations from random measurements for

compressive sensing (CS) video reconstruction. CSVideoNet is an end-to-end

trainable and non-iterative model that combines convolutional neural networks

(CNNs) with a recurrent neural networks (RNN) to facilitate video

reconstruction by leveraging temporal-spatial features. The proposed network

can accept random measurements with a multi-level compression ratio (CR). The

lightly and aggressively compressed measurements offer background information

and object details, respectively. This is similar to the variable bit rate

techniques widely used in conventional video coding approaches. The RNN

employed by CSVideoNet can leverage temporal coherence that exists in adjacent

video frames to extrapolate motion features and merge them with spatial visual

features extracted by the CNNs to further enhance reconstruction quality,

especially at high CRs. We test our CSVideoNet on the UCF-101 dataset.

Experimental results show that CSVideoNet outperforms the existing video CS

reconstruction approaches. The results demonstrate that our method can preserve

relatively excellent visual details from original videos even at a 100x CR,

which is difficult to realize with the reference approaches. Also, the

non-iterative nature of CSVideoNet results in an decrease in runtime by three

orders of magnitude over iterative reconstruction algorithms. Furthermore,

CSVideoNet can enhance the CR of CS cameras beyond the limitation of

conventional approaches, ensuring a reduction in bandwidth for data

transmission. These benefits are especially favorable to high-frame-rate video


On the Potential of Simple Framewise Approaches to Piano Transcription

Rainer Kelz , Matthias Dorfer , Filip Korzeniowski , Sebastian Böck , Andreas Arzt , Gerhard Widmer

Comments: Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR 2016), New York, NY



Sound (cs.SD)

; Learning (cs.LG)

In an attempt at exploring the limitations of simple approaches to the task

of piano transcription (as usually defined in MIR), we conduct an in-depth

analysis of neural network-based framewise transcription. We systematically

compare different popular input representations for transcription systems to

determine the ones most suitable for use with neural networks. Exploiting

recent advances in training techniques and new regularizers, and taking into

account hyper-parameter tuning, we show that it is possible, by simple

bottom-up frame-wise processing, to obtain a piano transcriber that outperforms

the current published state of the art on the publicly available MAPS dataset

— without any complex post-processing steps. Thus, we propose this simple

approach as a new baseline for this dataset, for future transcription research

to build on and improve.

Towards End-to-End Audio-Sheet-Music Retrieval

Matthias Dorfer , Andreas Arzt , Gerhard Widmer

Comments: In NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop, Barcelona, Spain



Sound (cs.SD)

; Information Retrieval (cs.IR); Learning (cs.LG)

This paper demonstrates the feasibility of learning to retrieve short

snippets of sheet music (images) when given a short query excerpt of music

(audio) — and vice versa –, without any symbolic representation of music or

scores. This would be highly useful in many content-based musical retrieval

scenarios. Our approach is based on Deep Canonical Correlation Analysis (DCCA)

and learns correlated latent spaces allowing for cross-modality retrieval in

both directions. Initial experiments with relatively simple monophonic music

show promising results.

Feature Learning for Chord Recognition: The Deep Chroma Extractor

Filip Korzeniowski , Gerhard Widmer

Comments: In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), New York, USA, 2016



Sound (cs.SD)

; Learning (cs.LG)

We explore frame-level audio feature learning for chord recognition using

artificial neural networks. We present the argument that chroma vectors

potentially hold enough information to model harmonic content of audio for

chord recognition, but that standard chroma extractors compute too noisy

features. This leads us to propose a learned chroma feature extractor based on

artificial neural networks. It is trained to compute chroma features that

encode harmonic information important for chord recognition, while being robust

to irrelevant interferences. We achieve this by feeding the network an audio

spectrum with context instead of a single frame as input. This way, the network

can learn to selectively compensate noise and resolve harmonic ambiguities.

We compare the resulting features to hand-crafted ones by using a simple

linear frame-wise classifier for chord recognition on various data sets. The

results show that the learned feature extractor produces superior chroma

vectors for chord recognition.

Graphical RNN Models

Ashish Bora , Sugato Basu , Joydeep Ghosh Subjects : Neural and Evolutionary Computing (cs.NE) ; Learning (cs.LG)

Many time series are generated by a set of entities that interact with one

another over time. This paper introduces a broad, flexible framework to learn

from multiple inter-dependent time series generated by such entities. Our

framework explicitly models the entities and their interactions through time.

It achieves this by building on the capabilities of Recurrent Neural Networks,

while also offering several ways to incorporate domain knowledge/constraints

into the model architecture. The capabilities of our approach are showcased

through an application to weather prediction, which shows gains over strong


Optimal structure and parameter learning of Ising models

Andrey Y. Lokhov , Marc Vuffray , Sidhant Misra , Michael Chertkov

Comments: 4 pages, 11 pages of supplementary information



Statistical Mechanics (cond-mat.stat-mech)

; Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

Reconstruction of structure and parameters of a graphical model from binary

samples is a problem of practical importance in a variety of disciplines,

ranging from statistical physics and computational biology to image processing

and machine learning. The focus of the research community shifted towards

developing universal reconstruction algorithms which are both computationally

efficient and require the minimal amount of expensive data. We introduce a new

method, Interaction Screening, which accurately estimates the model parameters

using local optimization problems. The algorithm provably achieves perfect

graph structure recovery with an information-theoretically optimal number of

samples and outperforms state of the art techniques, especially in the

low-temperature regime which is known to be the hardest for learning. We assess

the efficacy of Interaction Screening through extensive numerical tests on

Ising models of various topologies and with different types of interactions,

ranging from ferromagnetic to spin-glass.

Graph-based semi-supervised learning for relational networks

Leto Peel

Comments: 11 pages, 8 figures



Social and Information Networks (cs.SI)

; Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

We address the problem of semi-supervised learning in relational networks,

networks in which nodes are entities and links are the relationships or

interactions between them. Typically this problem is confounded with the

problem of graph-based semi-supervised learning (GSSL), because both problems

represent the data as a graph and predict the missing class labels of nodes.

However, not all graphs are created equally. In GSSL a graph is constructed,

often from independent data, based on similarity. As such, edges tend to

connect instances with the same class label. Relational networks, however, can

be more heterogeneous and edges do not always indicate similarity. For

instance, instead of links being more likely to connect nodes with the same

class label, they may occur more frequently between nodes with different class

labels (link-heterogeneity). Or nodes with the same class label do not

necessarily have the same type of connectivity across the whole network

(class-heterogeneity), e.g. in a network of sexual interactions we may observe

links between opposite genders in some parts of the graph and links between the

same genders in others. Performing classification in networks with different

types of heterogeneity is a hard problem that is made harder still when we do

not know a-priori the type or level of heterogeneity. Here we present two

scalable approaches for graph-based semi-supervised learning for the more

general case of relational networks. We demonstrate these approaches on

synthetic and real-world networks that display different link patterns within

and between classes. Compared to state-of-the-art approaches, ours give better

classification performance without prior knowledge of how classes interact. In

particular, our two-step label propagation algorithm gives consistently good

accuracy and runs on networks of over 1.6 million nodes and 30 million edges in

around 12 seconds.

Improving Neural Network Generalization by Combining Parallel Circuits with Dropout

Kien Tuong Phan , Tomas Henrique Maul , Tuong Thuy Vu , Lai Weng Kin

Comments: Pre-print. The final publication is available at Springer via this http URL



Neural and Evolutionary Computing (cs.NE)

; Learning (cs.LG)

In an attempt to solve the lengthy training times of neural networks, we

proposed Parallel Circuits (PCs), a biologically inspired architecture.

Previous work has shown that this approach fails to maintain generalization

performance in spite of achieving sharp speed gains. To address this issue, and

motivated by the way Dropout prevents node co-adaption, in this paper, we

suggest an improvement by extending Dropout to the PC architecture. The paper

provides multiple insights into this combination, including a variety of fusion

approaches. Experiments show promising results in which improved error rates

are achieved in most cases, whilst maintaining the speed advantage of the PC


Dynamical Kinds and their Discovery

Benjamin C. Jantzen

Comments: Accepted for the proceedings of the Causation: Foundation to Application Workshop, UAI 2016



Machine Learning (stat.ML)

; Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)

We demonstrate the possibility of classifying causal systems into kinds that

share a common structure without first constructing an explicit dynamical model

or using prior knowledge of the system dynamics. The algorithmic ability to

determine whether arbitrary systems are governed by causal relations of the

same form offers significant practical applications in the development and

validation of dynamical models. It is also of theoretical interest as an

essential stage in the scientific inference of laws from empirical data. The

algorithm presented is based on the dynamical symmetry approach to dynamical

kinds. A dynamical symmetry with respect to time is an intervention on one or

more variables of a system that commutes with the time evolution of the system.

A dynamical kind is a class of systems sharing a set of dynamical symmetries.

The algorithm presented classifies deterministic, time-dependent causal systems

by directly comparing their exhibited symmetries. Using simulated, noisy data

from a variety of nonlinear systems, we show that this algorithm correctly

sorts systems into dynamical kinds. It is robust under significant sampling

error, is immune to violations of normality in sampling error, and fails

gracefully with increasing dynamical similarity. The algorithm we demonstrate

is the first to address this aspect of automated scientific discovery.

Semi-Supervised Phone Classification using Deep Neural Networks and Stochastic Graph-Based Entropic Regularization

Sunil Thulasidasan , Jeffrey Bilmes

Comments: InterSpeech Workshop on Machine Learning in Speech and Language Processing, 2016



Machine Learning (stat.ML)

; Learning (cs.LG)

We describe a graph-based semi-supervised learning framework in the context

of deep neural networks that uses a graph-based entropic regularizer to favor

smooth solutions over a graph induced by the data. The main contribution of

this work is a computationally efficient, stochastic graph-regularization

technique that uses mini-batches that are consistent with the graph structure,

but also provides enough stochasticity (in terms of mini-batch data diversity)

for convergence of stochastic gradient descent methods to good solutions. For

this work, we focus on results of frame-level phone classification accuracy on

the TIMIT speech corpus but our method is general and scalable to much larger

data sets. Results indicate that our method significantly improves

classification accuracy compared to the fully-supervised case when the fraction

of labeled data is low, and it is competitive with other methods in the fully

labeled case.

Efficient Distributed Semi-Supervised Learning using Stochastic Regularization over Affinity Graphs

Sunil Thulasidasan , Jeffrey Bilmes , Garrett Kenyon

Comments: NIPS 2016 Workshop on Machine Learning Systems



Machine Learning (stat.ML)

; Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

We describe a computationally efficient, stochastic graph-regularization

technique that can be utilized for the semi-supervised training of deep neural

networks in a parallel or distributed setting. We utilize a technique, first

described in [13] for the construction of mini-batches for stochastic gradient

descent (SGD) based on synthesized partitions of an affinity graph that are

consistent with the graph structure, but also preserve enough stochasticity for

convergence of SGD to good local minima. We show how our technique allows a

graph-based semi-supervised loss function to be decomposed into a sum over

objectives, facilitating data parallelism for scalable training of machine

learning models. Empirical results indicate that our method significantly

improves classification accuracy compared to the fully-supervised case when the

fraction of labeled data is low, and in the parallel case, achieves significant

speed-up in terms of wall-clock time to convergence. We show the results for

both sequential and distributed-memory semi-supervised DNN training on a speech


Deep learning is effective for the classification of OCT images of normal versus Age-related Macular Degeneration

Cecilia S. Lee , Doug M. Baughman , Aaron Y. Lee

Comments: 4 Figures, 1 Table



Machine Learning (stat.ML)

; Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Objective: The advent of Electronic Medical Records (EMR) with large

electronic imaging databases along with advances in deep neural networks with

machine learning has provided a unique opportunity to achieve milestones in

automated image analysis. Optical coherence tomography (OCT) is the most

commonly obtained imaging modality in ophthalmology and represents a dense and

rich dataset when combined with labels derived from the EMR. We sought to

determine if deep learning could be utilized to distinguish normal OCT images

from images from patients with Age-related Macular Degeneration (AMD). Methods:

Automated extraction of an OCT imaging database was performed and linked to

clinical endpoints from the EMR. OCT macula scans were obtained by Heidelberg

Spectralis, and each OCT scan was linked to EMR clinical endpoints extracted

from EPIC. The central 11 images were selected from each OCT scan of two

cohorts of patients: normal and AMD. Cross-validation was performed using a

random subset of patients. Area under receiver operator curves (auROC) were

constructed at an independent image level, macular OCT level, and patient

level. Results: Of an extraction of 2.6 million OCT images linked to clinical

datapoints from the EMR, 52,690 normal and 48,312 AMD macular OCT images were

selected. A deep neural network was trained to categorize images as either

normal or AMD. At the image level, we achieved an auROC of 92.78% with an

accuracy of 87.63%. At the macula level, we achieved an auROC of 93.83% with an

accuracy of 88.98%. At a patient level, we achieved an auROC of 97.45% with an

accuracy of 93.45%. Peak sensitivity and specificity with optimal cutoffs were

92.64% and 93.69% respectively. Conclusions: Deep learning techniques are

effective for classifying OCT images. These findings have important

implications in utilizing OCT in automated screening and computer aided

diagnosis tools.

Interpretable Semantic Textual Similarity: Finding and explaining differences between sentences

I. Lopez-Gazpio , M. Maritxalar , A. Gonzalez-Agirre , G. Rigau , L. Uria , E. Agirre

Comments: Preprint version, Knowledge-Based Systems (ISSN: 0950-7051). (2016)



Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI); Learning (cs.LG)

User acceptance of artificial intelligence agents might depend on their

ability to explain their reasoning, which requires adding an interpretability

layer that fa- cilitates users to understand their behavior. This paper focuses

on adding an in- terpretable layer on top of Semantic Textual Similarity (STS),

which measures the degree of semantic equivalence between two sentences. The

interpretability layer is formalized as the alignment between pairs of segments

across the two sentences, where the relation between the segments is labeled

with a relation type and a similarity score. We present a publicly available

dataset of sentence pairs annotated following the formalization. We then

develop a system trained on this dataset which, given a sentence pair, explains

what is similar and different, in the form of graded and typed segment

alignments. When evaluated on the dataset, the system performs better than an

informed baseline, showing that the dataset and task are well-defined and

feasible. Most importantly, two user studies show how the system output can be

used to automatically produce explanations in natural language. Users performed

better when having access to the explanations, pro- viding preliminary evidence

that our dataset and method to automatically produce explanations is useful in

real applications.

Uncovering the Dynamics of Crowdlearning and the Value of Knowledge

Utkarsh Upadhyay , Isabel Valera , Manuel Gomez-Rodriguez

Comments: To appear in Tenth ACM International conference on Web Search and Data Mining (WSDM) in 2017



Social and Information Networks (cs.SI)

; Learning (cs.LG); Physics and Society (physics.soc-ph); Machine Learning (stat.ML)

Learning from the crowd has become increasingly popular in the Web and social

media. There is a wide variety of crowdlearning sites in which, on the one

hand, users learn from the knowledge that other users contribute to the site,

and, on the other hand, knowledge is reviewed and curated by the same users

using assessment measures such as upvotes or likes.

In this paper, we present a probabilistic modeling framework of

crowdlearning, which uncovers the evolution of a user’s expertise over time by

leveraging other users’ assessments of her contributions. The model allows for

both off-site and on-site learning and captures forgetting of knowledge. We

then develop a scalable estimation method to fit the model parameters from

millions of recorded learning and contributing events. We show the

effectiveness of our model by tracing activity of ~25 thousand users in Stack

Overflow over a 4.5 year period. We find that answers with high knowledge value

are rare. Newbies and experts tend to acquire less knowledge than users in the

middle range. Prolific learners tend to be also proficient contributors that

post answers with high knowledge value.

Information Theory

Lossy Transmission of Correlated Sources over a Multiple Access Channel: Necessary Conditions and Separation Results

Basak Guler , Deniz Gunduz , Aylin Yener

Comments: Submitted to IEEE Transactions on Information Theory on Nov 30, 2016



Information Theory (cs.IT)

Lossy communication of correlated sources over a multiple access channel is

studied. First, lossy communication is investigated in the presence of

correlated decoder side information. An achievable joint source-channel coding

scheme is presented, and the conditions under which separate source and channel

coding is optimal are explored. It is shown that separation is optimal when the

encoders and the decoder have access to a common observation conditioned on

which the two sources are independent. Separation is shown to be optimal also

when only the encoders have access to such a common observation whose lossless

recovery is required at the decoder. Moreover, the optimality of separation is

shown for sources with a common part, and sources with reconstruction

constraints. Next, these results obtained for the system in presence of side

information are utilized to provide a set of necessary conditions for the

transmission of correlated sources over a multiple access channel without side

information. The identified necessary conditions are specialized to the case of

bivariate Gaussian sources over a Gaussian multiple access channel, and are

shown to be tighter than known results in the literature in certain cases. Our

results indicate that side information can have a significant impact on the

optimality of source-channel separation in lossy transmission, in addition to

being instrumental in identifying necessary conditions for the transmission of

correlated sources when no side information is present.

Privacy-Protecting Energy Management Unit through Model-Distribution Predictive Control

Jun-Xing Chin , Tomas Tinoco De Rubira , Gabriela Hug

Comments: Pre-print, submitted for review



Information Theory (cs.IT)

; Optimization and Control (math.OC)

The roll-out of smart meters in electricity networks introduces risks for

consumer privacy due to increased measurement frequency and granularity.

Through various Non-Intrusive Load Monitoring techniques, consumer behavior may

be inferred from their metering data. In this paper, we propose an energy

management method that protects privacy through the minimization of information

leakage. The method is based on a Model Predictive Controller that utilizes

energy storage and local generation, and that predicts the effects of its

actions on the statistics of the actual energy consumption of a consumer and

that seen by the grid. Computationally, the method requires solving a

Mixed-Integer Quadratic Program of manageable size whenever new meter readings

are available. We simulate the controller on generated residential load

profiles with different privacy costs in a two-tier time-of-use energy pricing

environment. Results show that information leakage is effectively reduced at

the expense of increased energy cost. The results also show that, using the

proposed controller, the consumer load profile seen by the grid resembles a

mixture between that obtained with Non-Intrusive Load Leveling and Lazy


Variations of the McEliece Cryptosystem

Jessalyn Bolkema , Heide Gluesing-Luerssen , Christine A. Kelley , Kristin Lauter , Beth Malmskog , Joachim Rosenthal Subjects : Information Theory (cs.IT) ; Cryptography and Security (cs.CR)

Two variations of the McEliece cryptosystem are presented. The first one is

based on a relaxation of the column permutation in the classical McEliece

scrambling process. This is done in such a way that the Hamming weight of the

error, added in the encryption process, can be controlled so that efficient

decryption remains possible. The second variation is based on the use of

spatially coupled moderate-density parity-check codes as secret codes. These

codes are known for their excellent error-correction performance and allow for

a relatively low key size in the cryptosystem. For both variants the security

with respect to known attacks is discussed.

QoS-Based Linear Transceiver Optimization for Full-Duplex Multi-User Communications

Tsung-Hui Chang , Ya-Feng Liu , Shih-Chun Lin

Comments: submitted for publication



Information Theory (cs.IT)

In this paper, we consider a multi-user wireless system with one full duplex

(FD) base station (BS) serving a set of half duplex (HD) mobile users.To cope

with the in-band self-interference (SI) and co-channel interference, we

formulate a quality-of-service (QoS) based linear transceiver design problem.

The problem jointly optimizes the downlink (DL) and uplink (UL) beamforming

vectors of the BS and the transmission powers of UL users so as to provide both

the DL and UL users with guaranteed signal-to-interference-plus-noise ratio

performance, using a minimum UL and DL transmission sum power.The considered

system model not only takes into account noise caused by non-ideal RF circuits,

analog/digital SI cancellation but also constrains the maximum signal power at

the input of the analog-to-digital converter (ADC) for avoiding signal

distortion due to finite ADC precision. The formulated design problem is not

convex and challenging to solve in general. We first show that for a special

case where the SI channel estimation errors are independent and identically

distributed, the QoS-based linear transceiver design problem is globally

solvable by a polynomial-time bisection algorithm.For the general case, we

propose a suboptimal algorithm based on alternating optimization (AO). The AO

algorithm is guaranteed to converge to a Karush-Kuhn-Tucker solution.To reduce

the complexity of the AO algorithm, we further develop a fixed-point method by

extending the classical uplink-downlink duality in HD systems to the FD

system.Simulation results are presented to demonstrate the performance of the

proposed algorithms and the comparison with HD systems.

Antenna Selection for MIMO Non-orthogonal Multiple Access Systems

Yuehua Yu , He Chen , Yonghui Li , Zhiguo Ding , Branka Vucetic

Comments: Submitted for possible journal publication



Information Theory (cs.IT)

This paper considers the joint antenna selection (AS) problem for a classical

two-user MIMO non-orthogonal multiple access (NOMA) system, where both the base

station (BS) and users (UEs) are equipped with multiple antennas. Specifically,

several computationally-efficient AS algorithms are developed for two

commonly-used NOMA scenarios: fixed power allocation NOMA (F-NOMA) and

cognitive radio-inspired NOMA (CR-NOMA). For the F-NOMA system, two novel AS

schemes, namely max-max-max AS (A(^3)-AS) and max-min-max AS (AIA-AS), are

proposed to maximize the system sum-rate, without and with the consideration of

user fairness, respectively. In the CR-NOMA network, a novel AS algorithm,

termed maximum-channel-gain-based AS (MCG-AS), is proposed to maximize the

achievable rate of the secondary user, under the condition that the primary

user’s quality of service requirement is satisfied. The asymptotic closed-form

expressions of the average sum-rate for A(^3)-AS and AIA-AS and that of the

average rate of the secondary user for MCG-AS are derived, respectively.

Numerical results demonstrate that the AIA-AS provides better user-fairness,

while the A(^3)-AS achieves a near-optimal sum-rate in F-NOMA systems. For the

CR-NOMA scenario, MCG-AS achieves a near-optimal performance in a wide SNR

regime. Furthermore, all the proposed AS algorithms yield a significant

computational complexity reduction, compared to exhaustive search-based


Optical Adaptive Precoding for Visible Light Communications

Hanaa Marshoud , Paschalis C. Sofotasios , Sami Muhaidat , Bayan S. Sharif , George K. Karagiannidis Subjects : Information Theory (cs.IT)

Multiple-input multiple-output (MIMO) techniques have recently demonstrated

significant potentials in visible light communications (VLC), as they can

overcome the modulation bandwidth limitation and provide substantial

improvement in terms of spectral efficiency and link reliability. However, MIMO

systems typically suffer from inter-channel interference, which causes severe

degradation to the system performance. In this context, we propose a novel

optical adaptive precoding (OAP) scheme for the downlink of MIMO VLC systems,

which exploits the knowledge of transmitted symbols to enhance the effective

signal-to-interference-plus-noise ratio. We also derive bit-error-rate

expressions for the OAP under perfect and outdated channel state information

(CSI). Our results demonstrate that the proposed scheme is more robust to both

CSI error and channel correlation, compared to conventional channel inversion


State Estimation with Secrecy against Eavesdroppers

Anastasios Tsiamis , Konstantinos Gatsis , George J. Pappas Subjects : Systems and Control (cs.SY) ; Cryptography and Security (cs.CR); Information Theory (cs.IT)

We study the problem of remote state estimation, in the presence of an

eavesdropper. An authorized user estimates the state of a linear plant, based

on the data received from a sensor, while the data may also be intercepted by

the eavesdropper. To maintain confidentiality with respect to state, we

introduce a novel control-theoretic definition of perfect secrecy requiring

that the user’s expected error remains bounded while the eavesdropper’s

expected error grows unbounded. We propose a secrecy mechanism which guarantees

perfect secrecy by randomly withholding sensor information, under the condition

that the user’s packet reception rate is larger than the eavesdropper’s

interception rate. Given this mechanism, we also explore the tradeoff between

user’s utility and confidentiality with respect to the eavesdropper, via an

optimization problem. Finally, some examples are studied to provide insights

about this tradeoff.

A Data-Driven Compressive Sensing Framework Tailored For Energy-Efficient Wearable Sensing

Kai Xu , Yixing Li , Fengbo Ren

Comments: Accepted as an oral presentation in 42th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)



Learning (cs.LG)

; Information Theory (cs.IT)

Compressive sensing (CS) is a promising technology for realizing

energy-efficient wireless sensors for long-term health monitoring. However,

conventional model-driven CS frameworks suffer from limited compression ratio

and reconstruction quality when dealing with physiological signals due to

inaccurate models and the overlook of individual variability. In this paper, we

propose a data-driven CS framework that can learn signal characteristics and

personalized features from any individual recording of physiologic signals to

enhance CS performance with a minimized number of measurements. Such

improvements are accomplished by a co-training approach that optimizes the

sensing matrix and the dictionary towards improved restricted isometry property

and signal sparsity, respectively. Experimental results upon ECG signals show

that the proposed method, at a compression ratio of 10x, successfully reduces

the isometry constant of the trained sensing matrices by 86% against random

matrices and improves the overall reconstructed signal-to-noise ratio by 15dB

over conventional model-driven approaches.


arXiv Paper Daily: Fri, 16 Dec 2016



原文  https://www.52ml.net/21418.html