转载

arXiv Paper Daily: Thu, 12 Jan 2017

Neural and Evolutionary Computing

OpenNMT: Open-Source Toolkit for Neural Machine Translation

Guillaume Klein , Yoon Kim , Yuntian Deng , Jean Senellart , Alexander M. Rush

Comments: Report for this http URL

Subjects

Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

We describe an open-source toolkit for neural machine translation (NMT). The

toolkit prioritizes efficiency, modularity, and extensibility with the goal of

supporting NMT research into model architectures, feature representations, and

source modalities, while maintaining competitive performance and reasonable

training requirements. The toolkit consists of modeling and translation

support, as well as detailed pedagogical documentation about the underlying

techniques.

Computer Vision and Pattern Recognition

A More General Robust Loss Function

Jonathan T. Barron Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Learning (cs.LG); Machine Learning (stat.ML)

We present a loss function which can be viewed as a generalization of many

popular loss functions used in robust statistics: the Cauchy/Lorentzian,

Welsch, and generalized Charbonnier loss functions (and by transitivity the L2,

L1, L1-L2, and pseudo-Huber/Charbonnier loss functions). We describe and

visualize this loss, and document several of its useful properties.

CNN-based Segmentation of Medical Imaging Data

Baris Kayalibay , Grady Jensen , Patrick van der Smagt

Comments: 24 pages

Subjects

Computer Vision and Pattern Recognition (cs.CV)

Convolutional neural networks have been applied to a wide variety of computer

vision tasks. Recent advances in semantic segmentation have enabled their

application to medical image segmentation. While most CNNs use two-dimensional

kernels, recent CNN-based publications on medical image segmentation featured

three-dimensional kernels, allowing full access to the three-dimensional

structure of medical images. Though closely related to semantic segmentation,

medical image segmentation includes specific challenges that need to be

addressed, such as the scarcity of labelled data, the high class imbalance

found in the ground truth and the high memory demand of three-dimensional

images. In this work, a CNN-based method with three-dimensional filters is

demonstrated and applied to hand and brain MRI. Two modifications to an

existing CNN architecture are discussed, along with methods on addressing the

aforementioned challenges. While most of the existing literature on medical

image segmentation focuses on soft tissue and the major organs, this work is

validated on data both from the central nervous system as well as the bones of

the hand.

Revisiting Deep Image Smoothing and Intrinsic Image Decomposition

Qingnan Fan , David Wipf , Gang Hua , Baoquan Chen Subjects : Computer Vision and Pattern Recognition (cs.CV)

We propose an image smoothing approximation and intrinsic image decomposition

method based on a modified convolutional neural network architecture applied

directly to the original color image. Our network has a very large receptive

field equipped with at least 20 convolutional layers and 8 residual units. When

training such a deep model however, it is quite difficult to generate

edge-preserving images without undesirable color differences. To overcome this

obstacle, we apply both image gradient supervision and a channel-wise rescaling

layer that computes a minimum mean-squared error color correction.

Additionally, to enhance piece-wise constant effects for image smoothing, we

append a domain transform filter with a predicted refined edge map. The

resulting deep model, which can be trained end-to-end, directly learns

edge-preserving smooth images and intrinsic decompositions without any special

design or input scaling/size requirements. Moreover, our method shows much

better numerical and visual results on both tasks and runs in comparable test

time to existing deep methods.

Modeling Retinal Ganglion Cell Population Activity with Restricted Boltzmann Machines

Matteo Zanotto , Riccardo Volpi , Alessandro Maccione , Luca Berdondini , Diego Sona , Vittorio Murino Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Neurons and Cognition (q-bio.NC)

The retina is a complex nervous system which encodes visual stimuli before

higher order processing occurs in the visual cortex. In this study we evaluated

whether information about the stimuli received by the retina can be retrieved

from the firing rate distribution of Retinal Ganglion Cells (RGCs), exploiting

High-Density 64×64 MEA technology. To this end, we modeled the RGC population

activity using mean-covariance Restricted Boltzmann Machines, latent variable

models capable of learning the joint distribution of a set of continuous

observed random variables and a set of binary unobserved random units. The idea

was to figure out if binary latent states encode the regularities associated to

different visual stimuli, as modes in the joint distribution. We measured the

goodness of mcRBM encoding by calculating the Mutual Information between the

latent states and the stimuli shown to the retina. Results show that binary

states can encode the regularities associated to different stimuli, using both

gratings and natural scenes as stimuli. We also discovered that hidden

variables encode interesting properties of retinal activity, interpreted as

population receptive fields. We further investigated the ability of the model

to learn different modes in population activity by comparing results associated

to a retina in normal conditions and after pharmacologically blocking GABA

receptors (GABAC at first, and then also GABAA and GABAB). As expected, Mutual

Information tends to decrease if we pharmacologically block receptors. We

finally stress that the computational method described in this work could

potentially be applied to any kind of neural data obtained through MEA

technology, though different techniques should be applied to interpret the

results.

Context-aware Captions from Context-agnostic Supervision

Ramakrishna Vedantam , Samy Bengio , Kevin Murphy , Devi Parikh , Gal Chechik

Comments: 16 pages, 10 figures

Subjects

Computer Vision and Pattern Recognition (cs.CV)

; Artificial Intelligence (cs.AI)

We introduce a technique to produce discriminative context-aware image

captions (captions that describe differences between images or visual concepts)

using only generic context-agnostic training data (captions that describe a

concept or an image in isolation). For example, given images and captions of

“siamese cat” and “tiger cat”, our system generates language that describes the

“siamese cat” in a way that distinguishes it from “tiger cat”. We start with a

generic language model that is context-agnostic and add a listener to

discriminate between closely-related concepts. Our approach offers two key

advantages over previous work: 1) our listener does not need separate training,

and 2) allows joint inference to decode sentences that satisfy both the speaker

and listener — yielding an introspective speaker. We first apply our

introspective speaker to a justification task, i.e. to describe why an image

contains a particular fine-grained category as opposed to another closely

related category in the CUB-200-2011 dataset. We then study discriminative

image captioning to generate language that uniquely refers to one out of two

semantically similar images in the COCO dataset. Evaluations with

discriminative ground truth for justification and human studies for

discriminative image captioning reveal that our approach outperforms baseline

generative and speaker-listener approaches for discrimination.

A Unified RGB-T Saliency Detection Benchmark: Dataset, Baselines, Analysis and A Novel Approach

Chenglong Li , Guizhao Wang , Yunpeng Ma , Aihua Zheng , Bin Luo , Jin Tang Subjects : Computer Vision and Pattern Recognition (cs.CV)

Despite significant progress, image saliency detection still remains a

challenging task in complex scenes and environments. Integrating multiple

different but complementary cues, like RGB and Thermal (RGB-T), may be an

effective way for boosting saliency detection performance. The current research

in this direction, however, is limited by the lack of a comprehensive

benchmark. This work contributes such a RGB-T image dataset, which includes 821

spatially aligned RGB-T image pairs and their ground truth annotations for

saliency detection purpose. The image pairs are with high diversity recorded

under different scenes and environmental conditions, and we annotate 11

challenges on these image pairs for performing the challenge-sensitive analysis

for different saliency detection algorithms. We also implement 3 kinds of

baseline methods with different modality inputs to provide a comprehensive

comparison platform.

With this benchmark, we propose a novel approach, multi-task manifold ranking

with cross-modality consistency, for RGB-T saliency detection. In particular,

we introduce a weight for each modality to describe the reliability, and

integrate them into the graph-based manifold ranking algorithm to achieve

adaptive fusion of different source data. Moreover, we incorporate the

cross-modality consistent constraints to integrate different modalities

collaboratively. For the optimization, we design an efficient algorithm to

iteratively solve several subproblems with closed-form solutions. Extensive

experiments against other baseline methods on the newly created benchmark

demonstrate the effectiveness of the proposed approach, and we also provide

basic insights and potential future research directions for RGB-T saliency

detection.

Full-reference image quality assessment-based B-mode ultrasound image similarity measure

Kele Xu , Xi Liu , Chang Liu , Hengxing Cai , Zhifeng Gao

Comments: 7 pages, 4 figures

Subjects

Computer Vision and Pattern Recognition (cs.CV)

During the last decades, the number of new full-reference image quality

assessment algorithms has been increasing drastically. Yet, despite of the

remarkable progress that has been made, the medical ultrasound image similarity

measurement remains largely unsolved due to a high level of speckle noise

contamination. Potential applications of the ultrasound image similarity

measurement seem evident in several aspects. To name a few, ultrasound imaging

quality assessment, abnormal function region detection, etc. In this paper, a

comparative study was made on full-reference image quality assessment methods

for ultrasound image visual structural similarity measure. Moreover, based on

the image similarity index, a generic ultrasound motion tracking

re-initialization framework is given in this work. The experiments are

conducted on synthetic data and real-ultrasound liver data and the results

demonstrate that, with proposed similarity-based tracking re-initialization,

the mean error of landmarks tracking can be decreased from 2 mm to about 1.5 mm

in the ultrasound liver sequence.

Multivariate Regression with Grossly Corrupted Observations: A Robust Approach and its Applications

Xiaowei Zhang , Chi Xu , Yu Zhang , Tingshao Zhu , Li Cheng Subjects : Machine Learning (stat.ML) ; Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

This paper studies the problem of multivariate linear regression where a

portion of the observations is grossly corrupted or is missing, and the

magnitudes and locations of such occurrences are unknown in priori. To deal

with this problem, we propose a new approach by explicitly consider the error

source as well as its sparseness nature. An interesting property of our

approach lies in its ability of allowing individual regression output elements

or tasks to possess their unique noise levels. Moreover, despite working with a

non-smooth optimization problem, our approach still guarantees to converge to

its optimal solution. Experiments on synthetic data demonstrate the

competitiveness of our approach compared with existing multivariate regression

models. In addition, empirically our approach has been validated with very

promising results on two exemplar real-world applications: The first concerns

the prediction of extit{Big-Five} personality based on user behaviors at

social network sites (SNSs), while the second is 3D human hand pose estimation

from depth images. The implementation of our approach and comparison methods as

well as the involved datasets are made publicly available in support of the

open-source and reproducible research initiatives.

Stochastic Generative Hashing

Bo Dai , Ruiqi Guo , Sanjiv Kumar , Niao He , Le Song

Comments: 19 pages, 22 figures

Subjects

Learning (cs.LG)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Learning to hash plays a fundamentally important role in the efficient image

and video retrieval and many other computer vision problems. However, due to

the binary outputs of the hash functions, the learning of hash functions is

very challenging. In this paper, we propose a novel approach to learn

stochastic hash functions such that the learned hashing codes can be used to

regenerate the inputs. We develop an efficient stochastic gradient learning

algorithm which can avoid the notorious difficulty caused by binary output

constraint, and directly optimize the parameters of the hash functions and the

associated generative model jointly. The proposed method can be applied to both

(L2) approximate nearest neighbor search (L2NNS) and maximum inner product

search (MIPS). Extensive experiments on a variety of large-scale datasets show

that the proposed method achieves significantly better retrieval results than

previous state-of-the-arts.

Artificial Intelligence

Towards Smart Proof Search for Isabelle

Yutaka Nagashima

Comments: Accepted at AITP2017

Subjects

Artificial Intelligence (cs.AI)

Despite the recent progress in automatic theorem provers, proof engineers are

still suffering from the lack of powerful proof automation. In this position

paper we first report our proof strategy language based on a meta-tool

approach. Then, we propose an AI-based approach to drastically improve proof

automation for Isabelle, while identifying three major challenges we plan to

address for this objective.

A Framework for Knowledge Management and Automated Reasoning Applied on Intelligent Transport Systems

Aneta Vulgarakis Feljan , Athanasios Karapantelakis , Leonid Mokrushin , Hongxin Liang , Rafia Inam , Elena Fersman , Carlos R.B. Azevedo , Klaus Raizer , Ricardo S. Souza Subjects : Artificial Intelligence (cs.AI)

Cyber-Physical Systems in general, and Intelligent Transport Systems (ITS) in

particular use heterogeneous data sources combined with problem solving

expertise in order to make critical decisions that may lead to some form of

actions e.g., driver notifications, change of traffic light signals and braking

to prevent an accident. Currently, a major part of the decision process is done

by human domain experts, which is time-consuming, tedious and error-prone.

Additionally, due to the intrinsic nature of knowledge possession this decision

process cannot be easily replicated or reused. Therefore, there is a need for

automating the reasoning processes by providing computational systems a formal

representation of the domain knowledge and a set of methods to process that

knowledge. In this paper, we propose a knowledge model that can be used to

express both declarative knowledge about the systems’ components, their

relations and their current state, as well as procedural knowledge representing

possible system behavior. In addition, we introduce a framework for knowledge

management and automated reasoning (KMARF). The idea behind KMARF is to

automatically select an appropriate problem solver based on formalized

reasoning expertise in the knowledge base, and convert a problem definition to

the corresponding format. This approach automates reasoning, thus reducing

operational costs, and enables reusability of knowledge and methods across

different domains. We illustrate the approach on a transportation planning use

case.