转载

arXiv Paper Daily: Fri, 23 Dec 2016

Neural and Evolutionary Computing

Highway and Residual Networks learn Unrolled Iterative Estimation

Klaus Greff , Rupesh K. Srivastava , Jürgen Schmidhuber

Comments: 10 + 3pages, under review for ICLR 2017

Subjects

Neural and Evolutionary Computing (cs.NE)

; Artificial Intelligence (cs.AI); Learning (cs.LG)

The past year saw the introduction of new architectures such as Highway

networks and Residual networks which, for the first time, enabled the training

of feedforward networks with dozens to hundreds of layers using simple gradient

descent. While depth of representation has been posited as a primary reason for

their success, there are indications that these architectures defy a popular

view of deep learning as a hierarchical computation of increasingly abstract

features at each layer.

In this report, we argue that this view is incomplete and does not adequately

explain several recent findings. We propose an alternative viewpoint based on

unrolled iterative estimation—a group of successive layers iteratively refine

their estimates of the same features instead of computing an entirely new

representation. We demonstrate that this viewpoint directly leads to the

construction of Highway and Residual networks. Finally we provide preliminary

experiments to discuss the similarities and differences between the two

architectures.

Difficulty Adjustable and Scalable Constrained Multi-objective Test Problem Toolkit

Zhun Fan , Wenji Li , Xinye Cai , Hui Li , Kaiwen Hu , Qingfu Zhang , Kalyanmoy Deb , Erik D. Goodman

Comments: 15 pages, 8 figures, 7 tables

Subjects

Neural and Evolutionary Computing (cs.NE)

; Artificial Intelligence (cs.AI)

In order to better understand the advantages and disadvantages of a

constrained multi-objective evolutionary algorithm (CMOEA), it is important to

understand the nature of difficulty of a constrained multi-objective

optimization problem (CMOP) that the CMOEA is going to deal with. In this

paper, we first propose three primary types of difficulty to characterize the

constraints in CMOPs, including feasibility-hardness, convergence-hardness and

diversity-hardness. We then develop a general toolkit to construct difficulty

adjustable CMOPs with three types of parameterized constraint functions

according to the proposed three primary types of difficulty. In fact,

combination of the three primary constraint functions with different parameters

can lead to construct a large variety of CMOPs and CMaOPs, whose difficulty can

be uniquely defined by a triplet with each of its parameter specifying the

level of each primary difficulty type respectively. Based on this toolkit, we

suggest fifteen difficulty adjustable CMOPs named DAC-MOP1-15 with different

types and levels of difficulty. To study the effectiveness of DAC-MOP1-15, two

popular CMOEAs – MOEA/D-CDP and NSGA-II-CDP are adopted to test their

performances on them. Furthermore, this toolkit also has the ability to scale

the number of objectives. Nine difficulty adjustable constrained many-objective

optimization problems (DAC-MaOPs) named DAC-MaOP1-9 with the scalability to the

number of objectives are also proposed using this toolkit. Two constrained

many-objective evolutionary algorithms (CMaOEAs) – CNSGA-III and CMOEA/DD are

applied to test their performances on three, five, seven and ten-objective

DAC-MaOP1-9 with different difficulty levels and types.

Computer Vision and Pattern Recognition

Online Semantic Activity Forecasting with DARKO

Nicholas Rhinehart , Kris M. Kitani

Comments: this http URL

Subjects

Computer Vision and Pattern Recognition (cs.CV)

We address the problem of continuously observing and forecasting long-term

semantic activities of a first-person camera wearer: what the person will do,

where they will go, and what goal they are seeking. In contrast to prior work

in trajectory forecasting and short-term activity forecasting, our algorithm,

DARKO, reasons about the future position, future semantic state, and future

high-level goals of the camera-wearer at arbitrary spatial and temporal

horizons defined only by the wearer’s behaviors. DARKO learns and forecasts

online from first-person observations of the user’s daily behaviors. We derive

novel mathematical results that enable efficient forecasting of different

semantic quantities of interest. We apply our method to a dataset of 5

large-scale environments with 3 different environment types, collected from 3

different users, and experimentally validate DARKO on forecasting tasks.

Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics

Xin Li , Fuxin Li Subjects : Computer Vision and Pattern Recognition (cs.CV)

Deep learning has greatly improved visual recognition in recent years.

However, recent research has shown that there exist many adversarial examples

that can negatively impact the performance of such an architecture. This paper

focuses on detecting those adversarial examples by analyzing whether they come

from the same distribution as the normal examples. Instead of directly training

a deep neural network to detect adversarials, a much simpler approach is

proposed based on statistics on outputs from convolutional layers. A cascade

classifier is designed to efficiently detect adversarials. Furthermore, trained

from one particular adversarial generating mechanism, the resulting classifier

can successfully detect adversarials from a completely different mechanism as

well. After detecting adversarial examples, we show that many of them can be

recovered by simply performing a small average filter on the image. Those

findings should provoke us to think more about the classification mechanisms in

deep convolutional neural networks.

Internet-Based Image Retrieval Using End-to-End Trained Deep Distributions

A. Vakhitov , A. Kuzmin , V. Lempitsky Subjects : Computer Vision and Pattern Recognition (cs.CV)

Internet image search engines have long been considered as a promising tool

for handling open-vocabulary textual user queries to unannotated image

datasets. However, systems that use this tool have to deal with multi-modal and

noisy image sets returned by search engines, especially for polysemous queries.

Generally, for many queries, only a small part of the returned sets can be

relevant to the user intent.

In this work, we suggest an approach that explicitly accounts for the complex

and noisy structure of the image sets returned by Internet image search

engines. Similarly to a considerable number of previous image retrieval works,

we train a deep convolutional network that maps images to high-dimensional

descriptors. To model image sets obtained from the Internet, our approach then

fits a simple probabilistic model that accounts for multi-modality and noise

(e.g. a Gaussian mixture model) to the deep descriptors of the images in this

set. Finally, the resulting distribution model can be used to search in the

unannotated image dataset by evaluating likelihoods of individual images.

As our main contribution, we develop an end-to-end training procedure that

tunes the parameters of a deep network using an annotated training set, while

accounting for the distribution fitting and the subsequent matching. In the

experiments, we show that such an end-to-end approach boosts the accuracy of

the Internet-based image retrieval for hold-out concepts, as compared to

retrieval systems that fit similar distribution models to pre-trained features

and to simpler end-to-end trained baselines.

MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving

Marvin Teichmann , Michael Weber , Marius Zoellner , Roberto Cipolla , Raquel Urtasun

Comments: 9 pages, 7 tables and 9 figures; first place on Kitti Road Segmentation; Code on GitHub (coming soon)

Subjects

Computer Vision and Pattern Recognition (cs.CV)

; Robotics (cs.RO)

While most approaches to semantic reasoning have focused on improving

performance, in this paper we argue that computational times are very important

in order to enable real time applications such as autonomous driving. Towards

this goal, we present an approach to joint classification, detection and

semantic segmentation via a unified architecture where the encoder is shared

amongst the three tasks. Our approach is very simple, can be trained end-to-end

and performs extremely well in the challenging KITTI dataset, outperforming the

state-of-the-art in the road segmentation task. Our approach is also very

efficient, taking less than 100 ms to perform all tasks.

Hardware for Machine Learning: Challenges and Opportunities

Vivienne Sze , Yu-Hsin Chen , Joel Emer , Amr Suleiman , Zhengdong Zhang Subjects : Computer Vision and Pattern Recognition (cs.CV)

Machine learning plays a critical role in extracting meaningful information

out of the zetabytes of sensor data collected every day. For some applications,

the goal is to analyze and understand the data to identify trends (e.g.,

surveillance, portable/wearable electronics); in other applications, the goal

is to take immediate action based the data (e.g., robotics/drones, self-driving

cars, smart Internet of Things). For many of these applications, local embedded

processing near the sensor is preferred over the cloud due to privacy or

latency concerns, or limitations in the communication bandwidth. However, at

the sensor there are often stringent constraints on energy consumption and cost

in addition to throughput and accuracy requirements. Furthermore, flexibility

is often required such that the processing can be adapted for different

applications or environments (e.g., update the weights and model in the

classifier). In many applications, machine learning often involves transforming

the input data into a higher dimensional space, which, along with programmable

weights, increases data movement and consequently energy consumption. In this

paper, we will discuss how these challenges can be addressed at various levels

of hardware design ranging from architecture, hardware-friendly algorithms,

mixed-signal circuits, and advanced technologies (including memories and

sensors).

A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search

Deng Cai Subjects : Computer Vision and Pattern Recognition (cs.CV)

Approximate Nearest Neighbor (ANN) search is a fundamental problem in many

areas of machine learning and data mining. During the past decade, numerous

hashing algorithms are proposed to solve this problem. Every proposed algorithm

claims outperforms other state-of-the-art methods. However, there are serious

drawbacks in the evaluation of existing hashing papers and most of the claims

in these papers should be re-examined. 1) Most of the existing papers failed to

correctly measure the search time which is essential for the ANN search

problem. 2) As a result, most of the papers report the performance increases as

the code length increases, which is wrong if we measure the search time

correctly. 3) The performance of some hashing algorithms (eg, LSH) can easily

be boosted if one uses multiple hash tables, which is an important factor

should be considered in the evaluation while most of the papers failed to do

so. In this paper, we carefully revisit many popular hashing algorithms and

suggest one possible promising direction. For the sake of reproducibility, all

the codes used in the paper are released on Github, which can be used as a

testing platform to fairly compare various hashing algorithms.

Cohort of LSTM and lexicon verification for handwriting recognition with gigantic lexicon

Bruno Stuner , Clément Chatelain , Thierry Paquet

Comments: 28 pages

Subjects

Computer Vision and Pattern Recognition (cs.CV)

Handwriting recognition state of the art methods are based on Long Short Term

Memory (LSTM) recurrent neural networks (RNN) coupled with the use of

linguistic knowledge. LSTM RNN presents high raw performance and interesting

training properties that allow us to break with the standard method at the

state of the art. We present a simple and efficient way to extract from a

single training a large number of complementary LSTM RNN, called cohort,

combined in a cascade architecture with a lexical verification. This process

does not require fine tuning, making it easy to use. Our verification allow to

deal quickly and efficiently with gigantic lexicon (over 3 million words). We

achieve state of the art results for isolated word recognition with very large

lexicon and present novel results for an unprecedented gigantic lexicon.

Deep Blind Compressed Sensing

Shikha Singh , Vanika Singhal , Angshul Majumdar

Comments: DCC 2017 Poster

Subjects

Computer Vision and Pattern Recognition (cs.CV)

This work addresses the problem of extracting deeply learned features

directly from compressive measurements. There has been no work in this area.

Existing deep learning tools only give good results when applied on the full

signal, that too usually after preprocessing. These techniques require the

signal to be reconstructed first. In this work we show that by learning

directly from the compressed domain, considerably better results can be

obtained. This work extends the recently proposed framework of deep matrix

factorization in combination with blind compressed sensing; hence the term deep

blind compressed sensing. Simulation experiments have been carried out on

imaging via single pixel camera, under-sampled biomedical signals, arising in

wireless body area network and compressive hyperspectral imaging. In all cases,

the superiority of our proposed deep blind compressed sensing can be envisaged.

Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks

Yinda Zhang , Shuran Song , Ersin Yumer , Manolis Savva , Joon-Young Lee , Hailin Jin , Thomas Funkhouser

Comments: CVPR submission, main paper and supplementary material

Subjects

Computer Vision and Pattern Recognition (cs.CV)

Indoor scene understanding is central to applications such as robot

navigation and human companion assistance. Over the last years, data-driven

deep neural networks have outperformed many traditional approaches thanks to

their representation learning capabilities. One of the bottlenecks in training

for better representations is the amount of available per-pixel ground truth

data that is required for core scene understanding tasks such as semantic

segmentation, normal prediction, and object edge detection. To address this

problem, a number of works proposed using synthetic data. However, a systematic

study of how such synthetic data is generated is missing. In this work, we

introduce a large-scale synthetic dataset with 400K physically-based rendered

images from 45K realistic 3D indoor scenes. We study the effects of rendering

methods and scene lighting on training for three computer vision tasks: surface

normal prediction, semantic segmentation, and object boundary detection. This

study provides insights into the best practices for training with synthetic

data (more realistic rendering is worth it) and shows that pretraining with our

new synthetic dataset can improve results beyond the current state of the art

on all three tasks.

Efficient Action Detection in Untrimmed Videos via Multi-Task Learning

Yi Zhu , Shawn Newsam

Comments: Accepted at WACV 2017

Subjects

Computer Vision and Pattern Recognition (cs.CV)

; Multimedia (cs.MM)

This paper studies the joint learning of action recognition and temporal

localization in long, untrimmed videos. We employ a multi-task learning

framework that performs the three highly related steps of action proposal,

action recognition, and action localization refinement in parallel instead of

the standard sequential pipeline that performs the steps in order. We develop a

novel temporal actionness regression module that estimates what proportion of a

clip contains action. We use it for temporal localization but it could have

other applications like video retrieval, surveillance, summarization, etc. We

also introduce random shear augmentation during training to simulate viewpoint

change. We evaluate our framework on three popular video benchmarks. Results

demonstrate that our joint model is efficient in terms of storage and

computation in that we do not need to compute and cache dense trajectory

features, and that it is several times faster than its sequential ConvNets

counterpart. Yet, despite being more efficient, it outperforms state-of-the-art

methods with respect to accuracy.

Automatic Identification of Scenedesmus Polymorphic Microalgae from Microscopic Images

Jhony-Heriberto Giraldo-Zuluaga , Geman Diez , Alexander Gomez , Tatiana Martinez , Mariana Peñuela Vasquez , Jesus Francisco Vargas Bonilla , Augusto Salazar Subjects : Computer Vision and Pattern Recognition (cs.CV)

Microalgae counting is used to measure biomass quantity. Usually, it is

performed in a manual way using a Neubauer chamber and expert criterion, with

the risk of a high error rate. This paper addresses the methodology for

automatic identification of Scenedesmus microalgae (used in the methane

production and food industry) and applies it to images captured by a digital

microscope. The use of contrast adaptive histogram equalization for

pre-processing, and active contours for segmentation are presented. The

calculation of statistical features (Histogram of Oriented Gradients, Hu and

Zernike moments) with texture features (Haralick and Local Binary Patterns

descriptors) are proposed for algae characterization. Scenedesmus algae can

build coenobia consisting of 1, 2, 4 and 8 cells. The amount of algae of each

coenobium helps to determine the amount of lipids, proteins, and other

substances in a given sample of a algae crop. The knowledge of the quantity of

those elements improves the quality of bioprocess applications. Classification

of coenobia achieves accuracies of 98.63% and 97.32% with Support Vector

Machine (SVM) and Artificial Neural Network (ANN), respectively. According to

the results it is possible to consider the proposed methodology as an

alternative to the traditional technique for algae counting. The database used

in this paper is publicly available for download.

Top-down Visual Saliency Guided by Captions

Vasili Ramanishka , Abir Das , Jianming Zhang , Kate Saenko Subjects : Computer Vision and Pattern Recognition (cs.CV)

Top-down saliency methods based on deep neural networks have recently been

proposed for task-driven visual search. However existing methods focus on

object or scene classification tasks and cannot be used to compute saliency

heatmaps using a natural language sentence as the top-down input. Current

state-of-the-art image and video captioning models can generate accurate

sentence captions but are difficult to understand, as they do not expose the

internal process by which spatial and temporal regions are mapped to predicted

words. In this paper, we expose this mapping and demonstrate that

spatio-temporal saliency is learned implicitly by the combination of CNN and

LSTM parts of modern encoder-decoder networks. Our approach, which we call

Caption-Guided Visual Saliency, can produce spatial or spatio-temporal heatmaps

for both given input sentences or sentences predicted by our model. Unlike

recent efforts that introduce explicit “attention” layers to selectively attend

to certain inputs while generating each word, our approach recovers saliency

without the overhead of explicit attention layers, and can be used to analyze a

variety of existing model architectures and improve their design. We evaluate

our approach on large scale video and image datasets and make several

interesting discoveries about the inner workings of captioning models. The

source code is publicly available at

github.com/VisionLearningGroup/caption-guided-saliency.

Re-evaluating Automatic Metrics for Image Captioning

Mert Kilickaya , Aykut Erdem , Nazli Ikizler-Cinbis , Erkut Erdem Subjects : Computation and Language (cs.CL) ; Computer Vision and Pattern Recognition (cs.CV)

The task of generating natural language descriptions from images has received

a lot of attention in recent years. Consequently, it is becoming increasingly

important to evaluate such image captioning approaches in an automatic manner.

In this paper, we provide an in-depth evaluation of the existing image

captioning metrics through a series of carefully designed experiments.

Moreover, we explore the utilization of the recently proposed Word Mover’s

Distance (WMD) document metric for the purpose of image captioning. Our

findings outline the differences and/or similarities between metrics and their

relative robustness by means of extensive correlation, accuracy and distraction

based evaluations. Our results also demonstrate that WMD provides strong

advantages over other metrics.

Artificial Intelligence

Jointly Extracting Relations with Class Ties via Effective Deep Ranking

Hai Ye , Wenhan Chao , Zhunchen Luo Subjects : Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)

In distant supervised relation extraction, the connection between relations

of one entity tuple, which we call class ties, is common. Exploiting this

connection may be promising for relation extraction. However, this property is

seldom considered by previous work. In this work, to leverage class ties, we

propose to make joint relation extraction with a unified model that integrates

convolutional neural network with a general pairwise ranking framework, in

which two novel ranking loss functions are introduced. Besides, an effective

method is proposed to relieve the impact of relation NR (not relation) for

model training and test. Experimental results on a widely used dataset show

that: (1) Our model is much more superior than the baselines, achieving

state-of-the-art performance; (2) Leveraging class ties, joint extraction is

indeed better than separated extraction; (3) Relieving the impact of NR will

significantly boost our model performance; (4) Our model can primely deal with

wrong labeling problem.

Solving Set Optimization Problems by Cardinality Optimization via Weak Constraints with an Application to Argumentation

Wolfgang Faber , Mauro Vallati , Federico Cerutti , Massimiliano Giacomin

Comments: Informal proceedings of the 1st Workshop on Trends and Applications of Answer Set Programming (TAASP 2016), Klagenfurt, Austria, 26 September 2016

Subjects

Artificial Intelligence (cs.AI)

Optimization – minimization or maximization – in the lattice of subsets is a

frequent operation in Artificial Intelligence tasks. Examples are

subset-minimal model-based diagnosis, nonmonotonic reasoning by means of

circumscription, or preferred extensions in abstract argumentation. Finding the

optimum among many admissible solutions is often harder than finding admissible

solutions with respect to both computational complexity and methodology. This

paper addresses the former issue by means of an effective method for finding

subset-optimal solutions. It is based on the relationship between

cardinality-optimal and subset-optimal solutions, and the fact that many

logic-based declarative programming systems provide constructs for finding

cardinality-optimal solutions, for example maximum satisfiability (MaxSAT) or

weak constraints in Answer Set Programming (ASP). Clearly each

cardinality-optimal solution is also a subset-optimal one, and if the language

also allows for the addition of particular restricting constructs (both MaxSAT

and ASP do) then all subset-optimal solutions can be found by an iterative

computation of cardinality-optimal solutions. As a showcase, the computation of

preferred extensions of abstract argumentation frameworks using the proposed

method is studied.

The SP Theory of Intelligence as a Foundation for the Development of a General, Human-Level Thinking Machine

J Gerard Wolff Subjects : Artificial Intelligence (cs.AI)

This paper summarises how the “SP theory of intelligence” and its realisation

in the “SP computer model” simplifies and integrates concepts across artificial

intelligence and related areas, and thus provides a promising foundation for

the development of a general, human-level thinking machine, in accordance with

the main goal of research in artificial general intelligence.

The key to this simplification and integration is the powerful concept of

“multiple alignment”, borrowed and adapted from bioinformatics. This concept

has the potential to be the “double helix” of intelligence, with as much

significance for human-level intelligence as has DNA for biological sciences.

Strengths of the SP system include: versatility in the representation of

diverse kinds of knowledge; versatility in aspects of intelligence (including:

strengths in unsupervised learning; the processing of natural language; pattern

recognition at multiple levels of abstraction that is robust in the face of

errors in data; several kinds of reasoning (including: one-step `deductive’

reasoning; chains of reasoning; abductive reasoning; reasoning with

probabilistic networks and trees; reasoning with ‘rules’; nonmonotonic

reasoning and reasoning with default values; Bayesian reasoning with

‘explaining away’; and more); planning; problem solving; and more); seamless

integration of diverse kinds of knowledge and diverse aspects of intelligence

in any combination; and potential for application in several areas (including:

helping to solve nine problems with big data; helping to develop human-level

intelligence in autonomous robots; serving as a database with intelligence and

with versatility in the representation and integration of several forms of

knowledge; serving as a vehicle for medical knowledge and as an aid to medical

diagnosis; and several more).

Non-Deterministic Policy Improvement Stabilizes Approximated Reinforcement Learning

Wendelin Böhmer , Rong Guo , Klaus Obermayer

Comments: This paper has been presented at the 13th European Workshop on Reinforcement Learning (EWRL 2016) on the 3rd and 4th of December 2016 in Barcelona, Spain

Subjects

Artificial Intelligence (cs.AI)

; Learning (cs.LG); Machine Learning (stat.ML)

This paper investigates a type of instability that is linked to the greedy

policy improvement in approximated reinforcement learning. We show empirically

that non-deterministic policy improvement can stabilize methods like LSPI by

controlling the improvements’ stochasticity. Additionally we show that a

suitable representation of the value function also stabilizes the solution to

some degree. The presented approach is simple and should also be easily

transferable to more sophisticated algorithms like deep reinforcement learning.