转载

arXiv Paper Daily: Tue, 31 Jan 2017

Neural and Evolutionary Computing

PathNet: Evolution Channels Gradient Descent in Super Neural Networks

Chrisantha Fernando , Dylan Banarse , Charles Blundell , Yori Zwols , David Ha , Andrei A. Rusu , Alexander Pritzel , Daan Wierstra Subjects : Neural and Evolutionary Computing (cs.NE) ; Learning (cs.LG)

For artificial general intelligence (AGI) it would be efficient if multiple

users trained the same giant neural network, permitting parameter reuse,

without catastrophic forgetting. PathNet is a first step in this direction. It

is a neural network algorithm that uses agents embedded in the neural network

whose task is to discover which parts of the network to re-use for new tasks.

Agents are pathways (views) through the network which determine the subset of

parameters that are used and updated by the forwards and backwards passes of

the backpropogation algorithm. During learning, a tournament selection genetic

algorithm is used to select pathways through the neural network for replication

and mutation. Pathway fitness is the performance of that pathway measured

according to a cost function. We demonstrate successful transfer learning;

fixing the parameters along a path learned on task A and re-evolving a new

population of paths for task B, allows task B to be learned faster than it

could be learned from scratch or after fine-tuning. Paths evolved on task B

re-use parts of the optimal path evolved on task A. Positive transfer was

demonstrated for binary MNIST, CIFAR, and SVHN supervised learning

classification tasks, and a set of Atari and Labyrinth reinforcement learning

tasks, suggesting PathNets have general applicability for neural network

training. Finally, PathNet also significantly improves the robustness to

hyperparameter choices of a parallel asynchronous reinforcement learning

algorithm (A3C).

Memory Augmented Neural Networks with Wormhole Connections

Caglar Gulcehre , Sarath Chandar , Yoshua Bengio Subjects : Learning (cs.LG) ; Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Recent empirical results on long-term dependency tasks have shown that neural

networks augmented with an external memory can learn the long-term dependency

tasks more easily and achieve better generalization than vanilla recurrent

neural networks (RNN). We suggest that memory augmented neural networks can

reduce the effects of vanishing gradients by creating shortcut (or wormhole)

connections. Based on this observation, we propose a novel memory augmented

neural network model called TARDIS (Temporal Automatic Relation Discovery in

Sequences). The controller of TARDIS can store a selective set of embeddings of

its own previous hidden states into an external memory and revisit them as and

when needed. For TARDIS, memory acts as a storage for wormhole connections to

the past to propagate the gradients more effectively and it helps to learn the

temporal dependencies. The memory structure of TARDIS has similarities to both

Neural Turing Machines (NTM) and Dynamic Neural Turing Machines (D-NTM), but

both read and write operations of TARDIS are simpler and more efficient. We use

discrete addressing for read/write operations which helps to substantially to

reduce the vanishing gradient problem with very long sequences. Read and write

operations in TARDIS are tied with a heuristic once the memory becomes full,

and this makes the learning problem simpler when compared to NTM or D-NTM type

of architectures. We provide a detailed analysis on the gradient propagation in

general for MANNs. We evaluate our models on different long-term dependency

tasks and report competitive results in all of them.

Source localization in an ocean waveguide using supervised machine learning

Haiqiang Niu , Peter Gerstoft , Emma Reeves

Comments: Submitted to The Journal of the Acoustical Society of America

Subjects

Atmospheric and Oceanic Physics (physics.ao-ph)

; Neural and Evolutionary Computing (cs.NE); Geophysics (physics.geo-ph)

Source localization is solved as a classification problem by training a

feed-forward neural network (FNN) on ocean acoustic data. The pressure received

by a vertical linear array is preprocessed by constructing a normalized sample

covariance matrix (SCM), which is used as input for the FNN. Each neuron of the

output layer represents a discrete source range. FNN is a data-driven method

that learns features directly from observed acoustic data, unlike model-based

localization methods such as matched-field processing that require accurate

sound propagation modeling. The FNN achieves a good performance (the mean

absolute percentage error below 10/%) for predicting source ranges for vertical

array data from the Noise09 experiment. The effects of varying the parameters

of the method, such as number of hidden neurons and layers, number of output

neurons and number of snapshots in each input sample are discussed.

Detection, Segmentation and Recognition of Face and its Features Using Neural Network

Smriti Tikoo , Nitin Malik

Comments: Google Scholar Indexed Journal, 5 pages, 10 figures, Journal of Biosensors and Bioelectronics, vol. 7, no. 2, June-Sept 2016

Subjects

Computer Vision and Pattern Recognition (cs.CV)

; Neural and Evolutionary Computing (cs.NE)

Face detection and recognition has been prevalent with research scholars and

diverse approaches have been incorporated till date to serve purpose. The

rampant advent of biometric analysis systems, which may be full body scanners,

or iris detection and recognition systems and the finger print recognition

systems, and surveillance systems deployed for safety and security purposes

have contributed to inclination towards same. Advances has been made with

frontal view, lateral view of the face or using facial expressions such as

anger, happiness and gloominess, still images and video image to be used for

detection and recognition. This led to newer methods for face detection and

recognition to be introduced in achieving accurate results and economically

feasible and extremely secure. Techniques such as Principal Component analysis

(PCA), Independent component analysis (ICA), Linear Discriminant Analysis

(LDA), have been the predominant ones to be used. But with improvements needed

in the previous approaches Neural Networks based recognition was like boon to

the industry. It not only enhanced the recognition but also the efficiency of

the process. Choosing Backpropagation as the learning method was clearly out of

its efficiency to recognize nonlinear faces with an acceptance ratio of more

than 90% and execution time of only few seconds.

Computer Vision and Pattern Recognition

Document Decomposition of Bangla Printed Text

Md. Fahad Hasan , Tasmin Afroz , Sabir Ismail , Md. Saiful Islam

Comments: 6 pages

Subjects

Computer Vision and Pattern Recognition (cs.CV)

; Computation and Language (cs.CL)

Today all kind of information is getting digitized and along with all this

digitization, the huge archive of various kinds of documents is being digitized

too. We know that, Optical Character Recognition is the method through which,

newspapers and other paper documents convert into digital resources. But, it is

a fact that this method works on texts only. As a result, if we try to process

any document which contains non-textual zones, then we will get garbage texts

as output. That is why; in order to digitize documents properly they should be

prepossessed carefully. And while preprocessing, segmenting document in

different regions according to the category properly is most important. But,

the Optical Character Recognition processes available for Bangla language have

no such algorithm that can categorize a newspaper/book page fully. So we worked

to decompose a document into its several parts like headlines, sub headlines,

columns, images etc. And if the input is skewed and rotated, then the input was

also deskewed and de-rotated. To decompose any Bangla document we found out the

edges of the input image. Then we find out the horizontal and vertical area of

every pixel where it lies in. Later on the input image was cut according to

these areas. Then we pick each and every sub image and found out their

height-width ratio, line height. Then according to these values the sub images

were categorized. To deskew the image we found out the skew angle and de skewed

the image according to this angle. To de-rotate the image we used the line

height, matra line, pixel ratio of matra line.

Self-Adaptation of Activity Recognition Systems to New Sensors

David Bannach , Martin Jänicke , Vitor F. Rey , Sven Tomforde , Bernhard Sick , Paul Lukowicz

Comments: 26 pages, very descriptive figures, comprehensive evaluation on real-life datasets

Subjects

Computer Vision and Pattern Recognition (cs.CV)

; Learning (cs.LG); Machine Learning (stat.ML)

Traditional activity recognition systems work on the basis of training,

taking a fixed set of sensors into account. In this article, we focus on the

question how pattern recognition can leverage new information sources without

any, or with minimal user input. Thus, we present an approach for opportunistic

activity recognition, where ubiquitous sensors lead to dynamically changing

input spaces. Our method is a variation of well-established principles of

machine learning, relying on unsupervised clustering to discover structure in

data and inferring cluster labels from a small number of labeled dates in a

semi-supervised manner. Elaborating the challenges, evaluations of over 3000

sensor combinations from three multi-user experiments are presented in detail

and show the potential benefit of our approach.

A Survey on Structure from Motion

Onur Ozyesil , Vladislav Voroninski , Ronen Basri , Amit Singer

Comments: 40 pages, 16 figures

Subjects

Computer Vision and Pattern Recognition (cs.CV)

The structure from motion (SfM) problem in computer vision is the problem of

recovering the (3)D structure of a stationary scene from a set of projective

measurements, represented as a collection of (2)D images, via estimation of

motion of the cameras corresponding to these images. In essence, SfM involves

the three main stages of (1) extraction of features in images (e.g., points of

interest, lines, etc.) and matching of these features between images, (2)

camera motion estimation (e.g., using relative pairwise camera poses estimated

from the extracted features), (3) recovery of the (3)D structure using the

estimated motion and features (e.g., by minimizing the so-called reprojection

error). This survey mainly focuses on the relatively recent developments in the

literature pertaining to stages (2) and (3). More specifically, after touching

upon the early factorization-based techniques for motion and structure

estimation, we provide a detailed account of some of the recent camera location

estimation methods in the literature, which precedes the discussion of notable

techniques for (3)D structure recovery. We also cover the basics of the

simultaneous localization and mapping (SLAM) problem, which can be considered

to be a specific case of the SfM problem. Additionally, a review of the

fundamentals of feature extraction and matching (i.e., stage (1) above),

various recent methods for handling ambiguities in (3)D scenes, SfM techniques

involving relatively uncommon camera models and image features, and popular

sources of data and SfM software is included in our survey.

CNN as Guided Multi-layer RECOS Transform

C.-C. Jay Kuo Subjects : Computer Vision and Pattern Recognition (cs.CV)

There is a resurging interest in developing a neural-network-based solution

to the supervised machine learning problem. The convolutional neural network

(CNN), which is also known as the feedforward neural network and the

multi-layer perceptron (MLP), will be studied in this note. To begin with, we

introduce a RECOS transform as a basic building block of CNNs. The “RECOS” is

an acronym for “REctified-COrrelations on a Sphere”. It consists of two main

concepts: 1) data clustering on a sphere and 2) rectification. Afterwards, we

interpret a CNN as a network that implements the guided multi-layer RECOS

transform with three highlights. First, we compare the traditional single-layer

and modern multi-layer signal analysis approaches, point out key ingredients

that enable the multi-layer approach, and provide a full explanation to the

operating principle of CNNs. Second, we discuss how guidance is provided by

labels through backpropagation in the training. Third, we show that a trained

network can be greatly simplified in the testing stage demanding only one-bit

representation for both filter weights and inputs.

Scalable Nearest Neighbor Search based on kNN Graph

Wan-Lei Zhao , Jie Yang , Cheng-Hao Deng

Comments: 6 pages, 2 figures

Subjects

Computer Vision and Pattern Recognition (cs.CV)

; Databases (cs.DB)

Nearest neighbor search is known as a challenging issue that has been studied

for several decades. Recently, this issue becomes more and more imminent in

viewing that the big data problem arises from various fields. In this paper, a

scalable solution based on hill-climbing strategy with the support of k-nearest

neighbor graph (kNN) is presented. Two major issues have been considered in the

paper. Firstly, an efficient kNN graph construction method based on two means

tree is presented. For the nearest neighbor search, an enhanced hill-climbing

procedure is proposed, which sees considerable performance boost over original

procedure. Furthermore, with the support of inverted indexing derived from

residue vector quantization, our method achieves close to 100% recall with high

speed efficiency in two state-of-the-art evaluation benchmarks. In addition, a

comparative study on both the compressional and traditional nearest neighbor

search methods is presented. We show that our method achieves the best

trade-off between search quality, efficiency and memory complexity.

Re-ranking Person Re-identification with k-reciprocal Encoding

Zhun Zhong , Liang Zheng , Donglin Cao , Shaozi Li Subjects : Computer Vision and Pattern Recognition (cs.CV)

When considering person re-identification (re-ID) as a retrieval process,

re-ranking is a critical step to improve its accuracy. Yet in the re-ID

community, limited effort has been devoted to re-ranking, especially those

fully automatic, unsupervised solutions. In this paper, we propose a

k-reciprocal encoding method to re-rank the re-ID results. Our hypothesis is

that if a gallery image is similar to the probe in the k-reciprocal nearest

neighbors, it is more likely to be a true match. Specifically, given an image,

a k-reciprocal feature is calculated by encoding its k-reciprocal nearest

neighbors into a single vector, which is used for re-ranking under the Jaccard

distance. The final distance is computed as the combination of the original

distance and the Jaccard distance. Our re-ranking method does not require any

human interaction or any labeled data, so it is applicable to large-scale

datasets. Experiments on the large-scale Market-1501, CUHK03, MARS, and PRW

datasets confirm the effectiveness of our method.

Faceness-Net: Face Detection through Deep Facial Part Responses

Shuo Yang , Ping Luo , Chen Change Loy , Xiaoou Tang

Comments: An extended version of our ICCV 2015 paper

Subjects

Computer Vision and Pattern Recognition (cs.CV)

We propose a deep convolutional neural network (CNN) for face detection

leveraging on facial attributes based supervision. We observe a phenomenon that

part detectors emerge within CNN trained to classify attributes from uncropped

face images, without any explicit part supervision. The observation motivates a

new method for finding faces through scoring facial parts responses by their

spatial structure and arrangement. The scoring mechanism is data-driven, and

carefully formulated considering challenging cases where faces are only

partially visible. This consideration allows our network to detect faces under

severe occlusion and unconstrained pose variations. Our method achieves

promising performance on popular benchmarks including FDDB, PASCAL Faces, AFW,

and WIDER FACE.

The HASYv2 dataset

Martin Thoma Subjects : Computer Vision and Pattern Recognition (cs.CV)

This paper describes the HASYv2 dataset. HASY is a publicly available, free

of charge dataset of single symbols similar to MNIST. It contains 168233

instances of 369 classes. HASY contains two challenges: A classification

challenge with 10 pre-defined folds for 10-fold cross-validation and a

verification challenge.

MSCM-LiFe: Multi-scale cross modal linear feature for horizon detection in maritime images

D. K. Prasad , D. Rajan , C. K. Prasath , L. Rachmawati , E. Rajabaly , C. Quek

Comments: 5 pages, 4 figures, IEEE TENCON 2016

Subjects

Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a new method for horizon detection called the multi-scale

cross modal linear feature. This method integrates three different concepts

related to the presence of horizon in maritime images to increase the accuracy

of horizon detection. Specifically it uses the persistence of horizon in

multi-scale median filtering, and its detection as a linear feature commonly

detected by two different methods, namely the Hough transform of edgemap and

the intensity gradient. We demonstrate the performance of the method over 13

videos comprising of more than 3000 frames and show that the proposed method

detects horizon with small error in most of the cases, outperforming three

state-of-the-art methods.

VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem

Ronald Clark , Sen Wang , Hongkai Wen , Andrew Markham , Niki Trigoni

Comments: AAAI-17

Subjects

Computer Vision and Pattern Recognition (cs.CV)

In this paper we present an on-manifold sequence-to-sequence learning

approach to motion estimation using visual and inertial sensors. It is to the

best of our knowledge the first end-to-end trainable method for visual-inertial

odometry which performs fusion of the data at an intermediate

feature-representation level. Our method has numerous advantages over

traditional approaches. Specifically, it eliminates the need for tedious manual

synchronization of the camera and IMU as well as eliminating the need for

manual calibration between the IMU and camera. A further advantage is that our

model naturally and elegantly incorporates domain specific information which

significantly mitigates drift. We show that our approach is competitive with

state-of-the-art traditional methods when accurate calibration data is

available and can be trained to outperform them in the presence of calibration

and synchronization errors.

Feature base fusion for splicing forgery detection based on neuro fuzzy

Habib Ghaffari Hadigheh , Ghazali bin sulong Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Learning (cs.LG)

Most of researches on image forensics have been mainly focused on detection

of artifacts introduced by a single processing tool. They lead in the

development of many specialized algorithms looking for one or more particular

footprints under specific settings. Naturally, the performance of such

algorithms are not perfect, and accordingly the provided output might be noisy,

inaccurate and only partially correct. Furthermore, a forged image in practical

scenarios is often the result of utilizing several tools available by

image-processing software systems. Therefore, reliable tamper detection

requires developing more poweful tools to deal with various tempering

scenarios. Fusion of forgery detection tools based on Fuzzy Inference System

has been used before for addressing this problem. Adjusting the membership

functions and defining proper fuzzy rules for attaining to better results are

time-consuming processes. This can be accounted as main disadvantage of fuzzy

inference systems. In this paper, a Neuro-Fuzzy inference system for fusion of

forgery detection tools is developed. The neural network characteristic of

these systems provides appropriate tool for automatically adjusting the

membership functions. Moreover, initial fuzzy inference system is generated

based on fuzzy clustering techniques. The proposed framework is implemented and

validated on a benchmark image splicing data set in which three forgery

detection tools are fused based on adaptive Neuro-Fuzzy inference system. The

outcome of the proposed method reveals that applying Neuro Fuzzy inference

systems could be a better approach for fusion of forgery detection tools.

Supervised Multilayer Sparse Coding Networks for Image Classification

Xiaoxia Sun , Nasser M. Nasrabadi , Trac D. Tran Subjects : Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose a novel multilayer sparse coding network capable of

efficiently adapting its own regularization parameters to a given dataset. The

network is trained end-to-end with a supervised task-driven learning algorithm

via error backpropagation. During training, the network learns both the

dictionaries and the regularization parameters of each sparse coding layer so

that the reconstructive dictionaries are smoothly transformed into increasingly

discriminative representations. We also incorporate a new weighted sparse

coding scheme into our sparse recovery procedure, offering the system more

flexibility to adjust sparsity levels. Furthermore, we have devised a sparse

coding layer utilizing a ‘skinny’ dictionary. Integral to computational

efficiency, these skinny dictionaries compress the high dimensional sparse

codes into lower dimensional structures. The adaptivity and discriminability of

our 13-layer sparse coding network are demonstrated on four benchmark datasets,

namely Cifar-10, Cifar-100, SVHN and MNIST, most of which are considered

difficult for sparse coding models. Experimental results show that our

architecture overwhelmingly outperforms traditional one-layer sparse coding

architectures while using much fewer parameters. Moreover, our multilayer

architecture fuses the benefits of depth with sparse coding’s characteristic

ability to operate on smaller datasets. In such data-constrained scenarios, we

demonstrate our technique can overcome the limitations of deep neural networks

by exceeding the state of the art in accuracy.

Pooling Facial Segments to Face: The Shallow and Deep Ends

Upal Mahbub , Sayantan Sarkar , Rama Chellappa

Comments: 8 pages, 7 figures, 3 tables, accepted for publication in FG2017

Subjects

Computer Vision and Pattern Recognition (cs.CV)

Generic face detection algorithms do not perform very well in the mobile

domain due to significant presence of occluded and partially visible faces. One

promising technique to handle the challenge of partial faces is to design face

detectors based on facial segments. In this paper two such face detectors

namely, SegFace and DeepSegFace, are proposed that detect the presence of a

face given arbitrary combinations of certain face segments. Both methods use

proposals from facial segments as input that are found using weak boosted

classifiers. SegFace is a shallow and fast algorithm using traditional

features, tailored for situations where real time constraints must be

satisfied. On the other hand, DeepSegFace is a more powerful algorithm based on

a deep convolutional neutral network (DCNN) architecture. DeepSegFace offers

certain advantages over other DCNN-based face detectors as it requires

relatively little amount of data to train by utilizing a novel data

augmentation scheme and is very robust to occlusion by design. Extensive

experiments show the superiority of the proposed methods, specially

DeepSegFace, over other state-of-the-art face detectors in terms of

precision-recall and ROC curve on two mobile face datasets.

Treelogy: A Novel Tree Classifier Utilizing Deep and Hand-crafted Representations

İlke Çuğu , Eren Şener , Çağrı Erciyes , Burak Balcı , Emre Akın , Itır Önal , Ahmet Oğuz Akyüz Subjects : Computer Vision and Pattern Recognition (cs.CV)

We propose a novel tree classification system called Treelogy, that fuses

deep representations with hand-crafted features obtained from leaf images to

perform leaf-based plant classification. Key to this system are segmentation of

the leaf from an untextured background, using convolutional neural networks

(CNNs) for learning deep representations, extracting hand-crafted features with

a number of image processing techniques, training a linear SVM with feature

vectors, merging SVM and CNN results, and identifying the species from a

dataset of 57 trees. Our classification results show that fusion of deep

representations with hand-crafted features leads to the highest accuracy. The

proposed algorithm is embedded in a smart-phone application, which is publicly

available. Furthermore, our novel dataset comprised of 5408 leaf images is also

made public for use of other researchers.

Face Detection using Deep Learning: An Improved Faster RCNN Approach

Xudong Sun , Pengcheng Wu , Steven C.H. Hoi Subjects : Computer Vision and Pattern Recognition (cs.CV)

In this report, we present a new face detection scheme using deep learning

and achieve the state-of-the-art detection performance on the well-known FDDB

face detetion benchmark evaluation. In particular, we improve the

state-of-the-art faster RCNN framework by combining a number of strategies,

including feature concatenation, hard negative mining, multi-scale training,

model pretraining, and proper calibration of key parameters. As a consequence,

the proposed scheme obtained the state-of-the-art face detection performance,

making it the best model in terms of ROC curves among all the published methods

on the FDDB benchmark.

Pruned non-local means

Sanjay Ghosh , Amit K. Mandal , Kunal N. Chaudhury

Comments: Accepted in IET Image Processing, 16 pages

Subjects

Computer Vision and Pattern Recognition (cs.CV)

In Non-Local Means (NLM), each pixel is denoised by performing a weighted

averaging of its neighboring pixels, where the weights are computed using image

patches. We demonstrate that the denoising performance of NLM can be improved

by pruning the neighboring pixels, namely, by rejecting neighboring pixels

whose weights are below a certain threshold (lambda). While pruning can

potentially reduce pixel averaging in uniform-intensity regions, we demonstrate

that there is generally an overall improvement in the denoising performance. In

particular, the improvement comes from pixels situated close to edges and

corners. The success of the proposed method strongly depends on the choice of

the global threshold (lambda), which in turn depends on the noise level and

the image characteristics. We show how Stein’s unbiased estimator of the

mean-squared error can be used to optimally tune (lambda), at a marginal

computational overhead. We present some representative denoising results to

demonstrate the superior performance of the proposed method over NLM and its

variants.

Exploiting saliency for object segmentation from image level labels

Seong Joon Oh , Rodrigo Benenson , Anna Khoreva , Zeynep Akata , Mario Fritz , Bernt Schiele

Comments: Submitted to CVPR 2017

Subjects

Computer Vision and Pattern Recognition (cs.CV)

There have been remarkable improvements in the semantic labelling task in the

recent years. However, the state of the art methods rely on large-scale

pixel-level annotations. This paper studies the problem of training a

pixel-wise semantic labeller network from image-level annotations of the

present object classes. Recently, it has been shown that high quality seeds

indicating discriminative object regions can be obtained from image-level

labels. Without additional information, obtaining the full extent of the object

is an inherently ill-posed problem due to co-occurrences. We propose using a

saliency model as additional information and hereby exploit prior knowledge on

the object extent and image statistics. We show how to combine both information

sources in order to recover 80% of the fully supervised performance – which is

the new state of the art in weakly supervised training for pixel-wise semantic

labelling.