Comments: 4 pages, 3 figures
Subjects:
Neural and Evolutionary Computing (cs.NE)
Recurrent neural networks (RNNs) are capable of learning to generate highly
realistic, online handwritings in a wide variety of styles from a given text
sequence. Furthermore, the networks can generate handwritings in the style of a
particular writer when the network states are primed with a real sequence of
pen movements from the writer. However, how populations of neurons in the RNN
collectively achieve such performance still remains poorly understood. To
tackle this problem, we investigated learned representations in RNNs by
extracting low-dimensional, neural trajectories that summarize the activity of
a population of neurons in the network during individual syntheses of
handwritings. The neural trajectories show that different writing styles are
encoded in different subspaces inside an internal space of the network. Within
each subspace, different characters of the same style are represented as
different state dynamics. These results demonstrate the effectiveness of
analyzing the neural trajectory for intuitive understanding of how the RNNs
work.
Comments: 15 pages, 5 figures
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Learning (cs.LG); Machine Learning (stat.ML)
Given the success of the gated recurrent unit, a natural question is whether
all the gates of the long short-term memory (LSTM) network are necessary.
Previous research has shown that the forget gate is one of the most important
gates in the LSTM. Here we show that a forget-gate-only version of the LSTM
with chrono-initialized biases, not only provides computational savings but
outperforms the standard LSTM on multiple benchmark datasets and competes with
some of the best contemporary models. Our proposed network, the JANET, achieves
accuracies of 99% and 92.5% on the MNIST and pMNIST datasets, outperforming the
standard LSTM which yields accuracies of 98.5% and 91%.
Representing smooth functions as compositions of near-identity functions with implications for deep network optimization
Peter L. Bartlett , Steven N. Evans , Philip M. Long Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Statistics Theory (math.ST); Machine Learning (stat.ML)
We show that any smooth bi-Lipschitz (h) can be represented exactly as a
composition (h_m circ … circ h_1) of functions (h_1,…,h_m) that are close
to the identity in the sense that each (left(h_i-mathrm{Id}
ight)) is
Lipschitz, and the Lipschitz constant decreases inversely with the number (m)
of functions composed. This implies that (h) can be represented to any accuracy
by a deep residual network whose nonlinear layers compute functions with a
small Lipschitz constant. Next, we consider nonlinear regression with a
composition of near-identity nonlinear maps. We show that, regarding Fr’echet
derivatives with respect to the (h_1,…,h_m), any critical point of a
quadratic criterion in this near-identity region must be a global minimizer. In
contrast, if we consider derivatives with respect to parameters of a fixed-size
residual network with sigmoid activation functions, we show that there are
near-identity critical points that are suboptimal, even in the realizable case.
Informally, this means that functional gradient methods for residual networks
cannot get stuck at suboptimal critical points corresponding to near-identity
layers, whereas parametric gradient methods for sigmoidal residual networks
suffer from suboptimal critical points in the near-identity region.
Comments: 11 pages, 14 figures. Part of the content have been published in IPSJ SIG Technical Report, Vol. 2017-HPC-162, No. 22, pp. 1-9, 2017. (DOI: this http URL )
Subjects:
Learning (cs.LG)
; Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
NVIDIA cuDNN is a low-level library that provides GPU kernels frequently used
in deep learning. Specifically, cuDNN implements several equivalent convolution
algorithms, whose performance and memory footprint may vary considerably,
depending on the layer dimensions. When an algorithm is automatically selected
by cuDNN, the decision is performed on a per-layer basis, and thus it often
resorts to slower algorithms that fit the workspace size constraints. We
present {mu}-cuDNN, a transparent wrapper library for cuDNN, which divides
layers’ mini-batch computation into several micro-batches. Based on Dynamic
Programming and Integer Linear Programming, {mu}-cuDNN enables faster
algorithms by decreasing the workspace requirements. At the same time,
{mu}-cuDNN keeps the computational semantics unchanged, so that it decouples
statistical efficiency from the hardware efficiency safely. We demonstrate the
effectiveness of {mu}-cuDNN over two frameworks, Caffe and TensorFlow,
achieving speedups of 1.63x for AlexNet and 1.21x for ResNet-18 on P100-SXM2
GPU. These results indicate that using micro-batches can seamlessly increase
the performance of deep learning, while maintaining the same memory footprint.
Christoph Treude , Markus Wagner Subjects : Computation and Language (cs.CL) ; Neural and Evolutionary Computing (cs.NE)
To make sense of large amounts of textual data, topic modelling is frequently
used as a text-mining tool for the discovery of hidden semantic structures in
text bodies. Latent Dirichlet allocation (LDA) is a commonly used topic model
that aims to explain the structure of a corpus by grouping texts. LDA requires
multiple parameters to work well, and there are only rough and sometimes
conflicting guidelines available on how these parameters should be set. In this
paper, we contribute (i) a broad study of parameters to arrive at good local
optima, (ii) an a-posteriori characterisation of text corpora related to eight
programming languages from GitHub and Stack Overflow, and (iii) an analysis of
corpus feature importance via per-corpus LDA configuration.
Comments: Accepted by The IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018)
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
In many computer vision applications, obtaining images of high resolution in
both the spatial and spectral domains are equally important. However, due to
hardware limitations, one can only expect to acquire images of high resolution
in either the spatial or spectral domains. This paper focuses on hyperspectral
image super-resolution (HSI-SR), where a hyperspectral image (HSI) with low
spatial resolution (LR) but high spectral resolution is fused with a
multispectral image (MSI) with high spatial resolution (HR) but low spectral
resolution to obtain HR HSI. Existing deep learning-based solutions are all
supervised that would need a large training set and the availability of HR HSI,
which is unrealistic. Here, we make the first attempt to solving the HSI-SR
problem using an unsupervised encoder-decoder architecture that carries the
following uniquenesses. First, it is composed of two encoder-decoder networks,
coupled through a shared decoder, in order to preserve the rich spectral
information from the HSI network. Second, the network encourages the
representations from both modalities to follow a sparse Dirichlet distribution
which naturally incorporates the two physical constraints of HSI and MSI.
Third, the angular difference between representations are minimized in order to
reduce the spectral distortion. We refer to the proposed architecture as
unsupervised Sparse Dirichlet-Net, or uSDN. Extensive experimental results
demonstrate the superior performance of uSDN as compared to the
state-of-the-art.
Comparatives, Quantifiers, Proportions: A Multi-Task Model for the Learning of Quantities from Vision
Comments: 12 pages (references included). To appear in the Proceedings of NAACL-HLT 2018
Journal-ref: Proceedings of NAACL-HLT 2018
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Learning (cs.LG); Machine Learning (stat.ML)
The present work investigates whether different quantification mechanisms
(set comparison, vague quantification, and proportional estimation) can be
jointly learned from visual scenes by a multi-task computational model. The
motivation is that, in humans, these processes underlie the same cognitive,
non-symbolic ability, which allows an automatic estimation and comparison of
set magnitudes. We show that when information about lower-complexity tasks is
available, the higher-level proportional task becomes more accurate than when
performed in isolation. Moreover, the multi-task model is able to generalize to
unseen combinations of target/non-target objects. Consistently with behavioral
evidence showing the interference of absolute number in the proportional task,
the multi-task model no longer works when asked to provide the number of target
objects in the scene.
Convolutional Neural Networks for Skull-stripping in Brain MR Imaging using Consensus-based Silver standard Masks
Oeslle Lucena , Roberto Souza , Leticia Rittner , Richard Frayne , Roberto Lotufo Subjects : Computer Vision and Pattern Recognition (cs.CV)
Convolutional neural networks (CNN) for medical imaging are constrained by
the number of annotated data required in the training stage. Usually, manual
annotation is considered to be the “gold standard”. However, medical imaging
datasets that include expert manual segmentation are scarce as this step is
time-consuming, and therefore expensive. Moreover, single-rater manual
annotation is most often used in data-driven approaches making the network
optimal with respect to only that single expert. In this work, we propose a CNN
for brain extraction in magnetic resonance (MR) imaging, that is fully trained
with what we refer to as silver standard masks. Our method consists of 1)
developing a dataset with “silver standard” masks as input, and implementing
both 2) a tri-planar method using parallel 2D U-Net-based CNNs (referred to as
CONSNet) and 3) an auto-context implementation of CONSNet. The term CONSNet
refers to our integrated approach, i.e., training with silver standard masks
and using a 2D U-Net-based architecture. Our results showed that we
outperformed (i.e., larger Dice coefficients) the current state-of-the-art SS
methods. Our use of silver standard masks reduced the cost of manual
annotation, decreased inter-intra-rater variability, and avoided CNN
segmentation super-specialization towards one specific manual annotation
guideline that can occur when gold standard masks are used. Moreover, the usage
of silver standard masks greatly enlarges the volume of input annotated data
because we can relatively easily generate labels for unlabeled data. In
addition, our method has the advantage that, once trained, it takes only a few
seconds to process a typical brain image volume using modern hardware, such as
a high-end graphics processing unit. In contrast, many of the other competitive
methods have processing times in the order of minutes.
An efficient deep convolutional laplacian pyramid architecture for CS reconstruction at low sampling ratios
Comments: 5 pages. Accepted by ICASSP2018
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
The compressed sensing (CS) has been successfully applied to image
compression in the past few years as most image signals are sparse in a certain
domain. Several CS reconstruction models have been proposed and obtained
superior performance. However, these methods suffer from blocking artifacts or
ringing effects at low sampling ratios in most cases. To address this problem,
we propose a deep convolutional Laplacian Pyramid Compressed Sensing Network
(LapCSNet) for CS, which consists of a sampling sub-network and a
reconstruction sub-network. In the sampling sub-network, we utilize a
convolutional layer to mimic the sampling operator. In contrast to the fixed
sampling matrices used in traditional CS methods, the filters used in our
convolutional layer are jointly optimized with the reconstruction sub-network.
In the reconstruction sub-network, two branches are designed to reconstruct
multi-scale residual images and muti-scale target images progressively using a
Laplacian pyramid architecture. The proposed LapCSNet not only integrates
multi-scale information to achieve better performance but also reduces
computational cost dramatically. Experimental results on benchmark datasets
demonstrate that the proposed method is capable of reconstructing more details
and sharper edges against the state-of-the-arts methods.
Comments: This work was submitted to MIDL 2018 Conference
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Fast and accurate anatomical landmark detection can benefit many medical
image analysis methods. Here, we propose a method to automatically detect
anatomical landmarks in medical images. Automatic landmark detection is
performed with a patch-based fully convolutional neural network (FCNN) that
combines regression and classification. For any given image patch, regression
is used to predict the 3D displacement vector from the image patch to the
landmark. Simultaneously, classification is used to identify patches that
contain the landmark. Under the assumption that patches close to a landmark can
determine the landmark location more precisely than patches farther from it,
only those patches that contain the landmark according to classification are
used to determine the landmark location. The landmark location is obtained by
calculating the average landmark location using the computed 3D displacement
vectors. The method is evaluated using detection of six clinically relevant
landmarks in coronary CT angiography (CCTA) scans: the right and left ostium,
the bifurcation of the left main coronary artery (LM) into the left anterior
descending and the left circumflex artery, and the origin of the right,
non-coronary, and left aortic valve commissure. The proposed method achieved an
average Euclidean distance error of 2.19 mm and 2.88 mm for the right and left
ostium respectively, 3.78 mm for the bifurcation of the LM, and 1.82 mm, 2.10
mm and 1.89 mm for the origin of the right, non-coronary, and left aortic valve
commissure respectively, demonstrating accurate performance. The proposed
combination of regression and classification can be used to accurately detect
landmarks in CCTA scans.
Mariyanayagam Damien , Gurdjos Pierre , Chambon Sylvie , Brunet Florent , Charvillat Vincent Subjects : Computer Vision and Pattern Recognition (cs.CV)
Circular markers are planar markers which offer great performances for
detection and pose estimation. For an uncalibrated camera with an unknown focal
length, at least the images of at least two coplanar circles are generally
required to recover their poses. Unfortunately, detecting more than one ellipse
in the image must be tricky and time-consuming, especially regarding concentric
circles. On the other hand, when the camera is calibrated, one circle suffices
but the solution is twofold and can hardly be disambiguated. Our contribution
is to put beyond this limit by dealing with the uncalibrated case of a camera
seeing one circle and discussing how to remove the ambiguity. We propose a new
problem formulation that enables to show how to detect geometric configurations
in which the ambiguity can be removed. Furthermore, we introduce the notion of
default camera intrinsics and show, using intensive empirical works, the
surprising observation that very approximate calibration can lead to accurate
circle pose estimation.
Learning to Exploit the Prior Network Knowledge for Weakly-Supervised Semantic Segmentation
Carolina Redondo-Cabrera , Roberto J. López-Sastre Subjects : Computer Vision and Pattern Recognition (cs.CV)
Training a Convolutional Neural Network (CNN) for semantic segmentation
typically requires to collect a large amount of accurate pixel-level
annotations, a hard and expensive task. In contrast, simple image tags are
easier to gather. With this paper we introduce a novel weakly-supervised
semantic segmentation model able to learn from image labels, and just image
labels. Our model uses the prior knowledge of a network trained for image
recognition, employing these image annotations, as an attention mechanism to
identify semantic regions in the images. We then present a methodology that
builds accurate class-specific segmentation masks from these regions, where
neither external objectness nor saliency algorithms are required. We describe
how to incorporate this mask generation strategy into a fully end-to-end
trainable process where the network jointly learns to classify and segment
images. Our experiments on PASCAL VOC 2012 dataset show that exploiting these
generated class-specific masks in conjunction with our novel end-to-end
learning process outperforms several recent weakly-supervised semantic
segmentation methods that use image tags only, and even some models that
leverage additional supervision or training data.
Comments: Submitted Under review to The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ECML-2018 Conference Dublin, Ireland during the 10-14 September 2018
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Unlike conventional anomaly detection research that focuses on point
anomalies, our goal is to detect anomalous collections of individual data
points. In particular, we perform group anomaly detection (GAD) with an
emphasis on irregular group distributions (e.g. irregular mixtures of image
pixels). GAD is an important task in detecting unusual and anomalous phenomena
in real-world applications such as high energy particle physics, social media,
and medical imaging. In this paper, we take a generative approach by proposing
deep generative models: Adversarial autoencoder (AAE) and variational
autoencoder (VAE) for group anomaly detection. Both AAE and VAE detect group
anomalies using point-wise input data where group memberships are known a
priori. We conduct extensive experiments to evaluate our models on real-world
datasets. The empirical results demonstrate that our approach is effective and
robust in detecting group anomalies.
Gül Varol , Duygu Ceylan , Bryan Russell , Jimei Yang , Ersin Yumer , Ivan Laptev , Cordelia Schmid Subjects : Computer Vision and Pattern Recognition (cs.CV)
Human shape estimation is an important task for video editing, animation and
fashion industry. Predicting 3D human body shape from natural images, however,
is highly challenging due to factors such as variation in human bodies,
clothing and viewpoint. Prior methods addressing this problem typically attempt
to fit parametric body models with certain priors on pose and shape. In this
work we argue for an alternative representation and propose BodyNet, a neural
network for direct inference of volumetric body shape from a single image.
BodyNet is an end-to-end trainable network that benefits from (i) a volumetric
3D loss, (ii) a multi-view re-projection loss, and (iii) intermediate
supervision of 2D pose, 2D body part segmentation, and 3D pose. Each of them
results in performance improvement as demonstrated by our experiments. To
evaluate the method, we fit the SMPL model to our network output and show
state-of-the-art results on the SURREAL and Unite the People datasets,
outperforming recent approaches. Besides achieving state-of-the-art
performance, our method also enables volumetric body-part segmentation.
Comments: 25 pages, 14 figures and 1 table
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
This paper studies the problem of blind face restoration from an
unconstrained blurry, noisy, low-resolution, or compressed image (i.e.,
degraded observation). For better recovery of fine facial details, we modify
the problem setting by taking both the degraded observation and a high-quality
guided image of the same identity as input to our guided face restoration
network (GFRNet). However, the degraded observation and guided image generally
are different in pose, illumination and expression, thereby making plain CNNs
(e.g., U-Net) fail to recover fine and identity-aware facial details. To tackle
this issue, our GFRNet model includes both a warping subnetwork (WarpNet) and a
reconstruction subnetwork (RecNet). The WarpNet is introduced to predict flow
field for warping the guided image to correct pose and expression (i.e., warped
guidance), while the RecNet takes the degraded observation and warped guidance
as input to produce the restoration result. Due to that the ground-truth flow
field is unavailable, landmark loss together with total variation
regularization are incorporated to guide the learning of WarpNet. Furthermore,
to make the model applicable to blind restoration, our GFRNet is trained on the
synthetic data with versatile settings on blur kernel, noise level,
downsampling scale factor, and JPEG quality factor. Experiments show that our
GFRNet not only performs favorably against the state-of-the-art image and face
restoration methods, but also generates visually photo-realistic results on
real degraded facial images.
Comments: To appear in CVPR 2018
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
In this paper we derive and test a probability-based weighting that can
balance residuals of different types in spline fitting. In contrast to previous
formulations, the proposed spline error weighting scheme also incorporates a
prediction of the approximation error of the spline fit. We demonstrate the
effectiveness of the prediction in a synthetic experiment, and apply it to
visual-inertial fusion on rolling shutter cameras. This results in a method
that can estimate 3D structure with metric scale on generic first-person
videos. We also propose a quality measure for spline fitting, that can be used
to automatically select the knot spacing. Experiments verify that the obtained
trajectory quality corresponds well with the requested quality. Finally, by
linearly scaling the weights, we show that the proposed spline error weighting
minimizes the estimation errors on real sequences, in terms of scale and
end-point errors.
Ryoichi Ishikawa , Takeshi Oishi , Katsushi Ikeuchi Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Robotics (cs.RO)
Robot navigation technology is required to accomplish difficult tasks in
various environments. In navigation, it is necessary to know the information of
the external environments and the state of the robot under the environment. On
the other hand, various studies have been done on SLAM technology, which is
also used for navigation, but also applied to devices for Mixed Reality and the
like.
In this paper, we propose a robot-device calibration method for navigation
with a device using SLAM technology on a robot. The calibration is performed by
using the position and orientation information given by the robot and the
device. In the calibration, the most efficient way of movement is clarified
according to the restriction of the robot movement. Furthermore, we also show a
method to dynamically correct the position and orientation of the robot so that
the information of the external environment and the shape information of the
robot maintain consistency in order to reduce the dynamic error occurring
during navigation.
Our method can be easily used for various kinds of robots and localization
with sufficient precision for navigation is possible with offline calibration
and online position correction. In the experiments, we confirm the parameters
obtained by two types of offline calibration according to the degree of freedom
of robot movement and validate the effectiveness of online correction method by
plotting localized position error during robot’s intense movement. Finally, we
show the demonstration of navigation using SLAM device.
Comments: 17 pages, 7 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
The extraction of meaningful features from videos is important as they can be
used in various applications. Despite its importance, video representation
learning has not been studied much, because it is challenging to deal with both
content and motion information. We present a Mutual Suppression network (MSnet)
to learn disentangled motion and content features in videos. The MSnet is
trained in such way that content features do not contain motion information and
motion features do not contain content information; this is done by suppressing
each other with adversarial training. We utilize the disentangled features from
the MSnet for several tasks, such as frame reproduction, pixel-level video
frame prediction, and dense optical flow estimation, to demonstrate the
strength of MSnet. The proposed model outperforms the state-of-the-art methods
in pixel-level video frame prediction. The source code will be publicly
available.
Comments: This paper is accepted at CVPR 2018 as poster
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Human free-hand sketches have been studied in various contexts including
sketch recognition, synthesis and fine-grained sketch-based image retrieval
(FG-SBIR). A fundamental challenge for sketch analysis is to deal with
drastically different human drawing styles, particularly in terms of
abstraction level. In this work, we propose the first stroke-level sketch
abstraction model based on the insight of sketch abstraction as a process of
trading off between the recognizability of a sketch and the number of strokes
used to draw it. Concretely, we train a model for abstract sketch generation
through reinforcement learning of a stroke removal policy that learns to
predict which strokes can be safely removed without affecting recognizability.
We show that our abstraction model can be used for various sketch analysis
tasks including: (1) modeling stroke saliency and understanding the decision of
sketch recognition models, (2) synthesizing sketches of variable abstraction
for a given category, or reference object instance in a photo, and (3) training
a FG-SBIR model with photos only, bypassing the expensive photo-sketch pair
collection step.
Haonan Qiu , Yingbin Zheng , Hao Ye , Yao Lu , Feng Wang , Liang He Subjects : Computer Vision and Pattern Recognition (cs.CV)
Locating actions in long untrimmed videos has been a challenging problem in
video content analysis. The performances of existing action localization
approaches remain unsatisfactory in precisely determining the beginning and the
end of an action. Imitating the human perception procedure with observations
and refinements, we propose a novel three-phase action localization framework.
Our framework is embedded with an Actionness Network to generate initial
proposals through frame-wise similarity grouping, and then a Refinement Network
to conduct boundary adjustment on these proposals. Finally, the refined
proposals are sent to a Localization Network for further fine-grained location
regression. The whole process can be deemed as multi-stage refinement using a
novel non-local pyramid feature under various temporal granularities. We
evaluate our framework on THUMOS14 benchmark and obtain a significant
improvement over the state-of-the-arts approaches. Specifically, the
performance gain is remarkable under precise localization with high IoU
thresholds. Our proposed framework achieves mAP@IoU=0.5 of 34.2%.
Comments: Project Page: this http URL
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Given an arbitrary face image and an arbitrary speech clip, the proposed work
attempts to generating the talking face video with accurate lip synchronization
while maintaining smooth transition of both lip and facial movement over the
entire video clip. Existing works either do not consider temporal dependency on
face images across different video frames thus easily yielding
noticeable/abrupt facial and lip movement or are only limited to the generation
of talking face video for a specific person thus lacking generalization
capacity. We propose a novel conditional video generation network where the
audio input is treated as a condition for the recurrent adversarial network
such that temporal dependency is incorporated to realize smooth transition for
the lip and facial movement. In addition, we deploy a multi-task adversarial
training scheme in the context of video generation to improve both
photo-realism and the accuracy for lip synchronization. Finally, based on the
phoneme distribution information extracted from the audio clip, we develop a
sample selection method that effectively reduces the size of the training
dataset without sacrificing the quality of the generated video. Extensive
experiments on both controlled and uncontrolled datasets demonstrate the
superiority of the proposed approach in terms of visual quality, lip sync
accuracy, and smooth transition of lip and facial movement, as compared to the
state-of-the-art.
Comments: 17 pages, 5 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Motion boundary detection is a crucial yet challenging problem. Prior methods
focus on analyzing the gradients and distributions of optical flow fields, or
use hand-crafted features for motion boundary learning. In this paper, we
propose the first dedicated end-to-end deep learning approach for motion
boundary detection, which we term as MoBoNet. We introduce a refinement network
structure which takes source input images, initial forward and backward optical
flows as well as corresponding warping errors as inputs and produces
high-resolution motion boundaries. Furthermore, we show that the obtained
motion boundaries, through a fusion sub-network we design, can in turn guide
the optical flows for removing the artifacts. The proposed MoBoNet is generic
and works with any optical flows. Our motion boundary detection and the refined
optical flow estimation achieve results superior to the state of the art.
Comments: 16 pages, 5 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Images captured by fisheye lenses violate the pinhole camera assumption and
suffer from distortions. Rectification of fisheye images is therefore a crucial
preprocessing step for many computer vision applications. In this paper, we
propose an end-to-end multi-context collaborative deep network for removing
distortions from single fisheye images. In contrast to conventional approaches,
which focus on extracting hand-crafted features from input images, our method
learns high-level semantics and low-level appearance features simultaneously to
estimate the distortion parameters. To facilitate training, we construct a
synthesized dataset that covers various scenes and distortion parameter
settings. Experiments on both synthesized and real-world datasets show that the
proposed model significantly outperforms current state of the art methods. Our
code and synthesized dataset will be made publicly available.
Comments: 17 pages of main paper and 5 pages of supplementary materials
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Cryptography and Security (cs.CR)
As more and more personal photos are shared and tagged in social media,
avoiding privacy risks such as unintended recognition becomes increasingly
challenging. We propose a new hybrid approach to obfuscate identities in photos
by head replacement. Our approach combines state of the art parametric face
synthesis with latest advances in Generative Adversarial Networks (GAN) for
data-driven image synthesis. On the one hand, the parametric part of our method
gives us control over the facial parameters and allows for explicit
manipulation of the identity. On the other hand, the data-driven aspects allow
for adding fine details and overall realism as well as seamless blending into
the scene context. In our experiments, we show highly realistic output of our
system that improves over the previous state of the art in obfuscation rate
while preserving a higher similarity to the original image content.
Comments: Code: this https URL
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Learning (cs.LG); Machine Learning (stat.ML)
Unsupervised image-to-image translation is an important and challenging
problem in computer vision. Given an image in the source domain, the goal is to
learn the conditional distribution of corresponding images in the target
domain, without seeing any pairs of corresponding images. While this
conditional distribution is inherently multimodal, existing approaches make an
overly simplified assumption, modeling it as a deterministic one-to-one
mapping. As a result, they fail to generate diverse outputs from a given source
domain image. To address this limitation, we propose a Multimodal Unsupervised
Image-to-image Translation (MUNIT) framework. We assume that the image
representation can be decomposed into a content code that is domain-invariant,
and a style code that captures domain-specific properties. To translate an
image to another domain, we recombine its content code with a random style code
sampled from the style space of the target domain. We analyze the proposed
framework and establish several theoretical results. Extensive experiments with
comparisons to the state-of-the-art approaches further demonstrates the
advantage of the proposed framework. Moreover, our framework allows users to
control the style of translation outputs by providing an example style image.
Code and pretrained models are available at this https URL
Comments: CVPR 2018 (Spotlight). Project Page at this https URL
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Deep generative models have demonstrated great performance in image
synthesis. However, results deteriorate in case of spatial deformations, since
they generate images of objects directly, rather than modeling the intricate
interplay of their inherent shape and appearance. We present a conditional
U-Net for shape-guided image generation, conditioned on the output of a
variational autoencoder for appearance. The approach is trained end-to-end on
images, without requiring samples of the same object with varying pose or
appearance. Experiments show that the model enables conditional image
generation and transfer. Therefore, either shape or appearance can be retained
from a query image, while freely altering the other. Moreover, appearance can
be sampled due to its stochastic latent representation, while preserving shape.
In quantitative and qualitative experiments on COCO, DeepFashion, shoes,
Market-1501 and handbags, the approach demonstrates significant improvements
over the state-of-the-art.
Comments: Submitted to IEEE TIP Journal
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
In real-world visual recognition problems, the assumption that the training
data (source domain) and test data (target domain) are sampled from the same
distribution is often violated. This is known as the domain adaptation problem.
In this work, we propose a novel domain-adaptive dictionary learning framework
for cross-domain visual recognition. Our method generates a set of intermediate
domains. These intermediate domains form a smooth path and bridge the gap
between the source and target domains. Specifically, we not only learn a common
dictionary to encode the domain-shared features, but also learn a set of
domain-specific dictionaries to model the domain shift. The separation of the
common and domain-specific dictionaries enables us to learn more compact and
reconstructive dictionaries for domain adaptation. These dictionaries are
learned by alternating between domain-adaptive sparse coding and dictionary
updating steps. Meanwhile, our approach gradually recovers the feature
representations of both source and target data along the domain path. By
aligning all the recovered domain data, we derive the final domain-adaptive
features for cross-domain visual recognition. Extensive experiments on three
public datasets demonstrates that our approach outperforms most
state-of-the-art methods.
Ganesh Iyer , J. Krishna Murthy , Gunshi Gupta , K. Madhava Krishna , Liam Paull Subjects : Robotics (cs.RO) ; Computer Vision and Pattern Recognition (cs.CV)
With the success of deep learning based approaches in tackling challenging
problems in computer vision, a wide range of deep architectures have recently
been proposed for the task of visual odometry (VO) estimation. Most of these
proposed solutions rely on supervision, which requires the acquisition of
precise ground-truth camera pose information, collected using expensive motion
capture systems or high-precision IMU/GPS sensor rigs. In this work, we propose
an unsupervised paradigm for deep visual odometry learning. We show that using
a noisy teacher, which could be a standard VO pipeline, and by designing a loss
term that enforces geometric consistency of the trajectory, we can train
accurate deep models for VO that do not require ground-truth labels. We
leverage geometry as a self-supervisory signal and propose “Composite
Transformation Constraints (CTCs)”, that automatically generate supervisory
signals for training and enforce geometric consistency in the VO estimate. We
also present a method of characterizing the uncertainty in VO estimates thus
obtained. To evaluate our VO pipeline, we present exhaustive ablation studies
that demonstrate the efficacy of end-to-end, self-supervised methodologies to
train deep models for monocular VO. We show that leveraging concepts from
geometry and incorporating them into the training of a recurrent neural network
results in performance competitive to supervised deep VO methods.
Comments: Submitted to IEEE International Conference on Intelligent Robots and Systems (IROS) 2018
Subjects:
Robotics (cs.RO)
; Computer Vision and Pattern Recognition (cs.CV)
3D LiDARs and 2D cameras are increasingly being used alongside each other in
sensor rigs for perception tasks. Before these sensors can be used to gather
meaningful data, however, their extrinsics (and intrinsics) need to be
accurately calibrated, as the performance of the sensor rig is extremely
sensitive to these calibration parameters. A vast majority of existing
calibration techniques require significant amounts of data and/or calibration
targets and human effort, severely impacting their applicability in large-scale
production systems. We address this gap with CalibNet: a self-supervised deep
network capable of automatically estimating the 6-DoF rigid body transformation
between a 3D LiDAR and a 2D camera in real-time. CalibNet alleviates the need
for calibration targets, thereby resulting in significant savings in
calibration efforts. During training, the network only takes as input a LiDAR
point cloud, the corresponding monocular image, and the camera calibration
matrix K. At train time, we do not impose direct supervision (i.e., we do not
directly regress to the calibration parameters, for example). Instead, we train
the network to predict calibration parameters that maximize the geometric and
photometric consistency of the input images and point clouds. CalibNet learns
to iteratively solve the underlying geometric problem and accurately predicts
extrinsic calibration parameters for a wide range of mis-calibrations, without
requiring retraining or domain adaptation. The project page is hosted at
this https URLTobias Käfer , Andreas Harth Subjects : Artificial Intelligence (cs.AI) ; Software Engineering (cs.SE)
The W3C’s Web of Things working group is aimed at addressing the
interoperability problem on the Internet of Things using Linked Data as uniform
interface. While Linked Data paves the way towards combining such devices into
integrated applications, traditional solutions for specifying the control flow
of applications do not work seamlessly with Linked Data. We therefore tackle
the problem of the specification, execution, and monitoring of applications in
the context of Linked Data. We present a novel approach that combines
workflows, semantic reasoning, and RESTful interaction into one integrated
solution. We contribute to the state of the art by (1) defining an ontology for
describing workflow models and instances, (2) providing operational semantics
for the ontology that allows for the execution and monitoring of workflow
instances, (3) presenting a benchmark to evaluate our solution. Moreover, we
showcase how we used the ontology and the operational semantics to monitor
pilots executing workflows in virtual aircraft cockpits.
Roman Václavík , Přemysl Šůcha , Zdeněk Hanzálek Subjects : Artificial Intelligence (cs.AI) ; Learning (cs.LG); Optimization and Control (math.OC)
The personnel scheduling problem is a well-known NP-hard combinatorial
problem. Due to the complexity of this problem and the size of the real-world
instances, it is not possible to use exact methods, and thus heuristics,
meta-heuristics, or hyper-heuristics must be employed. The majority of
heuristic approaches are based on iterative search, where the quality of
intermediate solutions must be calculated. Unfortunately, this is
computationally highly expensive because these problems have many constraints
and some are very complex. In this study, we propose a machine learning
technique as a tool to accelerate the evaluation phase in heuristic approaches.
The solution is based on a simple classifier, which is able to determine
whether the changed solution (more precisely, the changed part of the solution)
is better than the original or not. This decision is made much faster than a
standard cost-oriented evaluation process. However, the classification process
cannot guarantee 100% correctness. Therefore, our approach, which is
illustrated using a tabu search algorithm in this study, includes a filtering
mechanism, where the classifier rejects the majority of the potentially bad
solutions and the remaining solutions are then evaluated in a standard manner.
We also show how the boosting algorithms can improve the quality of the final
solution compared with a simple classifier. We verified our proposed approach
and premises, based on standard and real-world benchmark instances, to
demonstrate the significant speedup obtained with comparable solution quality.
Representing smooth functions as compositions of near-identity functions with implications for deep network optimization
Peter L. Bartlett , Steven N. Evans , Philip M. Long Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Statistics Theory (math.ST); Machine Learning (stat.ML)
We show that any smooth bi-Lipschitz (h) can be represented exactly as a
composition (h_m circ … circ h_1) of functions (h_1,…,h_m) that are close
to the identity in the sense that each (left(h_i-mathrm{Id}
ight)) is
Lipschitz, and the Lipschitz constant decreases inversely with the number (m)
of functions composed. This implies that (h) can be represented to any accuracy
by a deep residual network whose nonlinear layers compute functions with a
small Lipschitz constant. Next, we consider nonlinear regression with a
composition of near-identity nonlinear maps. We show that, regarding Fr’echet
derivatives with respect to the (h_1,…,h_m), any critical point of a
quadratic criterion in this near-identity region must be a global minimizer. In
contrast, if we consider derivatives with respect to parameters of a fixed-size
residual network with sigmoid activation functions, we show that there are
near-identity critical points that are suboptimal, even in the realizable case.
Informally, this means that functional gradient methods for residual networks
cannot get stuck at suboptimal critical points corresponding to near-identity
layers, whereas parametric gradient methods for sigmoidal residual networks
suffer from suboptimal critical points in the near-identity region.
Comments: 6 pages, 10 figures. arXiv admin note: substantial text overlap with arXiv:1804.02657 and arXiv:1804.03994
Journal-ref: Proc. of IEEE 7th International Workshop on Computational
Intelligence and Applications (IWCIA2014)
Subjects:
Human-Computer Interaction (cs.HC)
; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
An emotion orientated intelligent interface consists of Emotion Generating
Calculations (EGC) and Mental State Transition Network (MSTN). We have
developed the Android EGC application software which the agent works to
evaluate the feelings in the conversation. In this paper, we develop the
tourist information system which can estimate the user’s feelings at the
sightseeing spot. The system can recommend the sightseeing spot and the local
food corresponded to the user’s feeling. The system calculates the
recommendation list by the estimate function which consists of Google search
results, the important degree of a term at the sightseeing website, and the the
aroused emotion by EGC. In order to show the effectiveness, this paper
describes the experimental results for some situations during Hiroshima
sightseeing.
Sam Ganzfried , Austin Nowak , Joannier Pinales Subjects : Computer Science and Game Theory (cs.GT) ; Artificial Intelligence (cs.AI)
Creating strong agents for games with more than two players is a major open
problem in AI. Common approaches are based on approximating game-theoretic
solution concepts such as Nash equilibrium, which have strong theoretical
guarantees in two-player zero-sum games, but no guarantees in non-zero-sum
games or in games with more than two players. We describe an agent that is able
to defeat a variety of realistic opponents using an exact Nash equilibrium
strategy in a 3-player imperfect-information game. This shows that, despite a
lack of theoretical guarantees, agents based on Nash equilibrium strategies can
be successful in multiplayer games after all.
Shaojun Zhu , David Surovik , Kostas E. Bekris , Abdeslam Boularias Subjects : Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)
This paper aims to identify in a practical manner unknown physical
parameters, such as mechanical models of actuated robot links, which are
critical in dynamical robotic tasks. Key features include the use of an
off-the-shelf physics engine and the Bayesian optimization framework. The task
being considered is locomotion with a high-dimensional, compliant Tensegrity
robot. A key insight, in this case, is the need to project the model
identification challenge into an appropriate lower dimensional space for
efficiency. Comparisons with alternatives indicate that the proposed method can
identify the parameters more accurately within the given time budget, which
also results in more precise locomotion control.
Comments: 14 pages. arXiv admin note: text overlap with arXiv:1703.04247
Subjects:
Information Retrieval (cs.IR)
; Learning (cs.LG); Machine Learning (stat.ML)
Learning sophisticated feature interactions behind user behaviors is critical
in maximizing CTR for recommender systems. Despite great progress, existing
methods have a strong bias towards low- or high-order interactions, or rely on
expertise feature engineering. In this paper, we show that it is possible to
derive an end-to-end learning model that emphasizes both low- and high-order
feature interactions. The proposed framework, DeepFM, combines the power of
factorization machines for recommendation and deep learning for feature
learning in a new neural network architecture. Compared to the latest Wide &
Deep model from Google, DeepFM has a shared raw feature input to both its
“wide” and “deep” components, with no need of feature engineering besides raw
features. DeepFM, as a general learning framework, can incorporate various
network architectures in its deep component. In this paper, we study two
instances of DeepFM where its “deep” component is DNN and PNN respectively, for
which we denote as DeepFM-D and DeepFM-P. Comprehensive experiments are
conducted to demonstrate the effectiveness of DeepFM-D and DeepFM-P over the
existing models for CTR prediction, on both benchmark data and commercial data.
We conduct online A/B test in Huawei App Market, which reveals that DeepFM-D
leads to more than 10% improvement of click-through rate in the production
environment, compared to a well-engineered LR model. We also covered related
practice in deploying our framework in Huawei App Market.
Comments: 12 pages, Accepted in n 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2018
Subjects:
Information Retrieval (cs.IR)
; Learning (cs.LG)
Is it possible to extract malicious IP addresses reported in security forums
in an automatic way? This is the question at the heart of our work. We focus on
security forums, where security professionals and hackers share knowledge and
information, and often report misbehaving IP addresses. So far, there have only
been a few efforts to extract information from such security forums. We propose
RIPEx, a systematic approach to identify and label IP addresses in security
forums by utilizing a cross-forum learning method. In more detail, the
challenge is twofold: (a) identifying IP addresses from other numerical
entities, such as software version numbers, and (b) classifying the IP address
as benign or malicious. We propose an integrated solution that tackles both
these problems. A novelty of our approach is that it does not require training
data for each new forum. Our approach does knowledge transfer across forums: we
use a classifier from our source forums to identify seed information for
training a classifier on the target forum. We evaluate our method using data
collected from five security forums with a total of 31K users and 542K posts.
First, RIPEx can distinguish IP address from other numeric expressions with 95%
precision and above 93% recall on average. Second, RIPEx identifies malicious
IP addresses with an average precision of 88% and over 78% recall, using our
cross-forum learning. Our work is a first step towards harnessing the wealth of
useful information that can be found in security forums.
Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context
Comments: 10 pages, 4 figures
Journal-ref: Proceedings of the ACM/IEEE-CS Joint Conference on Digital
Libraries (JCDL), Jun. 2018, Fort Worth, USA
Subjects:
Digital Libraries (cs.DL)
; Information Retrieval (cs.IR)
Mathematical formulae represent complex semantic information in a concise
form. Especially in Science, Technology, Engineering, and Mathematics,
mathematical formulae are crucial to communicate information, e.g., in
scientific papers, and to perform computations using computer algebra systems.
Enabling computers to access the information encoded in mathematical formulae
requires machine-readable formats that can represent both the presentation and
content, i.e., the semantics, of formulae. Exchanging such information between
systems additionally requires conversion methods for mathematical
representation formats. We analyze how the semantic enrichment of formulae
improves the format conversion process and show that considering the textual
context of formulae reduces the error rate of such conversions. Our main
contributions are: (1) providing an openly available benchmark dataset for the
mathematical format conversion task consisting of a newly created test
collection, an extensive, manually curated gold standard and task-specific
evaluation metrics; (2) performing a quantitative evaluation of
state-of-the-art tools for mathematical format conversions; (3) presenting a
new approach that considers the textual context of formulae to reduce the error
rate for mathematical format conversions. Our benchmark dataset facilitates
future research on mathematical format conversions as well as research on many
problems in mathematical information retrieval. Because we annotated and linked
all components of formulae, e.g., identifiers, operators and other entities, to
Wikidata entries, the gold standard can, for instance, be used to train methods
for formula concept discovery and recognition. Such methods can then be applied
to improve mathematical information retrieval systems, e.g., for semantic
formula search, recommendation of mathematical content, or detection of
mathematical plagiarism.
Comments: 6 pages, 10 figures. arXiv admin note: substantial text overlap with arXiv:1804.02657 and arXiv:1804.03994
Journal-ref: Proc. of IEEE 7th International Workshop on Computational
Intelligence and Applications (IWCIA2014)
Subjects:
Human-Computer Interaction (cs.HC)
; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
An emotion orientated intelligent interface consists of Emotion Generating
Calculations (EGC) and Mental State Transition Network (MSTN). We have
developed the Android EGC application software which the agent works to
evaluate the feelings in the conversation. In this paper, we develop the
tourist information system which can estimate the user’s feelings at the
sightseeing spot. The system can recommend the sightseeing spot and the local
food corresponded to the user’s feeling. The system calculates the
recommendation list by the estimate function which consists of Google search
results, the important degree of a term at the sightseeing website, and the the
aroused emotion by EGC. In order to show the effectiveness, this paper
describes the experimental results for some situations during Hiroshima
sightseeing.
Chaochao Chen , Ziqi Liu , Peilin Zhao , Longfei Li , Jun Zhou , Xiaolong Li Subjects : Learning (cs.LG) ; Information Retrieval (cs.IR); Machine Learning (stat.ML)
Collaborative filtering, especially latent factor model, has been popularly
used in personalized recommendation. Latent factor model aims to learn user and
item latent factors from user-item historic behaviors. To apply it into real
big data scenarios, efficiency becomes the first concern, including offline
model training efficiency and online recommendation efficiency. In this paper,
we propose a Distributed Collaborative Hashing (DCH) model which can
significantly improve both efficiencies. Specifically, we first propose a
distributed learning framework, following the state-of-the-art parameter server
paradigm, to learn the offline collaborative model. Our model can be learnt
efficiently by distributedly computing subgradients in minibatches on workers
and updating model parameters on servers asynchronously. We then adopt hashing
technique to speedup the online recommendation procedure. Recommendation can be
quickly made through exploiting lookup hash tables. We conduct thorough
experiments on two real large-scale datasets. The experimental results
demonstrate that, comparing with the classic and state-of-the-art (distributed)
latent factor models, DCH has comparable performance in terms of recommendation
accuracy but has both fast convergence speed in offline model training
procedure and realtime efficiency in online recommendation procedure.
Furthermore, the encouraging performance of DCH is also shown for several
real-world applications in Ant Financial.
Comments: To appear at NAACL 2018 Industry Track
Subjects:
Computation and Language (cs.CL)
Neural machine translation has achieved levels of fluency and adequacy that
would have been surprising a short time ago. Output quality is extremely
relevant for industry purposes, however it is equally important to produce
results in the shortest time possible, mainly for latency-sensitive
applications and to control cloud hosting costs. In this paper we show the
effectiveness of translating with 8-bit quantization for models that have been
trained using 32-bit floating point values. Results show that 8-bit translation
makes a non-negligible impact in terms of speed with no degradation in accuracy
and adequacy.
Incorporating Dictionaries into Deep Neural Networks for the Chinese Clinical Named Entity Recognition
Comments: 21 pages, 6 figures
Subjects:
Computation and Language (cs.CL)
Clinical Named Entity Recognition (CNER) aims to identify and classify
clinical terms such as diseases, symptoms, treatments, exams, and body parts in
electronic health records, which is a fundamental and crucial task for clinical
and translational research. In recent years, deep neural networks have achieved
significant success in named entity recognition and many other Natural Language
Processing (NLP) tasks. Most of these algorithms are trained end to end, and
can automatically learn features from large scale labeled datasets. However,
these data-driven methods typically lack the capability of processing rare or
unseen entities. Previous statistical methods and feature engineering practice
have demonstrated that human knowledge can provide valuable information for
handling rare and unseen cases. In this paper, we address the problem by
incorporating dictionaries into deep neural networks for the Chinese CNER task.
Two different architectures that extend the Bi-directional Long Short-Term
Memory (Bi-LSTM) neural network and five different feature representation
schemes are proposed to handle the task. Computational results on the CCKS-2017
Task 2 benchmark dataset show that the proposed method achieves the highly
competitive performance compared with the state-of-the-art deep learning
methods.
Comments: 9 pages, 27 figures, goes to 1st Financial Narrative Processing Workshop @ LREC 7-12 May 2018, Miyazaki, Japan
Subjects:
Computation and Language (cs.CL)
Keeping the dialogue state in dialogue systems is a notoriously difficult
task. We introduce an ontology-based dialogue manage(OntoDM), a dialogue
manager that keeps the state of the conversation, provides a basis for anaphora
resolution and drives the conversation via domain ontologies. The banking and
finance area promises great potential for disambiguating the context via a rich
set of products and specificity of proper nouns, named entities and verbs. We
used ontologies both as a knowledge base and a basis for the dialogue manager;
the knowledge base component and dialogue manager components coalesce in a
sense. Domain knowledge is used to track Entities of Interest, i.e. nodes
(classes) of the ontology which happen to be products and services. In this way
we also introduced conversation memory and attention in a sense. We finely
blended linguistic methods, domain-driven keyword ranking and domain ontologies
to create ways of domain-driven conversation. Proposed framework is used in our
in-house German language banking and finance chatbots. General challenges of
German language processing and finance-banking domain chatbot language models
and lexicons are also introduced. This work is still in progress, hence no
success metrics have been introduced yet.
Christoph Treude , Markus Wagner Subjects : Computation and Language (cs.CL) ; Neural and Evolutionary Computing (cs.NE)
To make sense of large amounts of textual data, topic modelling is frequently
used as a text-mining tool for the discovery of hidden semantic structures in
text bodies. Latent Dirichlet allocation (LDA) is a commonly used topic model
that aims to explain the structure of a corpus by grouping texts. LDA requires
multiple parameters to work well, and there are only rough and sometimes
conflicting guidelines available on how these parameters should be set. In this
paper, we contribute (i) a broad study of parameters to arrive at good local
optima, (ii) an a-posteriori characterisation of text corpora related to eight
programming languages from GitHub and Stack Overflow, and (iii) an analysis of
corpus feature importance via per-corpus LDA configuration.
Comments: 13 pages, 1 figure
Journal-ref: Information Processing Letters, 116(2):100-106 (2016)
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
; Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
This paper investigates a variant of the work-stealing algorithm that we call
the localized work-stealing algorithm. The intuition behind this variant is
that because of locality, processors can benefit from working on their own
work. Consequently, when a processor is free, it makes a steal attempt to get
back its own work. We call this type of steal a steal-back. We show that the
expected running time of the algorithm is (T_1/P+O(T_infty P)), and that under
the “even distribution of free agents assumption”, the expected running time of
the algorithm is (T_1/P+O(T_inftylg P)). In addition, we obtain another
running-time bound based on ratios between the sizes of serial tasks in the
computation. If (M) denotes the maximum ratio between the largest and the
smallest serial tasks of a processor after removing a total of (O(P)) serial
tasks across all processors from consideration, then the expected running time
of the algorithm is (T_1/P+O(T_infty M)).
Demetrios Coutinho , Samuel Xavier-de-Souza , Daniel Aloise Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)
We present a shared-memory parallel implementation of the Simplex tableau
algorithm for dense large-scale Linear Programming (LP) problems. We present
the general scheme and explain each parallelization step of the standard
simplex algorithm, emphasizing important solutions for solving performance
bottlenecks. We analyzed the speedup and the parallel efficiency for the
proposed implementation relative to the standard Simplex algorithm using a
shared-memory system with 64 processing cores. The experiments were performed
for several different problems, with up to 8192 variables and constraints, in
their primal and dual formulations. The results show that the performance is
mostly much better when we use the formulation with more variables than
inequality constraints. Also, they show that the parallelization strategies
applied to avoid bottlenecks caused the implementation to scale well with the
problem size and the core count up to a certain limit of problem size. Further
analysis showed that this was an effect of resource limitation. Even though,
our implementation was able to reach speedups in the order of 19x.
Comments: 11 pages
Subjects:
Cryptography and Security (cs.CR)
; Distributed, Parallel, and Cluster Computing (cs.DC)
It is very easy to run applications in Docker. Docker offers an ecosystem
that offers a platform for application packaging, distributing and managing
within containers. However, Docker platform is yet not matured. Presently,
Docker is less secured as compare to virtual machines (VM) and most of the
other cloud technologies. The key of reason of Docker inadequate security
protocols is containers sharing of Linux kernel, which can lead to risk of
privileged escalations. This research is going to outline some major security
vulnerabilities at Docker and counter solutions to neutralize such attacks.
There are variety of security attacks like insider and outsider. This research
will outline both types of attacks and their mitigations strategies. Taking
some precautionary measures can save from huge disasters. This research will
also present Docker secure deployment guidelines. These guidelines will suggest
different configurations to deploy Docker containers in a more secure way.
Comments: 11 pages, 14 figures. Part of the content have been published in IPSJ SIG Technical Report, Vol. 2017-HPC-162, No. 22, pp. 1-9, 2017. (DOI: this http URL )
Subjects:
Learning (cs.LG)
; Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
NVIDIA cuDNN is a low-level library that provides GPU kernels frequently used
in deep learning. Specifically, cuDNN implements several equivalent convolution
algorithms, whose performance and memory footprint may vary considerably,
depending on the layer dimensions. When an algorithm is automatically selected
by cuDNN, the decision is performed on a per-layer basis, and thus it often
resorts to slower algorithms that fit the workspace size constraints. We
present {mu}-cuDNN, a transparent wrapper library for cuDNN, which divides
layers’ mini-batch computation into several micro-batches. Based on Dynamic
Programming and Integer Linear Programming, {mu}-cuDNN enables faster
algorithms by decreasing the workspace requirements. At the same time,
{mu}-cuDNN keeps the computational semantics unchanged, so that it decouples
statistical efficiency from the hardware efficiency safely. We demonstrate the
effectiveness of {mu}-cuDNN over two frameworks, Caffe and TensorFlow,
achieving speedups of 1.63x for AlexNet and 1.21x for ResNet-18 on P100-SXM2
GPU. These results indicate that using micro-batches can seamlessly increase
the performance of deep learning, while maintaining the same memory footprint.
Robail Yasrab Subjects : Cryptography and Security (cs.CR) ; Distributed, Parallel, and Cluster Computing (cs.DC)
Cloud computing has brought a revolution in the field of information
technology and improving the efficiency of computational resources. It offers
computing as a service enabling huge cost and resource efficiency. Despite its
advantages, certain security issues still hinder organizations and enterprises
from it being adopted. This study mainly focused on the security of
Platform-as-a-Service (PaaS) as well as the most critical security issues that
were documented regarding PaaS infrastructure. The prime outcome of this study
was a security model proposed to mitigate security vulnerabilities of PaaS.
This security model consists of a number of tools, techniques and guidelines to
mitigate and neutralize security issues of PaaS. The security vulnerabilities
along with mitigation strategies were discussed to offer a deep insight into
PaaS security for both vendor and client that may facilitate future design to
implement secure PaaS platforms.
Cheng Daning , Xia Fen , Li Shigang , Zhang Yunquan Subjects : Learning (cs.LG) ; Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
With the development of big data technology, Gradient Boosting Decision Tree,
i.e. GBDT, becomes one of the most important machine learning algorithms for
its accurate output. However, the training process of GBDT needs a lot of
computational resources and time. In order to accelerate the training process
of GBDT, the asynchronous parallel sampling gradient boosting decision tree,
abbr. asynch-SGBDT is proposed in this paper. Via introducing sampling, we
adapt the numerical optimization process of traditional GBDT training process
into stochastic optimization process and use asynchronous parallel stochastic
gradient descent to accelerate the GBDT training process. Meanwhile, the
theoretical analysis of asynch-SGBDT is provided by us in this paper.
Experimental results show that GBDT training process could be accelerated by
asynch-SGBDT. Our asynchronous parallel strategy achieves an almost linear
speedup, especially for high-dimensional sparse datasets.
Representing smooth functions as compositions of near-identity functions with implications for deep network optimization
Peter L. Bartlett , Steven N. Evans , Philip M. Long Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Statistics Theory (math.ST); Machine Learning (stat.ML)
We show that any smooth bi-Lipschitz (h) can be represented exactly as a
composition (h_m circ … circ h_1) of functions (h_1,…,h_m) that are close
to the identity in the sense that each (left(h_i-mathrm{Id}
ight)) is
Lipschitz, and the Lipschitz constant decreases inversely with the number (m)
of functions composed. This implies that (h) can be represented to any accuracy
by a deep residual network whose nonlinear layers compute functions with a
small Lipschitz constant. Next, we consider nonlinear regression with a
composition of near-identity nonlinear maps. We show that, regarding Fr’echet
derivatives with respect to the (h_1,…,h_m), any critical point of a
quadratic criterion in this near-identity region must be a global minimizer. In
contrast, if we consider derivatives with respect to parameters of a fixed-size
residual network with sigmoid activation functions, we show that there are
near-identity critical points that are suboptimal, even in the realizable case.
Informally, this means that functional gradient methods for residual networks
cannot get stuck at suboptimal critical points corresponding to near-identity
layers, whereas parametric gradient methods for sigmoidal residual networks
suffer from suboptimal critical points in the near-identity region.
Chaochao Chen , Ziqi Liu , Peilin Zhao , Longfei Li , Jun Zhou , Xiaolong Li Subjects : Learning (cs.LG) ; Information Retrieval (cs.IR); Machine Learning (stat.ML)
Collaborative filtering, especially latent factor model, has been popularly
used in personalized recommendation. Latent factor model aims to learn user and
item latent factors from user-item historic behaviors. To apply it into real
big data scenarios, efficiency becomes the first concern, including offline
model training efficiency and online recommendation efficiency. In this paper,
we propose a Distributed Collaborative Hashing (DCH) model which can
significantly improve both efficiencies. Specifically, we first propose a
distributed learning framework, following the state-of-the-art parameter server
paradigm, to learn the offline collaborative model. Our model can be learnt
efficiently by distributedly computing subgradients in minibatches on workers
and updating model parameters on servers asynchronously. We then adopt hashing
technique to speedup the online recommendation procedure. Recommendation can be
quickly made through exploiting lookup hash tables. We conduct thorough
experiments on two real large-scale datasets. The experimental results
demonstrate that, comparing with the classic and state-of-the-art (distributed)
latent factor models, DCH has comparable performance in terms of recommendation
accuracy but has both fast convergence speed in offline model training
procedure and realtime efficiency in online recommendation procedure.
Furthermore, the encouraging performance of DCH is also shown for several
real-world applications in Ant Financial.
Comments: Submitted to ECML-PKDD 2018
Subjects:
Learning (cs.LG)
; Machine Learning (stat.ML)
One-class Support Vector Machine (OC-SVM) for a long time has been one of the
most effective anomaly detection methods and widely adopted in both research as
well as industrial applications. The biggest issue for OC-SVM is, however, the
capability to operate with large and high-dimensional datasets due to
inefficient features and optimization complexity. Those problems might be
mitigated via dimensionality reduction techniques such as manifold learning or
auto-encoder. However, previous work often treats representation learning and
anomaly prediction separately. In this paper, we propose autoencoder based
one-class SVM (AE-1SVM) that brings OC-SVM, with the aid of random Fourier
features to approximate the radial basis kernel, into deep learning context by
combining it with a representation learning architecture and jointly exploit
stochastic gradient descend to obtain end-to-end training. Interestingly, this
also opens up the possible use of gradient-based attribution methods to explain
the decision making for anomaly detection, which has ever been challenging as a
result of the implicit mappings between the input space and the kernel space.
To the best of our knowledge, this is the first work to study the
interpretability of deep learning in anomaly detection. We evaluate our method
on a wide range of unsupervised anomaly detection tasks in which our end-to-end
training architecture achieves a performance significantly better than the
previous work using separate training.
Comments: 11 pages, 14 figures. Part of the content have been published in IPSJ SIG Technical Report, Vol. 2017-HPC-162, No. 22, pp. 1-9, 2017. (DOI: this http URL )
Subjects:
Learning (cs.LG)
; Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
NVIDIA cuDNN is a low-level library that provides GPU kernels frequently used
in deep learning. Specifically, cuDNN implements several equivalent convolution
algorithms, whose performance and memory footprint may vary considerably,
depending on the layer dimensions. When an algorithm is automatically selected
by cuDNN, the decision is performed on a per-layer basis, and thus it often
resorts to slower algorithms that fit the workspace size constraints. We
present {mu}-cuDNN, a transparent wrapper library for cuDNN, which divides
layers’ mini-batch computation into several micro-batches. Based on Dynamic
Programming and Integer Linear Programming, {mu}-cuDNN enables faster
algorithms by decreasing the workspace requirements. At the same time,
{mu}-cuDNN keeps the computational semantics unchanged, so that it decouples
statistical efficiency from the hardware efficiency safely. We demonstrate the
effectiveness of {mu}-cuDNN over two frameworks, Caffe and TensorFlow,
achieving speedups of 1.63x for AlexNet and 1.21x for ResNet-18 on P100-SXM2
GPU. These results indicate that using micro-batches can seamlessly increase
the performance of deep learning, while maintaining the same memory footprint.
Connie Kou , Hwee Kuan Lee , Teck Khim Ng Subjects : Learning (cs.LG) ; Machine Learning (stat.ML)
We introduce our Distribution Regression Network (DRN) which performs
regression from input probability distributions to output probability
distributions. Compared to existing methods, DRN learns with fewer model
parameters and easily extends to multiple input and multiple output
distributions. On synthetic and real-world datasets, DRN performs similarly or
better than the state-of-the-art. Furthermore, DRN generalizes the conventional
multilayer perceptron (MLP). In the framework of MLP, each node encodes a real
number, whereas in DRN, each node encodes a probability distribution.
Takuma Oda , Carlee Joe-Wong Subjects : Learning (cs.LG) ; Machine Learning (stat.ML)
Modern vehicle fleets, e.g., for ridesharing platforms and taxi companies,
can reduce passengers’ waiting times by proactively dispatching vehicles to
locations where pickup requests are anticipated in the future. Yet it is
unclear how to best do this: optimal dispatching requires optimizing over
several sources of uncertainty, including vehicles’ travel times to their
dispatched locations, as well as coordinating between vehicles so that they do
not attempt to pick up the same passenger. While prior works have developed
models for this uncertainty and used them to optimize dispatch policies, in
this work we introduce a model-free approach. Specifically, we propose MOVI, a
Deep Q-network (DQN)-based framework that directly learns the optimal vehicle
dispatch policy. Since DQNs scale poorly with a large number of possible
dispatches, we streamline our DQN training and suppose that each individual
vehicle independently learns its own optimal policy, ensuring scalability at
the cost of less coordination between vehicles. We then formulate a centralized
receding-horizon control (RHC) policy to compare with our DQN policies. To
compare these policies, we design and build MOVI as a large-scale realistic
simulator based on 15 million taxi trip records that simulates policy-agnostic
responses to dispatch decisions. We show that the DQN dispatch policy reduces
the number of unserviced requests by 76% compared to without dispatch and 20%
compared to the RHC approach, emphasizing the benefits of a model-free approach
and suggesting that there is limited value to coordinating vehicle actions.
This finding may help to explain the success of ridesharing platforms, for
which drivers make individual decisions.
Cheng Daning , Xia Fen , Li Shigang , Zhang Yunquan Subjects : Learning (cs.LG) ; Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
With the development of big data technology, Gradient Boosting Decision Tree,
i.e. GBDT, becomes one of the most important machine learning algorithms for
its accurate output. However, the training process of GBDT needs a lot of
computational resources and time. In order to accelerate the training process
of GBDT, the asynchronous parallel sampling gradient boosting decision tree,
abbr. asynch-SGBDT is proposed in this paper. Via introducing sampling, we
adapt the numerical optimization process of traditional GBDT training process
into stochastic optimization process and use asynchronous parallel stochastic
gradient descent to accelerate the GBDT training process. Meanwhile, the
theoretical analysis of asynch-SGBDT is provided by us in this paper.
Experimental results show that GBDT training process could be accelerated by
asynch-SGBDT. Our asynchronous parallel strategy achieves an almost linear
speedup, especially for high-dimensional sparse datasets.
Marysia Winkels , Taco S. Cohen Subjects : Learning (cs.LG) ; Machine Learning (stat.ML)
Convolutional Neural Networks (CNNs) require a large amount of annotated data
to learn from, which is often difficult to obtain in the medical domain. In
this paper we show that the sample complexity of CNNs can be significantly
improved by using 3D roto-translation group convolutions (G-Convs) instead of
the more conventional translational convolutions. These 3D G-CNNs were applied
to the problem of false positive reduction for pulmonary nodule detection, and
proved to be substantially more effective in terms of performance, sensitivity
to malignant nodules, and speed of convergence compared to a strong and
comparable baseline architecture with regular convolutions, data augmentation
and a similar number of parameters. For every dataset size tested, the G-CNN
achieved a FROC score close to the CNN trained on ten times more data.
Comments: 10 pages, 8 figures
Subjects:
Instrumentation and Methods for Astrophysics (astro-ph.IM)
; Learning (cs.LG)
We present the results of various automated classification methods, based on
machine learning (ML), of objects from data releases 6 and 7 (DR6 and DR7) of
the Sloan Digital Sky Survey (SDSS), primarily distinguishing stars from
quasars. We provide a careful scrutiny of approaches available in the
literature and have highlighted the pitfalls in those approaches based on the
nature of data used for the study. The aim is to investigate the
appropriateness of the application of certain ML methods. The manuscript argues
convincingly in favor of the efficacy of asymmetric AdaBoost to classify
photometric data. The paper presents a critical review of existing study and
puts forward an application of asymmetric AdaBoost, as an offspring of that
exercise.
Joshua Saxe , Richard Harang , Cody Wild , Hillary Sanders Subjects : Cryptography and Security (cs.CR) ; Learning (cs.LG); Machine Learning (stat.ML)
Malicious web content is a serious problem on the Internet today. In this
paper we propose a deep learning approach to detecting malevolent web pages.
While past work on web content detection has relied on syntactic parsing or on
emulation of HTML and Javascript to extract features, our approach operates
directly on a language-agnostic stream of tokens extracted directly from static
HTML files with a simple regular expression. This makes it fast enough to
operate in high-frequency data contexts like firewalls and web proxies, and
allows it to avoid the attack surface exposure of complex parsing and emulation
code. Unlike well-known approaches such as bag-of-words models, which ignore
spatial information, our neural network examines content at hierarchical
spatial scales, allowing our model to capture locality and yielding superior
accuracy compared to bag-of-words baselines. Our proposed architecture achieves
a 97.5% detection rate at a 0.1% false positive rate, and classifies
small-batched web pages at a rate of over 100 per second on commodity hardware.
The speed and accuracy of our approach makes it appropriate for deployment to
endpoints, firewalls, and web proxies.
Comparatives, Quantifiers, Proportions: A Multi-Task Model for the Learning of Quantities from Vision
Comments: 12 pages (references included). To appear in the Proceedings of NAACL-HLT 2018
Journal-ref: Proceedings of NAACL-HLT 2018
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Learning (cs.LG); Machine Learning (stat.ML)
The present work investigates whether different quantification mechanisms
(set comparison, vague quantification, and proportional estimation) can be
jointly learned from visual scenes by a multi-task computational model. The
motivation is that, in humans, these processes underlie the same cognitive,
non-symbolic ability, which allows an automatic estimation and comparison of
set magnitudes. We show that when information about lower-complexity tasks is
available, the higher-level proportional task becomes more accurate than when
performed in isolation. Moreover, the multi-task model is able to generalize to
unseen combinations of target/non-target objects. Consistently with behavioral
evidence showing the interference of absolute number in the proportional task,
the multi-task model no longer works when asked to provide the number of target
objects in the scene.
Sainyam Galhotra , Arya Mazumdar , Soumyabrata Pal , Barna Saha Subjects : Discrete Mathematics (cs.DM) ; Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG)
Random geometric graphs are the simplest, and perhaps the earliest possible
random graph model of spatial networks, introduced by Gilbert in 1961. In the
most basic setting, a random geometric graph (G(n,r)) has (n) vertices. Each
vertex of the graph is assigned a real number in ([0,1]) randomly and
uniformly. There is an edge between two vertices if the corresponding two
random numbers differ by at most (r) (to mitigate the boundary effect, let us
consider the Lee distance here, (d_L(u,v) = min{|u-v|, 1-|u-v|})). It is
well-known that the connectivity threshold regime for random geometric graphs
is at (r approx frac{log n}{n}). In particular, if (r = frac{alog n}{n}),
then a random geometric graph is connected with high probability if and only if
(a > 1). Consider (G(n,frac{(1+epsilon)log{n}}{n})) for any (epsilon >0) to
satisfy the connectivity requirement and delete half of its edges which have
distance at most (frac{log{n}}{2n}). It is natural to believe that the
resultant graph will be disconnected. Surprisingly, we show that the graph
still remains connected!
Formally, generalizing random geometric graphs, we define a random annulus
graph (G(n, [r_1, r_2]), r_1 <r_2) with (n) vertices. Each vertex of the graph
is assigned a real number in ([0,1]) randomly and uniformly as before. There is
an edge between two vertices if the Lee distance between the corresponding two
random numbers is between (r_1) and (r_2), (0<r_1<r_2). Let us assume (r_1 =
frac{b log n}{n},) and (r_2 = frac{a log n}{n}, 0 <b <a). We show that this
graph is connected with high probability if and only if (a -b > frac12) and (a
>1). That is (G(n, [0,frac{0.99log n}{n}])) is not connected but
(G(n,[frac{0.50 log n}{n},frac{1+epsilon log n}{n}])) is.
This result is then used to give improved lower and upper bounds on the
recovery threshold of the geometric block model.
Roman Václavík , Přemysl Šůcha , Zdeněk Hanzálek Subjects : Artificial Intelligence (cs.AI) ; Learning (cs.LG); Optimization and Control (math.OC)
The personnel scheduling problem is a well-known NP-hard combinatorial
problem. Due to the complexity of this problem and the size of the real-world
instances, it is not possible to use exact methods, and thus heuristics,
meta-heuristics, or hyper-heuristics must be employed. The majority of
heuristic approaches are based on iterative search, where the quality of
intermediate solutions must be calculated. Unfortunately, this is
computationally highly expensive because these problems have many constraints
and some are very complex. In this study, we propose a machine learning
technique as a tool to accelerate the evaluation phase in heuristic approaches.
The solution is based on a simple classifier, which is able to determine
whether the changed solution (more precisely, the changed part of the solution)
is better than the original or not. This decision is made much faster than a
standard cost-oriented evaluation process. However, the classification process
cannot guarantee 100% correctness. Therefore, our approach, which is
illustrated using a tabu search algorithm in this study, includes a filtering
mechanism, where the classifier rejects the majority of the potentially bad
solutions and the remaining solutions are then evaluated in a standard manner.
We also show how the boosting algorithms can improve the quality of the final
solution compared with a simple classifier. We verified our proposed approach
and premises, based on standard and real-world benchmark instances, to
demonstrate the significant speedup obtained with comparable solution quality.
Comments: 6 pages, ICRA 2018
Subjects:
Computers and Society (cs.CY)
; Learning (cs.LG); Machine Learning (stat.ML)
Unintentional falls can cause severe injuries and even death, especially if
no immediate assistance is given. The aim of Fall Detection Systems (FDSs) is
to detect an occurring fall. This information can be used to trigger the
necessary assistance in case of injury. This can be done by using either
ambient-based sensors, e.g. cameras, or wearable devices. The aim of this work
is to study the technical aspects of FDSs based on wearable devices and
artificial intelligence techniques, in particular Deep Learning (DL), to
implement an effective algorithm for on-line fall detection. The proposed
classifier is based on a Recurrent Neural Network (RNN) model with underlying
Long Short-Term Memory (LSTM) blocks. The method is tested on the publicly
available SisFall dataset, with extended annotation, and compared with the
results obtained by the SisFall authors.
Comments: 14 pages. arXiv admin note: text overlap with arXiv:1703.04247
Subjects:
Information Retrieval (cs.IR)
; Learning (cs.LG); Machine Learning (stat.ML)
Learning sophisticated feature interactions behind user behaviors is critical
in maximizing CTR for recommender systems. Despite great progress, existing
methods have a strong bias towards low- or high-order interactions, or rely on
expertise feature engineering. In this paper, we show that it is possible to
derive an end-to-end learning model that emphasizes both low- and high-order
feature interactions. The proposed framework, DeepFM, combines the power of
factorization machines for recommendation and deep learning for feature
learning in a new neural network architecture. Compared to the latest Wide &
Deep model from Google, DeepFM has a shared raw feature input to both its
“wide” and “deep” components, with no need of feature engineering besides raw
features. DeepFM, as a general learning framework, can incorporate various
network architectures in its deep component. In this paper, we study two
instances of DeepFM where its “deep” component is DNN and PNN respectively, for
which we denote as DeepFM-D and DeepFM-P. Comprehensive experiments are
conducted to demonstrate the effectiveness of DeepFM-D and DeepFM-P over the
existing models for CTR prediction, on both benchmark data and commercial data.
We conduct online A/B test in Huawei App Market, which reveals that DeepFM-D
leads to more than 10% improvement of click-through rate in the production
environment, compared to a well-engineered LR model. We also covered related
practice in deploying our framework in Huawei App Market.
Vikas Sindhwani , Stephen Tu , Mohi Khansari Subjects : Robotics (cs.RO) ; Learning (cs.LG); Machine Learning (stat.ML)
We propose a new non-parametric framework for learning incrementally stable
dynamical systems x’ = f(x) from a set of sampled trajectories. We construct a
rich family of smooth vector fields induced by certain classes of matrix-valued
kernels, whose equilibria are placed exactly at a desired set of locations and
whose local contraction and curvature properties at various points can be
explicitly controlled using convex optimization. With curl-free kernels, our
framework may also be viewed as a mechanism to learn potential fields and
gradient flows. We develop large-scale techniques using randomized kernel
approximations in this context. We demonstrate our approach, called contracting
vector fields (CVF), on imitation learning tasks involving complex
point-to-point human handwriting motions.
Comments: 15 pages, 5 figures
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Learning (cs.LG); Machine Learning (stat.ML)
Given the success of the gated recurrent unit, a natural question is whether
all the gates of the long short-term memory (LSTM) network are necessary.
Previous research has shown that the forget gate is one of the most important
gates in the LSTM. Here we show that a forget-gate-only version of the LSTM
with chrono-initialized biases, not only provides computational savings but
outperforms the standard LSTM on multiple benchmark datasets and competes with
some of the best contemporary models. Our proposed network, the JANET, achieves
accuracies of 99% and 92.5% on the MNIST and pMNIST datasets, outperforming the
standard LSTM which yields accuracies of 98.5% and 91%.
Comments: 13 pages. Submitted to IEEE JSTSP Special Issue on Data Science: Robust Subspace Learning and Tracking: Theory, Algorithms, and Applications
Subjects:
Machine Learning (stat.ML)
; Learning (cs.LG)
Robust PCA, the problem of PCA in the presence of outliers has been
extensively investigated in the last few years. Here we focus on Robust PCA in
the column sparse outlier model. The existing methods for column sparse outlier
model assumes either the knowledge of the dimension of the lower dimensional
subspace or the fraction of outliers in the system. However in many
applications knowledge of these parameters is not available. Motivated by this
we propose a parameter free outlier identification method for robust PCA which
a) does not require the knowledge of outlier fraction, b) does not require the
knowledge of the dimension of the underlying subspace, c) is computationally
simple and fast. Further, analytical guarantees are derived for outlier
identification and the performance of the algorithm is compared with the
existing state of the art methods.
Wutao Wei , Bowei Xi , Murat Kantarcioglu Subjects : Machine Learning (stat.ML) ; Learning (cs.LG)
Nowadays more and more data are gathered for detecting and preventing cyber
attacks. In cyber security applications, data analytics techniques have to deal
with active adversaries that try to deceive the data analytics models and avoid
being detected. The existence of such adversarial behavior motivates the
development of robust and resilient adversarial learning techniques for various
tasks. Most of the previous work focused on adversarial classification
techniques, which assumed the existence of a reasonably large amount of
carefully labeled data instances. However, in practice, labeling the data
instances often requires costly and time-consuming human expertise and becomes
a significant bottleneck. Meanwhile, a large number of unlabeled instances can
also be used to understand the adversaries’ behavior. To address the above
mentioned challenges, in this paper, we develop a novel grid based adversarial
clustering algorithm. Our adversarial clustering algorithm is able to identify
the core normal regions, and to draw defensive walls around the centers of the
normal objects utilizing game theoretic ideas. Our algorithm also identifies
sub-clusters of attack objects, the overlapping areas within clusters, and
outliers which may be potential anomalies.
Chihiro Watanabe , Kaoru Hiramatsu , Kunio Kashino Subjects : Machine Learning (stat.ML) ; Learning (cs.LG)
A layered neural network is now one of the most common choices for the
prediction of high-dimensional practical data sets, where the relationship
between input and output data is complex and cannot be represented well by
simple conventional models. Its effectiveness is shown in various tasks,
however, the lack of interpretability of the trained result by a layered neural
network has limited its application area.
In our previous studies, we proposed methods for extracting a simplified
global structure of a trained layered neural network by classifying the units
into communities according to their connection patterns with adjacent layers.
These methods provided us with knowledge about the strength of the relationship
between communities from the existence of bundled connections, which are
determined by threshold processing of the connection ratio between pairs of
communities.
However, it has been difficult to understand the role of each community
quantitatively by observing the modular structure. We could only know to which
sets of the input and output dimensions each community was mainly connected, by
tracing the bundled connections from the community to the input and output
layers. Another problem is that the finally obtained modular structure is
changed greatly depending on the setting of the threshold hyperparameter used
for determining bundled connections.
In this paper, we propose a new method for interpreting quantitatively the
role of each community in inference, by defining the effect of each input
dimension on a community, and the effect of a community on each output
dimension. We show experimentally that our proposed method can reveal the role
of each part of a layered neural network by applying the neural networks to
three types of data sets, extracting communities from the trained network, and
applying the proposed method to the community structure.
Comments: 12 pages, Accepted in n 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2018
Subjects:
Information Retrieval (cs.IR)
; Learning (cs.LG)
Is it possible to extract malicious IP addresses reported in security forums
in an automatic way? This is the question at the heart of our work. We focus on
security forums, where security professionals and hackers share knowledge and
information, and often report misbehaving IP addresses. So far, there have only
been a few efforts to extract information from such security forums. We propose
RIPEx, a systematic approach to identify and label IP addresses in security
forums by utilizing a cross-forum learning method. In more detail, the
challenge is twofold: (a) identifying IP addresses from other numerical
entities, such as software version numbers, and (b) classifying the IP address
as benign or malicious. We propose an integrated solution that tackles both
these problems. A novelty of our approach is that it does not require training
data for each new forum. Our approach does knowledge transfer across forums: we
use a classifier from our source forums to identify seed information for
training a classifier on the target forum. We evaluate our method using data
collected from five security forums with a total of 31K users and 542K posts.
First, RIPEx can distinguish IP address from other numeric expressions with 95%
precision and above 93% recall on average. Second, RIPEx identifies malicious
IP addresses with an average precision of 88% and over 78% recall, using our
cross-forum learning. Our work is a first step towards harnessing the wealth of
useful information that can be found in security forums.
Comments: Code: this https URL
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Learning (cs.LG); Machine Learning (stat.ML)
Unsupervised image-to-image translation is an important and challenging
problem in computer vision. Given an image in the source domain, the goal is to
learn the conditional distribution of corresponding images in the target
domain, without seeing any pairs of corresponding images. While this
conditional distribution is inherently multimodal, existing approaches make an
overly simplified assumption, modeling it as a deterministic one-to-one
mapping. As a result, they fail to generate diverse outputs from a given source
domain image. To address this limitation, we propose a Multimodal Unsupervised
Image-to-image Translation (MUNIT) framework. We assume that the image
representation can be decomposed into a content code that is domain-invariant,
and a style code that captures domain-specific properties. To translate an
image to another domain, we recombine its content code with a random style code
sampled from the style space of the target domain. We analyze the proposed
framework and establish several theoretical results. Extensive experiments with
comparisons to the state-of-the-art approaches further demonstrates the
advantage of the proposed framework. Moreover, our framework allows users to
control the style of translation outputs by providing an example style image.
Code and pretrained models are available at this https URL
Arash Rahnama , Khalique Newaz , Panos J. Antsaklis , Tijana Milenkovic Subjects : Molecular Networks (q-bio.MN) ; Learning (cs.LG); Machine Learning (stat.ML)
Experimental determination of protein function is resource-consuming. As an
alternative, computational prediction of protein function has received
attention. In this context, protein structural classification (PSC) can help,
by allowing for determining structural classes of currently unclassified
proteins based on their features, and then relying on the fact that proteins
with similar structures have similar functions. Existing PSC approaches rely on
sequence-based or direct (“raw”) 3-dimensional (3D) structure-based protein
features. Instead, we first model 3D structures as protein structure networks
(PSNs). Then, we use (“processed”) network-based features for PSC. We are the
first ones to do so. We propose the use of graphlets, state-of-the-art features
in many domains of network science, in the task of PSC. Moreover, because
graphlets can deal only with unweighted PSNs, and because accounting for edge
weights when constructing PSNs could improve PSC accuracy, we also propose a
deep learning framework that automatically learns network features from the
weighted PSNs. When evaluated on a large set of 9,509 CATH and 11,451 SCOP
protein domains, our proposed approaches are superior to existing PSC
approaches in terms of both accuracy and running time.
Shaojun Zhu , David Surovik , Kostas E. Bekris , Abdeslam Boularias Subjects : Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)
This paper aims to identify in a practical manner unknown physical
parameters, such as mechanical models of actuated robot links, which are
critical in dynamical robotic tasks. Key features include the use of an
off-the-shelf physics engine and the Bayesian optimization framework. The task
being considered is locomotion with a high-dimensional, compliant Tensegrity
robot. A key insight, in this case, is the need to project the model
identification challenge into an appropriate lower dimensional space for
efficiency. Comparisons with alternatives indicate that the proposed method can
identify the parameters more accurately within the given time budget, which
also results in more precise locomotion control.
Comments: 5 pages, 7 figures, submitted to ISITA 2018
Subjects:
Information Theory (cs.IT)
The random access scheme is a fundamental scenario in which users transmit
through a shared channel and cannot coordinate each other. In recent years,
successive interference cancellation (SIC) was introduced into the random
access scheme. It is possible to decode transmitted packets using collided
packets by the SIC. The coded slotted ALOHA (CSA) is a random access scheme
using the SIC. The CSA encodes each packet using a local code prior to
transmission. It is known that the CSA achieves excellent throughput. On the
other hand, it is reported that in the coding theory time shift improves the
decoding performance for packet-oriented erasure correcting codes. In this
paper, we propose a random access scheme which applies the time shift to the
CSA in order to achieve better throughput. Numerical examples show that our
proposed random access scheme achieves better throughput and packet loss rate
than the CSA.
Comments: 6 pages, 1 figure, 3 tables, submitted to ISITA 2018
Subjects:
Information Theory (cs.IT)
This paper proposes an erasure correcting code and its systematic form for
the distributed storage system.
The proposed codes are encoded by exclusive OR and bit-level shift operation.
By the shift operation, the encoded packets are slightly longer than the
source packets.
This paper evaluates the extra length of encoded packets, called overhead,
and shows that the proposed codes have smaller overheads than the zigzag
decodable code, which is an existing code using exclusive OR and bit-level
shift operation.
Comments: 5 pages, submitted to ISITA 2018
Subjects:
Information Theory (cs.IT)
This paper constructs a non-binary code correcting a single (b)-burst of
insertions or deletions. This paper also proposes a decoding algorithm of this
code and evaluates a lower bound of the cardinality of this code. Moreover, we
evaluate an asymptotic upper bound on the cardinality of codes which can
correct a single burst of insertions or deletions.
Comments: 26 pages, 10 figures
Subjects:
Information Theory (cs.IT)
In this paper, we study an aerial drone base station (DBS) assisted cellular
network that consists of a single ground macro base station (MBS), multiple
DBSs, and multiple ground terminals (GT). We assume that the MBS transmits to
the DBSs and the GTs in the licensed band while the DBSs use a separate
unlicensed band (e.g. Wi-Fi) to transmit to the GTs. For the utilization of the
DBSs, we propose a cooperative decode–forward (DF) protocol in which multiple
DBSs assist the terminals simultaneously while maintaining a predetermined
interference level on the coexisting unlicensed band users. For our network
setup, we formulate a joint optimization problem for minimizing the aggregate
gap between the target rates and the throughputs of terminals by optimizing
over the 3D positions of the DBSs and the resources (power, time, bandwidth) of
the network. To solve the optimization problem, we propose an efficient nested
structured algorithm based on particle swarm optimization and convex
optimization methods. Extensive numerical evaluations of the proposed algorithm
is performed considering various aspects to demonstrate the performance of our
algorithm and the gain for utilizing DBSs.
Comments: Submitted to IEEE
Subjects:
Networking and Internet Architecture (cs.NI)
; Information Theory (cs.IT)
The grand objective of 5G wireless technology is to support services with
vastly heterogeneous requirements. Network slicing, in which each service
operates within an exclusive slice of allocated resources, is seen as a way to
cope with this heterogeneity. However, the shared nature of the wireless
channel allows non-orthogonal slicing, where services us overlapping slices of
resources at the cost of interference. This paper investigates the performance
of orthogonal and non-orthogonal slicing of radio resources for the
provisioning of the three generic services of 5G: enhanced mobile broadband
(eMBB), massive machine-type communications (mMTC), and ultra-reliable
low-latency communications (URLLC). We consider uplink communications from a
set of eMBB, mMTC and URLLC devices to a common base station. A
communication-theoretic model is proposed that accounts for the heterogeneous
requirements and characteristics of the three services. For non-orthogonal
slicing, different decoding architectures are considered, such as puncturing
and successive interference cancellation. The concept of reliability diversity
is introduced here as a design principle that takes advantage of the vastly
different reliability requirements across the services. This study reveals that
non-orthogonal slicing can lead, in some regimes, to significant gains in terms
of performance trade-offs among the three generic services compared to
orthogonal slicing.
Sainyam Galhotra , Arya Mazumdar , Soumyabrata Pal , Barna Saha Subjects : Discrete Mathematics (cs.DM) ; Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG)
Random geometric graphs are the simplest, and perhaps the earliest possible
random graph model of spatial networks, introduced by Gilbert in 1961. In the
most basic setting, a random geometric graph (G(n,r)) has (n) vertices. Each
vertex of the graph is assigned a real number in ([0,1]) randomly and
uniformly. There is an edge between two vertices if the corresponding two
random numbers differ by at most (r) (to mitigate the boundary effect, let us
consider the Lee distance here, (d_L(u,v) = min{|u-v|, 1-|u-v|})). It is
well-known that the connectivity threshold regime for random geometric graphs
is at (r approx frac{log n}{n}). In particular, if (r = frac{alog n}{n}),
then a random geometric graph is connected with high probability if and only if
(a > 1). Consider (G(n,frac{(1+epsilon)log{n}}{n})) for any (epsilon >0) to
satisfy the connectivity requirement and delete half of its edges which have
distance at most (frac{log{n}}{2n}). It is natural to believe that the
resultant graph will be disconnected. Surprisingly, we show that the graph
still remains connected!
Formally, generalizing random geometric graphs, we define a random annulus
graph (G(n, [r_1, r_2]), r_1 <r_2) with (n) vertices. Each vertex of the graph
is assigned a real number in ([0,1]) randomly and uniformly as before. There is
an edge between two vertices if the Lee distance between the corresponding two
random numbers is between (r_1) and (r_2), (0<r_1<r_2). Let us assume (r_1 =
frac{b log n}{n},) and (r_2 = frac{a log n}{n}, 0 <b <a). We show that this
graph is connected with high probability if and only if (a -b > frac12) and (a
>1). That is (G(n, [0,frac{0.99log n}{n}])) is not connected but
(G(n,[frac{0.50 log n}{n},frac{1+epsilon log n}{n}])) is.
This result is then used to give improved lower and upper bounds on the
recovery threshold of the geometric block model.
Rotem Mulayoff , Tomer Michaeli Subjects : Signal Processing (eess.SP) ; Information Theory (cs.IT)
Sparse representation over redundant dictionaries constitutes a good model
for many classes of signals (e.g., patches of natural images, segments of
speech signals, etc.). However, despite its popularity, very little is known
about the representation capacity of this model. In this paper, we study how
redundant a dictionary must be so as to allow any vector to admit a sparse
approximation with a prescribed sparsity and a prescribed level of accuracy. We
address this problem both in a worst-case setting and in an average-case one.
For each scenario we derive lower and upper bounds on the minimal required
overcompleteness. Our bounds have simple closed-form expressions that allow to
easily deduce the asymptotic behavior in large dimensions. In particular, we
find that the required overcompleteness grows exponentially with the sparsity
level and polynomially with the allowed representation error. This implies that
universal sparse representation is practical only at moderate sparsity levels,
but can be achieved at relatively high accuracy. As a side effect of our
analysis, we obtain a tight lower bound on the regularized incomplete beta
function, which may be interesting in its own right. We illustrate the validity
of our results through numerical simulations, which support our findings.
Martin Genzel , Alexander Stollenwerk Subjects : Statistics Theory (math.ST) ; Information Theory (cs.IT)
This work theoretically studies the problem of estimating a structured
high-dimensional signal (x_0 in mathbb{R}^n) from noisy (1)-bit Gaussian
measurements. Our recovery approach is based on a simple convex program which
uses the hinge loss function as data fidelity term. While such a risk
minimization strategy is very natural to learn binary output models, such as in
classification, its capacity to estimate a specific signal vector is largely
unexplored. A major difficulty is that the hinge loss is just piecewise linear,
so that its “curvature energy” is concentrated in a single point. This is
substantially different from other popular loss functions considered in signal
estimation, e.g., the square or logistic loss, which are at least locally
strongly convex. It is therefore somewhat unexpected that we can still prove
very similar types of recovery guarantees for the hinge loss estimator, even in
the presence of strong noise. More specifically, our non-asymptotic error
bounds show that stable and robust reconstruction of (x_0) can be achieved with
the optimal oversampling rate (O(m^{-1/2})) in terms of the number of
measurements (m). Moreover, we permit a wide class of structural assumptions on
the ground truth signal, in the sense that (x_0) can belong to an arbitrary
bounded convex set (K subset mathbb{R}^n). The proofs of our main results
rely on some recent advances in statistical learning theory due to Mendelson.
In particular, we invoke an adapted version of Mendelson’s small ball method
that allows us to establish a quadratic lower bound on the error of the first
order Taylor approximation of the empirical hinge loss function.
Comments: submitted to SPAWC 2018
Subjects:
Signal Processing (eess.SP)
; Information Theory (cs.IT)
We examine the usability of deep neural networks for multiple-input
multiple-output (MIMO) user positioning solely based on the orthogonal
frequency division multiplex (OFDM) complex channel coefficients. In contrast
to other indoor positioning systems (IPSs), the proposed method does not
require any additional piloting overhead or any other changes in the
communications system itself as it is deployed on top of an existing OFDM MIMO
system. Supported by actual measurements, we are mainly interested in the more
challenging non-line of sight (NLoS) scenario. However, gradient descent
optimization is known to require a large amount of data-points for training,
i.e., the required database would be too large when compared to conventional
methods. Thus, we propose a twostep training procedure, with training on
simulated line of sight (LoS) data in the first step, and finetuning on
measured NLoS positions in the second step. This turns out to reduce the
required measured training positions and thus, reduces the effort for data
acquisition.
微信扫一扫,关注我爱机器学习公众号
微博:我爱机器学习