An Investigation of Environmental Influence on the Benefits of Adaptation Mechanisms in Evolutionary Swarm Robotics
Comments: In GECCO 2017
Subjects:
Neural and Evolutionary Computing (cs.NE)
A robotic swarm that is required to operate for long periods in a potentially
unknown environment can use both evolution and individual learning methods in
order to adapt. However, the role played by the environment in influencing the
effectiveness of each type of learning is not well understood. In this paper,
we address this question by analysing the performance of a swarm in a range of
simulated, dynamic environments where a distributed evolutionary algorithm for
evolving a controller is augmented with a number of different individual
learning mechanisms. The learning mechanisms themselves are defined by
parameters which can be either fixed or inherited. We conduct experiments in a
range of dynamic environments whose characteristics are varied so as to present
different opportunities for learning. Results enable us to map environmental
characteristics to the most effective learning algorithm.
Evolution of a Functionally Diverse Swarm via a Novel Decentralised Quality-Diversity Algorithm
Comments: In GECCO 2018
Subjects:
Neural and Evolutionary Computing (cs.NE)
The presence of functional diversity within a group has been demonstrated to
lead to greater robustness, higher performance and increased problem-solving
ability in a broad range of studies that includes insect groups, human groups
and swarm robotics. Evolving group diversity however has proved challenging
within Evolutionary Robotics, requiring reproductive isolation and careful
attention to population size and selection mechanisms. To tackle this issue, we
introduce a novel, decentralised, variant of the MAP-Elites illumination
algorithm which is hybridised with a well-known distributed evolutionary
algorithm (mEDEA). The algorithm simultaneously evolves multiple diverse
behaviours for multiple robots, with respect to a simple token-gathering task.
Each robot in the swarm maintains a local archive defined by two pre-specified
functional traits which is shared with robots it come into contact with. We
investigate four different strategies for sharing, exploiting and combining
local archives and compare results to mEDEA. Experimental results show that in
contrast to previous claims, it is possible to evolve a functionally diverse
swarm without geographical isolation, and that the new method outperforms mEDEA
in terms of the diversity, coverage and precision of the evolved swarm.
Minimizing Area and Energy of Deep Learning Hardware Design Using Collective Low Precision and Structured Compression
Comments: 2017 Asilomar Conference on Signals, Systems and Computers
Subjects:
Neural and Evolutionary Computing (cs.NE)
Deep learning algorithms have shown tremendous success in many recognition
tasks; however, these algorithms typically include a deep neural network (DNN)
structure and a large number of parameters, which makes it challenging to
implement them on power/area-constrained embedded platforms. To reduce the
network size, several studies investigated compression by introducing
element-wise or row-/column-/block-wise sparsity via pruning and
regularization. In addition, many recent works have focused on reducing
precision of activations and weights with some reducing down to a single bit.
However, combining various sparsity structures with binarized or
very-low-precision (2-3 bit) neural networks have not been comprehensively
explored. In this work, we present design techniques for minimum-area/-energy
DNN hardware with minimal degradation in accuracy. During training, both
binarization/low-precision and structured sparsity are applied as constraints
to find the smallest memory footprint for a given deep learning algorithm. The
DNN model for CIFAR-10 dataset with weight memory reduction of 50X exhibits
accuracy comparable to that of the floating-point counterpart. Area,
performance and energy results of DNN hardware in 40nm CMOS are reported for
the MNIST dataset. The optimized DNN that combines 8X structured compression
and 3-bit weight precision showed 98.4% accuracy at 20nJ per classification.
Comments: conference paper
Subjects:
Quantum Physics (quant-ph)
; Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In this paper, we propose a simple neural net that requires only (O(nlog_2k))
numbers of quantum gates and qubits: Here, (n) is the number of input
parameters, and (k) is the number of weights applied to these input parameters
in the proposed neural net. We describe the network in terms of a quantum
circuit, and then draw its equivalent classical neural net which involves
(O(k^n)) nodes in the hidden layer. Then, we show that the network uses a
periodic activation function of cosine values of the linear combinations of the
inputs and weights. The steps of the gradient descent are described, and then
Iris and Breast cancer datasets are used for the numerical simulations. The
numerical results indicate the network can be used in machine learning problems
and it may provide exponential speedup over the same structured classical
neural net.
Comments: CVPR 2018
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
We address the computational problem of novel human pose synthesis. Given an
image of a person and a desired pose, we produce a depiction of that person in
that pose, retaining the appearance of both the person and background. We
present a modular generative neural network that synthesizes unseen poses using
training pairs of images and poses taken from human action videos. Our network
separates a scene into different body part and background layers, moves body
parts to new locations and refines their appearances, and composites the new
foreground with a hole-filled background. These subtasks, implemented with
separate modules, are trained jointly using only a single target image as a
supervised label. We use an adversarial discriminator to force our network to
synthesize realistic details conditioned on pose. We demonstrate image
synthesis results on three action classes: golf, yoga/workouts and tennis, and
show that our method produces accurate results within action classes as well as
across action classes. Given a sequence of desired poses, we also produce
coherent videos of actions.
Comments: 10 pages, 5 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Cryptography and Security (cs.CR); Learning (cs.LG); Machine Learning (stat.ML)
While deep neural networks have proven to be a powerful tool for many
recognition and classification tasks, their stability properties are still not
well understood. In the past, image classifiers have been shown to be
vulnerable to so-called adversarial attacks, which are created by additively
perturbing the correctly classified image.
In this paper, we propose the ADef algorithm to construct a different kind of
adversarial attack created by iteratively applying small deformations to the
image, found through a gradient descent step. We demonstrate our results on
MNIST with a convolutional neural network and on ImageNet with Inception-v3 and
ResNet-101.
Comments: 23 pages, includes appendix
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Existing deep learning based image inpainting methods use a standard
convolutional network over the corrupted image, using convolutional filter
responses conditioned on both valid pixels as well as the substitute values in
the masked holes (typically the mean value). This often leads to artifacts such
as color discrepancy and blurriness. Post-processing is usually used to reduce
such artifacts, but are expensive and may fail. We propose the use of partial
convolutions, where the convolution is masked and renormalized to be
conditioned on only valid pixels. We further include a mechanism to
automatically generate an updated mask for the next layer as part of the
forward pass. Our model outperforms other methods for irregular masks. We show
qualitative and quantitative comparisons with other methods to validate our
approach.
Comments: Accepted in CVPR 2018
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
We propose TAL-Net, an improved approach to temporal action localization in
video that is inspired by the Faster R-CNN object detection framework. TAL-Net
addresses three key shortcomings of existing approaches: (1) we improve
receptive field alignment using a multi-scale architecture that can accommodate
extreme variation in action durations; (2) we better exploit the temporal
context of actions for both proposal generation and action classification by
appropriately extending receptive fields; and (3) we explicitly consider
multi-stream feature fusion and demonstrate that fusing motion late is
important. We achieve state-of-the-art performance for both action proposal and
localization on THUMOS’14 detection benchmark and competitive performance on
ActivityNet challenge.
One-Shot Learning using Mixture of Variational Autoencoders: a Generalization Learning approach
Journal-ref: 17th International Conference on Autonomous Agents and Multiagent
Systems (AAMAS 2018)
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Learning (cs.LG); Machine Learning (stat.ML)
Deep learning, even if it is very successful nowadays, traditionally needs
very large amounts of labeled data to perform excellent on the classification
task. In an attempt to solve this problem, the one-shot learning paradigm,
which makes use of just one labeled sample per class and prior knowledge,
becomes increasingly important. In this paper, we propose a new one-shot
learning method, dubbed MoVAE (Mixture of Variational AutoEncoders), to perform
classification. Complementary to prior studies, MoVAE represents a shift of
paradigm in comparison with the usual one-shot learning methods, as it does not
use any prior knowledge. Instead, it starts from zero knowledge and one labeled
sample per class. Afterward, by using unlabeled data and the generalization
learning concept (in a way, more as humans do), it is capable to gradually
improve by itself its performance. Even more, if there are no unlabeled data
available MoVAE can still perform well in one-shot learning classification. We
demonstrate empirically the efficiency of our proposed approach on three
datasets, i.e. the handwritten digits (MNIST), fashion products
(Fashion-MNIST), and handwritten characters (Omniglot), showing that MoVAE
outperforms state-of-the-art one-shot learning algorithms.
Comments: To be submitted to SPL
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Learning (cs.LG)
In this paper, we present a class of extremely efficient CNN models called
MobileFaceNets, which use no more than 1 million parameters and specifically
tailored for high-accuracy real-time face verification on mobile and embedded
devices. We also make a simple analysis on the weakness of common mobile
networks for face verification. The weakness has been well overcome by our
specifically designed MobileFaceNets. Under the same experimental conditions,
our MobileFaceNets achieve significantly superior accuracy as well as more than
2 times actual speedup over MobileNetV2. After trained by ArcFace loss on the
refined MS-Celeb-1M from scratch, our single MobileFaceNet model of 4.0MB size
achieves 99.55% face verification accuracy on LFW and 92.59% TAR (FAR1e-6) on
MegaFace Challenge 1, which is even comparable to state-of-the-art big CNN
models of hundreds MB size. The fastest one of our MobileFaceNets has an actual
inference time of 18 milliseconds on a mobile phone. Our experiments on LFW,
AgeDB, and MegaFace show that our MobileFaceNets achieve significantly improved
efficiency compared with the state-of-the-art lightweight and mobile CNNs for
face verification.
Zicheng Liao , Kevin Karsch , Hongyi Zhang , David Forsyth Subjects : Computer Vision and Pattern Recognition (cs.CV)
We present an object relighting system that allows an artist to select an
object from an image and insert it into a target scene. Through simple
interactions, the system can adjust illumination on the inserted object so that
it appears naturally in the scene. To support image-based relighting, we build
object model from the image, and propose a emph{perceptually-inspired}
approximate shading model for the relighting. It decomposes the shading field
into (a) a rough shape term that can be reshaded, (b) a parametric shading
detail that encodes missing features from the first term, and (c) a geometric
detail term that captures fine-scale material properties. With this
decomposition, the shading model combines 3D rendering and image-based
composition and allows more flexible compositing than image-based methods.
Quantitative evaluation and a set of user studies suggest our method is a
promising alternative to existing methods of object insertion.
Zhiwen Fan , Huafeng Wu , Xueyang Fu , Yue Hunag , Xinghao Ding Subjects : Computer Vision and Pattern Recognition (cs.CV)
Single image rain streaks removal is extremely important since rainy images
adversely affect many computer vision systems. Deep learning based methods have
found great success in image deraining tasks. In this paper, we propose a novel
residual-guide feature fusion network, called ResGuideNet, for single image
deraining that progressively predicts highquality reconstruction. Specifically,
we propose a cascaded network and adopt residuals generated from shallower
blocks to guide deeper blocks. By using this strategy, we can obtain a coarse
to fine estimation of negative residual as the blocks go deeper. The outputs of
different blocks are merged into the final reconstruction. We adopt recursive
convolution to build each block and apply supervision to all intermediate
results, which enable our model to achieve promising performance on synthetic
and real-world data while using fewer parameters than previous required.
ResGuideNet is detachable to meet different rainy conditions. For images with
light rain streaks and limited computational resource at test time, we can
obtain a decent performance even with several building blocks. Experiments
validate that ResGuideNet can benefit other low- and high-level vision tasks.
Comments: 3 pages, 3 figures, 2 tables
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
The seam-driven approach has been proven fairly effective for
parallax-tolerant image stitching, whose strategy is to search for an invisible
seam from finite representative hypotheses of local alignment. In this paper,
we propose a graph-based hypothesis generation and a seam-guided local
alignment for improving the effectiveness and the efficiency of the seam-driven
approach. The experiment demonstrates the significant reduction of number of
hypotheses and the improved quality of naturalness of final stitching results,
comparing to the state-of-the-art method SEAGULL.
Comments: To appear in CVPR 2018 Workshops
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
One of the most critical topics in autonomous driving or ride-sharing
technology is to accurately localize vehicles in the world frame. In addition
to common multi-view camera systems, it usually also relies on industrial grade
sensors, such as LiDAR, differential GPS, high precision IMU, and etc. In this
paper, we develop an approach to provide an effective solution to this problem.
We propose a method to train a geo-spatial deep neural network (CNN+LSTM) to
predict accurate geo-locations (latitude and longitude) using only ordinary
ground imagery and low accuracy phone-grade GPS. We evaluate our approach on
the open dataset released during ACM Multimedia 2017 Grand Challenge. Having
ground truth locations for training, we are able to reach nearly lane-level
accuracy. We also evaluate the proposed method on our own collected images in
San Francisco downtown area often described as “downtown canyon” where consumer
GPS signals are extremely inaccurate. The results show the model can predict
quality locations that suffice in real business applications, such as
ride-sharing, only using phone-grade GPS. Unlike classic visual localization or
recent PoseNet-like methods that may work well in indoor environments or
small-scale outdoor environments, we avoid using a map or an SFM
(structure-from-motion) model at all. More importantly, the proposed method can
be scaled up without concerns over the potential failure of 3D reconstruction.
Peng Gao , Yipeng Ma , Ke Song , Chao Li , Fei Wang , Liyi Xiao Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Graphics (cs.GR)
Discriminative Correlation Filters (DCF)-based tracking algorithms exploiting
conventional handcrafted features have achieved impressive results both in
terms of accuracy and robustness. Template handcrafted features have shown
excellent performance, but they perform poorly when the appearance of target
changes rapidly such as fast motions and fast deformations. In contrast,
statistical handcrafted features are insensitive to fast states changes, but
they yield inferior performance in the scenarios of illumination variations and
background clutters. In this work, to achieve an efficient tracking
performance, we propose a novel visual tracking algorithm, named MFCMT, based
on a complementary ensemble model with multiple features, including Histogram
of Oriented Gradients (HOGs), Color Names (CNs) and Color Histograms (CHs).
Additionally, to improve tracking results and prevent targets drift, we
introduce an effective fusion method by exploiting relative entropy to coalesce
all basic response maps and get an optimal response. Furthermore, we suggest a
simple but efficient update strategy to boost tracking performance.
Comprehensive evaluations are conducted on two tracking benchmarks demonstrate
and the experimental results demonstrate that our method is competitive with
numerous state-of-the-art trackers. Our tracker achieves impressive performance
with faster speed on these benchmarks.
Comments: To appear in CVPR 2018
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Generating a novel image by manipulating two input images is an interesting
research problem in the study of generative adversarial networks (GANs). We
propose a new GAN-based network that generates a fusion image with the identity
of input image x and the shape of input image y. Our network can simultaneously
train on more than two image datasets in an unsupervised manner. We define an
identity loss LI to catch the identity of image x and a shape loss LS to get
the shape of y. In addition, we propose a novel training method called
Min-Patch training to focus the generator on crucial parts of an image, rather
than its entirety. We show qualitative results on the VGG Youtube Pose dataset,
Eye dataset (MPIIGaze and UnityEyes), and the Photo-Sketch-Cartoon dataset.
Pengfei Zhang , Cuiling Lan , Junliang Xing , Wenjun Zeng , Jianru Xue , Nanning Zheng Subjects : Computer Vision and Pattern Recognition (cs.CV)
Skeleton-based human action recognition has recently attracted increasing
attention thanks to the accessibility and the popularity of 3D skeleton data.
One of the key challenges in skeleton-based action recognition lies in the
large view variations when capturing data. In order to alleviate the effects of
view variations, this paper introduces a novel view adaptation scheme, which
automatically determines the virtual observation viewpoints in a learning based
data driven manner. We design two view adaptive neural networks, i.e., VA-RNN
based on RNN, and VA-CNN based on CNN.. For each network, a novel view
adaptation module learns and determines the most suitable observation
viewpoints, and transforms the skeletons to those viewpoints for the end-to-end
recognition with a main classification network. Ablation studies find that the
proposed view adaptive models are capable of transforming the skeletons of
various viewpoints to much more consistent virtual viewpoints which largely
eliminates the viewpoint influence. In addition, we design a two-stream scheme
(referred to as VA-fusion) that fuses the scores of the two networks to provide
the fused prediction. Extensive experimental evaluations on five challenging
benchmarks demonstrate that the effectiveness of the proposed view-adaptive
networks and superior performance over state-of-the-art approaches.
Comments: 11 pages, 11 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
In this paper we present a large-scale visual object detection and tracking
benchmark, named VisDrone2018, aiming at advancing visual understanding tasks
on the drone platform. The images and video sequences in the benchmark were
captured over various urban/suburban areas of 14 different cities across China
from north to south. Specifically, VisDrone2018 consists of 263 video clips and
10,209 images (no overlap with video clips) with rich annotations, including
object bounding boxes, object categories, occlusion, truncation ratios, etc.
With intensive amount of effort, our benchmark has more than 2.5 million
annotated instances in 179,264 images/video frames. Being the largest such
dataset ever published, the benchmark enables extensive evaluation and
investigation of visual analysis algorithms on the drone platform. In
particular, we design four popular tasks with the benchmark, including object
detection in images, object detection in videos, single object tracking, and
multi-object tracking. All these tasks are extremely challenging in the
proposed dataset due to factors such as occlusion, large scale and pose
variation, and fast motion. We hope the benchmark largely boost the research
and development in visual analysis on drone platforms.
Arvind Balachandrasekaran , Merry Mani , Mathews Jacob Subjects : Computer Vision and Pattern Recognition (cs.CV)
We introduce a structured low rank algorithm for the calibration-free
compensation of field inhomogeneity artifacts in Echo Planar Imaging (EPI) MRI
data. We acquire the data using two EPI readouts that differ in echo-time (TE).
Using time segmentation, we reformulate the field inhomogeneity compensation
problem as the recovery of an image time series from highly undersampled
Fourier measurements. The temporal profile at each pixel is modeled as a single
exponential, which is exploited to fill in the missing entries. We show that
the exponential behavior at each pixel, along with the spatial smoothness of
the exponential parameters, can be exploited to derive a 3D annihilation
relation in the Fourier domain. This relation translates to a low rank property
on a structured multi-fold Toeplitz matrix, whose entries correspond to the
measured k-space samples. We introduce a fast two-step algorithm for the
completion of the Toeplitz matrix from the available samples. In the first
step, we estimate the null space vectors of the Toeplitz matrix using only its
fully sampled rows. The null space is then used to estimate the signal
subspace, which facilitates the efficient recovery of the time series of
images. We finally demonstrate the proposed approach on spherical MR phantom
data and human data and show that the artifacts are significantly reduced. The
proposed approach could potentially be used to compensate for time varying
field map variations in dynamic applications such as functional MRI.
Comments: 3DV 2017
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
The research in dense online 3D mapping is mostly focused on the geometrical
accuracy and spatial extent of the reconstructions. Their color appearance is
often neglected, leading to inconsistent colors and noticeable artifacts. We
rectify this by extending a state-of-the-art SLAM system to accumulate colors
in HDR space. We replace the simplistic pixel intensity averaging scheme with
HDR color fusion rules tailored to the incremental nature of SLAM and a noise
model suitable for off-the-shelf RGB-D cameras. Our main contribution is a
map-aware exposure time controller. It makes decisions based on the global
state of the map and predicted camera motion, attempting to maximize the
information gain of each observation. We report a set of experiments
demonstrating the improved texture quality and advantages of using the custom
controller that is tightly integrated in the mapping loop.
Yuqian Zhou , Ding Liu , Thomas Huang Subjects : Computer Vision and Pattern Recognition (cs.CV)
Face detection is a well-explored problem. Many challenges on face detectors
like extreme pose, illumination, low resolution and small scales are studied in
the previous work. However, previous proposed models are mostly trained and
tested on good-quality images which are not always the case for practical
applications like surveillance systems. In this paper, we first review the
current state-of-the-art face detectors and their performance on benchmark
dataset FDDB, and compare the design protocols of the algorithms. Secondly, we
investigate their performance degradation while testing on low-quality images
with different levels of blur, noise, and contrast. Our results demonstrate
that both hand-crafted and deep-learning based face detectors are not robust
enough for low-quality images. It inspires researchers to produce more robust
design for face detection in the wild.
Unsupervised Representation Adversarial Learning Network: from Reconstruction to Generation
Yuqian Zhou , Kuangxiao Gu , Thomas Huang Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Learning (cs.LG); Machine Learning (stat.ML)
A good representation for arbitrarily complicated data should have the
capability of semantic generation, clustering and reconstruction. Previous
research has already achieved impressive performance on either one. This paper
aims at learning a disentangled representation effective for all of them in an
unsupervised way. To achieve all the three tasks together, we learn the forward
and inverse mapping between data and representation on the basis of a symmetric
adversarial process. In theory, we minimize the upper bound of the two
conditional entropy loss between the latent variables and the observations
together to achieve the cycle consistency. The newly proposed RepGAN is tested
on MNIST, fashionMNIST, CelebA, and SVHN datasets to perform unsupervised or
semi-supervised classification, generation and reconstruction tasks. The result
demonstrates that RepGAN is able to learn a useful and competitive
representation. To the author’s knowledge, our work is the first one to achieve
both a high unsupervised classification accuracy and low reconstruction error
on MNIST.
Sanjeel Parekh , Slim Essid , Alexey Ozerov , Ngoc Q. K. Duong , Patrick Pérez , Gaël Richard Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Sound (cs.SD); Audio and Speech Processing (eess.AS)
Audio-visual representation learning is an important task from the
perspective of designing machines with the ability to understand complex
events. To this end, we propose a novel multimodal framework that instantiates
multiple instance learning. We show that the learnt representations are useful
for classifying events and localizing their characteristic audio-visual
elements. The system is trained using only video-level event labels without any
timing information. An important feature of our method is its capacity to learn
from unsynchronized audio-visual events. We achieve state-of-the-art results on
a large-scale dataset of weakly-labeled audio event videos. Visualizations of
localized visual regions and audio segments substantiate our system’s efficacy,
especially when dealing with noisy situations where modality-specific cues
appear asynchronously.
Comments: 14 pages
Subjects:
Signal Processing (eess.SP)
; Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Ultrasound localization microscopy has enabled super-resolution vascular
imaging in laboratory environments through precise localization of individual
ultrasound contrast agents across numerous imaging frames. However, analysis of
high-density regions with significant overlaps among the agents’ point spread
responses yields high localization errors, constraining the technique to
low-concentration conditions. As such, long acquisition times are required to
sufficiently cover the vascular bed. In this work, we present a fast and
precise method for obtaining super-resolution vascular images from high-density
contrast-enhanced ultrasound imaging data. This method, which we term Deep
Ultrasound Localization Microscopy (Deep-ULM), exploits modern deep learning
strategies and employs a convolutional neural network to perform localization
microscopy in dense scenarios. This end-to-end fully convolutional neural
network architecture is trained effectively using on-line synthesized data,
enabling robust inference in-vivo under a wide variety of imaging conditions.
We show that deep learning attains super-resolution with challenging
contrast-agent concentrations (microbubble densities), both in-silico as well
as in-vivo, as we go from ultrasound scans of a rodent spinal cord in an
experimental setting to standard clinically-acquired recordings in a human
prostate. Deep-ULM achieves high quality sub-diffraction recovery, and is
suitable for real-time applications, resolving about 135 high-resolution
64×64-patches per second on a standard PC. Exploiting GPU computation, this
number increases to 2500 patches per second.
Comments: Published in IEEE AP-S Symposium on Antennas and Propagation and USNC-URSI Radio Science Meeting, 2018
Subjects:
Instrumentation and Methods for Astrophysics (astro-ph.IM)
; Computer Vision and Pattern Recognition (cs.CV)
The total amount of solar irradiance falling on the earth’s surface is an
important area of study amongst the photo-voltaic (PV) engineers and remote
sensing analysts. The received solar irradiance impacts the total amount of
generated solar energy. However, this generation is often hindered by the high
degree of solar irradiance variability. In this paper, we study the main
factors behind such variability with the assistance of Global Positioning
System (GPS) and ground-based, high-resolution sky cameras. This analysis will
also be helpful for understanding cloud phenomenon and other events in the
earth’s atmosphere.
Dominic Masters , Carlo Luschi Subjects : Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Modern deep neural network training is typically based on mini-batch
stochastic gradient optimization. While the use of large mini-batches increases
the available computational parallelism, small batch training has been shown to
provide improved generalization performance and allows a significantly smaller
memory footprint, which might also be exploited to improve machine throughput.
In this paper, we review common assumptions on learning rate scaling and
training duration, as a basis for an experimental comparison of test
performance for different mini-batch sizes. We adopt a learning rate that
corresponds to a constant average weight update per gradient calculation (i.e.,
per unit cost of computation), and point out that this results in a variance of
the weight updates that increases linearly with the mini-batch size (m).
The collected experimental results for the CIFAR-10, CIFAR-100 and ImageNet
datasets show that increasing the mini-batch size progressively reduces the
range of learning rates that provide stable convergence and acceptable test
performance. On the other hand, small mini-batch sizes provide more up-to-date
gradient calculations, which yields more stable and reliable training. The best
performance has been consistently obtained for mini-batch sizes between (m = 2)
and (m = 32), which contrasts with recent work advocating the use of mini-batch
sizes in the thousands.
Akash Ganesan , Divyansh Pal , Karthik Muthuraman , Shubham Dash Subjects : Computation and Language (cs.CL) ; Computer Vision and Pattern Recognition (cs.CV)
The primary aim of this project is to build a contextual Question-Answering
model for videos. The current methodologies provide a robust model for image
based Question-Answering, but we are aim to generalize this approach to be
videos. We propose a graphical representation of video which is able to handle
several types of queries across the whole video. For example, if a frame has an
image of a man and a cat sitting, it should be able to handle queries like,
where is the cat sitting with respect to the man? or ,what is the man holding
in his hand?. It should be able to answer queries relating to temporal
relationships also.
Comments: First version. Submitted to ECCV 2018
Subjects:
Learning (cs.LG)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
There has recently been a concerted effort to derive mechanisms in vision and
machine learning systems to offer uncertainty estimates of the predictions they
make. Clearly, there are enormous benefits to a system that is not only
accurate but also has a sense for when it is not sure. Existing proposals
center around Bayesian interpretations of modern deep architectures — these
are effective but can often be computationally demanding. We show how classical
ideas in the literature on exponential families on probabilistic networks
provide an excellent starting point to derive uncertainty estimates in Gated
Recurrent Units (GRU). Our proposal directly quantifies uncertainty
deterministically, without the need for costly sampling-based estimation. We
demonstrate how our model can be used to quantitatively and qualitatively
measure uncertainty in unsupervised image sequence prediction. To our
knowledge, this is the first result describing sampling-free uncertainty
estimation for powerful sequential models such as GRUs.
Juan Afanador , Nir Oren , Murilo S. Baptista Subjects : Artificial Intelligence (cs.AI) ; Multiagent Systems (cs.MA)
Delegation allows an agent to request that another agent completes a task. In
many situations the task may be delegated onwards, and this process can repeat
until it is eventually, successfully or unsuccessfully, performed. We consider
policies to guide an agent in choosing who to delegate to when such recursive
interactions are possible. These policies, based on quitting games and
multi-armed bandits, were empirically tested for effectiveness. Our results
indicate that the quitting game based policies outperform those which do not
explicitly account for the recursive nature of delegation.
Mayukh Das , Phillip Odom , Md. Rakibul Islam , Janardhan Rao (Jana)
Comments: Under Review at Knowledge-Based Systems (Elsevier); “Extended Abstract” accepted and to appear at AAMAS 2018
Subjects:
Artificial Intelligence (cs.AI)
Planning with preferences has been employed extensively to quickly generate
high-quality plans. However, it may be difficult for the human expert to supply
this information without knowledge of the reasoning employed by the planner and
the distribution of planning problems. We consider the problem of actively
eliciting preferences from a human expert during the planning process.
Specifically, we study this problem in the context of the Hierarchical Task
Network (HTN) planning framework as it allows easy interaction with the human.
Our experimental results on several diverse planning domains show that the
preferences gathered using the proposed approach improve the quality and speed
of the planner, while reducing the burden on the human expert.
Comments: v7
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI)
Dialogue policy transfer enables us to build dialogue policies in a target
domain with little data by leveraging knowledge from a source domain with
plenty of data. Dialogue sentences are usually represented by speech-acts and
domain slots, and the dialogue policy transfer is usually achieved by assigning
a slot mapping matrix based on human heuristics. However, existing dialogue
policy transfer methods cannot transfer across dialogue domains with different
speech-acts, for example, between systems built by different companies. Also,
they depend on either common slots or slot entropy, which are not available
when the source and target slots are totally disjoint and no database is
available to calculate the slot entropy. To solve this problem, we propose a
Policy tRansfer across dOMaIns and SpEech-acts (PROMISE) model, which is able
to transfer dialogue policies across domains with different speech-acts and
disjoint slots. The PROMISE model can learn to align different speech-acts and
slots simultaneously, and it does not require common slots or the calculation
of the slot entropy. Experiments on both real-world dialogue data and
simulations demonstrate that PROMISE model can effectively transfer dialogue
policies across domains with different speech-acts and disjoint slots.
Comments: Paper accepted for publication on IJCNN 2018
Subjects:
Learning (cs.LG)
; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
In Machine Learning, ensemble methods have been receiving a great deal of
attention. Techniques such as Bagging and Boosting have been successfully
applied to a variety of problems. Nevertheless, such techniques are still
susceptible to the effects of noise and outliers in the training data. We
propose a new method for the generation of pools of classifiers based on
Bagging, in which the probability of an instance being selected during the
resampling process is inversely proportional to its instance hardness, which
can be understood as the likelihood of an instance being misclassified,
regardless of the choice of classifier. The goal of the proposed method is to
remove noisy data without sacrificing the hard instances which are likely to be
found on class boundaries. We evaluate the performance of the method in
nineteen public data sets, and compare it to the performance of the Bagging and
Random Subspace algorithms. Our experiments show that in high noise scenarios
the accuracy of our method is significantly better than that of Bagging.
Comments: 9 pages, Published in Proceedings of NAACL workshop on stylistic variation (2018)
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI)
Social media features substantial stylistic variation, raising new challenges
for syntactic analysis of online writing. However, this variation is often
aligned with author attributes such as age, gender, and geography, as well as
more readily-available social network metadata. In this paper, we report new
evidence on the link between language and social networks in the task of
part-of-speech tagging. We find that tagger error rates are correlated with
network structure, with high accuracy in some parts of the network, and lower
accuracy elsewhere. As a result, tagger accuracy depends on training from a
balanced sample of the network, rather than training on texts from a narrow
subcommunity. We also describe our attempts to add robustness to stylistic
variation, by building a mixture-of-experts model in which each expert is
associated with a region of the social network. While prior work found that
similar approaches yield performance improvements in sentiment analysis and
entity linking, we were unable to obtain performance improvements in
part-of-speech tagging, despite strong evidence for the link between
part-of-speech error rates and social network structure.
Comments: 10 pages, 3 figures, 6 tables, presented at SPIE Defense + Commercial Sensing: Next Generation Analyst (2018)
Subjects:
Information Retrieval (cs.IR)
Personalized search provides a potentially powerful tool, however, it is
limited due to the large number of roles that a person has: parent, employee,
consumer, etc. We present the role-relevance algorithm: a search technique that
favors search results relevant to the user’s current role. The role-relevance
algorithm uses three factors to score documents: (1) the number of keywords
each document contains; (2) each document’s geographic relevance to the user’s
role (if applicable); and (3) each document’s topical relevance to the user’s
role (if applicable). Topical relevance is assessed using a novel extension to
Latent Dirichlet Allocation (LDA) that allows standard LDA to score document
relevance to user-defined topics. Overall results on a pre-labeled corpus show
an average improvement in search precision of approximately 20% compared to
keyword search alone.
Comments: 8 pages, 7 figures, about to submit for review
Subjects:
Social and Information Networks (cs.SI)
; Information Retrieval (cs.IR)
This paper presents twAwler, a lightweight twitter crawler that targets
language-specific communities of users. twAwler takes advantage of multiple
endpoints of the twitter API to explore user relations and quickly recognize
users belonging to the targetted set. It performs a complete crawl for all
users, discovering many standard user relations, including the retweet graph,
mention graph, reply graph, quote graph, follow graph, etc. twAwler respects
all twitter policies and rate limits, while able to monitor large communities
of active users.
twAwler was used between August 2016 and March 2018 to generate an extensive
dataset of close to all Greek-speaking twitter accounts (about 330 thousand)
and their tweets and relations. In total, the crawler has gathered 750 million
tweets of which 424 million are in Greek; 750 million follow relations;
information about 300 thousand lists, their members (119 million member
relations) and subscribers (27 thousand subscription relations); 705 thousand
trending topics; information on 52 million users in total of which 292 thousand
have been since suspended, 141 thousand have deleted their account, and 3.5
million are protected and cannot be crawled. twAwler mines the collected tweets
for the retweet, quote, reply, and mention graphs, which, in addition to the
follow relation crawled, offer vast opportunities for analysis and further
research.
Comments: 13 pages, 11 figures, 6 tables
Subjects:
Databases (cs.DB)
; Information Retrieval (cs.IR)
We present a novel natural language query interface, the FactChecker, aimed
at text summaries of relational data sets. The tool focuses on natural language
claims that translate into an SQL query and a claimed query result. Similar in
spirit to a spell checker, the FactChecker marks up text passages that seem to
be inconsistent with the actual data. At the heart of the system is a
probabilistic model that reasons about the input document in a holistic
fashion. Based on claim keywords and the document structure, it maps each text
claim to a probability distribution over associated query translations. By
efficiently executing tens to hundreds of thousands of candidate translations
for a typical input document, the system maps text claims to correctness
probabilities. This process becomes practical via a specialized processing
backend, avoiding redundant work via query merging and result caching.
Verification is an interactive process in which users are shown tentative
results, enabling them to take corrective actions if necessary.
Our system was tested on a set of 53 public articles containing 392 claims.
Our test cases include articles from major newspapers, summaries of survey
results, and Wikipedia articles. Our tool revealed erroneous claims in roughly
a third of test cases. A detailed user study shows that users using our tool
are in average six times faster at checking text summaries, compared to generic
SQL interfaces. In fully automated verification, our tool achieves
significantly higher recall and precision than baselines from the areas of
natural language query interfaces and fact checking.
Comments: PhD thesis, 2017
Subjects:
Computation and Language (cs.CL)
; Information Retrieval (cs.IR)
Verifiability is one of the core editing principles in Wikipedia, where
editors are encouraged to provide citations for the added statements.
Statements can be any arbitrary piece of text, ranging from a sentence up to a
paragraph. However, in many cases, citations are either outdated, missing, or
link to non-existing references (e.g. dead URL, moved content etc.). In total,
20/% of the cases such citations refer to news articles and represent the
second most cited source. Even in cases where citations are provided, there are
no explicit indicators for the span of a citation for a given piece of text. In
addition to issues related with the verifiability principle, many Wikipedia
entity pages are incomplete, with relevant information that is already
available in online news sources missing. Even for the already existing
citations, there is often a delay between the news publication time and the
reference time.
In this thesis, we address the aforementioned issues and propose automated
approaches that enforce the verifiability principle in Wikipedia, and suggest
relevant and missing news references for further enriching Wikipedia entity
pages.
Benchmarking Top-K Keyword and Top-K Document Processing with T({}^2)K({}^2) and T({}^2)K({}^2)D({}^2)
Journal-ref: Future Generation Computer Systems, Elsevier, 2018, 85, pp.60-75.
https://www.sciencedirect.com/science/article/pii/S0167739X17323580
Subjects:
Databases (cs.DB)
; Information Retrieval (cs.IR)
Top-k keyword and top-k document extraction are very popular text analysis
techniques. Top-k keywords and documents are often computed on-the-fly, but
they exploit weighted vocabularies that are costly to build. To compare
competing weighting schemes and database implementations, benchmarking is
customary. To the best of our knowledge, no benchmark currently addresses these
problems. Hence, in this paper, we present T({}^2)K({}^2), a top-k keywords and
documents benchmark, and its decision support-oriented evolution
T({}^2)K({}^2)D({}^2). Both benchmarks feature a real tweet dataset and queries
with various complexities and selectivities. They help evaluate weighting
schemes and database implementations in terms of computing performance. To
illustrate our bench-marks’ relevance and genericity, we successfully ran
performance tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand,
and on different relational (Oracle, PostgreSQL) and document-oriented
(MongoDB) database implementations, on the other hand.
Guillaume Lample , Myle Ott , Alexis Conneau , Ludovic Denoyer , Marc'Aurelio Ranzato Subjects : Computation and Language (cs.CL)
Machine translation systems achieve near human-level performance on some
languages, yet their effectiveness strongly relies on the availability of large
amounts of bitexts, which hinders their applicability to the majority of
language pairs. This work investigates how to learn to translate when having
access to only large monolingual corpora in each language. We propose two model
variants, a neural and a phrase-based model. Both versions leverage automatic
generation of parallel data by backtranslating with a backward model operating
in the other direction, and the denoising effect of a language model trained on
the target side. These models are significantly better than methods from the
literature, while being simpler and having fewer hyper-parameters. On the
widely used WMT14 English-French and WMT16 German-English benchmarks, our
models respectively obtain 27.1 and 23.6 BLEU points without using a single
parallel sentence, outperforming the state of the art by more than 11 BLEU
points.
Comments: 10 pages, 8 Figures, 6 Tables
Subjects:
Computation and Language (cs.CL)
We present a novel approach to learn representations for sentence-level
semantic similarity using conversational data. Our method trains an
unsupervised model to predict conversational input-response pairs. The
resulting sentence embeddings perform well on the semantic textual similarity
(STS) benchmark and SemEval 2017’s Community Question Answering (CQA) question
similarity subtask. Performance is further improved by introducing multitask
training combining the conversational input-response prediction task and a
natural language inference task. Extensive experiments show the proposed model
achieves the best performance among all neural models on the STS benchmark and
is competitive with the state-of-the-art feature engineered and mixed systems
in both tasks.
Armand Joulin , Piotr Bojanowski , Tomas Mikolov , Edouard Grave Subjects : Computation and Language (cs.CL) ; Learning (cs.LG)
Continuous word representations, learned on different languages, can be
aligned with remarkable precision. Using a small bilingual lexicon as training
data, learning the linear transformation is often formulated as a regression
problem using the square loss. The obtained mapping is known to suffer from the
hubness problem, when used for retrieval tasks (e.g. for word translation). To
address this issue, we propose to use a retrieval criterion instead of the
square loss for learning the mapping. We evaluate our method on word
translation, showing that our loss function leads to state-of-the-art results,
with the biggest improvements observed for distant language pairs such as
English-Chinese.
Comments: 6 pages
Subjects:
Computation and Language (cs.CL)
The current trend of extractive question answering (QA) heavily relies on the
joint encoding of the document and the question. In this paper, we formalize a
new modular variant of extractive QA, Phrase-Indexed Question Answering
(PI-QA), that enforces complete independence of the document encoder from the
question by building the standalone representation of the document discourse, a
key research goal in machine reading comprehension. That is, the document
encoder generates an index vector for each answer candidate phrase in the
document; at inference time, each question is mapped to the same vector space
and the answer with the nearest index vector is obtained. The formulation also
implies a significant scalability advantage since the index vectors can be
pre-computed and hashed offline for efficient retrieval. We experiment with
baseline models for the new task, which achieve a reasonable accuracy but
significantly underperform unconstrained QA models. We invite the QA research
community to engage in PI-QA for closing the gap.
Kris Cao , Stephen Clark Subjects : Computation and Language (cs.CL)
Generating from Abstract Meaning Representation (AMR) is an underspecified
problem, as many syntactic decisions are not specified by the semantic graph.
We learn a sequence-to-sequence model that generates possible constituency
trees for an AMR graph, and then train another model to generate text
realisations conditioned on both an AMR graph and a constituency tree. We show
that factorising the model this way lets us effectively use parse information,
obtaining competitive BLEU scores on self-generated parses and impressive BLEU
scores with oracle parses. We also demonstrate that we can generate
meaning-preserving syntactic paraphrases of the same AMR graph.
Anton Bakhtin , Arthur Szlam , Marc'Aurelio Ranzato , Edouard Grave Subjects : Computation and Language (cs.CL)
It is often the case that the best performing language model is an ensemble
of a neural language model with n-grams. In this work, we propose a method to
improve how these two models are combined. By using a small network which
predicts the mixture weight between the two models, we adapt their relative
importance at each time step. Because the gating network is small, it trains
quickly on small amounts of held out data, and does not add overhead at scoring
time. Our experiments carried out on the One Billion Word benchmark show a
significant improvement over the state of the art ensemble without retraining
of the basic modules.
Comments: v7
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI)
Dialogue policy transfer enables us to build dialogue policies in a target
domain with little data by leveraging knowledge from a source domain with
plenty of data. Dialogue sentences are usually represented by speech-acts and
domain slots, and the dialogue policy transfer is usually achieved by assigning
a slot mapping matrix based on human heuristics. However, existing dialogue
policy transfer methods cannot transfer across dialogue domains with different
speech-acts, for example, between systems built by different companies. Also,
they depend on either common slots or slot entropy, which are not available
when the source and target slots are totally disjoint and no database is
available to calculate the slot entropy. To solve this problem, we propose a
Policy tRansfer across dOMaIns and SpEech-acts (PROMISE) model, which is able
to transfer dialogue policies across domains with different speech-acts and
disjoint slots. The PROMISE model can learn to align different speech-acts and
slots simultaneously, and it does not require common slots or the calculation
of the slot entropy. Experiments on both real-world dialogue data and
simulations demonstrate that PROMISE model can effectively transfer dialogue
policies across domains with different speech-acts and disjoint slots.
Comments: 11 pages, 4 figures, accepted as long paper of NAACL HLT 2018
Subjects:
Computation and Language (cs.CL)
How to identify, extract, and use phrasal knowledge is a crucial problem for
the task of Recognizing Textual Entailment (RTE). To solve this problem, we
propose a method for detecting paraphrases via natural deduction proofs of
semantic relations between sentence pairs. Our solution relies on a graph
reformulation of partial variable unifications and an algorithm that induces
subgraph alignments between meaning representations. Experiments show that our
method can automatically detect various paraphrases that are absent from
existing paraphrase databases. In addition, the detection of paraphrases using
proof information improves the accuracy of RTE tasks.
Comments: Check-worthiness; Fact-Checking; Veracity; Community-Question Answering; Neural Networks; Arabic; English
Journal-ref: NAACL-2018
Subjects:
Computation and Language (cs.CL)
We present ClaimRank, an online system for detecting check-worthy claims.
While originally trained on political debates, the system can work for any kind
of text, e.g., interviews or regular news articles. Its aim is to facilitate
manual fact-checking efforts by prioritizing the claims that fact-checkers
should consider first. ClaimRank supports both Arabic and English, it is
trained on actual annotations from nine reputable fact-checking organizations
(PolitiFact, FactCheck, ABC, CNN, NPR, NYT, Chicago Tribune, The Guardian, and
Washington Post), and thus it can mimic the claim selection strategies for each
and any of them, as well as for the union of them all.
Comments: PhD thesis, 2017
Subjects:
Computation and Language (cs.CL)
; Information Retrieval (cs.IR)
Verifiability is one of the core editing principles in Wikipedia, where
editors are encouraged to provide citations for the added statements.
Statements can be any arbitrary piece of text, ranging from a sentence up to a
paragraph. However, in many cases, citations are either outdated, missing, or
link to non-existing references (e.g. dead URL, moved content etc.). In total,
20/% of the cases such citations refer to news articles and represent the
second most cited source. Even in cases where citations are provided, there are
no explicit indicators for the span of a citation for a given piece of text. In
addition to issues related with the verifiability principle, many Wikipedia
entity pages are incomplete, with relevant information that is already
available in online news sources missing. Even for the already existing
citations, there is often a delay between the news publication time and the
reference time.
In this thesis, we address the aforementioned issues and propose automated
approaches that enforce the verifiability principle in Wikipedia, and suggest
relevant and missing news references for further enriching Wikipedia entity
pages.
Comments: NAACL-2018; Stance detection; Fact-Checking; Veracity; Memory networks; Neural Networks; Distributed Representations
Subjects:
Computation and Language (cs.CL)
We present a novel end-to-end memory network for stance detection, which
jointly (i) predicts whether a document agrees, disagrees, discusses or is
unrelated with respect to a given target claim, and also (ii) extracts snippets
of evidence for that prediction. The network operates at the paragraph level
and integrates convolutional and recurrent neural networks, as well as a
similarity matrix as part of the overall architecture. The experimental
evaluation on the Fake News Challenge dataset shows state-of-the-art
performance.
Comments: this https URL
Subjects:
Computation and Language (cs.CL)
For natural language understanding (NLU) technology to be maximally useful,
both practically and as a scientific object of study, it must be general: it
must be able to process language in a way that is not exclusively tailored to
any one specific task or dataset. In pursuit of this objective, we introduce
the General Language Understanding Evaluation benchmark (GLUE), a tool for
evaluating and analyzing the performance of models across a diverse range of
existing NLU tasks. GLUE is model-agnostic, but it incentivizes sharing
knowledge across tasks because certain tasks have very limited training data.
We further provide a hand-crafted diagnostic test suite that enables detailed
linguistic analysis of NLU models. We evaluate baselines based on current
methods for multi-task and transfer learning and find that they do not
immediately give substantial improvements over the aggregate performance of
training a separate model per task, indicating room for improvement in
developing general and robust NLU systems.
Comments: Accepted as a conference paper at NAACL HLT 2018
Subjects:
Computation and Language (cs.CL)
Sentence simplification aims to simplify the content and structure of complex
sentences, and thus make them easier to interpret for human readers, and easier
to process for downstream NLP applications. Recent advances in neural machine
translation have paved the way for novel approaches to the task. In this paper,
we adapt an architecture with augmented memory capacities called Neural
Semantic Encoders (Munkhdalai and Yu, 2017) for sentence simplification. Our
experiments demonstrate the effectiveness of our approach on different
simplification datasets, both in terms of automatic evaluation measures and
human judgments.
Akash Ganesan , Divyansh Pal , Karthik Muthuraman , Shubham Dash Subjects : Computation and Language (cs.CL) ; Computer Vision and Pattern Recognition (cs.CV)
The primary aim of this project is to build a contextual Question-Answering
model for videos. The current methodologies provide a robust model for image
based Question-Answering, but we are aim to generalize this approach to be
videos. We propose a graphical representation of video which is able to handle
several types of queries across the whole video. For example, if a frame has an
image of a man and a cat sitting, it should be able to handle queries like,
where is the cat sitting with respect to the man? or ,what is the man holding
in his hand?. It should be able to answer queries relating to temporal
relationships also.
Comments: NAACL 2018 Workshop on Computational Models of Reference, Anaphora, and Coreference (CRAC). New Orleans, LA
Subjects:
Computation and Language (cs.CL)
Notional anaphors are pronouns which disagree with their antecedents’
grammatical categories for notional reasons, such as plural to singular
agreement in: ‘the government … they’. Since such cases are rare and conflict
with evidence from strictly agreeing cases (‘the government … it’), they
present a substantial challenge to both coreference resolution and referring
expression generation. Using the OntoNotes corpus, this paper takes an ensemble
approach to predicting English notional anaphora in context on the basis of the
largest empirical data to date. In addition to state of the art prediction
accuracy, the results suggest that theoretical approaches positing a plural
construal at the antecedent’s utterance are insufficient, and that
circumstances at the anaphor’s utterance location, as well as global factors
such as genre, have a strong effect on the choice of referring expression.
Comments: 9 pages, Published in Proceedings of NAACL workshop on stylistic variation (2018)
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI)
Social media features substantial stylistic variation, raising new challenges
for syntactic analysis of online writing. However, this variation is often
aligned with author attributes such as age, gender, and geography, as well as
more readily-available social network metadata. In this paper, we report new
evidence on the link between language and social networks in the task of
part-of-speech tagging. We find that tagger error rates are correlated with
network structure, with high accuracy in some parts of the network, and lower
accuracy elsewhere. As a result, tagger accuracy depends on training from a
balanced sample of the network, rather than training on texts from a narrow
subcommunity. We also describe our attempts to add robustness to stylistic
variation, by building a mixture-of-experts model in which each expert is
associated with a region of the social network. While prior work found that
similar approaches yield performance improvements in sentiment analysis and
entity linking, we were unable to obtain performance improvements in
part-of-speech tagging, despite strong evidence for the link between
part-of-speech error rates and social network structure.
Comments: NAACL 2018
Subjects:
Computation and Language (cs.CL)
We present a novel approach for determining learners’ second language
proficiency which utilizes behavioral traces of eye movements during reading.
Our approach provides stand-alone eyetracking based English proficiency scores
which reflect the extent to which the learner’s gaze patterns in reading are
similar to those of native English speakers. We show that our scores correlate
strongly with standardized English proficiency tests. We also demonstrate that
gaze information can be used to accurately predict the outcomes of such tests.
Our approach yields the strongest performance when the test taker is presented
with a suite of sentences for which we have eyetracking data from other
readers. However, it remains effective even using eyetracking with sentences
for which eye movement data have not been previously collected. By deriving
proficiency as an automatic byproduct of eye movements during ordinary reading,
our approach offers a potentially valuable new tool for second language
proficiency assessment. More broadly, our results open the door to future
methods for inferring reader characteristics from the behavioral traces of
reading.
Comments: 14 pages, 6 figures, has been submitted for review
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
Social Graph Analytics applications are very often built using off-the-shelf
analytics frameworks. These, however, are profiled and optimized for the
general case and have to perform for all kinds of graphs. This paper
investigates how knowledge of the application and the dataset can help optimize
performance with minimal effort. We concentrate on the impact of partitioning
strategies on the performance of computations on social graphs. We evaluate six
graph partitioning algorithms on a set of six social graphs, using four
standard graph algorithms by measuring a set of five partitioning metrics.
We analyze the performance of each partitioning strategy with respect to (i)
the properties of the graph dataset, (ii) each analytics computation,of
partitions. We discover that communication cost is the best predictor of
performance for most -but not all- analytics computations. We also find that
the best partitioning strategy for a particular kind of algorithm may not be
the best for another, and that optimizing for the general case of all
algorithms may not select the optimal partitioning strategy for a given graph
algorithm. We conclude with insights on selecting the right data partitioning
strategy, which has significant impact on the performance of large graph
analytics computations; certainly enough to warrant optimization of the
partitioning strategy to the computation and to the dataset.
Comments: 12 pages, 7 figures, ICCSA 2018, submitted to Lecture Notes in Computer Science (Springer Verlag)
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
; Computational Engineering, Finance, and Science (cs.CE)
Usage of GPUs as co-processors is a well-established approach to accelerate
costly algorithms operating on matrices and vectors.
We aim to further improve the performance of the Global Neutrino Analysis
framework (GNA) by adding GPU support in a way that is transparent to the end
user. To achieve our goal we use CUDA, a state of the art technology providing
GPGPU programming methods.
In this paper we describe new features of GNA related to CUDA support. Some
specific framework features that influence GPGPU integration are also
explained. The paper investigates the feasibility of GPU technology application
and shows an example of the achieved acceleration of an algorithm implemented
within framework. Benchmarks show a significant performance increase when using
GPU transformations.
The project is currently in the developmental phase. Our plans include
implementation of the set of transformations necessary for the data analysis in
the GNA framework and tests of the GPU expediency in the complete analysis
chain.
Max J. Friese , Thorsten Ehlers , Dirk Nowotka Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)
The computation of a cyber-physical system’s reaction to a stimulus typically
involves the execution of several tasks. The delay between stimulus and
reaction thus depends on the interaction of these tasks and is subject to
timing constraints. Such constraints exist for a number of reasons and range
from possible impacts on customer experiences to safety requirements. We
present a technique to determine end-to-end latencies of such task sequences.
The technique is demonstrated on the example of electronic control units (ECUs)
in automotive embedded real-time systems. Our approach is able to deal with
multi-core architectures and supports four different activation patterns,
including interrupts. It is the first formal analysis approach making use of
load assumptions in order to exclude infeasible data propagation paths without
the knowledge of worst-case execution times or worst-case response times. We
employ a constraint programming solver to compute bounds on end-to-end
latencies.
OpenFPM: A scalable open framework for particle and particle-mesh codes on parallel computers
Comments: 32 pages, 12 figures
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
; Mathematical Software (cs.MS); Software Engineering (cs.SE); Computational Physics (physics.comp-ph)
Scalable and efficient numerical simulations continue to gain importance, as
computation is firmly established as the third pillar of discovery, alongside
theory and experiment. Meanwhile, the performance of computing hardware grows
through increasing heterogeneous parallelism, enabling simulations of ever more
complex models. However, efficiently implementing scalable codes on
heterogeneous, distributed hardware systems becomes the bottleneck. This
bottleneck can be alleviated by intermediate software layers that provide
higher-level abstractions closer to the problem domain, hence allowing the
computational scientist to focus on the simulation. Here, we present OpenFPM,
an open and scalable framework that provides an abstraction layer for numerical
simulations using particles and/or meshes. OpenFPM provides transparent and
scalable infrastructure for shared-memory and distributed-memory
implementations of particles-only and hybrid particle-mesh simulations of both
discrete and continuous models, as well as non-simulation codes. This
infrastructure is complemented with portable implementations of frequently used
numerical routines, as well as interfaces to third-party libraries. We present
the architecture and design of OpenFPM, detail the underlying abstractions, and
benchmark the framework in applications ranging from Smoothed-Particle
Hydrodynamics (SPH) to Molecular Dynamics (MD), Discrete Element Methods (DEM),
Vortex Methods, stencil codes, high-dimensional Monte Carlo sampling (CMA-ES),
and Reaction-Diffusion solvers, comparing it to the current state of the art
and existing software frameworks.
Ludwig Dierks , Ian Kash , Sven Seuken Subjects : Distributed, Parallel, and Cluster Computing (cs.DC) ; Performance (cs.PF)
Cloud computing providers must handle customer workloads that wish to scale
their use of resources such as virtual machines up and down over time.
Currently, this is often done using simple threshold policies to reserve large
parts of each cluster. This leads to low utilization of the cluster on average.
In this paper, we propose more sophisticated policies for controlling admission
to a cluster and demonstrate that our policies significantly increase cluster
utilization. We first introduce a model and fit its parameters on a data trace
from Microsoft Azure. We then design policies that estimate moments of each
workload’s distribution of future resource usage. Via simulations we show that,
while estimating the first moments of workloads leads to a substantial
improvement over the simple threshold policy, also taking the second moments
into account yields another improvement in utilization. We then evaluate how
much further this can be improved with learned or elicited prior information
and how to incentivize users to provide this information.
Jesper Larsson Träff Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)
Standard implementations of 2-way, parallel, distributed memory Quicksort
algorithms exchange partitioned data elements at each level of the recursion.
This is not necessary: It suffices to exchange only the chosen pivots, while
postponing element redistribution to the bottom of the recursion. This reduces
the total volume of data exchanged from (O(nlog p)) to (O(n)), (n) being the
total number of elements to be sorted and (p) a power-of-two number of
processors, while preserving the flavor, characteristics and properties of a
Quicksort implementation. We give a template implementation based on this
observation, and compare against a standard, 2-way parallel Quicksort
implementation as well as other recent Quicksort implementations. We show
substantial, and considerably better absolute speed-up on a medium-large
InfiniBand cluster.
Enrique Fynn , Fernando Pedone Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)
Blockchain has received much attention in recent years. This immense
popularity has raised a number of concerns, scalability of blockchain systems
being a common one. In this paper, we seek to understand how Ethereum, a
well-established blockchain system, would respond to sharding. Sharding is a
prevalent technique to increase the scalability of distributed systems. To
understand how sharding would affect Ethereum, we model Ethereum blockchain as
a graph and evaluate five methods to partition the graph. We analyze the
results using three metrics: the balance among shards, the number of
transactions that would involve multiple shards, and the amount of data that
would be relocated across shards upon a repartitioning of the system.
Comments: 9 pages, 6 figures. Package available at this https URL
Subjects:
Instrumentation and Methods for Astrophysics (astro-ph.IM)
; Distributed, Parallel, and Cluster Computing (cs.DC)
We investigate the performances of Apache Spark, a cluster computing
framework, for analyzing data from future LSST-like galaxy surveys. Apache
Spark attempts to address big data problems have hitherto proved successful in
the industry, but its main use is often limited to naively structured data. We
show how to manage more complex binary data structures such as those handled in
astrophysics experiments, within a distributed environment. To this purpose, we
first designed and implemented a Spark connector to handle sets of arbitrarily
large FITS files, called spark-fits. The user interface is such that a simple
file “drag-and-drop” to a cluster gives full advantage of the framework. We
demonstrate the very high scalability of spark-fits using the LSST fast
simulation tool, CoLoRe, and present the methodologies for measuring and tuning
the performance bottlenecks for the workloads, scaling up to terabytes of FITS
data on the Cloud@VirtualData, located at Universit’e Paris Sud. We also
evaluate its performance on Cori, a High-Performance Computing system located
at NERSC, and widely used in the scientific community.
Mansoor Ahmed , Kari Kostiainen Subjects : Cryptography and Security (cs.CR) ; Distributed, Parallel, and Cluster Computing (cs.DC)
Decentralized currencies and similar blockchain applications require
consensus. Bitcoin achieves eventual consensus in a fully-decentralized
setting, but provides very low throughput and high latency with excessive
energy consumption. In this paper, we propose identity aging as a novel and
more efficient consensus approach. Our main idea is to establish reliable,
long-term identities and choose the oldest identity as the miner on each round.
Based on this approach, we design two blockchain systems. Our first system,
SCIFER, leverages Intel’s SGX attestation for identity bootstrapping in a
partially-decentralized setting, where blockchain is permissionless, but we
trust Intel for attestation. Our second system, DIFER, creates new identities
through a novel mining mechanism and provides consensus in a
fully-decentralized setting, similar to Bitcoin. One of the main benefits of
identity aging is that it does not require constant computation. Our analysis
and experiments show that identity aging provides significant performance
improvements over Bitcoin with strong security guarantees.
Modelling customer online behaviours with neural networks: applications to conversion prediction and advertising retargeting
Yanwei Cui , Rogatien Tobossi , Olivia Vigouroux Subjects : Learning (cs.LG) ; Machine Learning (stat.ML)
In this paper, we apply neural networks into digital marketing world for the
purpose of better targeting the potential customers. To do so, we model the
customer online behaviours using dedicated neural network architectures.
Starting from user searched keywords in a search engine to the landing page and
different following pages, until the user left the site, we model the whole
visited journey with a Recurrent Neural Network (RNN), together with
Convolution Neural Networks (CNN) that can take into account of the semantic
meaning of user searched keywords and different visited page names. With such
model, we use Monte Carlo simulation to estimate the conversion rates of each
potential customer in the future visiting. We believe our concept and the
preliminary promising results in this paper enable the use of largely available
customer online behaviours data for advanced digital marketing analysis.
Dominic Masters , Carlo Luschi Subjects : Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Modern deep neural network training is typically based on mini-batch
stochastic gradient optimization. While the use of large mini-batches increases
the available computational parallelism, small batch training has been shown to
provide improved generalization performance and allows a significantly smaller
memory footprint, which might also be exploited to improve machine throughput.
In this paper, we review common assumptions on learning rate scaling and
training duration, as a basis for an experimental comparison of test
performance for different mini-batch sizes. We adopt a learning rate that
corresponds to a constant average weight update per gradient calculation (i.e.,
per unit cost of computation), and point out that this results in a variance of
the weight updates that increases linearly with the mini-batch size (m).
The collected experimental results for the CIFAR-10, CIFAR-100 and ImageNet
datasets show that increasing the mini-batch size progressively reduces the
range of learning rates that provide stable convergence and acceptable test
performance. On the other hand, small mini-batch sizes provide more up-to-date
gradient calculations, which yields more stable and reliable training. The best
performance has been consistently obtained for mini-batch sizes between (m = 2)
and (m = 32), which contrasts with recent work advocating the use of mini-batch
sizes in the thousands.
Comments: 23 pages, 9 figures
Subjects:
Learning (cs.LG)
; Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
We present ElPiGraph, a method for approximating data distributions having
non-trivial topological features such as the existence of excluded regions or
branching structures. Unlike many existing methods, ElPiGraph is not based on
the construction of a k-nearest neighbour graph, a procedure that can perform
poorly in the case of multidimensional and noisy data. Instead, ElPiGraph
constructs elastic principal graphs in a more robust way by minimizing elastic
energy, applying graph grammars and explicitly controlling topological
complexity. Using trimmed approximation error function makes ElPiGraph
extremely robust to the presence of background noise without decreasing
computational performance and allows it to deal with complex cases of manifold
learning (for example, ElPiGraph can learn disconnected intersecting
manifolds). Thanks to the quasi-quadratic nature of the elastic function,
ElPiGraph performs almost as fast as a simple k-means clustering and,
therefore, is much more scalable than alternative methods, and can work on
large datasets containing millions of high dimensional points on a personal
computer. The excellent performance of the method opens the possibility to
apply resampling and to approximate complex data structures via principal graph
ensembles which can be used to construct consensus principal graphs. ElPiGraph
is currently implemented in five programming languages and accompanied by a
graphical user interface, which makes it a versatile tool to deal with complex
data in various fields from molecular biology, where it can be used to infer
pseudo-time trajectories from single-cell RNASeq, to astronomy, where it can be
used to approximate complex structures in the distribution of galaxies.
Streaming Active Learning Strategies for Real-Life Credit Card Fraud Detection: Assessment and Visualization
Journal-ref: International Journal of Data Science and Analytics 2018
Subjects:
Learning (cs.LG)
; Machine Learning (stat.ML)
Credit card fraud detection is a very challenging problem because of the
specific nature of transaction data and the labeling process. The transaction
data is peculiar because they are obtained in a streaming fashion, they are
strongly imbalanced and prone to non-stationarity. The labeling is the outcome
of an active learning process, as every day human investigators contact only a
small number of cardholders (associated to the riskiest transactions) and
obtain the class (fraud or genuine) of the related transactions. An adequate
selection of the set of cardholders is therefore crucial for an efficient fraud
detection process. In this paper, we present a number of active learning
strategies and we investigate their fraud detection accuracies. We compare
different criteria (supervised, semi-supervised and unsupervised) to query
unlabeled transactions. Finally, we highlight the existence of an
exploitation/exploration trade-off for active learning in the context of fraud
detection, which has so far been overlooked in the literature.
Comments: Paper accepted for publication on IJCNN 2018
Subjects:
Learning (cs.LG)
; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
In Machine Learning, ensemble methods have been receiving a great deal of
attention. Techniques such as Bagging and Boosting have been successfully
applied to a variety of problems. Nevertheless, such techniques are still
susceptible to the effects of noise and outliers in the training data. We
propose a new method for the generation of pools of classifiers based on
Bagging, in which the probability of an instance being selected during the
resampling process is inversely proportional to its instance hardness, which
can be understood as the likelihood of an instance being misclassified,
regardless of the choice of classifier. The goal of the proposed method is to
remove noisy data without sacrificing the hard instances which are likely to be
found on class boundaries. We evaluate the performance of the method in
nineteen public data sets, and compare it to the performance of the Bagging and
Random Subspace algorithms. Our experiments show that in high noise scenarios
the accuracy of our method is significantly better than that of Bagging.
Byung-Hak Kim , Ethan Vizitei , Varun Ganapathi Subjects : Learning (cs.LG) ; Computers and Society (cs.CY); Machine Learning (stat.ML)
Student performance prediction – where a machine forecasts the future
performance of students as they interact with online coursework – is a
challenging problem. Reliable early-stage predictions of a student’s future
performance could be critical to facilitate timely educational interventions
during a course. However, very few prior studies have explored this problem
from a deep learning perspective. In this paper, we recast the student
performance prediction problem as a sequential event prediction problem and
propose a new deep learning based algorithm, termed GritNet, which builds upon
the bidirectional long short term memory (BLSTM). Our results, from real
Udacity students’ graduation predictions, show that the GritNet not only
consistently outperforms the standard logistic-regression based method, but
that improvements are substantially pronounced in the first few weeks when
accurate predictions are most challenging.
Comments: First version. Submitted to ECCV 2018
Subjects:
Learning (cs.LG)
; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
There has recently been a concerted effort to derive mechanisms in vision and
machine learning systems to offer uncertainty estimates of the predictions they
make. Clearly, there are enormous benefits to a system that is not only
accurate but also has a sense for when it is not sure. Existing proposals
center around Bayesian interpretations of modern deep architectures — these
are effective but can often be computationally demanding. We show how classical
ideas in the literature on exponential families on probabilistic networks
provide an excellent starting point to derive uncertainty estimates in Gated
Recurrent Units (GRU). Our proposal directly quantifies uncertainty
deterministically, without the need for costly sampling-based estimation. We
demonstrate how our model can be used to quantitatively and qualitatively
measure uncertainty in unsupervised image sequence prediction. To our
knowledge, this is the first result describing sampling-free uncertainty
estimation for powerful sequential models such as GRUs.
Nonparametric Stochastic Compositional Gradient Descent for Q-Learning in Continuous Markov Decision Problems
Alec Koppel , Ekaterina Tolstaya , Ethan Stump , Alejandro Ribeiro Subjects : Learning (cs.LG) ; Systems and Control (cs.SY); Machine Learning (stat.ML)
We consider Markov Decision Problems defined over continuous state and action
spaces, where an autonomous agent seeks to learn a map from its states to
actions so as to maximize its long-term discounted accumulation of rewards. We
address this problem by considering Bellman’s optimality equation defined over
action-value functions, which we reformulate into a nested non-convex
stochastic optimization problem defined over a Reproducing Kernel Hilbert Space
(RKHS). We develop a functional generalization of stochastic quasi-gradient
method to solve it, which, owing to the structure of the RKHS, admits a
parameterization in terms of scalar weights and past state-action pairs which
grows proportionately with the algorithm iteration index. To ameliorate this
complexity explosion, we apply Kernel Orthogonal Matching Pursuit to the
sequence of kernel weights and dictionaries, which yields a controllable error
in the descent direction of the underlying optimization method. We prove that
the resulting algorithm, called KQ-Learning, converges with probability 1 to a
stationary point of this problem, yielding a fixed point of the Bellman
optimality operator under the hypothesis that it belongs to the RKHS. Under
constant learning rates, we further obtain convergence to a small Bellman error
that depends on the chosen learning rates. Numerical evaluation on the
Continuous Mountain Car and Inverted Pendulum tasks yields convergent
parsimonious learned action-value functions, policies that are competitive with
the state of the art, and exhibit reliable, reproducible learning behavior.
Armand Joulin , Piotr Bojanowski , Tomas Mikolov , Edouard Grave Subjects : Computation and Language (cs.CL) ; Learning (cs.LG)
Continuous word representations, learned on different languages, can be
aligned with remarkable precision. Using a small bilingual lexicon as training
data, learning the linear transformation is often formulated as a regression
problem using the square loss. The obtained mapping is known to suffer from the
hubness problem, when used for retrieval tasks (e.g. for word translation). To
address this issue, we propose to use a retrieval criterion instead of the
square loss for learning the mapping. We evaluate our method on word
translation, showing that our loss function leads to state-of-the-art results,
with the biggest improvements observed for distant language pairs such as
English-Chinese.
Comments: 10 pages, 5 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Cryptography and Security (cs.CR); Learning (cs.LG); Machine Learning (stat.ML)
While deep neural networks have proven to be a powerful tool for many
recognition and classification tasks, their stability properties are still not
well understood. In the past, image classifiers have been shown to be
vulnerable to so-called adversarial attacks, which are created by additively
perturbing the correctly classified image.
In this paper, we propose the ADef algorithm to construct a different kind of
adversarial attack created by iteratively applying small deformations to the
image, found through a gradient descent step. We demonstrate our results on
MNIST with a convolutional neural network and on ImageNet with Inception-v3 and
ResNet-101.
Comments: 10 pages, 5 figueres and 3 tables, under review in MIDL 2018
Subjects:
Machine Learning (stat.ML)
; Learning (cs.LG)
In this paper, we propose a novel unsupervised learning method to learn the
brain dynamics using a deep learning architecture named residual D-net. As it
is often the case in medical research, in contrast to typical deep learning
tasks, the size of the resting-state functional Magnetic Resonance Image
(rs-fMRI) datasets for training is limited. Thus, the available data should be
very efficiently used to learn the complex patterns underneath the brain
connectivity dynamics. To address this issue, we use residual connections to
alleviate the training complexity through recurrent multi-scale representation.
We conduct two classification tasks to differentiate early and late stage Mild
Cognitive Impairment (MCI) from Normal healthy Control (NC) subjects. The
experiments verify that our proposed residual D-net indeed learns the brain
connectivity dynamics, leading to significantly higher classification accuracy
compared to previously published techniques.
One-Shot Learning using Mixture of Variational Autoencoders: a Generalization Learning approach
Journal-ref: 17th International Conference on Autonomous Agents and Multiagent
Systems (AAMAS 2018)
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Learning (cs.LG); Machine Learning (stat.ML)
Deep learning, even if it is very successful nowadays, traditionally needs
very large amounts of labeled data to perform excellent on the classification
task. In an attempt to solve this problem, the one-shot learning paradigm,
which makes use of just one labeled sample per class and prior knowledge,
becomes increasingly important. In this paper, we propose a new one-shot
learning method, dubbed MoVAE (Mixture of Variational AutoEncoders), to perform
classification. Complementary to prior studies, MoVAE represents a shift of
paradigm in comparison with the usual one-shot learning methods, as it does not
use any prior knowledge. Instead, it starts from zero knowledge and one labeled
sample per class. Afterward, by using unlabeled data and the generalization
learning concept (in a way, more as humans do), it is capable to gradually
improve by itself its performance. Even more, if there are no unlabeled data
available MoVAE can still perform well in one-shot learning classification. We
demonstrate empirically the efficiency of our proposed approach on three
datasets, i.e. the handwritten digits (MNIST), fashion products
(Fashion-MNIST), and handwritten characters (Omniglot), showing that MoVAE
outperforms state-of-the-art one-shot learning algorithms.
Comments: conference paper
Subjects:
Quantum Physics (quant-ph)
; Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In this paper, we propose a simple neural net that requires only (O(nlog_2k))
numbers of quantum gates and qubits: Here, (n) is the number of input
parameters, and (k) is the number of weights applied to these input parameters
in the proposed neural net. We describe the network in terms of a quantum
circuit, and then draw its equivalent classical neural net which involves
(O(k^n)) nodes in the hidden layer. Then, we show that the network uses a
periodic activation function of cosine values of the linear combinations of the
inputs and weights. The steps of the gradient descent are described, and then
Iris and Breast cancer datasets are used for the numerical simulations. The
numerical results indicate the network can be used in machine learning problems
and it may provide exponential speedup over the same structured classical
neural net.
Comments: To be submitted to SPL
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Learning (cs.LG)
In this paper, we present a class of extremely efficient CNN models called
MobileFaceNets, which use no more than 1 million parameters and specifically
tailored for high-accuracy real-time face verification on mobile and embedded
devices. We also make a simple analysis on the weakness of common mobile
networks for face verification. The weakness has been well overcome by our
specifically designed MobileFaceNets. Under the same experimental conditions,
our MobileFaceNets achieve significantly superior accuracy as well as more than
2 times actual speedup over MobileNetV2. After trained by ArcFace loss on the
refined MS-Celeb-1M from scratch, our single MobileFaceNet model of 4.0MB size
achieves 99.55% face verification accuracy on LFW and 92.59% TAR (FAR1e-6) on
MegaFace Challenge 1, which is even comparable to state-of-the-art big CNN
models of hundreds MB size. The fastest one of our MobileFaceNets has an actual
inference time of 18 milliseconds on a mobile phone. Our experiments on LFW,
AgeDB, and MegaFace show that our MobileFaceNets achieve significantly improved
efficiency compared with the state-of-the-art lightweight and mobile CNNs for
face verification.
Two Use Cases of Machine Learning for SDN-Enabled IP/Optical Networks: Traffic Matrix Prediction and Optical Path Performance Prediction
Gagan Choudhury , David Lynch , Gaurav Thakur , Simon Tse Subjects : Networking and Internet Architecture (cs.NI) ; Learning (cs.LG); Machine Learning (stat.ML)
We describe two applications of machine learning in the context of IP/Optical
networks. The first one allows agile management of resources at a core
IP/Optical network by using machine learning for short-term and long-term
prediction of traffic flows and joint global optimization of IP and optical
layers using colorless/directionless (CD) flexible ROADMs. Multilayer
coordination allows for significant cost savings, flexible new services to meet
dynamic capacity needs, and improved robustness by being able to proactively
adapt to new traffic patterns and network conditions. The second application is
important as we migrate our metro networks to Open ROADM networks, to allow
physical routing without the need for detailed knowledge of optical parameters.
We discuss a proof-of-concept study, where detailed performance data for
wavelengths on a current flexible ROADM network is used for machine learning to
predict the optical performance of each wavelength. Both applications can be
efficiently implemented by using a SDN (Software Defined Network) controller.
Unsupervised Representation Adversarial Learning Network: from Reconstruction to Generation
Yuqian Zhou , Kuangxiao Gu , Thomas Huang Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Learning (cs.LG); Machine Learning (stat.ML)
A good representation for arbitrarily complicated data should have the
capability of semantic generation, clustering and reconstruction. Previous
research has already achieved impressive performance on either one. This paper
aims at learning a disentangled representation effective for all of them in an
unsupervised way. To achieve all the three tasks together, we learn the forward
and inverse mapping between data and representation on the basis of a symmetric
adversarial process. In theory, we minimize the upper bound of the two
conditional entropy loss between the latent variables and the observations
together to achieve the cycle consistency. The newly proposed RepGAN is tested
on MNIST, fashionMNIST, CelebA, and SVHN datasets to perform unsupervised or
semi-supervised classification, generation and reconstruction tasks. The result
demonstrates that RepGAN is able to learn a useful and competitive
representation. To the author’s knowledge, our work is the first one to achieve
both a high unsupervised classification accuracy and low reconstruction error
on MNIST.
Randomized ICA and LDA Dimensionality Reduction Methods for Hyperspectral Image Classification
Comments: Submitted IEEE JSTARS
Subjects:
Machine Learning (stat.ML)
; Learning (cs.LG)
Dimensionality reduction is an important step in processing the hyperspectral
images (HSI) to overcome the curse of dimensionality problem. Linear
dimensionality reduction methods such as Independent component analysis (ICA)
and Linear discriminant analysis (LDA) are commonly employed to reduce the
dimensionality of HSI. These methods fail to capture non-linear dependency in
the HSI data, as data lies in the nonlinear manifold. To handle this, nonlinear
transformation techniques based on kernel methods were introduced for
dimensionality reduction of HSI. However, the kernel methods involve cubic
computational complexity while computing the kernel matrix, and thus its
potential cannot be explored when the number of pixels (samples) are large. In
literature a fewer number of pixels are randomly selected to partial to
overcome this issue, however this sub-optimal strategy might neglect important
information in the HSI. In this paper, we propose randomized solutions to the
ICA and LDA dimensionality reduction methods using Random Fourier features, and
we label them as RFFICA and RFFLDA. Our proposed method overcomes the
scalability issue and to handle the non-linearities present in the data more
efficiently. Experiments conducted with two real-world hyperspectral datasets
demonstrates that our proposed randomized methods outperform the conventional
kernel ICA and kernel LDA in terms overall, per-class accuracies and
computational time.
Comments: Conference paper, 6 pages, 5 figures
Subjects:
Machine Learning (stat.ML)
; Learning (cs.LG)
Importance-weighting is a popular and well-researched technique for dealing
with sample selection bias and covariate shift. It has desirable
characteristics such as unbiasedness, consistency and low computational
complexity. However, weighting can have a detrimental effect on an estimator as
well. In this work, we empirically show that the sampling distribution of an
importance-weighted estimator can be skewed. For sample selection bias
settings, and for small sample sizes, the importance-weighted risk estimator
produces overestimates for datasets in the body of the sampling distribution,
i.e. the majority of cases, and large underestimates for data sets in the tail
of the sampling distribution. These over- and underestimates of the risk lead
to suboptimal regularization parameters when used for importance-weighted
validation.
Comments: 8 pages, 11 figures
Subjects:
Sound (cs.SD)
; Learning (cs.LG); Audio and Speech Processing (eess.AS)
A model of music needs to have the ability to recall past details and have a
clear, coherent understanding of musical structure. Detailed in the paper is a
neural network architecture that predicts and generates polyphonic music
aligned with musical rules. The probabilistic model presented is a Bi-axial
LSTM trained with a kernel reminiscent of a convolutional kernel. When analyzed
quantitatively and qualitatively, this approach performs well in composing
polyphonic music. Link to the code is provided.
Comments: 12 pages, 11 figures, submitted to IEEE Transactions on Wireless Communications
Subjects:
Information Theory (cs.IT)
The mobile edge computing (MEC) has been introduced for providing computing
capabilities at the edge of networks to improve the latency performance of
wireless networks. In this paper, we provide the novel framework for
MEC-enabled heterogeneous networks (HetNets) , composed of the multi-tier
networks with access points (APs) (i.e., MEC servers), which have different
transmission power and different computing capabilities. In this framework, we
also consider multiple-type mobile users with different sizes of computation
tasks, and they offload the tasks to a MEC server, and receive the computation
resulting data from the server. We derive the successful edge computing
probability considering both the computation and communication performance
using the queueing theory and stochastic geometry. We then analyze the effects
of network parameters and bias factors in MEC server association on the
successful edge computing probability. We provide how the optimal bias factors
in terms of successful edge computing probability can be changed according to
the user type and MEC tier, and how they are different to the conventional ones
that did not consider the computing capabilities and task sizes. It is also
shown how the optimal bias factors can be changed when minimizing the mean
latency instead of successful edge computing probability. This study provides
the design insights for the optimal configuration of MEC-enabled HetNets.
Achievable Information Rates for Nonlinear Fiber Communication via End-to-end Autoencoder Learning
Comments: 3 pages, 4 figures, submitted to ECOC 2018
Subjects:
Information Theory (cs.IT)
; Machine Learning (stat.ML)
Machine learning is used to compute achievable information rates (AIRs) for a
simplified fiber channel. The approach jointly optimizes the input distribution
(constellation shaping) and the auxiliary channel distribution to compute AIRs
without explicit channel knowledge in an end-to-end fashion.
Comments: 16 pages, 6 figures, To appear in the IEEE Journal on Selected Areas in Communications
Subjects:
Information Theory (cs.IT)
; Networking and Internet Architecture (cs.NI)
A large-scale content-centric mobile ad hoc network employing
subpacketization is studied in which each mobile node having finite-size cache
moves according to the reshuffling mobility model and requests a content object
from the library independently at random according to the Zipf popularity
distribution. Instead of assuming that one content object is transferred in a
single time slot, we consider a more challenging scenario where the size of
each content object is considerably large and thus only a subpacket of a file
can be delivered during one time slot, which is motivated by a fast mobility
scenario. Under our mobility model, we consider a single-hop-based content
delivery and characterize the fundamental trade-offs between throughput and
delay. The order-optimal throughput-delay trade-off is analyzed by presenting
the following two content reception strategies: the sequential reception for
uncoded caching and the random reception for maximum distance separable
(MDS)-coded caching. We also perform numerical evaluation to validate our
analytical results. In particular, we conduct performance comparisons between
the uncoded caching and the MDS-coded caching strategies by identifying the
regimes in which the performance difference between the two caching strategies
becomes prominent with respect to system parameters such as the Zipf exponent
and the number of subpackets. In addition, we extend our study to the random
walk mobility scenario and show that our main results are essentially the same
as those in the reshuffling mobility model.
Yu Han , Shi Jin , Jun Zhang , Jiayi Zhang , Kai-Kit Wong Subjects : Information Theory (cs.IT)
This paper considers the discrete Fourier transform (DFT) based hybrid
beamforming multiuser system and studies the use of analog beam selection
schemes. We first analyze the uplink ergodic achievable rates of the
zero-forcing (ZF) receiver and the maximum-ratio combining (MRC) receiver under
Ricean fading conditions. We then examine the downlink ergodic achievable rates
for the ZF and maximum-ratio transmitting (MRT) precoders. The long-term and
short-term normalization methods are introduced, which utilize long-term and
instantaneous channel state information (CSI) to implement the downlink power
normalization, respectively. Also, approximations and asymptotic expressions of
both the uplink and downlink rates are obtained, which facilitate the analog
beam selection solutions to maximize the achievable rates. An exhaustive search
provides the optimal results but to reduce the time-consumption, we resort to
the derived rate limits and propose the second selection scheme based on the
projected power of the line-of-sight (LoS) paths. We then combine the
advantages of the two schemes and propose a two-step scheme that achieves near
optimal performances with much less time-consumption than exhaustive search.
Numerical results confirm the analytical results of the ergodic achievable rate
and reveal the effectiveness of the proposed two-step method.
Dynamic Power Splitting for SWIPT with Nonlinear Energy Harvesting in Ergodic Fading Channel
Comments: 15 pages, 4 figures
Subjects:
Information Theory (cs.IT)
We study the dynamic power splitting for simultaneous wireless information
and power transfer (SWIPT) in the ergodic fading channel. Considering the
nonlinearity of practical energy harvesting circuits, we adopt the realistic
nonlinear energy harvesting (EH) model rather than the idealistic linear EH
model. To characterize the optimal rate-energy (R-E) tradeoff, we consider the
problem of maximizing the R-E region, which is nonconvex. We solve this
challenging problem for two different cases of the channel state information
(CSI): (i) when the CSI is known only at the receiver (CSIR case) and (ii) when
the CSI is known at both the transmitter and the receiver (CSIT case). First,
for the case of CSIR, we develop the optimal dynamic power splitting scheme. To
address the complexity issue of the optimal scheme, we also propose a
suboptimal scheme with low complexity. Comparing the proposed schemes to the
existing schemes, we provide various useful and interesting insights into the
dynamic power splitting for the nonlinear EH. Second, we present the optimal
and suboptimal schemes for the case of CSIT, and we obtain further insights.
Numerical results demonstrate that the proposed schemes significantly
outperform the existing schemes and the proposed suboptimal scheme works very
close to the optimal scheme at a much lower complexity.
Comments: 6 pages; conference publication
Subjects:
Information Theory (cs.IT)
; Networking and Internet Architecture (cs.NI)
Quality of service (QoS) provisioning in next-generation mobile
communications systems entails a deep understanding of the delay performance.
The delay in wireless networks is strongly affected by the traffic arrival
process and the service process, which in turn depends on the medium access
protocol and the signal-to-interference-plus-noise ratio (SINR) distribution.
In this work, we characterize the conditional distribution of the service
process given the point process in Poisson bipolar networks. We then provide an
upper bound on the delay violation probability combining tools from stochastic
network calculus and stochastic geometry. Furthermore, we analyze the delay
performance under statistical queueing constraints using the effective capacity
formulation. The impact of QoS requirements, network geometry and link distance
on the delay performance is identified. Our results provide useful insights for
guaranteeing stringent delay requirements in large wireless networks.
Comments: 6pages, 3 figures
Subjects:
Information Theory (cs.IT)
; Social and Information Networks (cs.SI)
Connectivity of wireless sensor networks (WSNs) is a fundamental global
property expected to be maintained even though some sensor nodes are at fault.
In this paper, we investigate the connectivity of random geometric graphs
(RGGs) in the node fault model as an abstract model of ad hoc WSNs with
unreliable nodes. In the model, each node is assumed to be stochastically at
fault, i.e., removed from a graph. As a measure of reliability, the network
breakdown probability is then defined as the average probability that a
resulting survival graph is disconnected over RGGs. We examine RGGs with
general connection functions as an extension of a conventional RGG model and
provide two mathematical analyses: the asymptotic analysis for infinite RGGs
that reveals the phase transition thresholds of connectivity, and the
non-asymptotic analysis for finite RGGs that provides a useful approximation
formula. Those analyses are supported by numerical simulations in the Rayleigh
SISO model reflecting a practical wireless channel.
Luc Le Magoarou (IRT b-com), Stéphane Paquelet (IRT b-com) Subjects : Signal Processing (eess.SP) ; Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
Channel estimation is challenging in multi-antenna communication systems,
because of the large number of parameters to estimate. It is possible to
facilitate this task by using a physical model describing the multiple paths
constituting the channel, in the hope of reducing the number of unknowns in the
problem. Adjusting the number of estimated paths leads to a bias-variance
tradeoff. This paper explores this tradeoff, aiming to find the optimal number
of paths to estimate. Moreover, the approach based on a physical model is
compared to the classical least squares and Bayesian techniques. Finally, the
impact of channel estimation error on the system data rate is assessed.
Matthieu Roy (IRT b-com), Stéphane Paquelet (IRT b-com), Luc Le Magoarou (IRT b-com), Matthieu Crussière (IETR, IRT b-com) Subjects : Networking and Internet Architecture (cs.NI) ; Information Theory (cs.IT)
In a multiple-input-multiple-output (MIMO) communication system, the
multipath fading is averaged over radio links. This well-known channel
hardening phenomenon plays a central role in the design of massive MIMO
systems. The aim of this paper is to study channel hardening using a physical
channel model in which the influences of propagation rays and antenna array
topologies are highlighted. A measure of channel hardening is derived through
the coefficient of variation of the channel gain. Our analyses and closed form
results based on the used physical model are consistent with those of the
literature relying on more abstract Rayleigh fading models, but offer further
insights on the relationship with channel characteristics.
微信扫一扫,关注我爱机器学习公众号
微博:我爱机器学习