Adam Gaier , Alexander Asteroth , Jean-Baptiste Mouret Subjects : Neural and Evolutionary Computing (cs.NE) ; Computational Engineering, Finance, and Science (cs.CE); Machine Learning (stat.ML)
The MAP-Elites algorithm produces a set of high-performing solutions that
vary according to features defined by the user. This technique has the
potential to be a powerful tool for design space exploration, but is limited by
the need for numerous evaluations. The Surrogate-Assisted Illumination
algorithm (SAIL), introduced here, integrates approximative models and
intelligent sampling of the objective function to minimize the number of
evaluations required by MAP-Elites.
The ability of SAIL to efficiently produce both accurate models and diverse
high performing solutions is illustrated on a 2D airfoil design problem. The
search space is divided into bins, each holding a design with a different
combination of features. In each bin SAIL produces a better performing solution
than MAP-Elites, and requires several orders of magnitude fewer evaluations.
The CMA-ES algorithm was used to produce an optimal design in each bin: with
the same number of evaluations required by CMA-ES to find a near-optimal
solution in a single bin, SAIL finds solutions of similar quality in every bin.
Comments: Accepted in DAC 2017
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Artificial Intelligence (cs.AI)
Synapse crossbar is an elementary structure in Neuromorphic Computing Systems
(NCS). However, the limited size of crossbars and heavy routing congestion
impedes the NCS implementations of big neural networks. In this paper, we
propose a two-step framework (namely, extit{group scissor}) to scale NCS
designs to big neural networks. The first step is extit{rank clipping}, which
integrates low-rank approximation into the training to reduce total crossbar
area. The second step is extit{group connection deletion}, which structurally
prunes connections to reduce routing congestion between crossbars. Tested on
convolutional neural networks of extit{LeNet} on MNIST database and
extit{ConvNet} on CIFAR-10 database, our experiments show significant
reduction of crossbar area and routing area in NCS designs. Without accuracy
loss, rank clipping reduces total crossbar area to 13.62/% and 51.81/% in the
NCS designs of extit{LeNet} and extit{ConvNet}, respectively. Following
rank clipping, group connection deletion further reduces the routing area of
extit{LeNet} and extit{ConvNet} to 8.1/% and 52.06/%, respectively.
Comments: 8 pages, 5 figures
Subjects:
Neural and Evolutionary Computing (cs.NE)
Increasing nature-inspired metaheuristic algorithms are applied to solving
the real-world optimization problems, as they have some advantages over the
classical methods of numerical optimization. This paper has proposed a new
nature-inspired metaheuristic called Whale Swarm Algorithm for function
optimization, which is inspired by the whales behavior of communicating with
each other via ultrasound for hunting. The proposed Whale Swarm Algorithm has
been compared with several popular metaheuristic algorithms on comprehensive
performance metrics. According to the experimental results, Whale Swarm
Algorithm has a quite competitive performance when compared with other
algorithms.
Comments: Under review for CVPR 2017. Project webpage: this https URL
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)
We introduce a neural architecture for navigation in novel environments. Our
proposed architecture learns to map from first-person viewpoints and plans a
sequence of actions towards goals in the environment. The Cognitive Mapper and
Planner (CMP) is based on two key ideas: a) a unified joint architecture for
mapping and planning, such that the mapping is driven by the needs of the
planner, and b) a spatial memory with the ability to plan given an incomplete
set of observations about the world. CMP constructs a top-down belief map of
the world and applies a differentiable neural net planner to produce the next
action at each time step. The accumulated belief of the world enables the agent
to track visited regions of the environment. Our experiments demonstrate that
CMP outperforms both reactive strategies and standard memory-based
architectures and performs well in novel environments. Furthermore, we show
that CMP can also achieve semantically specified goals, such as ‘go to a
chair’.
Comments: 10 pages, 10 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Segmenting human left ventricle (LV) in magnetic resonance imaging (MRI)
images and calculating its volume are important for diagnosing cardiac
diseases. In 2016, Kaggle organized a competition to estimate the volume of LV
from MRI images. The dataset consisted of a large number of cases, but only
provided systole and diastole volumes as labels. We designed a system based on
neural networks to solve this problem. It began with a detector combined with a
neural network classifier for detecting regions of interest (ROIs) containing
LV chambers. Then a deep neural network named hypercolumns fully convolutional
network was used to segment LV in ROIs. The 2D segmentation results were
integrated across different images to estimate the volume. With ground-truth
volume labels, this model was trained end-to-end. To improve the result, an
additional dataset with only segmentation label was used. The model was trained
alternately on these two datasets with different types of teaching signals. We
also proposed a variance estimation method for the final prediction. Our
algorithm ranked the 4th on the test set in this competition.
Comments: 8 Pages, 8 Figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
We introduce a novel, accurate and practical system for real-time people
tracking and identification. We used a Kinect V2 sensor for tracking that
generates a body skeleton for up to six people in the view. We perform
identification using both Kinect and passive RFID, by first measuring the
velocity vector of person’s skeleton and of their RFID tag using the position
of the RFID reader antennas as reference points and then finding the best match
between skeletons and tags. We introduce a method for synchronizing Kinect
data, which is captured regularly, with irregular or missing RFID data
readouts. Our experiments show centimeter-level people tracking resolution with
80% average identification accuracy for up to six people in indoor
environments, which meets the needs of many applications. Our system can
preserve user privacy and work with different lighting.
An Efficient Decomposition Framework for Discriminative Segmentation with Supermodular Losses
Jiaqian Yu , Matthew B. Blaschko Subjects : Computer Vision and Pattern Recognition (cs.CV)
Several supermodular losses have been shown to improve the perceptual quality
of image segmentation in a discriminative framework such as a structured output
support vector machine (SVM). These loss functions do not necessarily have the
same structure as the one used by the segmentation inference algorithm, and in
general, we may have to resort to generic submodular minimization algorithms
for loss augmented inference. Although these come with polynomial time
guarantees, they are not practical to apply to image scale data. Many
supermodular losses come with strong optimization guarantees, but are not
readily incorporated in a loss augmented graph cuts procedure. This motivates
our strategy of employing the alternating direction method of multipliers
(ADMM) decomposition for loss augmented inference. In doing so, we create a new
API for the structured SVM that separates the maximum a posteriori (MAP)
inference of the model from the loss augmentation during training. In this way,
we gain computational efficiency, making new choices of loss functions
practical for the first time, while simultaneously making the inference
algorithm employed during training closer to the test time procedure. We show
improvement both in accuracy and computational performance on the Microsoft
Research Grabcut database and a brain structure segmentation task, empirically
validating the use of several supermodular loss functions during training, and
the improved computational properties of the proposed ADMM approach over the
Fujishige-Wolfe minimum norm point algorithm.
Unsupervised temporal context learning using convolutional neural networks for laparoscopic workflow analysis
Sebastian Bodenstedt (1), Martin Wagner (2), Darko Katić (1), Patrick Mietkowski (2), Benjamin Mayer (2), Hannes Kenngott (2), Beat Müller-Stich (2), Rüdiger Dillmann (1), Stefanie Speidel (1) ((1) Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology, Karlsruhe, (2) Department of General, Visceral and Transplant Surgery, University of Heidelberg, Heidelberg) Subjects : Computer Vision and Pattern Recognition (cs.CV)
Computer-assisted surgery (CAS) aims to provide the surgeon with the right
type of assistance at the right moment. Such assistance systems are especially
relevant in laparoscopic surgery, where CAS can alleviate some of the drawbacks
that surgeons incur. For many assistance functions, e.g. displaying the
location of a tumor at the appropriate time or suggesting what instruments to
prepare next, analyzing the surgical workflow is a prerequisite. Since
laparoscopic interventions are performed via endoscope, the video signal is an
obvious sensor modality to rely on for workflow analysis.
Image-based workflow analysis tasks in laparoscopy, such as phase
recognition, skill assessment, video indexing or automatic annotation, require
a temporal distinction between video frames. Generally computer vision based
methods that generalize from previously seen data are used. For training such
methods, large amounts of annotated data are necessary. Annotating surgical
data requires expert knowledge, therefore collecting a sufficient amount of
data is difficult, time-consuming and not always feasible.
In this paper, we address this problem by presenting an unsupervised method
for training a convolutional neural network (CNN) to differentiate between
laparoscopic video frames on a temporal basis. We extract video frames at
regular intervals from 324 unlabeled laparoscopic interventions, resulting in a
dataset of approximately 2.2 million images. From this dataset, we extract
image pairs from the same video and train a CNN to determine their temporal
order. To solve this problem, the CNN has to extract features that are relevant
for comprehending laparoscopic workflow.
Furthermore, we demonstrate that such a CNN can be adapted for surgical
workflow segmentation. We performed image-based workflow segmentation on a
publicly available dataset of 7 cholecystectomies and 9 colorectal
interventions.
Comments: 14 pages
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Underwater cameras are widely used to observe the sea floor. They are usually
included in autonomous underwater vehicles, unmanned underwater vehicles, and
in situ ocean sensor networks. Despite being an important sensor for monitoring
underwater scenes, there exist many issues with recent underwater camera
sensors. Because of lights transportation characteristics in water and the
biological activity at the sea floor, the acquired underwater images often
suffer from scatters and large amounts of noise. Over the last five years, many
methods have been proposed to overcome traditional underwater imaging problems.
This paper aims to review the state-of-the-art techniques in underwater image
processing by highlighting the contributions and challenges presented in over
40 papers. We present an overview of various underwater image processing
approaches, such as underwater image descattering, underwater image color
restoration, and underwater image quality assessments. Finally, we summarize
the future trends and challenges in designing and processing underwater imaging
sensors.
Comments: 19 pages
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
As a result of several successful applications in computer vision and image
processing, sparse representation (SR) has attracted significant attention in
multi-sensor image fusion. Unlike the traditional multiscale transforms (MSTs)
that presume the basis functions, SR learns an over-complete dictionary from a
set of training images for image fusion, and it achieves more stable and
meaningful representations of the source images. By doing so, the SR-based
fusion methods generally outperform the traditional MST-based image fusion
methods in both subjective and objective tests. In addition, they are less
susceptible to mis-registration among the source images, thus facilitating the
practical applications. This survey paper proposes a systematic review of the
SR-based multi-sensor image fusion literature, highlighting the pros and cons
of each category of approaches. Specifically, we start by performing a
theoretical investigation of the entire system from three key algorithmic
aspects, (1) sparse representation models; (2) dictionary learning methods; and
(3) activity levels and fusion rules. Subsequently, we show how the existing
works address these scientific problems and design the appropriate fusion rules
for each application, such as multi-focus image fusion and multi-modality
(e.g., infrared and visible) image fusion. At last, we carry out some
experiments to evaluate the impact of these three algorithmic components on the
fusion performance when dealing with different applications. This article is
expected to serve as a tutorial and source of reference for researchers
preparing to enter the field or who desire to employ the sparse representation
theory in other fields.
Ryo Takahashi , Takashi Matsubara , Kuniaki Uehara Subjects : Computer Vision and Pattern Recognition (cs.CV)
Convolutional neural networks (CNNs) have demonstrated remarkable results in
image classification tasks for benchmark and practical uses. The CNNs with
deeper architectures have achieved higher performances thanks to their numerous
parameters and resulting high expression ability recently. However, the CNNs
have a problem of limited robustness to geometric transformation of objects in
images such as scaling and rotation. This problem is considered to limit
performance improvement of the deep CNNs but there is no established solution.
This study focuses on scale transformation and proposes a novel network
architecture called weight-shared multi-stage network (WSMS-Net), which enables
the existing deep CNNs, such as ResNet and DenseNet, to acquire robustness to
scaling of objects. The WSMS-Net architecture consists of multiple stages of
CNNs and is easily combined with existing deep CNNs. This study demonstrates
that existing deep CNNs combined the proposed WSMS-Net archive higher accuracy
for image classification tasks only with little increase in the number of
parameters.
Comments: 10 pages, 5 figures
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
State-of-the-art methods for 3D hand pose estimation from depth images
require large amounts of annotated training data. We propose to model the
statistical relationships of 3D hand poses and corresponding depth images using
two deep generative models with a shared latent space. By design, our
architecture allows for learning from unlabeled image data in a semi-supervised
manner. Assuming a one-to-one mapping between a pose and a depth map, any given
point in the shared latent space can be projected into both a hand pose and a
corresponding depth map. Regressing the hand pose can then be done by learning
a discriminator to estimate the posterior of the latent pose given some depth
map. To improve generalization and to better exploit unlabeled depth maps, we
jointly train a generator and a discriminator. At each iteration, the generator
is updated with the back-propagated gradient from the discriminator to
synthesize realistic depth maps of the articulated hand, while the
discriminator benefits from an augmented training set of synthesized and
unlabeled samples. The proposed discriminator network architecture is highly
efficient and runs at 90 FPS on the CPU with accuracies comparable or better
than state-of-art on 3 publicly available benchmarks.
Comments: 10 pages, 10 figures, submitted to ICIP2017 (extension version)
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
This paper proposes an extension to the Generative Adversarial Networks
(GANs), namely as ARTGAN to synthetically generate more challenging and complex
images such as artwork that have abstract characteristics. This is in contrast
to most of the current solutions that focused on generating natural images such
as room interiors, birds, flowers and faces. The key innovation of our work is
to allow back-propagation of the loss function w.r.t. the labels (randomly
assigned to each generated images) to the generator from the discriminator.
With the feedback from the label information, the generator is able to learn
faster and achieve better generated image quality. Empirically, we show that
the proposed ARTGAN is capable to create realistic artwork, as well as generate
compelling real world images that globally look natural with clear shape on
CIFAR-10.
Reverse Classification Accuracy: Predicting Segmentation Performance in the Absence of Ground Truth
Comments: Accepted article to appear in IEEE Transactions on Medical Imaging 2017
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
When integrating computational tools such as automatic segmentation into
clinical practice, it is of utmost importance to be able to assess the level of
accuracy on new data, and in particular, to detect when an automatic method
fails. However, this is difficult to achieve due to absence of ground truth.
Segmentation accuracy on clinical data might be different from what is found
through cross-validation because validation data is often used during
incremental method development, which can lead to overfitting and unrealistic
performance expectations. Before deployment, performance is quantified using
different metrics, for which the predicted segmentation is compared to a
reference segmentation, often obtained manually by an expert. But little is
known about the real performance after deployment when a reference is
unavailable. In this paper, we introduce the concept of reverse classification
accuracy (RCA) as a framework for predicting the performance of a segmentation
method on new data. In RCA we take the predicted segmentation from a new image
to train a reverse classifier which is evaluated on a set of reference images
with available ground truth. The hypothesis is that if the predicted
segmentation is of good quality, then the reverse classifier will perform well
on at least some of the reference images. We validate our approach on
multi-organ segmentation with different classifiers and segmentation methods.
Our results indicate that it is indeed possible to predict the quality of
individual segmentations, in the absence of ground truth. Thus, RCA is ideal
for integration into automatic processing pipelines in clinical routine and as
part of large-scale image analysis studies.
Comments: Submitted for ICIP 2017
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
This paper presents a novel automatic face recognition approach based on
local binary patterns (LBP). LBP descriptor considers a local neighbourhood of
a pixel to compute the features. This method is not very robust to handle image
noise, variances and different illumination conditions. In this paper, we
address these issues and extend the original LBP operator by considering more
pixels and different neighbourhoods to compute the feature vector. The proposed
method is evaluated on two benchmark corpora, namely UFI and FERET face
datasets. We experimentally show that our approach is very efficient because it
significantly outperforms several other state-of-the-art methods and is
efficient particularly in the real conditions where the above mentioned issues
are obvious.
Amarjot Singh , Nick Kingsbury Subjects : Computer Vision and Pattern Recognition (cs.CV)
This paper introduces a Deep Scattering network that utilizes Dual-Tree
complex wavelets to extract translation invariant representations from an input
signal. The computationally efficient Dual-Tree wavelets decompose the input
signal into densely spaced representations over scales. Translation invariance
is introduced in the representations by applying a non-linearity over a region
followed by averaging. The discriminatory information in the densely spaced,
locally smooth, signal representations aids the learning of the classifier. The
proposed network is shown to outperform Mallat’s ScatterNet on four datasets
with different modalities on classification accuracy.
Distributed Mapping with Privacy and Communication Constraints: Lightweight Algorithms and Object-based Models
Comments: preprint for IJRR submission
Subjects:
Robotics (cs.RO)
; Computer Vision and Pattern Recognition (cs.CV)
We consider the following problem: a team of robots is deployed in an unknown
environment and it has to collaboratively build a map of the area without a
reliable infrastructure for communication. The backbone for modern mapping
techniques is pose graph optimization, which estimates the trajectory of the
robots, from which the map can be easily built. The first contribution of this
paper is a set of distributed algorithms for pose graph optimization: rather
than sending all sensor data to a remote sensor fusion server, the robots
exchange very partial and noisy information to reach an agreement on the pose
graph configuration. Our approach can be considered as a distributed
implementation of the two-stage approach of Carlone et al., where we use the
Successive Over-Relaxation (SOR) and the Jacobi Over-Relaxation (JOR) as
workhorses to split the computation among the robots. As a second contribution,
we extend %and demonstrate the applicability of the proposed distributed
algorithms to work with object-based map models. The use of object-based models
avoids the exchange of raw sensor measurements (e.g., point clouds) further
reducing the communication burden. Our third contribution is an extensive
experimental evaluation of the proposed techniques, including tests in
realistic Gazebo simulations and field experiments in a military test facility.
Abundant experimental evidence suggests that one of the proposed algorithms
(the Distributed Gauss-Seidel method or DGS) has excellent performance. The DGS
requires minimal information exchange, has an anytime flavor, scales well to
large teams, is robust to noise, and is easy to implement. Our field tests show
that the combined use of our distributed algorithms and object-based models
reduces the communication requirements by several orders of magnitude and
enables distributed mapping with large teams of robots in real-world problems.
Comments: 7
Subjects:
Artificial Intelligence (cs.AI)
; Computation and Language (cs.CL)
Natural language sentence matching is a fundamental technology for a variety
of tasks. Previous approaches either match sentences from a single direction or
only apply single granular (word-by-word or sentence-by-sentence) matching. In
this work, we propose a bilateral multi-perspective matching (BiMPM) model
under the “matching-aggregation” framework. Given two sentences (P) and (Q),
our model first encodes them with a BiLSTM encoder. Next, we match the two
encoded sentences in two directions (P
ightarrow Q) and (P leftarrow Q). In
each matching direction, each time step of one sentence is matched against all
time-steps of the other sentence from multiple perspectives. Then, another
BiLSTM layer is utilized to aggregate the matching results into a fix-length
matching vector. Finally, based on the matching vector, the decision is made
through a fully connected layer. We evaluate our model on three tasks:
paraphrase identification, natural language inference and answer sentence
selection. Experimental results on standard benchmark datasets show that our
model achieves the state-of-the-art performance on all tasks.
Mieczysław A.Kłopotek , Sławomir T.Wierzchom , Robert A. Kłopotek , Elżbieta A. Kłopotek Subjects : Artificial Intelligence (cs.AI)
In a former paper we simplified the proof of a theorem on personalized random
walk that is fundamental to graph nodes clustering and generalized it to
bipartite graphs for a specific case where the proobability of random jump was
proprtional to the number of links of “personally prefereed” nodes. In this
paper we turn to the more complex issue of graphs in which the random jump
follows uniform distribution.
Mieczysław Kłopotek Subjects : Artificial Intelligence (cs.AI)
This paper investigates the application of consensus clustering and
meta-clustering to the set of all possible partitions of a data set. We show
that when using a “complement” of Rand Index as a measure of cluster
similarity, the total-separation partition, putting each element in a separate
set, is chosen.
Comments: 27 pages, 5 figures, 11 tables
Subjects:
Artificial Intelligence (cs.AI)
The lack of diversity in a genetic algorithm’s population may lead to a bad
performance of the genetic operators since there is not an equilibrium between
exploration and exploitation. In those cases, genetic algorithms present a fast
and unsuitable convergence.
In this paper we develop a novel hybrid genetic algorithm which attempts to
obtain a balance between exploration and exploitation. It confronts the
diversity problem using the named greedy diversification operator. Furthermore,
the proposed algorithm applies a competition between parent and children so as
to exploit the high quality visited solutions. These operators are complemented
by a simple selection mechanism designed to preserve and take advantage of the
population diversity.
Additionally, we extend our proposal to the field of memetic algorithms,
obtaining an improved model with outstanding results in practice.
The experimental study shows the validity of the approach as well as how
important is taking into account the exploration and exploitation concepts when
designing an evolutionary algorithm.
Benedikt Bünz , Matthew Lamm Subjects : Artificial Intelligence (cs.AI)
In this paper we explore whether or not deep neural architectures can learn
to classify Boolean satisfiability (SAT). We devote considerable time to
discussing the theoretical properties of SAT. Then, we define a graph
representation for Boolean formulas in conjunctive normal form, and train
neural classifiers over general graph structures called Graph Neural Networks,
or GNNs, to recognize features of satisfiability. To the best of our knowledge
this has never been tried before. Our preliminary findings are potentially
profound. In a weakly-supervised setting, that is, without problem specific
feature engineering, Graph Neural Networks can learn features of
satisfiability.
Qi Lei , Jinfeng Yi , Roman Vaculin , Lingfei Wu , Inderjit S. Dhillon Subjects : Artificial Intelligence (cs.AI) ; Learning (cs.LG)
A considerable amount of machine learning algorithms take matrices as their
inputs. As such, they cannot directly analyze time series data due to its
temporal nature, usually unequal lengths, and complex properties. This is a
great pity since many of these algorithms are effective, robust, efficient, and
easy to use. In this paper, we bridge this gap by proposing an efficient
representation learning framework that is able to convert a set of time series
with equal or unequal lengths to a matrix format. In particular, we guarantee
that the pairwise similarities between time series are well preserved after the
transformation. Therefore, the learned feature representation is particularly
suitable to the class of learning problems that are sensitive to data
similarities. Given a set of (n) time series, we first construct an (n imes n)
partially observed similarity matrix by randomly sampling (O(n log n)) pairs
of time series and computing their pairwise similarities. We then propose an
extremely efficient algorithm that solves a highly non-convex and NP-hard
problem to learn new features based on the partially observed similarity
matrix. We use the learned features to conduct experiments on both data
classification and clustering tasks. Our extensive experimental results
demonstrate that the proposed framework is both effective and efficient.
Karan Goel , Shreya Rajpal , Mausam Subjects : Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC); Multiagent Systems (cs.MA)
Managing micro-tasks on crowdsourcing marketplaces involves balancing
conflicting objectives — the quality of work, total cost incurred and time to
completion. Previous agents have focused on cost-quality, or cost-time
tradeoffs, limiting their real-world applicability. As a step towards this goal
we present Octopus, the first AI agent that jointly manages all three
objectives in tandem. Octopus is based on a computationally tractable,
multi-agent formulation consisting of three components; one that sets the price
per ballot to adjust the rate of completion of tasks, another that optimizes
each task for quality and a third that performs task selection. We demonstrate
that Octopus outperforms existing state-of-the-art approaches in simulation and
experiments with real data, demonstrating its superior performance. We also
deploy Octopus on Amazon Mechanical Turk to establish its ability to manage
tasks in a real-world, dynamic setting.
Comments: Report version of AI Journal article Best-first fixed-depth minimax algorithms 1996
Subjects:
Artificial Intelligence (cs.AI)
This paper has three main contributions to our understanding of fixed-depth
minimax search: (A) A new formulation for Stockman’s SSS* algorithm, based on
Alpha-Beta, is presented. It solves all the perceived drawbacks of SSS*,
finally transforming it into a practical algorithm. In effect, we show that
SSS* = alpha-beta + ransposition tables. The crucial step is the realization
that transposition tables contain so-called solution trees, structures that are
used in best-first search algorithms like SSS*. Having created a practical
version, we present performance measurements with tournament game-playing
programs for three different minimax games, yielding results that contradict a
number of publications. (B) Based on the insights gained in our attempts at
understanding SSS*, we present a framework that facilitates the construction of
several best-first fixed- depth game-tree search algorithms, known and new. The
framework is based on depth-first null-window Alpha-Beta search, enhanced with
storage to allow for the refining of previous search results. It focuses
attention on the essential differences between algorithms. (C) We present a new
instance of the framework, MTD(f). It is well-suited for use with iterative
deepening, and performs better than algorithms that are currently used in most
state-of-the-art game-playing programs. We provide experimental evidence to
explain why MTD(f) performs better than the other fixed-depth minimax
algorithms.
Comments: Under review for CVPR 2017. Project webpage: this https URL
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)
We introduce a neural architecture for navigation in novel environments. Our
proposed architecture learns to map from first-person viewpoints and plans a
sequence of actions towards goals in the environment. The Cognitive Mapper and
Planner (CMP) is based on two key ideas: a) a unified joint architecture for
mapping and planning, such that the mapping is driven by the needs of the
planner, and b) a spatial memory with the ability to plan given an incomplete
set of observations about the world. CMP constructs a top-down belief map of
the world and applies a differentiable neural net planner to produce the next
action at each time step. The accumulated belief of the world enables the agent
to track visited regions of the environment. Our experiments demonstrate that
CMP outperforms both reactive strategies and standard memory-based
architectures and performs well in novel environments. Furthermore, we show
that CMP can also achieve semantically specified goals, such as ‘go to a
chair’.
Comments: Accepted to conference track at ICLR 2017
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Usually bilingual word vectors are trained “online”. Mikolov et al. showed
they can also be found “offline”, whereby two pre-trained embeddings are
aligned with a linear transformation, using dictionaries compiled from expert
knowledge. In this work, we prove that the linear transformation between two
spaces should be orthogonal. This transformation can be obtained using the
singular value decomposition. We introduce a novel “inverted softmax” for
identifying translation pairs, with which we improve the precision @1 of
Mikolov’s original mapping from 34% to 43%, when translating a test set
composed of both common and rare English words into Italian. Orthogonal
transformations are more robust to noise, enabling us to learn the
transformation without expert bilingual signal by constructing a
“pseudo-dictionary” from the identical character strings which appear in both
languages, achieving 40% precision on the same test set. Finally, we extend our
method to retrieve the true translations of English sentences from a corpus of
200k Italian sentences with a precision @1 of 68%.
Stefano Nichele , Magnus S. Gundersen Subjects : Emerging Technologies (cs.ET) ; Artificial Intelligence (cs.AI)
The Reservoir Computing (RC) paradigm utilizes a dynamical system, i.e., a
reservoir, and a linear classifier, i.e., a read-out layer, to process data
from sequential classification tasks. In this paper the usage of Cellular
Automata (CA) as a reservoir is investigated. The use of CA in RC has been
showing promising results. In this paper, selected state-of-the-art experiments
are reproduced. It is shown that some CA-rules perform better than others, and
the reservoir performance is improved by increasing the size of the CA
reservoir itself. In addition, the usage of parallel loosely coupled
CA-reservoirs, where each reservoir has a different CA-rule, is investigated.
The experiments performed on quasi-uniform CA reservoir provide valuable
insights in CA reservoir design. The results herein show that some rules do not
work well together, while other combinations work remarkably well. This
suggests that non-uniform CA could represent a powerful tool for novel CA
reservoir implementations.
Patrick Glauner , Angelo Migliosi , Jorge Meira , Eric Aislan Antonelo , Petko Valtchev , Radu State , Franck Bettinger Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Non-technical losses (NTL) occur during the distribution of electricity in
power grids and include, but are not limited to, electricity theft and faulty
meters. In emerging countries, they may range up to 40% of the total
electricity distributed. In order to detect NTLs, machine learning methods are
used that learn irregular consumption patterns from customer data and
inspection results. The Big Data paradigm followed in modern machine learning
reflects the desire of deriving better conclusions from simply analyzing more
data, without the necessity of looking at theory and models. However, the
sample of inspected customers may be biased, i.e. it does not represent the
population of all customers. As a consequence, machine learning models trained
on these inspection results are biased as well and therefore lead to unreliable
predictions of whether customers cause NTL or not. In machine learning, this
issue is called covariate shift and has not been addressed in the literature on
NTL detection yet. In this work, we present a novel framework for quantifying
and visualizing covariate shift. We apply it to a commercial data set from
Brazil that consists of 3.6M customers and 820K inspection results. We show
that some features have a stronger covariate shift than others, making
predictions less reliable. In particular, previous inspections were focused on
certain neighborhoods or customer classes and that they were not sufficiently
spread among the population of customers. This framework is about to be
deployed in a commercial product for NTL detection.
Comments: Accepted in DAC 2017
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Artificial Intelligence (cs.AI)
Synapse crossbar is an elementary structure in Neuromorphic Computing Systems
(NCS). However, the limited size of crossbars and heavy routing congestion
impedes the NCS implementations of big neural networks. In this paper, we
propose a two-step framework (namely, extit{group scissor}) to scale NCS
designs to big neural networks. The first step is extit{rank clipping}, which
integrates low-rank approximation into the training to reduce total crossbar
area. The second step is extit{group connection deletion}, which structurally
prunes connections to reduce routing congestion between crossbars. Tested on
convolutional neural networks of extit{LeNet} on MNIST database and
extit{ConvNet} on CIFAR-10 database, our experiments show significant
reduction of crossbar area and routing area in NCS designs. Without accuracy
loss, rank clipping reduces total crossbar area to 13.62/% and 51.81/% in the
NCS designs of extit{LeNet} and extit{ConvNet}, respectively. Following
rank clipping, group connection deletion further reduces the routing area of
extit{LeNet} and extit{ConvNet} to 8.1/% and 52.06/%, respectively.
Comments: Accepted to conference track at ICLR 2017
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Usually bilingual word vectors are trained “online”. Mikolov et al. showed
they can also be found “offline”, whereby two pre-trained embeddings are
aligned with a linear transformation, using dictionaries compiled from expert
knowledge. In this work, we prove that the linear transformation between two
spaces should be orthogonal. This transformation can be obtained using the
singular value decomposition. We introduce a novel “inverted softmax” for
identifying translation pairs, with which we improve the precision @1 of
Mikolov’s original mapping from 34% to 43%, when translating a test set
composed of both common and rare English words into Italian. Orthogonal
transformations are more robust to noise, enabling us to learn the
transformation without expert bilingual signal by constructing a
“pseudo-dictionary” from the identical character strings which appear in both
languages, achieving 40% precision on the same test set. Finally, we extend our
method to retrieve the true translations of English sentences from a corpus of
200k Italian sentences with a precision @1 of 68%.
Comments: Accepted to conference track at ICLR 2017
Subjects:
Computation and Language (cs.CL)
; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Usually bilingual word vectors are trained “online”. Mikolov et al. showed
they can also be found “offline”, whereby two pre-trained embeddings are
aligned with a linear transformation, using dictionaries compiled from expert
knowledge. In this work, we prove that the linear transformation between two
spaces should be orthogonal. This transformation can be obtained using the
singular value decomposition. We introduce a novel “inverted softmax” for
identifying translation pairs, with which we improve the precision @1 of
Mikolov’s original mapping from 34% to 43%, when translating a test set
composed of both common and rare English words into Italian. Orthogonal
transformations are more robust to noise, enabling us to learn the
transformation without expert bilingual signal by constructing a
“pseudo-dictionary” from the identical character strings which appear in both
languages, achieving 40% precision on the same test set. Finally, we extend our
method to retrieve the true translations of English sentences from a corpus of
200k Italian sentences with a precision @1 of 68%.
Comments: To appear in EACL 2017 (short papers)
Subjects:
Computation and Language (cs.CL)
We explore the problem of translating speech to text in low-resource
scenarios where neither automatic speech recognition (ASR) nor machine
translation (MT) are available, but we have training data in the form of audio
paired with text translations. We present the first system for this problem
applied to a realistic multi-speaker dataset, the CALLHOME Spanish-English
speech translation corpus. Our approach uses unsupervised term discovery (UTD)
to cluster repeated patterns in the audio, creating a pseudotext, which we pair
with translations to create a parallel text and train a simple bag-of-words MT
model. We identify the challenges faced by the system, finding that the
difficulty of cross-speaker UTD results in low recall, but that our system is
still able to correctly translate some content words in test data.
Daniele Bonadiman , Antonio Uva , Alessandro Moschitti Subjects : Computation and Language (cs.CL)
In this paper, we developed a deep neural network (DNN) that learns to solve
simultaneously the three tasks of the cQA challenge proposed by the
SemEval-2016 Task 3, i.e., question-comment similarity, question-question
similarity and new question-comment similarity. The latter is the main task,
which can exploit the previous two for achieving better results. Our DNN is
trained jointly on all the three cQA tasks and learns to encode questions and
comments into a single vector representation shared across the multiple tasks.
The results on the official challenge test set show that our approach produces
higher accuracy and faster convergence rates than the individual neural
networks. Additionally, our method, which does not use any manual feature
engineering, approaches the state of the art established with methods that make
heavy use of it.
Comments: 6 pages, 1 figure, Thirtieth AAAI Conference on Artificial Intelligence. 2016
Subjects:
Computation and Language (cs.CL)
Agglutinative languages such as Turkish, Finnish and Hungarian require
morphological disambiguation before further processing due to the complex
morphology of words. A morphological disambiguator is used to select the
correct morphological analysis of a word. Morphological disambiguation is
important because it generally is one of the first steps of natural language
processing and its performance affects subsequent analyses. In this paper, we
propose a system that uses deep learning techniques for morphological
disambiguation. Many of the state-of-the-art results in computer vision, speech
recognition and natural language processing have been obtained through deep
learning models. However, applying deep learning techniques to morphologically
rich languages is not well studied. In this work, while we focus on Turkish
morphological disambiguation we also present results for French and German in
order to show that the proposed architecture achieves high accuracy with no
language-specific feature engineering or additional resource. In the
experiments, we achieve 84.12, 88.35 and 93.78 morphological disambiguation
accuracy among the ambiguous words for Turkish, German and French respectively.
Akiko Eriguchi , Yoshimasa Tsuruoka , Kyunghyun Cho Subjects : Computation and Language (cs.CL)
There has been relatively little attention to incorporating linguistic prior
to neural machine translation. Much of the previous work was further
constrained to considering linguistic prior on the source side. In this paper,
we propose a hybrid model, called NMT+RG, that learns to parse and translate by
combining the recurrent neural network grammar into the attention-based neural
machine translation. Our approach encourages the neural machine translation
model to incorporate linguistic prior during training, and lets it translate on
its own afterward. Extensive experiments with four language pairs show the
effectiveness of the proposed NMT+RG.
Ehsan Sherkat , Evangelos Milios Subjects : Computation and Language (cs.CL)
Using deep learning for different machine learning tasks such as image
classification and word embedding has recently gained many attentions. Its
appealing performance reported across specific Natural Language Processing
(NLP) tasks in comparison with other approaches is the reason for its
popularity. Word embedding is the task of mapping words or phrases to a low
dimensional numerical vector. In this paper, we use deep learning to embed
Wikipedia Concepts and Entities. The English version of Wikipedia contains more
than five million pages, which suggest its capability to cover many English
Entities, Phrases, and Concepts. Each Wikipedia page is considered as a
concept. Some concepts correspond to entities, such as a person’s name, an
organization or a place. Contrary to word embedding, Wikipedia Concepts
Embedding is not ambiguous, so there are different vectors for concepts with
similar surface form but different mentions. We proposed several approaches and
evaluated their performance based on Concept Analogy and Concept Similarity
tasks. The results show that proposed approaches have the performance
comparable and in some cases even higher than the state-of-the-art methods.
Comments: 2016 IEEE Workshop on Spoken Language Technology
Subjects:
Computation and Language (cs.CL)
; Learning (cs.LG)
Recently, machine learning methods have provided a broad spectrum of original
and efficient algorithms based on Deep Neural Networks (DNN) to automatically
predict an outcome with respect to a sequence of inputs. Recurrent hidden cells
allow these DNN-based models to manage long-term dependencies such as Recurrent
Neural Networks (RNN) and Long Short-Term Memory (LSTM). Nevertheless, these
RNNs process a single input stream in one (LSTM) or two (Bidirectional LSTM)
directions. But most of the information available nowadays is from multistreams
or multimedia documents, and require RNNs to process these information
synchronously during the training. This paper presents an original LSTM-based
architecture, named Parallel LSTM (PLSTM), that carries out multiple parallel
synchronized input sequences in order to predict a common output. The proposed
PLSTM method could be used for parallel sequence classification purposes. The
PLSTM approach is evaluated on an automatic telecast genre sequences
classification task and compared with different state-of-the-art architectures.
Results show that the proposed PLSTM method outperforms the baseline n-gram
models as well as the state-of-the-art LSTM approach.
Walid Shalaby , Wlodek Zadrozny Subjects : Computation and Language (cs.CL)
Explicit concept space models have proven efficacy for text representation in
many natural language and text mining applications. The idea is to embed
textual structures into a semantic space of concepts which captures the main
topics of these structures. That so called bag-of-concepts representation
suffers from data sparsity causing low similarity scores between similar texts
due to low concept overlap. In this paper we propose two neural embedding
models in order to learn continuous concept vectors. Once learned, we propose
an efficient vector aggregation method to generate fully dense bag-of-concepts
representations. Empirical results on a benchmark dataset for measuring entity
semantic relatedness show superior performance over other concept embedding
models. In addition, by utilizing our efficient aggregation method, we
demonstrate the effectiveness of the densified vector representation over the
typical sparse representations for dataless classification where we can achieve
at least same or better accuracy with much less dimensions.
Comments: This a draft version of the paper. We welcome any comments you may have regarding the content and presentation
Subjects:
Computation and Language (cs.CL)
Many language technology applications would benefit from the ability to
represent negation and its scope on top of widely-used linguistic resources. In
this paper, we investigate the possibility of obtaining a first-order logic
representation with negation scope marked using Universal Dependencies. To do
so, we enhance UDepLambda, a framework that converts dependency graphs to
logical forms. The resulting UDepLambda(lnot) is able to handle phenomena
related to scope by means of an higher-order type theory, relevant not only to
negation but also to universal quantification and other complex semantic
phenomena. The initial conversion we did for English is promising, in that one
can represent the scope of negation also in the presence of more complex
phenomena such as universal quantifiers.
Comments: 7
Subjects:
Artificial Intelligence (cs.AI)
; Computation and Language (cs.CL)
Natural language sentence matching is a fundamental technology for a variety
of tasks. Previous approaches either match sentences from a single direction or
only apply single granular (word-by-word or sentence-by-sentence) matching. In
this work, we propose a bilateral multi-perspective matching (BiMPM) model
under the “matching-aggregation” framework. Given two sentences (P) and (Q),
our model first encodes them with a BiLSTM encoder. Next, we match the two
encoded sentences in two directions (P
ightarrow Q) and (P leftarrow Q). In
each matching direction, each time step of one sentence is matched against all
time-steps of the other sentence from multiple perspectives. Then, another
BiLSTM layer is utilized to aggregate the matching results into a fix-length
matching vector. Finally, based on the matching vector, the decision is made
through a fully connected layer. We evaluate our model on three tasks:
paraphrase identification, natural language inference and answer sentence
selection. Experimental results on standard benchmark datasets show that our
model achieves the state-of-the-art performance on all tasks.
Comments: 2 pages, 1 figure, 1 table, IEEE Letter
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
The advent of High Performance Computing (HPC) has provided the computational
capacity required for power system operators (SO) to obtain solutions in the
least time to highly-complex applications, i.e., Unit Commitment (UC). The UC
problem, which attempts to schedule the least-cost combination of generating
units to meet the load, is increasing in complexity and problem size due to
deployments of renewable resources and smart grid technologies. The current
approach to solving the UC problem consists of in-house HPC infrastructures,
which experience issues at scale, and demands high maintenance and capital
expenditures. On the other hand, cloud computing is an ideal substitute due to
its powerful computational capacity, rapid scalability, and high
cost-effectiveness. In this work, the benefits and challenges of outsourcing
the UC application to the cloud are explored. A quantitative analysis of the
computational performance gain is explored for a large-scale UC problem solved
on the cloud and compared to traditional in-house HPC infrastructure. The
results show substantial reduction in solve time when outsourced to the cloud.
Iqra Altaf Gillani , Pooja Vyavahare , Amitabha Bagchi Subjects : Distributed, Parallel, and Cluster Computing (cs.DC) ; Networking and Internet Architecture (cs.NI)
We study the in-network computation of arbitrary functions whose computation
schema is a complete binary tree, i.e., we assume that the network wants to
compute a function of (K) operands, each of which is available at a distinct
node in the network, and rather than simply collecting the (K) operands at a
single sink node and computing the function, we want to compute the function
during the process of moving the data towards the sink. Such settings have been
studied in the literature but largely only for symmetric functions, e.g.
average, parity etc., which have the specific property that the output is
invariant to permutations of the operands. To the best of our knowledge, we
present the first decentralised algorithms for arbitrary functions. We propose
two algorithms, Fixed Metropolis-Compute and Flexible Metropolis-Compute, for
this problem, both of which use random walks on the network as their basic
primitive. Assuming that time is slotted we provide upper bounds on time taken
to compute the function, characterising this time in terms of the fundamental
parameters of the random walk on a graph: the hitting time in the case of Fixed
Metropolis-Compute, and the mixing time in the case of Flexible
Metropolis-Compute. Assuming a stochastic model for the generation of streams
of data at each source, we also provide lower and upper bound on the rate at
which Fixed Metropolis-Compute is able to compute the stream of associated
function values.
Comments: 6 pages, 3 figures, 1 table
Journal-ref: Proc. Euromicro Symposium on Digital Systems Design –
Architectures, Methods and Tools, September 4-6 2001, Warsaw, Poland, pp.
152-157
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
The aim of this paper is to demonstrate how the COSMA environment can be used
for system modeling. This environment is a set of tools based on Concurrent
State Machines paradigm and is developed in the Institute of Computer Science
at the Warsaw University of Technology. Our demonstration example is a
distributed brake control system dedicated for a railway transport. The paper
shortly introduces COSMA. Next it shows how the example model can be validated
by our temporal logic analyzer.
Barun Gorain , Andrzej Pelc Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)
Leader election is a basic symmetry breaking problem in distributed
computing. All nodes of a network have to agree on a single node, called the
leader. If the nodes of the network have distinct labels, then agreeing on a
single node means that all nodes have to output the label of the elected
leader.
If the nodes are anonymous, the task of leader election is formulated as
follows: every node of the network must output a simple path starting at it,
which is coded as a sequence of port numbers, such that all these paths end at
a common node, the leader. In this paper, we study deterministic leader
election in anonymous trees.
Our goal is to establish tradeoffs between the allocated time ( au) and the
amount of information that has to be given {em a priori} to the nodes of a
network to enable leader election in time ( au). Following the framework of
{em algorithms with advice}, this information is provided to all nodes at the
start by an oracle knowing the entire tree, in form of binary strings assigned
to all nodes. There are two possible variants of formulating this advice
assignment. Either the strings provided to all nodes are identical, or strings
assigned to different nodes may be potentially different, i.e., advice can be
{em customized}. As opposed to previous papers on leader election with advice,
in this paper we consider the latter option.
The maximum length of all assigned binary strings is called the {em size of
advice}.
For a given time ( au) allocated to leader election, we give upper and lower
bounds on the minimum size of advice sufficient to perform leader election in
time ( au). All our bounds except one pair are tight up to multiplicative
constants, and in this one exceptional case, the gap between the upper and the
lower bound is very small.
Matthias Fischer , Daniel Jung , Friedhelm Meyer auf der Heide Subjects : Distributed, Parallel, and Cluster Computing (cs.DC) ; Multiagent Systems (cs.MA); Robotics (cs.RO)
We consider a swarm of (n) autonomous mobile robots, distributed on a
2-dimensional grid. A basic task for such a swarm is the gathering process: all
robots have to gather at one (not predefined) place. The work in this paper is
motivated by the following insight: On one side, for swarms of robots
distributed in the 2-dimensional Euclidean space, several gathering algorithms
are known for extremely simple robots that are oblivious, have bounded viewing
radius, no compass, and no “flags” to communicate a status to others. On the
other side, in case of the 2-dimensional grid, the only known gathering
algorithms for robots with bounded viewing radius without compass, need to
memorize a constant number of rounds and need flags.
In this paper we contribute the, to the best of our knowledge, first
gathering algorithm on the grid, which works for anonymous, oblivious robots
with bounded viewing range, without any further means of communication and
without any memory. We prove its correctness and an (O(n^2)) time bound. This
time bound matches those of the best known algorithms for the Euclidean plane
mentioned above.
Comments: 6
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
Mass customization explains the phenomenon to provide a unique designed
products and services to all customer by achieving a high process integration
and flexibility. It has been used as a competitive approach by many companies.
Adequate resource implementation in mass customization-particularly in terms of
resource usage, it is therefore important to meet customer’s requirement in
terms effective responsiveness and meeting deadlines, at the same time offering
high scalability. An architecture for solving the resource allocation issue in
mass customized flexible manufacturing system was illustrated, by putting in
place a couple of advance reservation systems and scheduling algorithms for
effective usage of the product.
Comments: 5 pages
Subjects:
Learning (cs.LG)
; Distributed, Parallel, and Cluster Computing (cs.DC)
In this work, we propose to train a deep neural network by distributed
optimization over a graph. Two nonlinear functions are considered: the
rectified linear unit (ReLU) and a linear unit with both lower and upper
cutoffs (DCutLU). The problem reformulation over a graph is realized by
explicitly representing ReLU or DCutLU using a set of slack variables. We then
apply the alternating direction method of multipliers (ADMM) to update the
weights of the network layerwise by solving subproblems of the reformulated
problem. Empirical results suggest that by proper parameter selection, the
ADMM- based method converges considerably faster than gradient descent method.
Next-Step Conditioned Deep Convolutional Neural Networks Improve Protein Secondary Structure Prediction
Comments: 11 pages, 3 figures, 4 tables, submitted to ISMB/ECCB 2017
Subjects:
Learning (cs.LG)
; Biomolecules (q-bio.BM)
Recently developed deep learning techniques have significantly improved the
accuracy of various speech and image recognition systems. In this paper we show
how to adapt some of these techniques to create a novel chained convolutional
architecture with next-step conditioning for improving performance on protein
sequence prediction problems. We explore its value by demonstrating its ability
to improve performance on eight-class secondary structure prediction. We first
establish a state-of-the-art baseline by adapting recent advances in
convolutional neural networks which were developed for vision tasks. This model
achieves 70.0% per amino acid accuracy on the CB513 benchmark dataset without
use of standard performance-boosting techniques such as ensembling or multitask
learning. We then improve upon this state-of-the-art result using a novel
chained prediction approach which frames the secondary structure prediction as
a next-step prediction problem. This sequential model achieves 70.3% Q8
accuracy on CB513 with a single model; an ensemble of these models produces
71.4% Q8 accuracy on the same test set, improving upon the previous overall
state of the art for the eight-class secondary structure problem. Our models
are implemented using TensorFlow, an open-source machine learning software
library available at TensorFlow.org; we aim to release the code for these
experiments as part of the TensorFlow repository.
Comments: 29 pages
Subjects:
Learning (cs.LG)
; Optimization and Control (math.OC); Probability (math.PR); Machine Learning (stat.ML)
Stochastic Gradient Langevin Dynamics (SGLD) is a popular variant of
Stochastic Gradient Descent, where properly scaled isotropic Gaussian noise is
added to an unbiased estimate of the gradient at each iteration. This modest
change allows SGLD to escape local minima and suffices to guarantee asymptotic
convergence to global minimizers for sufficiently regular non-convex objectives
(Gelfand and Mitter, 1991).
The present work provides a nonasymptotic analysis in the context of
non-convex learning problems: SGLD requires ( ilde{O}(varepsilon^{-4}))
iterations to sample ( ilde{O}(varepsilon))-approximate minimizers of both
empirical and population risk, where ( ilde{O}(cdot)) hides polynomial
dependence on a temperature parameter, the model dimension, and a certain
spectral gap parameter.
As in the asymptotic setting, our analysis relates the discrete-time SGLD
Markov chain to a continuous-time diffusion process. A new tool that drives the
results is the use of weighted transportation cost inequalities to quantify the
rate of convergence of SGLD to a stationary distribution in the Euclidean
(2)-Wasserstein distance.
Patrick Glauner , Angelo Migliosi , Jorge Meira , Eric Aislan Antonelo , Petko Valtchev , Radu State , Franck Bettinger Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Non-technical losses (NTL) occur during the distribution of electricity in
power grids and include, but are not limited to, electricity theft and faulty
meters. In emerging countries, they may range up to 40% of the total
electricity distributed. In order to detect NTLs, machine learning methods are
used that learn irregular consumption patterns from customer data and
inspection results. The Big Data paradigm followed in modern machine learning
reflects the desire of deriving better conclusions from simply analyzing more
data, without the necessity of looking at theory and models. However, the
sample of inspected customers may be biased, i.e. it does not represent the
population of all customers. As a consequence, machine learning models trained
on these inspection results are biased as well and therefore lead to unreliable
predictions of whether customers cause NTL or not. In machine learning, this
issue is called covariate shift and has not been addressed in the literature on
NTL detection yet. In this work, we present a novel framework for quantifying
and visualizing covariate shift. We apply it to a commercial data set from
Brazil that consists of 3.6M customers and 820K inspection results. We show
that some features have a stronger covariate shift than others, making
predictions less reliable. In particular, previous inspections were focused on
certain neighborhoods or customer classes and that they were not sufficiently
spread among the population of customers. This framework is about to be
deployed in a commercial product for NTL detection.
Comments: 11 pages, 20 figures
Subjects:
Learning (cs.LG)
; Data Structures and Algorithms (cs.DS)
Kernel regression is an essential and ubiquitous tool for non-parametric data
analysis, particularly popular among time series and spatial data. However, the
central operation which is performed many times, evaluating a kernel on the
data set, takes linear time. This is impractical for modern large data sets.
In this paper we describe coresets for kernel regression: compressed data
sets which can be used as proxy for the original data and have provably bounded
worst case error. The size of the coresets are independent of the raw number of
data points, rather they only depend on the error guarantee, and in some cases
the size of domain and amount of smoothing. We evaluate our methods on very
large time series and spatial data, and demonstrate that they incur negligible
error, can be constructed extremely efficiently, and allow for great
computational gains.
You Lin , Ming Yang , Can Wan , Jianhui Wang , Yonghua Song Subjects : Learning (cs.LG) ; Applications (stat.AP)
Short-term probabilistic wind power forecasting can provide critical
quantified uncertainty information of wind generation for power system
operation and control. As the complicated characteristics of wind power
prediction error, it would be difficult to develop a universal forecasting
model dominating over other alternative models. Therefore, a novel multi-model
combination (MMC) approach for short-term probabilistic wind generation
forecasting is proposed in this paper to exploit the advantages of different
forecasting models. The proposed approach can combine different forecasting
models those provide different kinds of probability density functions to
improve the probabilistic forecast accuracy. Three probabilistic forecasting
models based on the sparse Bayesian learning, kernel density estimation and
beta distribution fitting are used to form the combined model. The parameters
of the MMC model are solved based on Bayesian framework. Numerical tests
illustrate the effectiveness of the proposed MMC approach.
Comments: Accepted by AISTATS 2017
Subjects:
Learning (cs.LG)
; Data Structures and Algorithms (cs.DS)
In the Best-(k)-Arm problem, we are given (n) stochastic bandit arms, each
associated with an unknown reward distribution. We are required to identify the
(k) arms with the largest means by taking as few samples as possible. In this
paper, we make progress towards a complete characterization of the
instance-wise sample complexity bounds for the Best-(k)-Arm problem. On the
lower bound side, we obtain a novel complexity term to measure the sample
complexity that every Best-(k)-Arm instance requires. This is derived by an
interesting and nontrivial reduction from the Best-(1)-Arm problem. We also
provide an elimination-based algorithm that matches the instance-wise lower
bound within doubly-logarithmic factors. The sample complexity of our algorithm
strictly dominates the state-of-the-art for Best-(k)-Arm (module constant
factors).
Comments: First version
Subjects:
Learning (cs.LG)
Incremental learning with concept drift has often been tackled by ensemble
methods, where models built in the past can be re-trained to attain new models
for the current data. Two design questions need to be addressed in developing
ensemble methods for incremental learning with concept drift, i.e., which
historical (i.e., previously trained) models should be preserved and how to
utilize them. A novel ensemble learning method, namely Diversity and Transfer
based Ensemble Learning (DTEL), is proposed in this paper. Given newly arrived
data, DTEL uses each preserved historical model as an initial model and further
trains it with the new data via transfer learning. Furthermore, DTEL preserves
a diverse set of historical models, rather than a set of historical models that
are merely accurate in terms of classification accuracy. Empirical studies on
15 synthetic data streams and 4 real-world data streams (all with concept
drifts) demonstrate that DTEL can handle concept drift more effectively than 4
other state-of-the-art methods.
Comments: 5 pages
Subjects:
Learning (cs.LG)
; Distributed, Parallel, and Cluster Computing (cs.DC)
In this work, we propose to train a deep neural network by distributed
optimization over a graph. Two nonlinear functions are considered: the
rectified linear unit (ReLU) and a linear unit with both lower and upper
cutoffs (DCutLU). The problem reformulation over a graph is realized by
explicitly representing ReLU or DCutLU using a set of slack variables. We then
apply the alternating direction method of multipliers (ADMM) to update the
weights of the network layerwise by solving subproblems of the reformulated
problem. Empirical results suggest that by proper parameter selection, the
ADMM- based method converges considerably faster than gradient descent method.
Comments: 9 pages
Subjects:
Learning (cs.LG)
; Machine Learning (stat.ML)
A generative model based on training deep architectures is proposed. The
model consists of K networks that are trained together to learn the underlying
distribution of a given data set. The process starts with dividing the input
data into K clusters and feeding each of them into a separate network. After
few iterations of training networks separately, we use an EM-like algorithm to
train the networks together and update the clusters of the data. We call this
model Mixture of Networks. The provided model is a platform that can be used
for any deep structure and be trained by any conventional objective function
for distribution modeling. As the components of the model are neural networks,
it has high capability in characterizing complicated data distributions as well
as clustering data. We apply the algorithm on MNIST hand-written digits and
Yale face datasets. We also demonstrate the clustering ability of the model
using some real-world and toy examples.
Comments: Under review for CVPR 2017. Project webpage: this https URL
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)
We introduce a neural architecture for navigation in novel environments. Our
proposed architecture learns to map from first-person viewpoints and plans a
sequence of actions towards goals in the environment. The Cognitive Mapper and
Planner (CMP) is based on two key ideas: a) a unified joint architecture for
mapping and planning, such that the mapping is driven by the needs of the
planner, and b) a spatial memory with the ability to plan given an incomplete
set of observations about the world. CMP constructs a top-down belief map of
the world and applies a differentiable neural net planner to produce the next
action at each time step. The accumulated belief of the world enables the agent
to track visited regions of the environment. Our experiments demonstrate that
CMP outperforms both reactive strategies and standard memory-based
architectures and performs well in novel environments. Furthermore, we show
that CMP can also achieve semantically specified goals, such as ‘go to a
chair’.
Hong Yu , Zheng-Hua Tan , Zhanyu Ma , Jun Guo Subjects : Sound (cs.SD) ; Cryptography and Security (cs.CR); Learning (cs.LG)
With the development of speech synthesis techniques, automatic speaker
verification systems face the serious challenge of spoofing attack. In order to
improve the reliability of speaker verification systems, we develop a new
filter bank based cepstral feature, deep neural network filter bank cepstral
coefficients (DNN-FBCC), to distinguish between natural and spoofed speech. The
deep neural network filter bank is automatically generated by training a filter
bank neural network (FBNN) using natural and synthetic speech. By adding
restrictions on the training rules, the learned weight matrix of FBNN is
band-limited and sorted by frequency, similar to the normal filter bank. Unlike
the manually designed filter bank, the learned filter bank has different filter
shapes in different channels, which can capture the differences between natural
and synthetic speech more effectively. The experimental results on the ASVspoof
{2015} database show that the Gaussian mixture model maximum-likelihood
(GMM-ML) classifier trained by the new feature performs better than the
state-of-the-art linear frequency cepstral coefficients (LFCC) based
classifier, especially on detecting unknown attacks.
Qi Lei , Jinfeng Yi , Roman Vaculin , Lingfei Wu , Inderjit S. Dhillon Subjects : Artificial Intelligence (cs.AI) ; Learning (cs.LG)
A considerable amount of machine learning algorithms take matrices as their
inputs. As such, they cannot directly analyze time series data due to its
temporal nature, usually unequal lengths, and complex properties. This is a
great pity since many of these algorithms are effective, robust, efficient, and
easy to use. In this paper, we bridge this gap by proposing an efficient
representation learning framework that is able to convert a set of time series
with equal or unequal lengths to a matrix format. In particular, we guarantee
that the pairwise similarities between time series are well preserved after the
transformation. Therefore, the learned feature representation is particularly
suitable to the class of learning problems that are sensitive to data
similarities. Given a set of (n) time series, we first construct an (n imes n)
partially observed similarity matrix by randomly sampling (O(n log n)) pairs
of time series and computing their pairwise similarities. We then propose an
extremely efficient algorithm that solves a highly non-convex and NP-hard
problem to learn new features based on the partially observed similarity
matrix. We use the learned features to conduct experiments on both data
classification and clustering tasks. Our extensive experimental results
demonstrate that the proposed framework is both effective and efficient.
Sandy H. Huang , David Held , Pieter Abbeel , Anca D. Dragan Subjects : Robotics (cs.RO) ; Learning (cs.LG)
Our ultimate goal is to efficiently enable end-users to correctly anticipate
a robot’s behavior in novel situations. This behavior is often a direct result
of the robot’s underlying objective function. Our insight is that end-users
need to have an accurate mental model of this objective function in order to
understand and predict what the robot will do. While people naturally develop
such a mental model over time through observing the robot act, this
familiarization process may be lengthy. Our approach reduces this time by
having the robot model how people infer objectives from observed behavior, and
then selecting those behaviors that are maximally informative. The problem of
computing a posterior over objectives from observed behavior is known as
Inverse Reinforcement Learning (IRL), and has been applied to robots learning
human objectives. We consider the problem where the roles of human and robot
are swapped. Our main contribution is to recognize that unlike robots, humans
will not be emph{exact} in their IRL inference. We thus introduce two factors
to define candidate approximate-inference models for human learning in this
setting, and analyze them in a user study in the autonomous driving domain. We
show that certain approximate-inference models lead to the robot generating
example behaviors that better enable users to anticipate what the robot will do
in test situations. Our results also suggest, however, that additional research
is needed in modeling how humans extrapolate from examples of robot behavior.
Comments: This is the appendix to the paper “A Collective, Probabilistic Approach to Schema Mapping” accepted to ICDE 2017
Subjects:
Databases (cs.DB)
; Learning (cs.LG)
In this appendix we provide additional supplementary material to “A
Collective, Probabilistic Approach to Schema Mapping.” We include an additional
extended example, supplementary experiment details, and proof for the
complexity result stated in the main paper.
Comments: 2016 IEEE Workshop on Spoken Language Technology
Subjects:
Computation and Language (cs.CL)
; Learning (cs.LG)
Recently, machine learning methods have provided a broad spectrum of original
and efficient algorithms based on Deep Neural Networks (DNN) to automatically
predict an outcome with respect to a sequence of inputs. Recurrent hidden cells
allow these DNN-based models to manage long-term dependencies such as Recurrent
Neural Networks (RNN) and Long Short-Term Memory (LSTM). Nevertheless, these
RNNs process a single input stream in one (LSTM) or two (Bidirectional LSTM)
directions. But most of the information available nowadays is from multistreams
or multimedia documents, and require RNNs to process these information
synchronously during the training. This paper presents an original LSTM-based
architecture, named Parallel LSTM (PLSTM), that carries out multiple parallel
synchronized input sequences in order to predict a common output. The proposed
PLSTM method could be used for parallel sequence classification purposes. The
PLSTM approach is evaluated on an automatic telecast genre sequences
classification task and compared with different state-of-the-art architectures.
Results show that the proposed PLSTM method outperforms the baseline n-gram
models as well as the state-of-the-art LSTM approach.
Comments: International Conference on Learning Representations (ICLR) 2017
Subjects:
Machine Learning (stat.ML)
; Learning (cs.LG)
We study reinforcement learning of chatbots with recurrent neural network
architectures when the rewards are noisy and expensive to obtain. For instance,
a chatbot used in automated customer service support can be scored by quality
assurance agents, but this process can be expensive, time consuming and noisy.
Previous reinforcement learning work for natural language processing uses
on-policy updates and/or is designed for on-line learning settings. We
demonstrate empirically that such strategies are not appropriate for this
setting and develop an off-policy batch policy gradient method (BPG). We
demonstrate the efficacy of our method via a series of synthetic experiments
and an Amazon Mechanical Turk experiment on a restaurant recommendations
dataset.
Syed Hassan Raza Naqvi , Andrea Matera , Lorenzo Combi , Umberto Spagnolini Subjects : Information Theory (cs.IT)
Centralized Radio Access Network (C-RAN) architecture is the only viable
solution to handle the complex interference scenario generated by massive
antennas and small cells deployment as required by next generation (5G) mobile
networks. In conventional C-RAN, the fronthaul links used to exchange the
signal between Base Band Units (BBUs) and Remote Antenna Units (RAUs) are based
on digital baseband (BB) signals over optical fibers due to the huge bandwidth
required. In this paper we evaluate the transport capability of copper-based
all-analog fronthaul architecture called Radio over Copper (RoC) that leverages
on the pre-existing LAN cables that are already deployed in buildings and
enterprises. In particular, the main contribution of the paper is to evaluate
the number of independent BB signals for multiple antennas system that can be
transported over multi-pair Cat-5/6/7 cables under a predefined fronthauling
transparency condition in terms of maximum BB signal degradation. The MIMO-RoC
proves to be a complementary solution to optical fiber for the last 200m toward
the RAUs, mostly to reuse the existing LAN cables and to power-supply the RAUs
over the same cable.
Comments: Extended version of a paper submitted to ISIT 2017
Subjects:
Information Theory (cs.IT)
In this paper, we study the impact of locality on the decoding of binary
cyclic codes under two approaches, namely ordered statistics decoding (OSD) and
trellis decoding. Given a binary cyclic code having locality or availability,
we suitably modify the OSD to obtain gains in terms of the Signal-To-Noise
ratio, for a given reliability and essentially the same level of decoder
complexity. With regard to trellis decoding, we show that careful introduction
of locality results in the creation of cyclic subcodes having lower maximum
state complexity. We also present a simple upper-bounding technique on the
state complexity profile, based on the zeros of the code. Finally, it is shown
how the decoding speed can be significantly increased in the presence of
locality, in the moderate-to-high SNR regime, by making use of a quick-look
decoder that often returns the ML codeword.
Comments: 22 pages, 1 figure, submitted to ISIT 2017
Subjects:
Information Theory (cs.IT)
In this work, we consider a complete covert communication system, which
includes the source-model of a stealthy secret key generation (SSKG) as the
first phase. The generated key will be used for the covert communication in the
second phase of the current round and also in the first phase of the next
round. We investigate the stealthy SK rate performance of the first phase. The
derived results show that the SK capacity lower and upper bounds of the
source-model SKG are not affected by the additional stealth constraint. This
result implies that we can attain the SSKG capacity for free when the sequences
observed by the three terminals Alice ((X^n)), Bob ((Y^n)) and Willie ((Z^n))
follow a Markov chain relationship, i.e., (X^n-Y^n-Z^n). We then prove that the
sufficient condition to attain both, the SK capacity as well as the SSK
capacity, can be relaxed from physical to stochastic degradedness. In order to
underline the practical relevance, we also derive a sufficient condition to
attain the degradedness by the usual stochastic order for Maurer’s fast fading
Gaussian (satellite) model for the source of common randomness.
On Muting Mobile Terminals for Uplink Interference Mitigation in HetNets — System-Level Analysis via Stochastic Geometry
Comments: 16 pages, 12 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects:
Information Theory (cs.IT)
We investigate the performance of a scheduling algorithm where the Mobile
Terminals (MTs) may be turned off if they cause a level of interference greater
than a given threshold. This approach, which is referred to as Interference
Aware Muting (IAM), may be regarded as an interference-aware scheme that is
aimed to reduce the level of interference. We analyze its performance with the
aid of stochastic geometry and compare it against other interference-unaware
and interference-aware schemes, where the level of interference is kept under
control in the power control scheme itself rather than in the scheduling
process. IAM is studied in terms of average transmit power, mean and variance
of the interference, coverage probability, Spectral Efficiency (SE), and Binary
Rate (BR), which accounts for the amount of resources allocated to the typical
MT. Simplified expressions of SE and BR for adaptive modulation and coding
schemes are proposed, which better characterize practical communication
systems. Our system-level analysis unveils that IAM increases the BR and
reduces the mean and variance of the interference. It is proved that an
operating regime exists, where the performance of IAM is independent of the
cell association criterion, which simplifies the joint design of uplink and
downlink transmissions.
Comments: 14 pages, 8 figures, submitted to IEEE Transactions on Wireless Communications, Nov. 2016
Subjects:
Information Theory (cs.IT)
The Internet of Things paradigm envisages the presence of many
battery-powered sensors and this entails the design of energy-aware protocols.
Source coding techniques allow to save some energy by compressing the packets
sent over the network, but at the cost of a poorer accuracy in the
representation of the data. This paper addresses the problem of designing
efficient policies to jointly perform processing and transmission tasks. In
particular, we aim at defining an optimal scheduling strategy with the twofold
ultimate goal of extending the network lifetime and guaranteeing a low overall
distortion of the transmitted data. We propose a Time Division Multiple Access
(TDMA)-based access scheme that optimally allocates resources to heterogeneous
nodes. We use realistic rate-distortion curves to quantify the impact of
compression on the data quality and propose a complete energy model that
includes the energy spent for processing and transmitting the data, as well as
the circuitry costs. Both full knowledge and statistical knowledge of the
wireless channels are considered, and optimal policies are derived for both
cases. The overall problem is structured in a modular fashion and solved
through convex and alternate programming techniques. Finally, we thoroughly
evaluate the proposed algorithms and the influence of the design variables on
the system performance adopting parameters of real sensors.
Lina Mohjazi , Sami Muhaidat , Mehrdad Dianati , Mahmoud Al-Qutayri Subjects : Information Theory (cs.IT)
Simultaneous wireless information and power transfer (SWIPT) relay networks
represent a paradigm shift in the development of wireless networks, enabling
simultaneous radio frequency (RF) energy harvesting (EH) and information
processing. Different from conventional SWIPT relaying schemes, which typically
assume the availability of perfect channel state information (CSI), here we
consider the application of noncoherent modulation in order to avoid the need
of instantaneous CSI estimation/tracking and minimise the energy consumption.
We propose a unified and comprehensive analytical framework for the analysis of
time switching (TS) and power splitting (PS) receiver architectures with the
amplify-and-forward (AF) protocol. In particular, we adopt a moments-based
approach to derive novel expressions for the outage probability, achievable
throughput, and average symbol error rate (ASER) of the considered SWIPT
system. We quantify the impact of several system parameters, involving relay
location, energy conversion efficiency, and TS and PS ratio assumptions,
imposed on the EH relay terminal. Our results reveal that the throughput
performance of the TS protocol is superior to that of the PS protocol at lower
receive signal-to-noise (SNR) values, which is in contrast to the
point-to-point SWIPT systems. An extensive Monte Carlo simulation study is
presented to corroborate the proposed analysis.
Andre Wibisono , Varun Jog , Po-Ling Loh Subjects : Information Theory (cs.IT) ; Statistics Theory (math.ST)
We study the relationship between information- and estimation-theoretic
quantities in time-evolving systems. We focus on the Fokker-Planck channel
defined by a general stochastic differential equation, and show that the time
derivatives of entropy, KL divergence, and mutual information are characterized
by estimation-theoretic quantities involving an appropriate generalization of
the Fisher information. Our results vastly extend De Bruijn’s identity and the
classical I-MMSE relation.
Comments: 32 pages, 2 figures
Subjects:
Information Theory (cs.IT)
In some applications, the variance of measurement noise depends on the signal
that we aim to measure. For instance, additive Gaussian signal-dependent noise
(AGSDN) channel models are used in molecular and optical communication. Herein
we provide lower and upper bounds on the capacity of additive signal dependent
noise (ASDN) channels. We also provide sufficient conditions under which the
capacity becomes infinity.
Bhavya Kailkhura , Lakshmi Narasimhan Theagarajan , Pramod K. Varshney Subjects : Information Theory (cs.IT)
In this letter, we extend the well-known index coding problem to exploit the
structure in the source-data to improve system throughput. In many
applications, the data to be transmitted may lie (or can be well approximated)
in a low-dimensional subspace. We exploit this low-dimensional structure of the
data using an algebraic framework to solve the index coding problem (referred
to as subspace-aware index coding) as opposed to the traditional index coding
problem which is subspace-unaware. Also, we propose an efficient algorithm
based on the alternating minimization approach to obtain near optimal index
codes for both subspace-aware and -unaware cases. Our simulations indicate that
under certain conditions, a significant throughput gain (about 90%) can be
achieved by subspace-aware index codes over conventional subspace-unaware index
codes.
Comments: 13 pages, 9 figures, 4 tables
Subjects:
Information Theory (cs.IT)
We determine lower and upper bounds on the capacity of bandlimited optical
intensity channels (BLOIC) with white Gaussian noise. Three types of input
power constraints are considered: 1) only an average power constraint, 2) only
a peak power constraint, and 3) an average and a peak power constraint.
Capacity lower bounds are derived by a two-step process including 1) for each
type of constraint, designing admissible pulse amplitude modulated input
waveform ensembles, and 2) lower bounding the maximum achievable information
rates of the designed input ensembles. Capacity upper bounds are derived by
exercising constraint relaxations and utilizing known results on discrete-time
optical intensity channels. We obtain degrees-of-freedom-optimal (DOF-optimal)
lower bounds which have the same pre-log factor as the upper bounds, thereby
characterizing the high SNR capacity of BLOIC to within a finite gap. We
further derive intersymbol-interference-free (ISI-free) signaling based lower
bounds, which perform well for all practical SNR values. In particular, the
ISI-free signaling based lower bounds outperform the DOF-optimal lower bound
when the SNR is below 10 dB.
Sense-and-Predict: Opportunistic MAC Based on Spatial Interference Correlation for Cognitive Radio Networks
Jeemin Kim , Seung-Woo Ko , Han Cha , Seong-Lyun Kim Subjects : Information Theory (cs.IT)
Opportunity detection at secondary transmitters (TXs) is a key technique
enabling cognitive radio (CR) networks. Such detection however cannot guarantee
reliable communication at secondary receivers (RXs), especially when their
association distance is long. To cope with the issue, this paper proposes a
novel MAC called sense-and-predict (SaP), where each secondary TX decides
whether to access or not based on the prediction of the interference level at
RX. Firstly, we provide the spatial interference correlation in a probabilistic
form using stochastic geometry, and utilize it to maximize the area spectral
efficiency (ASE) for secondary networks while guaranteeing the service quality
of primary networks. Through simulations and testbed experiments using USRP,
SaP is shown to always achieve ASE improvement compared with the conventional
TX based sensing.
Classical-Quantum Arbitrarily Varying Wiretap Channel: Secret Message Transmission under Jamming Attacks
Holger Boche , Minglai Cai , Christian Deppe , Janis Nötzel Subjects : Information Theory (cs.IT)
We analyze arbitrarily varying classical-quantum wiretap channels.These
channels are subject to two attacks at the same time: one passive
(eavesdropping), and one active (jamming). We progress on previous works by
introducing a reduced class of allowed codes that fulfills a more stringent
secrecy requirement than earlier definitions. In addition, we prove that
non-symmetrizability of the legal link is suficient for equality of the
deterministic and the common randomness assisted secrecy capacities. At last,
we focus on analytic properties of both secrecy capacities: We completely
characterize their discontinuity points, and their super-activation properties.
Comments: 13 pages, 5 figures, 2 tables; submitted to a journal
Subjects:
Information Theory (cs.IT)
; Hardware Architecture (cs.AR)
Massive multiuser (MU) multiple-input multiple-output (MIMO) will be a core
technology in fifth-generation (5G) wireless systems as it offers significant
improvements in spectral efficiency compared to existing multi-antenna
technologies. The presence of hundreds of antenna elements at the base station
(BS), however, results in excessively high hardware costs and power
consumption, and requires high interconnect throughput between the
baseband-processing unit and the radio unit. Massive MU-MIMO that uses
low-resolution analog-to-digital and digital-to-analog converters (DACs) has
the potential to address all these issues. In this paper, we focus on downlink
precoding for massive MU-MIMO systems with 1-bit DACs at the BS. The objective
is to design precoders that simultaneously mitigate multi-user interference
(MUI) and quantization artifacts. We propose two nonlinear 1-bit precoding
algorithms and corresponding very-large scale integration (VLSI) designs. Our
algorithms rely on biconvex relaxation, which enables the design of efficient
1-bit precoding algorithms that achieve superior error-rate performance
compared to that of linear precoding algorithms followed by quantization. To
showcase the efficacy of our algorithms, we design VLSI architectures that
enable efficient 1-bit precoding for massive MU-MIMO systems in which hundreds
of antennas serve tens of user equipments. We present corresponding
field-programmable gate array (FPGA) implementations to demonstrate that 1-bit
precoding enables reliable and high-rate downlink data transmission in
practical systems.
Dmitry Batenkov , Yaniv Romano , Michael Elad Subjects : Information Theory (cs.IT) ; Machine Learning (stat.ML)
The traditional sparse modeling approach, when applied to inverse problems
with large data such as images, essentially assumes a sparse model for small
overlapping data patches. While producing state-of-the-art results, this
methodology is suboptimal, as it does not attempt to model the entire global
signal in any meaningful way – a nontrivial task by itself. In this paper we
propose a way to bridge this theoretical gap by constructing a global model
from the bottom up. Given local sparsity assumptions in a dictionary, we show
that the global signal representation must satisfy a constrained
underdetermined system of linear equations, which can be solved efficiently by
modern optimization methods such as Alternating Direction Method of Multipliers
(ADMM). We investigate conditions for unique and stable recovery, and provide
numerical evidence corroborating the theory.
The Connectivity of Millimeter-Wave Networks in Urban Environments Modeled Using Random Lattices
Comments: 28 pages, 10 figures, submitted to IEEE Trans. on Wireless Communications
Subjects:
Information Theory (cs.IT)
Millimeter-wave (mmWave) communication opens up tens of giga-hertz (GHz)
spectrum in the mmWave band for use by next-generation wireless systems,
thereby solving the problem of spectrum scarcity. Maintaining connectivity
stands out to be a key design challenge for mmWave networks deployed in urban
regions due to the blockage effect characterizing mmWave propagation.
Specifically, mmWave signals can be blocked by buildings and other large urban
objects. In this paper, we set out to investigate the blockage effect on the
connectivity of mmWave networks in a Manhattan-type urban region modeled using
a random regular lattice while base stations (BSs) are Poisson distributed in
the plane. In particular, we analyze the connectivity probability that a
typical user is within the transmission range of a BS and connected by a
line-of-sight. Using random lattice and stochastic geometry theories, different
lower bounds on the connectivity probability are derived as functions of the
buildings’ size and probability of a lattice cell being occupied by a building
as well as BS density and transmission range. The asymptotic connectivity
probability is also derived for cases of dense buildings. Last, the results are
extended to heterogeneous networks. Our study yields closed-form relations
between the parameters of the building process and the BS process, providing
useful guidelines for practical mmWave network deployment.
Comments: This work has been accepted in IEEE TWC
Subjects:
Information Theory (cs.IT)
This paper jointly optimizes the precoding matrices and the set of active
remote radio heads (RRHs) to minimize the network power consumption (NPC) for a
user-centric cloud radio access network (C-RAN), where both the RRHs and users
have multiple antennas and each user is served by its nearby RRHs. Both users’
rate requirements and per-RRH power constraints are considered. Due to these
conflicting constraints, this optimization problem may be infeasible. In this
paper, we propose to solve this problem in two stages. In Stage I, a
low-complexity user selection algorithm is proposed to find the largest subset
of feasible users. In Stage II, a low-complexity algorithm is proposed to solve
the optimization problem with the users selected from Stage I. Specifically,
the re-weighted (l_1)-norm minimization method is used to transform the
original problem with non-smooth objective function into a series of weighted
power minimization (WPM) problems, each of which can be solved by the weighted
minimum mean square error (WMMSE) method. The solution obtained by the WMMSE
method is proved to satisfy the Karush-Kuhn-Tucker (KKT) conditions of the WPM
problem. Moreover, a low-complexity algorithm based on Newton’s method and the
gradient descent method is developed to update the precoder matrices in each
iteration of the WMMSE method. Simulation results demonstrate the rapid
convergence of the proposed algorithms and the benefits of equipping multiple
antennas at the user side. Moreover, the proposed algorithm is shown to achieve
near-optimal performance in terms of NPC.
Globally convergent Jacobi-type algorithms for simultaneous orthogonal symmetric tensor diagonalization
Comments: 19 pages, 5 figures
Subjects:
Numerical Analysis (math.NA)
; Information Theory (cs.IT); Optimization and Control (math.OC)
In this paper, we consider a family of Jacobi-type algorithms for
simultaneous orthogonal diagonalization problem of symmetric tensors. For the
Jacobi-based algorithm of [SIAM J. Matrix Anal. Appl., 2(34):651–672, 2013],
we prove its global convergence for simultaneous orthogonal diagonalization of
symmetric matrices and 3rd-order tensors. We also propose a new Jacobi-based
algorithm in the general setting and prove its global convergence for a wide
range of tensor problems (including Tucker approximation).
微信扫一扫,关注我爱机器学习公众号
微博:我爱机器学习