Comments: 10 + 3pages, under review for ICLR 2017
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Artificial Intelligence (cs.AI); Learning (cs.LG)
The past year saw the introduction of new architectures such as Highway
networks and Residual networks which, for the first time, enabled the training
of feedforward networks with dozens to hundreds of layers using simple gradient
descent. While depth of representation has been posited as a primary reason for
their success, there are indications that these architectures defy a popular
view of deep learning as a hierarchical computation of increasingly abstract
features at each layer.
In this report, we argue that this view is incomplete and does not adequately
explain several recent findings. We propose an alternative viewpoint based on
unrolled iterative estimation—a group of successive layers iteratively refine
their estimates of the same features instead of computing an entirely new
representation. We demonstrate that this viewpoint directly leads to the
construction of Highway and Residual networks. Finally we provide preliminary
experiments to discuss the similarities and differences between the two
architectures.
Comments: 15 pages, 8 figures, 7 tables
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Artificial Intelligence (cs.AI)
In order to better understand the advantages and disadvantages of a
constrained multi-objective evolutionary algorithm (CMOEA), it is important to
understand the nature of difficulty of a constrained multi-objective
optimization problem (CMOP) that the CMOEA is going to deal with. In this
paper, we first propose three primary types of difficulty to characterize the
constraints in CMOPs, including feasibility-hardness, convergence-hardness and
diversity-hardness. We then develop a general toolkit to construct difficulty
adjustable CMOPs with three types of parameterized constraint functions
according to the proposed three primary types of difficulty. In fact,
combination of the three primary constraint functions with different parameters
can lead to construct a large variety of CMOPs and CMaOPs, whose difficulty can
be uniquely defined by a triplet with each of its parameter specifying the
level of each primary difficulty type respectively. Based on this toolkit, we
suggest fifteen difficulty adjustable CMOPs named DAC-MOP1-15 with different
types and levels of difficulty. To study the effectiveness of DAC-MOP1-15, two
popular CMOEAs – MOEA/D-CDP and NSGA-II-CDP are adopted to test their
performances on them. Furthermore, this toolkit also has the ability to scale
the number of objectives. Nine difficulty adjustable constrained many-objective
optimization problems (DAC-MaOPs) named DAC-MaOP1-9 with the scalability to the
number of objectives are also proposed using this toolkit. Two constrained
many-objective evolutionary algorithms (CMaOEAs) – CNSGA-III and CMOEA/DD are
applied to test their performances on three, five, seven and ten-objective
DAC-MaOP1-9 with different difficulty levels and types.
Comments: this http URL
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
We address the problem of continuously observing and forecasting long-term
semantic activities of a first-person camera wearer: what the person will do,
where they will go, and what goal they are seeking. In contrast to prior work
in trajectory forecasting and short-term activity forecasting, our algorithm,
DARKO, reasons about the future position, future semantic state, and future
high-level goals of the camera-wearer at arbitrary spatial and temporal
horizons defined only by the wearer’s behaviors. DARKO learns and forecasts
online from first-person observations of the user’s daily behaviors. We derive
novel mathematical results that enable efficient forecasting of different
semantic quantities of interest. We apply our method to a dataset of 5
large-scale environments with 3 different environment types, collected from 3
different users, and experimentally validate DARKO on forecasting tasks.
Xin Li , Fuxin Li Subjects : Computer Vision and Pattern Recognition (cs.CV)
Deep learning has greatly improved visual recognition in recent years.
However, recent research has shown that there exist many adversarial examples
that can negatively impact the performance of such an architecture. This paper
focuses on detecting those adversarial examples by analyzing whether they come
from the same distribution as the normal examples. Instead of directly training
a deep neural network to detect adversarials, a much simpler approach is
proposed based on statistics on outputs from convolutional layers. A cascade
classifier is designed to efficiently detect adversarials. Furthermore, trained
from one particular adversarial generating mechanism, the resulting classifier
can successfully detect adversarials from a completely different mechanism as
well. After detecting adversarial examples, we show that many of them can be
recovered by simply performing a small average filter on the image. Those
findings should provoke us to think more about the classification mechanisms in
deep convolutional neural networks.
A. Vakhitov , A. Kuzmin , V. Lempitsky Subjects : Computer Vision and Pattern Recognition (cs.CV)
Internet image search engines have long been considered as a promising tool
for handling open-vocabulary textual user queries to unannotated image
datasets. However, systems that use this tool have to deal with multi-modal and
noisy image sets returned by search engines, especially for polysemous queries.
Generally, for many queries, only a small part of the returned sets can be
relevant to the user intent.
In this work, we suggest an approach that explicitly accounts for the complex
and noisy structure of the image sets returned by Internet image search
engines. Similarly to a considerable number of previous image retrieval works,
we train a deep convolutional network that maps images to high-dimensional
descriptors. To model image sets obtained from the Internet, our approach then
fits a simple probabilistic model that accounts for multi-modality and noise
(e.g. a Gaussian mixture model) to the deep descriptors of the images in this
set. Finally, the resulting distribution model can be used to search in the
unannotated image dataset by evaluating likelihoods of individual images.
As our main contribution, we develop an end-to-end training procedure that
tunes the parameters of a deep network using an annotated training set, while
accounting for the distribution fitting and the subsequent matching. In the
experiments, we show that such an end-to-end approach boosts the accuracy of
the Internet-based image retrieval for hold-out concepts, as compared to
retrieval systems that fit similar distribution models to pre-trained features
and to simpler end-to-end trained baselines.
Comments: 9 pages, 7 tables and 9 figures; first place on Kitti Road Segmentation; Code on GitHub (coming soon)
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Robotics (cs.RO)
While most approaches to semantic reasoning have focused on improving
performance, in this paper we argue that computational times are very important
in order to enable real time applications such as autonomous driving. Towards
this goal, we present an approach to joint classification, detection and
semantic segmentation via a unified architecture where the encoder is shared
amongst the three tasks. Our approach is very simple, can be trained end-to-end
and performs extremely well in the challenging KITTI dataset, outperforming the
state-of-the-art in the road segmentation task. Our approach is also very
efficient, taking less than 100 ms to perform all tasks.
Vivienne Sze , Yu-Hsin Chen , Joel Emer , Amr Suleiman , Zhengdong Zhang Subjects : Computer Vision and Pattern Recognition (cs.CV)
Machine learning plays a critical role in extracting meaningful information
out of the zetabytes of sensor data collected every day. For some applications,
the goal is to analyze and understand the data to identify trends (e.g.,
surveillance, portable/wearable electronics); in other applications, the goal
is to take immediate action based the data (e.g., robotics/drones, self-driving
cars, smart Internet of Things). For many of these applications, local embedded
processing near the sensor is preferred over the cloud due to privacy or
latency concerns, or limitations in the communication bandwidth. However, at
the sensor there are often stringent constraints on energy consumption and cost
in addition to throughput and accuracy requirements. Furthermore, flexibility
is often required such that the processing can be adapted for different
applications or environments (e.g., update the weights and model in the
classifier). In many applications, machine learning often involves transforming
the input data into a higher dimensional space, which, along with programmable
weights, increases data movement and consequently energy consumption. In this
paper, we will discuss how these challenges can be addressed at various levels
of hardware design ranging from architecture, hardware-friendly algorithms,
mixed-signal circuits, and advanced technologies (including memories and
sensors).
Deng Cai Subjects : Computer Vision and Pattern Recognition (cs.CV)
Approximate Nearest Neighbor (ANN) search is a fundamental problem in many
areas of machine learning and data mining. During the past decade, numerous
hashing algorithms are proposed to solve this problem. Every proposed algorithm
claims outperforms other state-of-the-art methods. However, there are serious
drawbacks in the evaluation of existing hashing papers and most of the claims
in these papers should be re-examined. 1) Most of the existing papers failed to
correctly measure the search time which is essential for the ANN search
problem. 2) As a result, most of the papers report the performance increases as
the code length increases, which is wrong if we measure the search time
correctly. 3) The performance of some hashing algorithms (eg, LSH) can easily
be boosted if one uses multiple hash tables, which is an important factor
should be considered in the evaluation while most of the papers failed to do
so. In this paper, we carefully revisit many popular hashing algorithms and
suggest one possible promising direction. For the sake of reproducibility, all
the codes used in the paper are released on Github, which can be used as a
testing platform to fairly compare various hashing algorithms.
Comments: 28 pages
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Handwriting recognition state of the art methods are based on Long Short Term
Memory (LSTM) recurrent neural networks (RNN) coupled with the use of
linguistic knowledge. LSTM RNN presents high raw performance and interesting
training properties that allow us to break with the standard method at the
state of the art. We present a simple and efficient way to extract from a
single training a large number of complementary LSTM RNN, called cohort,
combined in a cascade architecture with a lexical verification. This process
does not require fine tuning, making it easy to use. Our verification allow to
deal quickly and efficiently with gigantic lexicon (over 3 million words). We
achieve state of the art results for isolated word recognition with very large
lexicon and present novel results for an unprecedented gigantic lexicon.
Comments: DCC 2017 Poster
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
This work addresses the problem of extracting deeply learned features
directly from compressive measurements. There has been no work in this area.
Existing deep learning tools only give good results when applied on the full
signal, that too usually after preprocessing. These techniques require the
signal to be reconstructed first. In this work we show that by learning
directly from the compressed domain, considerably better results can be
obtained. This work extends the recently proposed framework of deep matrix
factorization in combination with blind compressed sensing; hence the term deep
blind compressed sensing. Simulation experiments have been carried out on
imaging via single pixel camera, under-sampled biomedical signals, arising in
wireless body area network and compressive hyperspectral imaging. In all cases,
the superiority of our proposed deep blind compressed sensing can be envisaged.
Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks
Comments: CVPR submission, main paper and supplementary material
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Indoor scene understanding is central to applications such as robot
navigation and human companion assistance. Over the last years, data-driven
deep neural networks have outperformed many traditional approaches thanks to
their representation learning capabilities. One of the bottlenecks in training
for better representations is the amount of available per-pixel ground truth
data that is required for core scene understanding tasks such as semantic
segmentation, normal prediction, and object edge detection. To address this
problem, a number of works proposed using synthetic data. However, a systematic
study of how such synthetic data is generated is missing. In this work, we
introduce a large-scale synthetic dataset with 400K physically-based rendered
images from 45K realistic 3D indoor scenes. We study the effects of rendering
methods and scene lighting on training for three computer vision tasks: surface
normal prediction, semantic segmentation, and object boundary detection. This
study provides insights into the best practices for training with synthetic
data (more realistic rendering is worth it) and shows that pretraining with our
new synthetic dataset can improve results beyond the current state of the art
on all three tasks.
Comments: Accepted at WACV 2017
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
; Multimedia (cs.MM)
This paper studies the joint learning of action recognition and temporal
localization in long, untrimmed videos. We employ a multi-task learning
framework that performs the three highly related steps of action proposal,
action recognition, and action localization refinement in parallel instead of
the standard sequential pipeline that performs the steps in order. We develop a
novel temporal actionness regression module that estimates what proportion of a
clip contains action. We use it for temporal localization but it could have
other applications like video retrieval, surveillance, summarization, etc. We
also introduce random shear augmentation during training to simulate viewpoint
change. We evaluate our framework on three popular video benchmarks. Results
demonstrate that our joint model is efficient in terms of storage and
computation in that we do not need to compute and cache dense trajectory
features, and that it is several times faster than its sequential ConvNets
counterpart. Yet, despite being more efficient, it outperforms state-of-the-art
methods with respect to accuracy.
Jhony-Heriberto Giraldo-Zuluaga , Geman Diez , Alexander Gomez , Tatiana Martinez , Mariana Peñuela Vasquez , Jesus Francisco Vargas Bonilla , Augusto Salazar Subjects : Computer Vision and Pattern Recognition (cs.CV)
Microalgae counting is used to measure biomass quantity. Usually, it is
performed in a manual way using a Neubauer chamber and expert criterion, with
the risk of a high error rate. This paper addresses the methodology for
automatic identification of Scenedesmus microalgae (used in the methane
production and food industry) and applies it to images captured by a digital
microscope. The use of contrast adaptive histogram equalization for
pre-processing, and active contours for segmentation are presented. The
calculation of statistical features (Histogram of Oriented Gradients, Hu and
Zernike moments) with texture features (Haralick and Local Binary Patterns
descriptors) are proposed for algae characterization. Scenedesmus algae can
build coenobia consisting of 1, 2, 4 and 8 cells. The amount of algae of each
coenobium helps to determine the amount of lipids, proteins, and other
substances in a given sample of a algae crop. The knowledge of the quantity of
those elements improves the quality of bioprocess applications. Classification
of coenobia achieves accuracies of 98.63% and 97.32% with Support Vector
Machine (SVM) and Artificial Neural Network (ANN), respectively. According to
the results it is possible to consider the proposed methodology as an
alternative to the traditional technique for algae counting. The database used
in this paper is publicly available for download.
Vasili Ramanishka , Abir Das , Jianming Zhang , Kate Saenko Subjects : Computer Vision and Pattern Recognition (cs.CV)
Top-down saliency methods based on deep neural networks have recently been
proposed for task-driven visual search. However existing methods focus on
object or scene classification tasks and cannot be used to compute saliency
heatmaps using a natural language sentence as the top-down input. Current
state-of-the-art image and video captioning models can generate accurate
sentence captions but are difficult to understand, as they do not expose the
internal process by which spatial and temporal regions are mapped to predicted
words. In this paper, we expose this mapping and demonstrate that
spatio-temporal saliency is learned implicitly by the combination of CNN and
LSTM parts of modern encoder-decoder networks. Our approach, which we call
Caption-Guided Visual Saliency, can produce spatial or spatio-temporal heatmaps
for both given input sentences or sentences predicted by our model. Unlike
recent efforts that introduce explicit “attention” layers to selectively attend
to certain inputs while generating each word, our approach recovers saliency
without the overhead of explicit attention layers, and can be used to analyze a
variety of existing model architectures and improve their design. We evaluate
our approach on large scale video and image datasets and make several
interesting discoveries about the inner workings of captioning models. The
source code is publicly available at
github.com/VisionLearningGroup/caption-guided-saliency.
Mert Kilickaya , Aykut Erdem , Nazli Ikizler-Cinbis , Erkut Erdem Subjects : Computation and Language (cs.CL) ; Computer Vision and Pattern Recognition (cs.CV)
The task of generating natural language descriptions from images has received
a lot of attention in recent years. Consequently, it is becoming increasingly
important to evaluate such image captioning approaches in an automatic manner.
In this paper, we provide an in-depth evaluation of the existing image
captioning metrics through a series of carefully designed experiments.
Moreover, we explore the utilization of the recently proposed Word Mover’s
Distance (WMD) document metric for the purpose of image captioning. Our
findings outline the differences and/or similarities between metrics and their
relative robustness by means of extensive correlation, accuracy and distraction
based evaluations. Our results also demonstrate that WMD provides strong
advantages over other metrics.
Hai Ye , Wenhan Chao , Zhunchen Luo Subjects : Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
In distant supervised relation extraction, the connection between relations
of one entity tuple, which we call class ties, is common. Exploiting this
connection may be promising for relation extraction. However, this property is
seldom considered by previous work. In this work, to leverage class ties, we
propose to make joint relation extraction with a unified model that integrates
convolutional neural network with a general pairwise ranking framework, in
which two novel ranking loss functions are introduced. Besides, an effective
method is proposed to relieve the impact of relation NR (not relation) for
model training and test. Experimental results on a widely used dataset show
that: (1) Our model is much more superior than the baselines, achieving
state-of-the-art performance; (2) Leveraging class ties, joint extraction is
indeed better than separated extraction; (3) Relieving the impact of NR will
significantly boost our model performance; (4) Our model can primely deal with
wrong labeling problem.
Solving Set Optimization Problems by Cardinality Optimization via Weak Constraints with an Application to Argumentation
Comments: Informal proceedings of the 1st Workshop on Trends and Applications of Answer Set Programming (TAASP 2016), Klagenfurt, Austria, 26 September 2016
Subjects:
Artificial Intelligence (cs.AI)
Optimization – minimization or maximization – in the lattice of subsets is a
frequent operation in Artificial Intelligence tasks. Examples are
subset-minimal model-based diagnosis, nonmonotonic reasoning by means of
circumscription, or preferred extensions in abstract argumentation. Finding the
optimum among many admissible solutions is often harder than finding admissible
solutions with respect to both computational complexity and methodology. This
paper addresses the former issue by means of an effective method for finding
subset-optimal solutions. It is based on the relationship between
cardinality-optimal and subset-optimal solutions, and the fact that many
logic-based declarative programming systems provide constructs for finding
cardinality-optimal solutions, for example maximum satisfiability (MaxSAT) or
weak constraints in Answer Set Programming (ASP). Clearly each
cardinality-optimal solution is also a subset-optimal one, and if the language
also allows for the addition of particular restricting constructs (both MaxSAT
and ASP do) then all subset-optimal solutions can be found by an iterative
computation of cardinality-optimal solutions. As a showcase, the computation of
preferred extensions of abstract argumentation frameworks using the proposed
method is studied.
The SP Theory of Intelligence as a Foundation for the Development of a General, Human-Level Thinking Machine
J Gerard Wolff Subjects : Artificial Intelligence (cs.AI)
This paper summarises how the “SP theory of intelligence” and its realisation
in the “SP computer model” simplifies and integrates concepts across artificial
intelligence and related areas, and thus provides a promising foundation for
the development of a general, human-level thinking machine, in accordance with
the main goal of research in artificial general intelligence.
The key to this simplification and integration is the powerful concept of
“multiple alignment”, borrowed and adapted from bioinformatics. This concept
has the potential to be the “double helix” of intelligence, with as much
significance for human-level intelligence as has DNA for biological sciences.
Strengths of the SP system include: versatility in the representation of
diverse kinds of knowledge; versatility in aspects of intelligence (including:
strengths in unsupervised learning; the processing of natural language; pattern
recognition at multiple levels of abstraction that is robust in the face of
errors in data; several kinds of reasoning (including: one-step `deductive’
reasoning; chains of reasoning; abductive reasoning; reasoning with
probabilistic networks and trees; reasoning with ‘rules’; nonmonotonic
reasoning and reasoning with default values; Bayesian reasoning with
‘explaining away’; and more); planning; problem solving; and more); seamless
integration of diverse kinds of knowledge and diverse aspects of intelligence
in any combination; and potential for application in several areas (including:
helping to solve nine problems with big data; helping to develop human-level
intelligence in autonomous robots; serving as a database with intelligence and
with versatility in the representation and integration of several forms of
knowledge; serving as a vehicle for medical knowledge and as an aid to medical
diagnosis; and several more).
Comments: This paper has been presented at the 13th European Workshop on Reinforcement Learning (EWRL 2016) on the 3rd and 4th of December 2016 in Barcelona, Spain
Subjects:
Artificial Intelligence (cs.AI)
; Learning (cs.LG); Machine Learning (stat.ML)
This paper investigates a type of instability that is linked to the greedy
policy improvement in approximated reinforcement learning. We show empirically
that non-deterministic policy improvement can stabilize methods like LSPI by
controlling the improvements’ stochasticity. Additionally we show that a
suitable representation of the value function also stabilizes the solution to
some degree. The presented approach is simple and should also be easily
transferable to more sophisticated algorithms like deep reinforcement learning.
Comments: 10 + 3pages, under review for ICLR 2017
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Artificial Intelligence (cs.AI); Learning (cs.LG)
The past year saw the introduction of new architectures such as Highway
networks and Residual networks which, for the first time, enabled the training
of feedforward networks with dozens to hundreds of layers using simple gradient
descent. While depth of representation has been posited as a primary reason for
their success, there are indications that these architectures defy a popular
view of deep learning as a hierarchical computation of increasingly abstract
features at each layer.
In this report, we argue that this view is incomplete and does not adequately
explain several recent findings. We propose an alternative viewpoint based on
unrolled iterative estimation—a group of successive layers iteratively refine
their estimates of the same features instead of computing an entirely new
representation. We demonstrate that this viewpoint directly leads to the
construction of Highway and Residual networks. Finally we provide preliminary
experiments to discuss the similarities and differences between the two
architectures.
Comments: 15 pages, 8 figures, 7 tables
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Artificial Intelligence (cs.AI)
In order to better understand the advantages and disadvantages of a
constrained multi-objective evolutionary algorithm (CMOEA), it is important to
understand the nature of difficulty of a constrained multi-objective
optimization problem (CMOP) that the CMOEA is going to deal with. In this
paper, we first propose three primary types of difficulty to characterize the
constraints in CMOPs, including feasibility-hardness, convergence-hardness and
diversity-hardness. We then develop a general toolkit to construct difficulty
adjustable CMOPs with three types of parameterized constraint functions
according to the proposed three primary types of difficulty. In fact,
combination of the three primary constraint functions with different parameters
can lead to construct a large variety of CMOPs and CMaOPs, whose difficulty can
be uniquely defined by a triplet with each of its parameter specifying the
level of each primary difficulty type respectively. Based on this toolkit, we
suggest fifteen difficulty adjustable CMOPs named DAC-MOP1-15 with different
types and levels of difficulty. To study the effectiveness of DAC-MOP1-15, two
popular CMOEAs – MOEA/D-CDP and NSGA-II-CDP are adopted to test their
performances on them. Furthermore, this toolkit also has the ability to scale
the number of objectives. Nine difficulty adjustable constrained many-objective
optimization problems (DAC-MaOPs) named DAC-MaOP1-9 with the scalability to the
number of objectives are also proposed using this toolkit. Two constrained
many-objective evolutionary algorithms (CMaOEAs) – CNSGA-III and CMOEA/DD are
applied to test their performances on three, five, seven and ten-objective
DAC-MaOP1-9 with different difficulty levels and types.
Comments: Informal proceedings of the 1st Workshop on Trends and Applications of Answer Set Programming (TAASP 2016), Klagenfurt, Austria, 26 September 2016
Subjects:
Logic in Computer Science (cs.LO)
; Artificial Intelligence (cs.AI); Computational Complexity (cs.CC)
While the solution counting problem for propositional satisfiability (#SAT)
has received renewed attention in recent years, this research trend has not
affected other AI solving paradigms like answer set programming (ASP). Although
ASP solvers are designed to enumerate all solutions, and counting can therefore
be easily done, the involved materialization of all solutions is a clear
bottleneck for the counting problem of ASP (#ASP). In this paper we propose
dynamic programming-based #ASP algorithms that exploit the structure of the
underlying (ground) ASP program. Experimental results for a prototype
implementation show promise when compared to existing solvers.
Jose M. Peña , Marcus Bendtsen Subjects : Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI)
We introduce a new family of graphical models that consists of graphs with
possibly directed, undirected and bidirected edges but without directed cycles.
We show that these models are suitable for representing causal models with
additive error terms. We provide a set of sufficient graphical criteria for the
identification of arbitrary causal effects when the new models contain directed
and undirected edges but no bidirected edge. We also provide a necessary and
sufficient graphical criterion for the identification of the causal effect of a
single variable on the rest of the variables. Moreover, we develop an exact
algorithm for learning the new models from observational and interventional
data via answer set programming. Finally, we introduce gated models for causal
effect identification, a new family of graphical models that exploits context
specific independences to identify additional causal effects.
Journal-ref: Sindh Univ. Res. Jour. (Sci. Ser.) Vol. 48 (4D) 05-08 (2016)
Subjects:
Computers and Society (cs.CY)
; Information Retrieval (cs.IR); Software Engineering (cs.SE)
In this fast developing world of information, the amount of medical knowledge
is rising at an exponential level. The UMLS (Unified Medical Language Systems),
is rich knowledge base consisting files and software that provides many health
and biomedical vocabularies and standards. A Web service is a web solution to
facilitate machine-to-machine interaction over a network. Few UMLS web services
are currently available for portable devices, but most of them lack in
efficiency and performance. It is proposed to develop Android-based web
services for healthcare systems underlying rich knowledge source of UMLS. The
experimental evaluation was made to analyse the efficiency and performance
effect with and without using the designed prototype. The understand-ability
and interaction with the prototype were greater than those who used the
alternate sources to obtain the answers to their questions. The overall
performance indicates that the system is convenient and easy to use. The result
of the evaluation clearly proved that designed system retrieves all the
pertinent information better than syntactic searches.
Mert Kilickaya , Aykut Erdem , Nazli Ikizler-Cinbis , Erkut Erdem Subjects : Computation and Language (cs.CL) ; Computer Vision and Pattern Recognition (cs.CV)
The task of generating natural language descriptions from images has received
a lot of attention in recent years. Consequently, it is becoming increasingly
important to evaluate such image captioning approaches in an automatic manner.
In this paper, we provide an in-depth evaluation of the existing image
captioning metrics through a series of carefully designed experiments.
Moreover, we explore the utilization of the recently proposed Word Mover’s
Distance (WMD) document metric for the purpose of image captioning. Our
findings outline the differences and/or similarities between metrics and their
relative robustness by means of extensive correlation, accuracy and distraction
based evaluations. Our results also demonstrate that WMD provides strong
advantages over other metrics.
Comments: EACL 2017; the first two authors contributed equally to this work
Subjects:
Computation and Language (cs.CL)
In this paper, we address two different types of noise in information
extraction models: noise from distant supervision and noise from pipeline input
features. Our target tasks are entity typing and relation extraction. For the
first noise type, we introduce multi-instance multi-label learning algorithms
using neural network models, and apply them to fine-grained entity typing for
the first time. This gives our models comparable performance with the
state-of-the-art supervised approach which uses global embeddings of entities.
For the second noise type, we propose ways to improve the integration of noisy
entity type predictions into relation extraction. Our experiments show that
probabilistic predictions are more robust than discrete predictions and that
joint training of the two tasks performs best.
Robert Östling , Jörg Tiedemann Subjects : Computation and Language (cs.CL)
Most existing models for multilingual natural language processing (NLP) treat
language as a discrete category, and make predictions for either one language
or the other. In contrast, we propose using continuous vector representations
of language. We show that these can be learned efficiently with a
character-based neural language model, and used to improve inference about
language varieties not seen during training. In experiments with 1303 Bible
translations into 990 different languages, we empirically explore the capacity
of multilingual language models, and also show that the language vectors
capture genetic relationships between languages.
Comments: 13 pages, 2 figures
Subjects:
Computation and Language (cs.CL)
; Learning (cs.LG)
We develop a new model for Interactive Question Answering (IQA), using
Gated-Recurrent-Unit recurrent networks (GRUs) as encoders for statements and
questions, and another GRU as a decoder for outputs. Distinct from previous
work, our approach employs context-dependent word-level attention for more
accurate statement representations and question-guided sentence-level attention
for better context modeling. Employing these mechanisms, our model accurately
understands when it can output an answer or when it requires generating a
supplementary question for additional input. When available, user’s feedback is
encoded and directly applied to update sentence-level attention to infer the
answer. Extensive experiments on QA and IQA datasets demonstrate quantitatively
the effectiveness of our model with significant improvement over conventional
QA models.
Hai Ye , Wenhan Chao , Zhunchen Luo Subjects : Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
In distant supervised relation extraction, the connection between relations
of one entity tuple, which we call class ties, is common. Exploiting this
connection may be promising for relation extraction. However, this property is
seldom considered by previous work. In this work, to leverage class ties, we
propose to make joint relation extraction with a unified model that integrates
convolutional neural network with a general pairwise ranking framework, in
which two novel ranking loss functions are introduced. Besides, an effective
method is proposed to relieve the impact of relation NR (not relation) for
model training and test. Experimental results on a widely used dataset show
that: (1) Our model is much more superior than the baselines, achieving
state-of-the-art performance; (2) Leveraging class ties, joint extraction is
indeed better than separated extraction; (3) Relieving the impact of NR will
significantly boost our model performance; (4) Our model can primely deal with
wrong labeling problem.
Comments: Published in ACM Transactions on Architecture and Code Optimization (TACO) 13, 4
Journal-ref: ACM Trans. Archit. Code Optim. 13, 4, Article 52 (December 2016)
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
This paper presents Pot, a system that leverages the concept of preordered
transactions to achieve deterministic multithreaded execution of programs that
use Transactional Memory. Preordered transactions eliminate the root cause of
nondeterminism in transactional execution: they provide the illusion of
executing in a deterministic serial order, unlike traditional transactions
which appear to execute in a nondeterministic order that can change from
execution to execution. Pot uses a new concurrency control protocol that
exploits the serialization order to distinguish between fast and speculative
transaction execution modes in order to mitigate the overhead of imposing a
deterministic order. We build two Pot prototypes: one using STM and another
using off-the-shelf HTM. To the best of our knowledge, Pot enables
deterministic execution of programs using off-the-shelf HTM for the first time.
An experimental evaluation shows that Pot achieves deterministic execution of
TM programs with low overhead, sometimes even outperforming nondeterministic
executions, and clearly outperforming the state of the art.
An efficient hybrid tridiagonal divide-and-conquer algorithm on distributed memory architectures
Comments: 20 pages, 7 figures
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
; Mathematical Software (cs.MS); Numerical Analysis (cs.NA)
In this paper, an efficient divide-and-conquer (DC) algorithm is proposed for
the symmetric tridiagonal matrices based on ScaLAPACK and the hierarchically
semiseparable (HSS) matrices. HSS is an important type of rank-structured
matrices.Most time of the DC algorithm is cost by computing the eigenvectors
via the matrix-matrix multiplications (MMM). In our parallel hybrid DC (PHDC)
algorithm, MMM is accelerated by using the HSS matrix techniques when the
intermediate matrix is large. All the HSS algorithms are done via the package
STRUMPACK. PHDC has been tested by using many different matrices. Compared with
the DC implementation in MKL, PHDC can be faster for some matrices with few
deflations when using hundreds of processes. However, the gains decrease as the
number of processes increases. The comparisons of PHDC with ELPA (the
Eigenvalue soLvers for Petascale Applications library) are similar. PHDC is
usually slower than MKL and ELPA when using 300 or more processes on Tianhe-2
supercomputer.
Comments: 6 pages, 7 figures,table1
Subjects:
Distributed, Parallel, and Cluster Computing (cs.DC)
; Computers and Society (cs.CY)
In this digitalised world where every information is stored, the data a are
growing exponentially. It is estimated that data are doubles itself every two
years. Geospatial data are one of the prime contributors to the big data
scenario. There are numerous tools of the big data analytics. But not all the
big data analytics tools are capabilities to handle geospatial big data. In the
present paper, it has been discussed about the recent two popular open source
geospatial big data analytical tools i.e. Spatial- Hadoop and GeoSpark which
can be used for analysis and process the geospatial big data in efficient
manner. It has compared the architectural view of SpatialHadoop and GeoSpark.
Through the architectural comparison, it has also summarised the merits and
demerits of these tools according the execution times and volume of the data
which has been used.
Comments: 6 pages, 5 figures
Subjects:
Databases (cs.DB)
; Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
We study distributed graph algorithms that adopt an iterative vertex-centric
framework for graph processing, popularized by the Google’s Pregel system.
Since then, there are several attempts to implement many graph algorithms in a
vertex-centric framework, as well as efforts to design optimization techniques
for improving the efficiency. However, to the best of our knowledge, there has
not been any systematic study to compare these vertex-centric implementations
with their sequential counterparts. Our paper addresses this gap in two ways.
(1) We analyze the computational complexity of such implementations with the
notion of time-processor product, and benchmark several vertex-centric graph
algorithms whether they perform more work with respect to their best-known
sequential solutions. (2) Employing the concept of balanced practical Pregel
algorithms, we study if these implementations suffer from imbalanced workload
and large number of iterations. Our findings illustrate that with the exception
of Euler tour tree algorithm, all other algorithms either perform more work
than their best-known sequential approach, or suffer from imbalanced workload/
large number of iterations, or even both. We also emphasize on graph algorithms
that are fundamentally difficult to be expressed in vertex-centric frameworks,
and conclude by discussing the road ahead for distributed graph processing.
Comments: 14 pages, 9 figures, submitted to IEEE Transactions on Neural Networks and Learning Systems
Subjects:
Learning (cs.LG)
; Machine Learning (stat.ML)
Since 2006, deep learning (DL) has become a rapidly growing research
direction, redefining state-of-the-art performances in a wide range of areas
such as object recognition, image segmentation, speech recognition and machine
translation. In modern manufacturing systems, data-driven machine health
monitoring is gaining in popularity due to the widespread deployment of
low-cost sensors and their connection to the Internet. Meanwhile, deep learning
provides useful tools for processing and analyzing these big machinery data.
The main purpose of this paper is to review and summarize the emerging research
work of deep learning on machine health monitoring. After the brief
introduction of deep learning techniques, the applications of deep learning in
machine health monitoring systems are reviewed mainly from the following
aspects: Auto-encoder (AE) and its variants, Restricted Boltzmann Machines and
its variants including Deep Belief Network (DBN) and Deep Boltzmann Machines
(DBM), Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN).
Finally, some new trends of DL-based machine health monitoring methods are
discussed.
Comments: 4 pages technical short note
Subjects:
Learning (cs.LG)
In this paper we improve the existing function approximation error bound for
the policy evaluation algorithm when the aim is to find the risk-sensitive cost
represented using exponential utility.
Johannes Blömer , Sascha Brauer , Kathrin Bujna Subjects : Learning (cs.LG) ; Data Structures and Algorithms (cs.DS)
In this paper, we present coreset constructions for the fuzzy (K)-means
problem. First, we show that one can construct a weak coresets for fuzzy
(K)-means. Second, we show that there are coresets for fuzzy (K)-means with
respect to balanced fuzzy (K)-means solutions. Third, we use these coresets to
develop a randomized approximation algorithm whose runtime is polynomial in the
number of the given points and the dimension of these points.
Comments: DCC 2017 poster
Subjects:
Learning (cs.LG)
; Machine Learning (stat.ML)
Currently there are two predominant ways to train deep neural networks. The
first one uses restricted Boltzmann machine (RBM) and the second one
autoencoders. RBMs are stacked in layers to form deep belief network (DBN); the
final representation layer is attached to the target to complete the deep
neural network. Autoencoders are nested one inside the other to form stacked
autoencoders; once the stcaked autoencoder is learnt the decoder portion is
detached and the target attached to the deepest layer of the encoder to form
the deep neural network. This work proposes a new approach to train deep neural
networks using dictionary learning as the basic building block; the idea is to
use the features from the shallower layer as inputs for training the next
deeper layer. One can use any type of dictionary learning (unsupervised,
supervised, discriminative etc.) as basic units till the pre-final layer. In
the final layer one needs to use the label consistent dictionary learning
formulation for classification. We compare our proposed framework with existing
state-of-the-art deep learning techniques on benchmark problems; we are always
within the top 10 results. In actual problems of age and gender classification,
we are better than the best known techniques.
Microstructure Representation and Reconstruction of Heterogeneous Materials via Deep Belief Network for Computational Material Design
Comments: 27 pages, 17 figures
Subjects:
Learning (cs.LG)
; Machine Learning (stat.ML)
Integrated Computational Materials Engineering (ICME) aims to accelerate
optimal design of complex material systems by integrating material science and
design automation. For tractable ICME, it is required that (1) a structural
feature space be identified to allow reconstruction of new designs, and (2) the
reconstruction process be property-preserving. The majority of existing
structural presentation schemes rely on the designer’s understanding of
specific material systems to identify geometric and statistical features, which
could be biased and insufficient for reconstructing physically meaningful
microstructures of complex material systems. In this paper, we develop a
feature learning mechanism based on convolutional deep belief network to
automate a two-way conversion between microstructures and their
lower-dimensional feature representations, and to achieves a 1000-fold
dimension reduction from the microstructure space. The proposed model is
applied to a wide spectrum of heterogeneous material systems with distinct
microstructural features including Ti-6Al-4V alloy, Pb63-Sn37 alloy,
Fontainebleau sandstone, and Spherical colloids, to produce material
reconstructions that are close to the original samples with respect to 2-point
correlation functions and mean critical fracture strength. This capability is
not achieved by existing synthesis methods that rely on the Markovian
assumption of material microstructures.
Charmgil Hong , Milos Hauskrecht Subjects : Learning (cs.LG) ; Machine Learning (stat.ML)
Despite tremendous progress in outlier detection research in recent years,
the majority of existing methods are designed only to detect unconditional
outliers that correspond to unusual data patterns expressed in the joint space
of all data attributes. Such methods are not applicable when we seek to detect
conditional outliers that reflect unusual responses associated with a given
context or condition. This work focuses on multivariate conditional outlier
detection, a special type of the conditional outlier detection problem, where
data instances consist of multi-dimensional input (context) and output
(responses) pairs. We present a novel outlier detection framework that
identifies abnormal input-output associations in data with the help of a
decomposable conditional probabilistic model that is learned from all data
instances. Since components of this model can vary in their quality, we combine
them with the help of weights reflecting their reliability in assessment of
outliers. We study two ways of calculating the component weights: global that
relies on all data, and local that relies only on instances similar to the
target instance. Experimental results on data from various domains demonstrate
the ability of our framework to successfully identify multivariate conditional
outliers.
Comments: 10 + 3pages, under review for ICLR 2017
Subjects:
Neural and Evolutionary Computing (cs.NE)
; Artificial Intelligence (cs.AI); Learning (cs.LG)
The past year saw the introduction of new architectures such as Highway
networks and Residual networks which, for the first time, enabled the training
of feedforward networks with dozens to hundreds of layers using simple gradient
descent. While depth of representation has been posited as a primary reason for
their success, there are indications that these architectures defy a popular
view of deep learning as a hierarchical computation of increasingly abstract
features at each layer.
In this report, we argue that this view is incomplete and does not adequately
explain several recent findings. We propose an alternative viewpoint based on
unrolled iterative estimation—a group of successive layers iteratively refine
their estimates of the same features instead of computing an entirely new
representation. We demonstrate that this viewpoint directly leads to the
construction of Highway and Residual networks. Finally we provide preliminary
experiments to discuss the similarities and differences between the two
architectures.
Comments: 7 pages, 3 figures, 3 tables
Subjects:
High Energy Physics – Phenomenology (hep-ph)
; Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)
Machine learning (ML) algorithms have been employed in the problem of
classifying signal and background events with high accuracy in particle
physics. In this paper, we use a widespread ML technique, namely, emph{stacked
generalization}, for the task of discovering a new neutral Higgs boson in gluon
fusion. We found that, at the same time it demands far less computation
efforts, emph{stacking} ML algorithms performs almost as well as deep neural
networks (DNN) trained exclusively with kinematic distributions for the same
task by building either a highly discriminating linear model or a shallower
neural network with stacked ML outputs. We also show that it is possible to
outperform DNN in this channel by partially exploring correlations among the
classifiers outputs in a multivariate statistical analysis.
Youngjoo Seo , Michaël Defferrard , Pierre Vandergheynst , Xavier Bresson Subjects : Machine Learning (stat.ML) ; Learning (cs.LG)
This paper introduces Graph Convolutional Recurrent Network (GCRN), a deep
learning model able to predict structured sequences of data. Precisely, GCRN is
a generalization of classical recurrent neural networks (RNN) to data
structured by an arbitrary graph. Such structured sequences can represent
series of frames in videos, spatio-temporal measurements on a network of
sensors, or random walks on a vocabulary graph for natural language modeling.
The proposed model combines convolutional neural networks (CNN) on graphs to
identify spatial structures and RNN to find dynamic patterns. We study two
possible architectures of GCRN, and apply the models to two practical problems:
predicting moving MNIST data, and modeling natural language with the Penn
Treebank dataset. Experiments show that exploiting simultaneously graph spatial
and dynamic information about data can improve both precision and learning
speed.
Comments: 9 pages, 4 tables, 1 figure
Subjects:
Machine Learning (stat.ML)
; Learning (cs.LG)
In many data exploration tasks it is meaningful to identify groups of
attribute interactions that are specific to a variable of interest. These
interactions are also useful in several practical applications, for example, to
gain insight into the structure of the data, in feature selection, and in data
anonymisation. We present a novel method, based on statistical significance
testing, that can be used to verify if the data set has been created by a given
factorized class-conditional joint distribution, where the distribution is
parametrized by partition of its attributes. Furthermore, we provide a method,
named ASTRID, to automatically find a partition of attributes that describes
the distribution that has generated the data. The state-of-the-art classifiers
are utilized to capture the interactions present in the data by systematically
breaking attribute interactions and observing the effect of this breaking on
classifier performance. We empirically demonstrate the utility of the proposed
method with real and synthetic data as well as with usage examples.
Comments: This paper has been presented at the 13th European Workshop on Reinforcement Learning (EWRL 2016) on the 3rd and 4th of December 2016 in Barcelona, Spain
Subjects:
Artificial Intelligence (cs.AI)
; Learning (cs.LG); Machine Learning (stat.ML)
This paper investigates a type of instability that is linked to the greedy
policy improvement in approximated reinforcement learning. We show empirically
that non-deterministic policy improvement can stabilize methods like LSPI by
controlling the improvements’ stochasticity. Additionally we show that a
suitable representation of the value function also stabilizes the solution to
some degree. The presented approach is simple and should also be easily
transferable to more sophisticated algorithms like deep reinforcement learning.
Comments: 5 pages, 2 figures
Subjects:
Sound (cs.SD)
; Learning (cs.LG); Machine Learning (stat.ML)
Most of the existing studies on voice conversion (VC) are conducted in
acoustically matched conditions between source and target signal. However, the
robustness of VC methods in presence of mismatch remains unknown. In this
paper, we report a comparative analysis of different VC techniques under
mismatched conditions. The extensive experiments with five different VC
techniques on CMU ARCTIC corpus suggest that performance of VC methods
substantially degrades in noisy conditions. We have found that bilinear
frequency warping with amplitude scaling (BLFWAS) outperforms other methods in
most of the noisy conditions. We further explore the suitability of different
speech enhancement techniques for robust conversion. The objective evaluation
results indicate that spectral subtraction and log minimum mean square error
(logMMSE) based speech enhancement techniques can be used to improve the
performance in specific noisy conditions.
Comments: 13 pages, 2 figures
Subjects:
Computation and Language (cs.CL)
; Learning (cs.LG)
We develop a new model for Interactive Question Answering (IQA), using
Gated-Recurrent-Unit recurrent networks (GRUs) as encoders for statements and
questions, and another GRU as a decoder for outputs. Distinct from previous
work, our approach employs context-dependent word-level attention for more
accurate statement representations and question-guided sentence-level attention
for better context modeling. Employing these mechanisms, our model accurately
understands when it can output an answer or when it requires generating a
supplementary question for additional input. When available, user’s feedback is
encoded and directly applied to update sentence-level attention to infer the
answer. Extensive experiments on QA and IQA datasets demonstrate quantitatively
the effectiveness of our model with significant improvement over conventional
QA models.
Amir Daneshmand , Gesualdo Scutari , Francisco Facchinei Subjects : Optimization and Control (math.OC) ; Learning (cs.LG)
The paper studies distributed Dictionary Learning (DL) problems where the
learning task is distributed over a multi-agent network with time-varying
(nonsymmetric) connectivity. This formulation is relevant, for instance, in
big-data scenarios where massive amounts of data are collected/stored in
different spatial locations and it is unfeasible to aggregate and/or process
all the data in a fusion center, due to resource limitations, communication
overhead or privacy considerations. We develop a general distributed
algorithmic framework for the (nonconvex) DL problem and establish its
asymptotic convergence. The new method hinges on Successive Convex
Approximation (SCA) techniques coupled with i) a gradient tracking mechanism
instrumental to locally estimate the missing global information; and ii) a
consensus step, as a mechanism to distribute the computations among the agents.
To the best of our knowledge, this is the first distributed algorithm with
provable convergence for the DL problem and, more in general, bi-convex
optimization problems over (time-varying) directed graphs.
Comments: This has an extra Appendix section giving more information about Remark 2 of the original paper published in the 52nd Annual Allerton Conference on Communication, Control and Computing, 2014
Subjects:
Information Theory (cs.IT)
We consider a directed acyclic network with multiple sources and multiple
terminals where each terminal is interested in decoding the sum of independent
sources generated at the source nodes. We describe a procedure whereby a simple
undirected graph can be used to construct such a sum-network and demonstrate an
upper bound on its computation rate. Furthermore, we show sufficient conditions
for the construction of a linear network code that achieves this upper bound.
Our procedure allows us to construct sum-networks that have any arbitrary
computation rate (frac{p}{q}) (where (p,q) are non-negative integers). Our
work significantly generalizes a previous approach for constructing
sum-networks with arbitrary capacities. Specifically, we answer an open
question in prior work by demonstrating sum-networks with significantly fewer
number of sources and terminals.
Comments: Patent no: US8929390 Methods And Apparatuses For Channel Estimation In Wireless Networks
Subjects:
Information Theory (cs.IT)
; Networking and Internet Architecture (cs.NI)
Radio channels are typically sparse in the delay domain, and ideal for
compressed sensing. A new compressed sensing algorithm called eX-OMP is
developed that yields performance similar to that of the optimal MMSE
estimator. The new algorithm relies on a small amount additional data. Both
eX-OMP and the MMSE estimator adaptively balance channel tracking and noise
reduction. They perform better than simple estimators such as the
linear-interpolator which fix this trade-off a priori. Some wideband
measurements are examined, and the channels are found to be represented by a
few delays.
Comments: 45 pages, 10 figures, submitted to IEEE Transactions on Information Theory
Subjects:
Information Theory (cs.IT)
We consider a wireless device-to-device (D2D) caching network where n nodes
are placed on a regular grid of area A(n). Each node caches L_C*F (coded) bits
from a library of size L*F bits, where L is the number of files and F is the
size of each file. Each node requests a file from the library independently
according to a popularity distribution. Under a commonly used “physical model”
and Zipf popularity distribution, we characterize the optimal per-node capacity
scaling law for extended networks (i.e., A(n). Moreover, we propose a
cache-induced hierarchical cooperation scheme and associated cache content
placement optimization algorithm to achieve the optimal per-node capacity
scaling law. When the path loss exponent alpha<3, the optimal per-node
capacity scaling law achieved by the cache-induced hierarchical cooperation can
be significantly better than that achieved by the existing state-of-the-art
schemes. To the best of our knowledge, this is the first work that completely
characterizes the per-node capacity scaling law for wireless caching networks
under the physical model and Zipf distribution with an arbitrary skewness
parameter au. While scaling law analysis yields clean results, it may not
accurately reflect the throughput performance of a large network with a finite
number of nodes. Therefore, we also analyze the throughput of the proposed
cache-induced hierarchical cooperation for networks of practical size. The
analysis and simulations verify that cache-induced hierarchical cooperation can
also achieve a large throughput gain over the cache-assisted multihop scheme
for networks of practical size.
Comments: 5 pages, 5 figures
Subjects:
Information Theory (cs.IT)
For greedy block sparse recovery where the sparsity level is unknown, we
derive a stopping condition to stop the iteration process. Focused on the block
orthogonal matching pursuit (BOMP) algorithm, we model the energy of residual
signals at each iteration from a probabilistic perspective. At the iteration
when the last supporting block is detected, the resulting energy of residual
signals is supposed to suffer an obvious decrease. Based on this, we stop the
iteration process when the energy of residual signals is below a given
threshold. Compared with other approaches, our derived condition works well for
the BOMP recovery. What is more, we promote our approach to the interference
cancellation based BOMP (ICBOMP) recovery in paper [1]. Simulation results show
that our derived condition can save many unnecessary iterations and at the same
time guarantees a favorable recovery accuracy, both for the BOMP and ICBOMP
recoveries.
Comments: Globecom 2016. arXiv admin note: text overlap with arXiv:1602.08010 , arXiv:1601.00608
Subjects:
Information Theory (cs.IT)
An uplink cognitive radio system with a single primary user (PU) and multiple
secondary users (SUs) is considered. The SUs have an individual average delay
constraint and an aggregate average interference constraint to the PU. If the
interference channels between the SUs and the PU are statistically different
due to the different physical locations of the SUs, the SUs will experience
different delay performances. This is because SUs located closer to the PU
transmit with lower power levels. A dynamic scheduling-and-power-allocation
policy that uses the dynamic programming, is proposed. The proposed policy can
provide the required average delay guarantees to all SUs irrespective of their
location as well as protect the PU from harmful interference. The policy is
shown to be asymptotically delay optimal in the light traffic regime.
Motivated, by the high complexity of the dynamic programming algorithm in the
optimal policy we exploit the structure of the problem’s solution to present an
alternative suboptimal policy. Through simulations we show that in both light
and heavy traffic regimes, the delay performance of the suboptimal policy is
within (0.3/%) from the optimal policy and both outperforming existing methods.
Comments: 38 pages, 5 figures
Subjects:
Probability (math.PR)
; Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)
We study the statistical limits of both detecting and estimating a rank-one
deformation of a symmetric random Gaussian tensor. We establish upper and lower
bounds on the critical signal-to-noise ratio, under a variety of priors for the
planted vector: (i) a uniformly sampled unit vector, (ii) i.i.d. (pm 1)
entries, and (iii) a sparse vector where a constant fraction (
ho) of entries
are i.i.d. (pm 1) and the rest are zero. For each of these cases, our upper
and lower bounds match up to a (1+o(1)) factor as the order (d) of the tensor
becomes large. For sparse signals (iii), our bounds are also asymptotically
tight in the sparse limit (
ho o 0) for any fixed (d) (including the (d=2)
case of sparse PCA). Our upper bounds for (i) demonstrate a phenomenon
reminiscent of the work of Baik, Ben Arous and P’ech’e: an `eigenvalue’ of a
perturbed tensor emerges from the bulk at a strictly lower signal-to-noise
ratio than when the perturbation itself exceeds the bulk; we quantify the size
of this effect. We also provide some general results for larger classes of
priors. In particular, the large (d) asymptotics of the threshold location
differs between problems with discrete priors versus continuous priors.
Finally, for priors (i) and (ii) we carry out the replica prediction from
statistical physics, which is conjectured to give the exact
information-theoretic threshold for any fixed (d).
Of independent interest, we introduce a new improvement to the second moment
method for contiguity, on which our lower bounds are based. Our technique
conditions away from rare `bad’ events that depend on interactions between the
signal and noise. This enables us to close (sqrt{2})-factor gaps present in
several previous works.
Comments: arXiv admin note: text overlap with arXiv:1612.06344
Subjects:
Optimization and Control (math.OC)
; Information Theory (cs.IT); Probability (math.PR)
In this paper we provide a complementary set of results to those we present
in our companion work cite{Stojnicl1HidParasymldp} regarding the behavior of
the so-called partial (ell_1) (a variant of the standard (ell_1) heuristic
often employed for solving under-determined systems of linear equations). As is
well known through our earlier works
cite{StojnicICASSP10knownsupp,StojnicTowBettCompSens13}, the partial (ell_1)
also exhibits the phase-transition (PT) phenomenon, discovered and well
understood in the context of the standard (ell_1) through Donoho’s and our own
works cite{DonohoPol,DonohoUnsigned,StojnicCSetam09,StojnicUpper10}.
cite{Stojnicl1HidParasymldp} goes much further though and, in addition to the
determination of the partial (ell_1)’s phase-transition curves (PT curves)
(which had already been done in
cite{StojnicICASSP10knownsupp,StojnicTowBettCompSens13}), provides a
substantially deeper understanding of the PT phenomena through a study of the
underlying large deviations principles (LDPs). As the PT and LDP phenomena are
by their definitions related to large dimensional settings, both sets of our
works, cite{StojnicICASSP10knownsupp,StojnicTowBettCompSens13} and
cite{Stojnicl1HidParasymldp}, consider what is typically called the asymptotic
regime. In this paper we move things in a different direction and consider
finite dimensional scenarios. Basically, we provide explicit performance
characterizations for any given collection of systems/parameters dimensions. We
do so for two different variants of the partial (ell_1), one that we call
exactly the partial (ell_1) and another one, possibly a bit more practical,
that we call the hidden partial (ell_1).
Partial (ell_1) optimization in random linear systems — phase transitions and large deviations
Comments: arXiv admin note: text overlap with arXiv:1612.06361 , arXiv:1612.06835
Subjects:
Optimization and Control (math.OC)
; Information Theory (cs.IT); Probability (math.PR)
(ell_1) optimization is a well known heuristic often employed for solving
various forms of sparse linear problems. In this paper we look at its a variant
that we refer to as the emph{partial} (ell_1) and discuss its mathematical
properties when used for solving linear under-determined systems of equations.
We will focus on large random systems and discuss the phase transition (PT)
phenomena and how they connect to the large deviation principles (LDP). Using a
variety of probabilistic and geometric techniques that we have developed in
recent years we will first present general guidelines that conceptually fully
characterize both, the PTs and the LDPs. After that we will put an emphasis on
providing a collection of explicit analytical solutions to all of the
underlying mathematical problems. As a nice bonus to the developed concepts,
the forms of the analytical solutions will, in our view, turn out to be fairly
elegant as well.
微信扫一扫,关注我爱机器学习公众号
微博:我爱机器学习