转载

arXiv Paper Daily: Mon, 12 Dec 2016

Neural and Evolutionary Computing

Field-Programmable Crossbar Array (FPCA) for Reconfigurable Computing

Mohammed A. Zidan , YeonJoo Jeong , Jong Hong Shin , Chao Du , Wei D. Lu Subjects : Emerging Technologies (cs.ET) ; Hardware Architecture (cs.AR); Neural and Evolutionary Computing (cs.NE)

For decades, advances in electronics were directly related to the scaling of

CMOS transistors according to Moore’s law. However, both the CMOS scaling and

the classical computer architecture are approaching fundamental and practical

limits, and new computing architectures based on emerging devices, such as

non-volatile memories e.g. resistive memory (RRAM) devices, are expected to

sustain the exponential growth of computing capability. Here we propose a novel

memory-centric, reconfigurable, general purpose computing platform to handle

the exploding amount of data in a fast and energy-efficient manner. The

proposed computing architecture is based on a single physical resistive

memory-centric fabric that can be optimally reconfigured and utilized to

perform different computing and data storage tasks in a massively parallel

approach. The system can be tailored to achieve maximal energy efficiency based

on the data flow by dynamically allocating the basic computing fabric to

storage, arithmetic, and analog including neuromorphic computing tasks.

A Review of Intelligent Practices for Irrigation Prediction

Hans Krupakar , Akshay Jayakumar , Dhivya G

Comments: 18 pages, 3 figures, 1 table, In National Conference on Computational Intelligence and High-Performance Computing (NCCIHPC)

Subjects

:

Learning (cs.LG)

; Neural and Evolutionary Computing (cs.NE)

Population growth and increasing droughts are creating unprecedented strain

on the continued availability of water resources. Since irrigation is a major

consumer of fresh water, wastage of resources in this sector could have strong

consequences. To address this issue, irrigation water management and prediction

techniques need to be employed effectively and should be able to account for

the variabilities present in the environment. The different techniques surveyed

in this paper can be classified into two categories: computational and

statistical. Computational methods deal with scientific correlations between

physical parameters whereas statistical methods involve specific prediction

algorithms that can be used to automate the process of irrigation water

prediction. These algorithms interpret semantic relationships between the

various parameters of temperature, pressure, evapotranspiration etc. and store

them as numerical precomputed entities specific to the conditions and the area

used as the data for the training corpus used to train it. We focus on

reviewing the computational methods used to determine Evapotranspiration and

its implications. We compare the efficiencies of different data mining and

machine learning methods implemented in this area, such as Logistic Regression,

Decision Tress Classifier, SysFor, Support Vector Machine(SVM), Fuzzy Logic

techniques, Artifical Neural Networks(ANNs) and various hybrids of Genetic

Algorithms (GA) applied to irrigation prediction. We also recommend a possible

technique for the same based on its superior results in other such time series

analysis tasks.

Computer Vision and Pattern Recognition

Panoptic Studio: A Massively Multiview System for Social Interaction Capture

Hanbyul Joo , Tomas Simon , Xulong Li , Hao Liu , Lei Tan , Lin Gui , Sean Banerjee , Timothy Godisart , Bart Nabbe , Iain Matthews , Takeo Kanade , Shohei Nobuhara , Yaser Sheikh

Comments: Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

We present an approach to capture the 3D motion of a group of people engaged

in a social interaction. The core challenges in capturing social interactions

are: (1) occlusion is functional and frequent; (2) subtle motion needs to be

measured over a space large enough to host a social group; (3) human appearance

and configuration variation is immense; and (4) attaching markers to the body

may prime the nature of interactions. The Panoptic Studio is a system organized

around the thesis that social interactions should be measured through the

integration of perceptual analyses over a large variety of view points. We

present a modularized system designed around this principle, consisting of

integrated structural, hardware, and software innovations. The system takes, as

input, 480 synchronized video streams of multiple people engaged in social

activities, and produces, as output, the labeled time-varying 3D structure of

anatomical landmarks on individuals in the space. Our algorithm is designed to

fuse the “weak” perceptual processes in the large number of views by

progressively generating skeletal proposals from low-level appearance cues, and

a framework for temporal refinement is also presented by associating body parts

to reconstructed dense 3D trajectory stream. Our system and method are the

first in reconstructing full body motion of more than five people engaged in

social interactions without using markers. We also empirically demonstrate the

impact of the number of views in achieving this goal.

Feature Pyramid Networks for Object Detection

Tsung-Yi Lin , Piotr Dollár , Ross Girshick , Kaiming He , Bharath Hariharan , Serge Belongie Subjects : Computer Vision and Pattern Recognition (cs.CV)

Feature pyramids are a basic component in recognition systems for detecting

objects at different scales. But recent deep learning object detectors have

avoided pyramid representations, in part because they are compute and memory

intensive. In this paper, we exploit the inherent multi-scale, pyramidal

hierarchy of deep convolutional networks to construct feature pyramids with

marginal extra cost. A top-down architecture with lateral connections is

developed for building high-level semantic feature maps at all scales. This

architecture, called a Feature Pyramid Network (FPN), shows significant

improvement as a generic feature extractor in several applications. Using FPN

in a basic Faster R-CNN system, our method achieves state-of-the-art

single-model results on the COCO detection benchmark without bells and

whistles, surpassing all existing single-model entries including those from the

COCO 2016 challenge winners. In addition, our method can run at 5 FPS on a GPU

and thus is a practical and accurate solution to multi-scale object detection.

Code will be made publicly available.

Quantifying and Predicting Image Scenicness

Scott Workman , Richard Souvenir , Nathan Jacobs Subjects : Computer Vision and Pattern Recognition (cs.CV)

Capturing the beauty of outdoor scenes in an image motivates many amateur and

professional photographers and serves as the basis for many image sharing

sites. While natural beauty is often considered a subjective property of

images, in this paper, we take an objective approach and provide methods for

quantifying and predicting the scenicness of an image. Using a dataset

containing hundreds of thousands of outdoor images captured throughout Great

Britain with crowdsourced ratings of natural beauty, we propose an approach to

predict scenicness which explicitly accounts for the variance of human raters.

We demonstrate that quantitative measures of scenicness can benefit semantic

image understanding, content-aware image processing, and a novel application of

cross-view mapping, where the sparsity of labeled ground-level images can be

addressed by incorporating unlabeled aerial images in the training and

prediction steps. For each application, our methods for scenicness prediction

result in quantitative and qualitative improvements over baseline approaches.

Shape-aware Instance Segmentation

Zeeshan Hayder , Xuming He , Mathieu Salzmann Subjects : Computer Vision and Pattern Recognition (cs.CV)

We address the problem of instance-level semantic segmentation, which aims at

jointly detecting, segmenting and classifying every individual object in an

image. In this context, existing methods typically propose candidate objects,

usually as bounding boxes, and directly predict a binary mask within each such

proposal. As a consequence, they cannot recover from errors in the object

candidate generation process, such as too small or shifted boxes. In this

paper, we introduce a novel object segment representation based on the distance

transform of the object masks. We then design an object mask network (OMN) with

a new residual-deconvolution architecture that infers such a representation and

decodes it into the final binary object mask. This allows us to predict masks

that go beyond the scope of the bounding boxes and are thus robust to

inaccurate object candidates. We integrate our OMN into a Multitask Network

Cascade framework, and learn the resulting shape-aware instance segmentation

(SAIS) network in an end-to-end manner. Our experiments on the PASCAL VOC 2012

and the CityScapes datasets demonstrate the benefits of our approach, which

outperforms the state-of-the-art in both object proposal generation and

instance segmentation.

Following Gaze Across Views

Adrià Recasens , Carl Vondrick , Aditya Khosla , Antonio Torralba

Comments: 9 pages, 8 figures

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Following the gaze of people inside videos is an important signal for

understanding people and their actions. In this paper, we present an approach

for following gaze across views by predicting where a particular person is

looking throughout a scene. We collect VideoGaze, a new dataset which we use as

a benchmark to both train and evaluate models. Given one view with a person in

it and a second view of the scene, our model estimates a density for gaze

location in the second view. A key aspect of our approach is an end-to-end

model that solves the following sub-problems: saliency, gaze pose, and

geometric relationships between views. Although our model is supervised only

with gaze, we show that the model learns to solve these subproblems

automatically without supervision. Experiments suggest that our approach

follows gaze better than standard baselines and produces plausible results for

everyday situations.

ActionFlowNet: Learning Motion Representation for Action Recognition

Joe Yue-Hei Ng , Jonghyun Choi , Jan Neumann , Larry S. Davis Subjects : Computer Vision and Pattern Recognition (cs.CV)

Even with the recent advances in convolutional neural networks (CNN) in

various visual recognition tasks, the state-of-the-art action recognition

system still relies on hand crafted motion feature such as optical flow to

achieve the best performance. We propose a multitask learning model

ActionFlowNet to train a single stream network directly from raw pixels to

jointly estimate optical flow while recognizing actions with convolutional

neural networks, capturing both appearance and motion in a single model. We

additionally provide insights to how the quality of the learned optical flow

affects the action recognition. Our model not only significantly improves

action recognition accuracy by a large margin (17%) compared to

state-of-the-art CNN-based action recognition models trained without external

large scale data and additional optical flow input, but also produces the

optical flow as a side product.

Automatic Model Based Dataset Generation for Fast and Accurate Crop and Weeds Detection

Maurilio Di Cicco , Ciro Potena , Giorgio Grisetti , Alberto Pretto Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Robotics (cs.RO)

Selective weeding is one of the key challenges in the field of agriculture

robotics: in order to accomplish this task, a farm robot should be able to

accurately detect plants and to distinguish them between crop and weeds. In

this paper, we face this problem by proposing a novel and effective approach

that aims to dramatically minimize the human intervention needed to train the

detection and classification algorithms. The idea is to synthetically generate

the training datasets by using state-of-the-art computer graphics methods and

tools. We explicitly model the relevant aspects of the target environment

(i.e., crop and weeds species, type of soil, light conditions): by changing the

model parameters, it is possible to render a huge number of photo-realistic

views of the environment, to be used as training data. The proposed methodology

has been validated exploiting a very recent deep learning based image

segmentation architecture called SegNet [Kendall et al.]. We trained and tested

different custom-built variants of SegNet using both real and synthetically

generated datasets, the reported results confirm the effectiveness and the

potentiality of our approach.

Facial Expression Recognition using Convolutional Neural Networks: State of the Art

Christopher Pramerdorfer , Martin Kampel Subjects : Computer Vision and Pattern Recognition (cs.CV)

The ability to recognize facial expressions automatically enables novel

applications in human-computer interaction and other areas. Consequently, there

has been active research in this field, with several recent works utilizing

Convolutional Neural Networks (CNNs) for feature extraction and inference.

These works differ significantly in terms of CNN architectures and other

factors. Based on the reported results alone, the performance impact of these

factors is unclear. In this paper, we review the state of the art in

image-based facial expression recognition using CNNs and highlight algorithmic

differences and their performance impact. On this basis, we identify existing

bottlenecks and consequently directions for advancing this research field.

Furthermore, we demonstrate that overcoming one of these bottlenecks – the

comparatively basic architectures of the CNNs utilized in this field – leads to

a substantial performance increase. By forming an ensemble of modern deep CNNs,

we obtain a FER2013 test accuracy of 75.2%, outperforming previous works

without requiring auxiliary training data or face registration.

Gesture-based Bootstrapping for Egocentric Hand Segmentation

Yubo Zhang , Vishnu Naresh Boddeti , Kris M. Kitani Subjects : Computer Vision and Pattern Recognition (cs.CV)

Accurately identifying hands in images is a key sub-task for human activity

understanding with wearable first-person point-of-view cameras. Traditional

hand segmentation approaches rely on a large corpus of manually labeled data to

generate robust hand detectors. However, these approaches still face challenges

as the appearance of the hand varies greatly across users, tasks, environments

or illumination conditions. A key observation in the case of many wearable

applications and interfaces is that, it is only necessary to accurately detect

the user’s hands in a specific situational context. Based on this observation,

we introduce an interactive approach to learn a person-specific hand

segmentation model that does not require any manually labeled training data.

Our approach proceeds in two steps, an interactive bootstrapping step for

identifying moving hand regions, followed by learning a personalized user

specific hand appearance model. Concretely, our approach uses two convolutional

neural networks: (1) a gesture network that uses pre-defined motion information

to detect the hand region; and (2) an appearance network that learns a person

specific model of the hand region based on the output of the gesture network.

During training, to make the appearance network robust to errors in the gesture

network, the loss function of the former network incorporates the confidence of

the gesture network while learning. Experiments demonstrate the robustness of

our approach with an F1 score over 0.8 on all challenging datasets across a

wide range of illumination and hand appearance variations, improving over a

baseline approach by over 10%.

Fast Fourier single-pixel imaging using binary illumination

Zibang Zhang , Xueying Wang , Jingang Zhong Subjects : Computer Vision and Pattern Recognition (cs.CV)

Fourier single-pixel imaging (FSI) has proven capable of reconstructing

high-quality two-dimensional and three-dimensional images. The utilization of

the sparsity of natural images in Fourier domain allows high-resolution images

to be reconstructed from far fewer measurements than effective image pixels.

However, applying original FSI in digital micro-mirror device (DMD) based

high-speed imaging system turns out to be challenging, because the original FSI

uses grayscale Fourier basis patterns for illumination while DMDs generate

grayscale patterns at a relatively low rate. DMDs are a binary device which can

only generate a black-and-white pattern at each instance. In this paper, we

adopt binary Fourier patterns for illumination to achieve DMD-based high-speed

single-pixel imaging. Binary Fourier patterns are generated by upsampling and

then applying error diffusion based dithering to the grayscale patterns.

Experiments demonstrate the proposed technique able to achieve static imaging

with high quality and dynamic imaging in real time. The proposed technique

potentially allows high-quality and high-speed imaging over broad wavebands.

Exploiting 2D Floorplan for Building-scale Panorama RGBD Alignment

Erik Wijmans , Yasutaka Furukawa Subjects : Computer Vision and Pattern Recognition (cs.CV)

This paper presents a novel algorithm that utilizes a 2D floorplan to align

panorama RGBD scans. While effective panorama RGBD alignment techniques exist,

such a system requires extremely dense RGBD image sampling. Our approach can

significantly reduce the number of necessary scans with the aid of a floorplan

image. We formulate a novel Markov Random Field inference problem as a scan

placement over the floorplan, as opposed to the conventional scan-to-scan

alignment. The technical contributions lie in multi-modal image correspondence

cues (between scans and schematic floorplan) as well as a novel coverage

potential avoiding an inherent stacking bias. The proposed approach has been

evaluated on five challenging large indoor spaces. To the best of our

knowledge, we present the first effective system that utilizes a 2D floorplan

image for building-scale 3D pointcloud alignment. The source code and the data

will be shared with the community to further enhance indoor mapping research.

Deep TEN: Texture Encoding Network

Hang Zhang , Jia Xue , Kristin Dana Subjects : Computer Vision and Pattern Recognition (cs.CV)

We propose a Deep Texture Encoding Network (Deep-TEN) with a novel Encoding

Layer integrated on top of convolutional layers, which ports the entire

dictionary learning and encoding pipeline into a single model. Current methods

build from distinct components, using standard encoders with separate

off-the-shelf features such as SIFT descriptors or pre-trained CNN features for

material recognition. Our new approach provides an end-to-end learning

framework, where the inherent visual vocabularies are learned directly from the

loss function. The features, dictionaries and the encoding representation for

the classifier are all learned simultaneously. The representation is orderless

and therefore is particularly useful for material and texture recognition. The

Encoding Layer generalizes robust residual encoders such as VLAD and Fisher

Vectors, and has the property of discarding domain specific information which

makes the learned convolutional features easier to transfer. Additionally,

joint training using multiple datasets of varied sizes and class labels is

supported resulting in increased recognition performance. The experimental

results show superior performance as compared to state-of-the-art methods using

gold-standard databases such as MINC-2500, Flickr Material Database,

KTH-TIPS-2b, and two recent databases 4D-Light-Field-Material and GTOS. The

source code for the complete system are publicly available.

A series of maximum entropy upper bounds of the differential entropy

Frank Nielsen , Richard Nock

Comments: 18 pages

Subjects

:

Information Theory (cs.IT)

; Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We present a series of closed-form maximum entropy upper bounds for the

differential entropy of a continuous univariate random variable and study the

properties of that series. We then show how to use those generic bounds for

upper bounding the differential entropy of Gaussian mixture models. This

requires to calculate the raw moments and raw absolute moments of Gaussian

mixtures in closed-form that may also be handy in statistical machine learning

and information theory. We report on our experiments and discuss on the

tightness of those bounds.

Artificial Intelligence

Measuring Adverse Drug Effects on Multimorbity using Tractable Bayesian Networks

Jessa Bekker , Arjen Hommersom , Martijn Lappenschaar , Jesse Davis

Comments: Machine Learning for Health @ NIPS 2016

Subjects

:

Artificial Intelligence (cs.AI)

Managing patients with multimorbidity often results in polypharmacy: the

prescription of multiple drugs. However, the long-term effects of specific

combinations of drugs and diseases are typically unknown. In particular, drugs

prescribed for one condition may result in adverse effects for the other. To

investigate which types of drugs may affect the further progression of

multimorbidity, we query models of diseases and prescriptions that are learned

from primary care data. State-of-the-art tractable Bayesian network

representations, on which such complex queries can be computed efficiently, are

employed for these large medical networks. Our results confirm that

prescriptions may lead to unintended negative consequences in further

development of multimorbidity in cardiovascular diseases. Moreover, a drug

treatment for one disease group may affect diseases of another group.

GOTM: a Goal-oriented Framework for Capturing Uncertainty of Medical Treatments

Davoud Mougouei , David Powers

Comments: Idea Paper

Subjects

:

Artificial Intelligence (cs.AI)

It has been widely recognized that uncertainty is an inevitable aspect of

diagnosis and treatment of medical disorders. Such uncertainties hence, need to

be considered in computerized medical models. The existing medical modeling

techniques however, have mainly focused on capturing uncertainty associated

with diagnosis of medical disorders while ignoring uncertainty of treatments.

To tackle this issue, we have proposed using a fuzzy-based modeling and

description technique for capturing uncertainties in treatment plans. We have

further contributed a formal framework which allows for goal-oriented modeling

and analysis of medical treatments.

Learning representations through stochastic gradient descent in cross-validation error

Richard S. Sutton , Vivek Veeriah

Comments: preliminary draft

Subjects

:

Artificial Intelligence (cs.AI)

; Learning (cs.LG); Machine Learning (stat.ML)

Representations are fundamental to Artificial Intelligence. Typically, the

performance of a learning system depends on its data representation. These data

representations are usually hand-engineered based on some prior domain

knowledge regarding the task. More recently, the trend is to learn these

representations through deep neural networks as these can produce dramatical

performance improvements over hand-engineered data representations. In this

paper, we present a new incremental learning algorithm, called crossprop, for

learning representations based on prior learning experiences. Unlike

backpropagation, crossprop minimizes the cross-validation error. Specifically,

our algorithm considers the influences of all the past weights on the current

squared error, and uses this gradient for incrementally learning the weights in

a neural network. This idea is similar to that of tuning the learning system

through an offline cross-validation procedure. Crossprop is applicable to

incremental learning tasks, where a sequence of examples are encountered by the

learning system and they need to be processed one by one and then discarded.

The learning system can use each example only once and can spend only a limited

amount of computation for an example. From our preliminary experiments, we

concluce that crossprop is a promising alternative for backprop for

representation learning.

Advancing Bayesian Optimization: The Mixed-Global-Local (MGL) Kernel and Length-Scale Cool Down

Kim Peter Wabersich , Marc Toussaint

Comments: Long version of accepted NIPS BayesOpt 2016 paper

Subjects

:

Learning (cs.LG)

; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Bayesian Optimization (BO) has become a core method for solving expensive

black-box optimization problems. While much research focussed on the choice of

the acquisition function, we focus on online length-scale adaption and the

choice of kernel function. Instead of choosing hyperparameters in view of

maximum likelihood on past data, we propose to use the acquisition function to

decide on hyperparameter adaptation more robustly and in view of the future

optimization progress. Further, we propose a particular kernel function that

includes non-stationarity and local anisotropy and thereby implicitly

integrates the efficiency of local convex optimization with global Bayesian

optimization. Comparisons to state-of-the art BO methods underline the

efficiency of these mechanisms on global optimization benchmarks.

Distributed, Parallel, and Cluster Computing

Clipper: A Low-Latency Online Prediction Serving System

Daniel Crankshaw , Xin Wang , Giulio Zhou , Michael J. Franklin , Joseph E. Gonzalez , Ion Stoica Subjects : Distributed, Parallel, and Cluster Computing (cs.DC) ; Learning (cs.LG)

Machine learning is being deployed in a growing number of applications which

demand real-time, accurate, and robust predictions under heavy query load.

However, most machine learning frameworks and systems only address model

training and not deployment.

In this paper, we introduce Clipper, the first general-purpose low-latency

prediction serving system. Interposing between end-user applications and a wide

range of machine learning frameworks, Clipper introduces a modular architecture

to simplify model deployment across frameworks. Furthermore, by introducing

caching, batching, and adaptive model selection techniques, Clipper reduces

prediction latency and improves prediction throughput, accuracy, and robustness

without modifying the underlying machine learning frameworks. We evaluate

Clipper on four common machine learning benchmark datasets and demonstrate its

ability to meet the latency, accuracy, and throughput demands of online serving

applications. Finally, we compare Clipper to the TensorFlow Serving system and

demonstrate comparable prediction throughput and latency on a range of models

while enabling new functionality, improved accuracy, and robustness.

Tuple spaces implementations and their efficiency

Vitaly Buravlev , Rocco De Nicola , Claudio Antares Mezzina Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)

Among the paradigms for parallel and distributed computing, the one

popularized with Linda, and based on tuple spaces, is one of the least used,

despite the fact of being intuitive, easy to understand and to use. A tuple

space is a repository, where processes can add, withdraw or read tuples by

means of atomic operations. Tuples may contain different values, and processes

can inspect their content via pattern matching. The lack of a reference

implementation for this paradigm has prevented its widespread. In this paper,

first we perform an extensive analysis of a number of actual implementations of

the tuple space paradigm and summarise their main features. Then, we select

four such implementations and compare their performances on four different case

studies that aim at stressing different aspects of computing such as

communication, data manipulation, and cpu usage. After reasoning on strengths

and weaknesses of the four implementations, we conclude with some

recommendations for future work towards building an effective implementation of

the tuple space paradigm.

Solidus: An Incentive-compatible Cryptocurrency Based on Permissionless Byzantine Consensus

Ittai Abraham , Dahlia Malkhi , Kartik Nayak , Ling Ren , Alexander Spiegelman Subjects : Cryptography and Security (cs.CR) ; Distributed, Parallel, and Cluster Computing (cs.DC); Computer Science and Game Theory (cs.GT)

The decentralized cryptocurrency Bitcoin has experienced great success but

also encountered many challenges. One of the challenges has been the long

confirmation time and low transaction throughput. Another challenge is the lack

of incentives at certain steps of the protocol, raising concerns for

transaction withholding, selfish mining, etc. To address these challenges, we

propose Solidus, a decentralized cryptocurrency based on permissionless

Byzantine consensus. A core technique in Solidus is to use proof of work for

leader election to adapt the Practical Byzantine Fault Tolerance (PBFT)

protocol to a permissionless setting. We also design Solidus to be incentive

compatible and to mitigate selfish mining. Solidus improves on Bitcoin in

confirmation time and provides safety and liveness assuming Byzantine players

and the largest coalition of rational players collectively control less than

one-third of the computation power.

Learning

Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing

Constantinos Daskalakis , Qinxuan Pan Subjects : Learning (cs.LG) ; Information Theory (cs.IT); Probability (math.PR); Statistics Theory (math.ST); Machine Learning (stat.ML)

We show that the square Hellinger distance between two Bayesian networks on

the same directed graph, (G), is subadditive with respect to the neighborhoods

of (G). Namely, if (P) and (Q) are the probability distributions defined by two

Bayesian networks on the same DAG, our inequality states that the square

Hellinger distance, (H^2(P,Q)), between (P) and (Q) is upper bounded by the

sum, (sum_v H^2(P_{{v} cup Pi_v}, Q_{{v} cup Pi_v})), of the square

Hellinger distances between the marginals of (P) and (Q) on every node (v) and

its parents (Pi_v) in the DAG. Importantly, our bound does not involve the

conditionals but the marginals of (P) and (Q). We derive a similar inequality

for more general Markov Random Fields.

As an application of our inequality, we show that distinguishing whether two

Bayesian networks (P) and (Q) on the same (but potentially unknown) DAG satisfy

(P=Q) vs (d_{

m TV}(P,Q)>epsilon) can be performed from

( ilde{O}(|Sigma|^{3/4(d+1)} cdot n/epsilon^2)) samples, where (d) is the

maximum in-degree of the DAG and (Sigma) the domain of each variable of the

Bayesian networks. If (P) and (Q) are defined on potentially different and

potentially unknown trees, the sample complexity becomes

( ilde{O}(|Sigma|^{4.5} n/epsilon^2)), whose dependence on (n, epsilon) is

optimal up to logarithmic factors. Lastly, if (P) and (Q) are product

distributions over ({0,1}^n) and (Q) is known, the sample complexity becomes

(O(sqrt{n}/epsilon^2)), which is optimal up to constant factors.

Advancing Bayesian Optimization: The Mixed-Global-Local (MGL) Kernel and Length-Scale Cool Down

Kim Peter Wabersich , Marc Toussaint

Comments: Long version of accepted NIPS BayesOpt 2016 paper

Subjects

:

Learning (cs.LG)

; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Bayesian Optimization (BO) has become a core method for solving expensive

black-box optimization problems. While much research focussed on the choice of

the acquisition function, we focus on online length-scale adaption and the

choice of kernel function. Instead of choosing hyperparameters in view of

maximum likelihood on past data, we propose to use the acquisition function to

decide on hyperparameter adaptation more robustly and in view of the future

optimization progress. Further, we propose a particular kernel function that

includes non-stationarity and local anisotropy and thereby implicitly

integrates the efficiency of local convex optimization with global Bayesian

optimization. Comparisons to state-of-the art BO methods underline the

efficiency of these mechanisms on global optimization benchmarks.

Analytical Stacked Gaussian Process Model

Kareem Abdelfatah , Junshu Bao , Gabriel Terejanu Subjects : Learning (cs.LG) ; Machine Learning (stat.ML)

A probabilistic model is proposed by stacking a set of independently trained

Gaussian processes to obtain prediction of quantities of interests that require

composition of functions. Analytical derivations are provided for first and

second-order moments of the stacked Gaussian process using RBF and polynomial

kernels. The StackedGP model can be extended to any number of layers and nodes

per layer, and it provides flexibility in kernel selection for each node. The

proposed nonparametric stacked model is validated using different synthetic

datasets and its performance is measured in two real-world applications.

A Review of Intelligent Practices for Irrigation Prediction

Hans Krupakar , Akshay Jayakumar , Dhivya G

Comments: 18 pages, 3 figures, 1 table, In National Conference on Computational Intelligence and High-Performance Computing (NCCIHPC)

Subjects

:

Learning (cs.LG)

; Neural and Evolutionary Computing (cs.NE)

Population growth and increasing droughts are creating unprecedented strain

on the continued availability of water resources. Since irrigation is a major

consumer of fresh water, wastage of resources in this sector could have strong

consequences. To address this issue, irrigation water management and prediction

techniques need to be employed effectively and should be able to account for

the variabilities present in the environment. The different techniques surveyed

in this paper can be classified into two categories: computational and

statistical. Computational methods deal with scientific correlations between

physical parameters whereas statistical methods involve specific prediction

algorithms that can be used to automate the process of irrigation water

prediction. These algorithms interpret semantic relationships between the

various parameters of temperature, pressure, evapotranspiration etc. and store

them as numerical precomputed entities specific to the conditions and the area

used as the data for the training corpus used to train it. We focus on

reviewing the computational methods used to determine Evapotranspiration and

its implications. We compare the efficiencies of different data mining and

machine learning methods implemented in this area, such as Logistic Regression,

Decision Tress Classifier, SysFor, Support Vector Machine(SVM), Fuzzy Logic

techniques, Artifical Neural Networks(ANNs) and various hybrids of Genetic

Algorithms (GA) applied to irrigation prediction. We also recommend a possible

technique for the same based on its superior results in other such time series

analysis tasks.

Testing Bayesian Networks

Clement Canonne , Ilias Diakonikolas , Daniel Kane , Alistair Stewart Subjects : Data Structures and Algorithms (cs.DS) ; Information Theory (cs.IT); Learning (cs.LG); Statistics Theory (math.ST)

This work initiates a systematic investigation of testing {em

high-dimensional} structured distributions by focusing on testing {em Bayesian

networks} — the prototypical family of directed graphical models. A Bayesian

network is defined by a directed acyclic graph, where we associate a random

variable with each node. The value at any particular node is conditionally

independent of all the other non-descendant nodes once its parents are fixed.

Specifically, we study the properties of identity testing and closeness testing

of Bayesian networks. Our main contribution is the first non-trivial efficient

testing algorithms for these problems and corresponding information-theoretic

lower bounds. For a wide range of parameter settings, our testing algorithms

have sample complexity {em sublinear} in the dimension and are sample-optimal,

up to constant factors.

Optimal mean-based algorithms for trace reconstruction

Anindya De , Ryan O'Donnell , Rocco Servedio Subjects : Computational Complexity (cs.CC) ; Data Structures and Algorithms (cs.DS); Learning (cs.LG)

In the (deletion-channel) trace reconstruction problem, there is an unknown

(n)-bit source string (x). An algorithm is given access to independent traces

of (x), where a trace is formed by deleting each bit of~(x) independently with

probability~(delta). The goal of the algorithm is to recover~(x) exactly (with

high probability), while minimizing samples (number of traces) and running

time.

Previously, the best known algorithm for the trace reconstruction problem was

due to Holenstein~et~al.; it uses (exp( ilde{O}(n^{1/2}))) samples and

running time for any fixed (0 < delta < 1). It is also what we call a

“mean-based algorithm”, meaning that it only uses the empirical means of the

individual bits of the traces. Holenstein~et~al.~also gave a lower bound,

showing that any mean-based algorithm must use at least (n^{ ilde{Omega}(log

n)}) samples.

In this paper we improve both of these results, obtaining matching upper and

lower bounds for mean-based trace reconstruction. For any constant deletion

rate (0 < delta < 1), we give a mean-based algorithm that uses

(exp(O(n^{1/3}))) time and traces; we also prove that any mean-based algorithm

must use at least (exp(Omega(n^{1/3}))) traces. In fact, we obtain matching

upper and lower bounds even for (delta) subconstant and (

ho := 1-delta)

subconstant: when ((log^3 n)/n ll delta leq 1/2) the bound is

(exp(-Theta(delta n)^{1/3})), and when (1/sqrt{n} ll

ho leq 1/2) the

bound is (exp(-Theta(n/

ho)^{1/3})).

Our proofs involve estimates for the maxima of Littlewood polynomials on

complex disks. We show that these techniques can also be used to perform trace

reconstruction with random insertions and bit-flips in addition to deletions.

We also find a surprising result: for deletion probabilities (delta > 1/2),

the presence of insertions can actually help with trace reconstruction.

Testing Ising Models

Constantinos Daskalakis , Nishanth Dikkala , Gautam Kamath Subjects : Data Structures and Algorithms (cs.DS) ; Information Theory (cs.IT); Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST)

Given samples from an unknown multivariate distribution (p), is it possible

to distinguish whether (p) is the product of its marginals versus (p) being far

from every product distribution? Similarly, is it possible to distinguish

whether (p) equals a given distribution (q) versus (p) and (q) being far from

each other? These problems of testing independence and goodness-of-fit have

received enormous attention in statistics, information theory, and theoretical

computer science, with sample-optimal algorithms known in several interesting

regimes of parameters. Unfortunately, it has also been understood that these

problems become intractable in large dimensions, necessitating exponential

sample complexity.

Motivated by the exponential lower bounds for general distributions as well

as the ubiquity of Markov Random Fields (MRFs) in the modeling of

high-dimensional distributions, we initiate the study of distribution testing

on structured multivariate distributions, and in particular the prototypical

example of MRFs: the Ising Model. We demonstrate that, in this structured

setting, we can avoid the curse of dimensionality, obtaining sample and time

efficient testers for independence and goodness-of-fit. Along the way, we

develop new tools for establishing concentration of functions of the Ising

model, using the exchangeable pairs framework developed by Chatterjee, and

improving upon this framework. In particular, we prove tighter concentration

results for multi-linear functions of the Ising model in the high-temperature

regime.

Phase transitions in Restricted Boltzmann Machines with generic priors

Adriano Barra , Giuseppe Genovese , Peter Sollich , Daniele Tantari

Comments: 5 pages, 4 figures

Subjects

:

Disordered Systems and Neural Networks (cond-mat.dis-nn)

; Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

We study generalised restricted Boltzmann machines with generic priors for

units and weights, interpolating between Boolean and Gaussian variables. We

present a complete analysis of the replica symmetric phase diagram of these

models, which can be regarded as generalised Hopfield models. We show the way

the paramagnetic phase boundary is directly related to the optimal size of the

training set necessary for good generalisation in a teacher- student scenario.

Moreover we underline the role of the retrieval phase for both inference and

learning processes. We show that retrieval is robust for a large class of

weight and unit priors, beyond the standard Hopfield scenario.

Clipper: A Low-Latency Online Prediction Serving System

Daniel Crankshaw , Xin Wang , Giulio Zhou , Michael J. Franklin , Joseph E. Gonzalez , Ion Stoica Subjects : Distributed, Parallel, and Cluster Computing (cs.DC) ; Learning (cs.LG)

Machine learning is being deployed in a growing number of applications which

demand real-time, accurate, and robust predictions under heavy query load.

However, most machine learning frameworks and systems only address model

training and not deployment.

In this paper, we introduce Clipper, the first general-purpose low-latency

prediction serving system. Interposing between end-user applications and a wide

range of machine learning frameworks, Clipper introduces a modular architecture

to simplify model deployment across frameworks. Furthermore, by introducing

caching, batching, and adaptive model selection techniques, Clipper reduces

prediction latency and improves prediction throughput, accuracy, and robustness

without modifying the underlying machine learning frameworks. We evaluate

Clipper on four common machine learning benchmark datasets and demonstrate its

ability to meet the latency, accuracy, and throughput demands of online serving

applications. Finally, we compare Clipper to the TensorFlow Serving system and

demonstrate comparable prediction throughput and latency on a range of models

while enabling new functionality, improved accuracy, and robustness.

BaTFLED: Bayesian Tensor Factorization Linked to External Data

Nathan H Lazar , Mehmet Gönen , Kemal Sönmez

Comments: 4 main pages with 14 supplemental pages

Subjects

:

Machine Learning (stat.ML)

; Learning (cs.LG); Quantitative Methods (q-bio.QM)

The vast majority of current machine learning algorithms are designed to

predict single responses or a vector of responses, yet many types of response

are more naturally organized as matrices or higher-order tensor objects where

characteristics are shared across modes. We present a new machine learning

algorithm BaTFLED (Bayesian Tensor Factorization Linked to External Data) that

predicts values in a three-dimensional response tensor using input features for

each of the dimensions. BaTFLED uses a probabilistic Bayesian framework to

learn projection matrices mapping input features for each mode into latent

representations that multiply to form the response tensor. By utilizing a

Tucker decomposition, the model can capture weights for interactions between

latent factors for each mode in a small core tensor. Priors that encourage

sparsity in the projection matrices and core tensor allow for feature selection

and model regularization. This method is shown to far outperform elastic net

and neural net models on ‘cold start’ tasks from data simulated in a three-mode

structure. Additionally, we apply the model to predict dose-response curves in

a panel of breast cancer cell lines treated with drug compounds that was used

as a Dialogue for Reverse Engineering Assessments and Methods (DREAM)

challenge.

A series of maximum entropy upper bounds of the differential entropy

Frank Nielsen , Richard Nock

Comments: 18 pages

Subjects

:

Information Theory (cs.IT)

; Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We present a series of closed-form maximum entropy upper bounds for the

differential entropy of a continuous univariate random variable and study the

properties of that series. We then show how to use those generic bounds for

upper bounding the differential entropy of Gaussian mixture models. This

requires to calculate the raw moments and raw absolute moments of Gaussian

mixtures in closed-form that may also be handy in statistical machine learning

and information theory. We report on our experiments and discuss on the

tightness of those bounds.

Learning representations through stochastic gradient descent in cross-validation error

Richard S. Sutton , Vivek Veeriah

Comments: preliminary draft

Subjects

:

Artificial Intelligence (cs.AI)

; Learning (cs.LG); Machine Learning (stat.ML)

Representations are fundamental to Artificial Intelligence. Typically, the

performance of a learning system depends on its data representation. These data

representations are usually hand-engineered based on some prior domain

knowledge regarding the task. More recently, the trend is to learn these

representations through deep neural networks as these can produce dramatical

performance improvements over hand-engineered data representations. In this

paper, we present a new incremental learning algorithm, called crossprop, for

learning representations based on prior learning experiences. Unlike

backpropagation, crossprop minimizes the cross-validation error. Specifically,

our algorithm considers the influences of all the past weights on the current

squared error, and uses this gradient for incrementally learning the weights in

a neural network. This idea is similar to that of tuning the learning system

through an offline cross-validation procedure. Crossprop is applicable to

incremental learning tasks, where a sequence of examples are encountered by the

learning system and they need to be processed one by one and then discarded.

The learning system can use each example only once and can spend only a limited

amount of computation for an example. From our preliminary experiments, we

concluce that crossprop is a promising alternative for backprop for

representation learning.

Tensor-Factor Analysis

Andrew Stevens , Yunchen Pu , Yannan Sun , Greg Spell , Lawrence Carin

Comments: Rejected from ICML 2016

Subjects

:

Machine Learning (stat.ML)

; Learning (cs.LG)

A nonparametric factor analysis framework is developed for tensor-variate

data. The dictionary elements (factor loadings) are inferred as tensors, as

opposed to vectors. Our tensor-factor analysis (TFA) framework is developed in

the context of the beta-process factor analysis (BPFA) using a nonparametric

tensor decomposition for each dictionary element. We extend the multiplicative

gamma prior to allow inference of the shape parameter, which can help control

model complexity during learning. Moreover, we extend the TFA model to include

translation invariance and a multi-layer structure—a deep convolutional TFA.

We test our approach on 2D & 3D image denoising, inpainting, and image

classification. Our TFA models provide more diverse dictionaries. In

particular, TFA provides state of the art results for simultaneous image

inpainting & denoising and on the Caltech 101 benchmark.

Robust Low-Complexity Randomized Methods for Locating Outliers in Large Matrices

Xingguo Li , Jarvis Haupt

Comments: 16 pages, 4 figures

Subjects

:

Information Theory (cs.IT)

; Learning (cs.LG); Machine Learning (stat.ML)

This paper examines the problem of locating outlier columns in a large,

otherwise low-rank matrix, in settings where {}{the data} are noisy, or where

the overall matrix has missing elements. We propose a randomized two-step

inference framework, and establish sufficient conditions on the required sample

complexities under which these methods succeed (with high probability) in

accurately locating the outliers for each task. Comprehensive numerical

experimental results are provided to verify the theoretical bounds and

demonstrate the computational efficiency of the proposed algorithm.

Information Theory

Low-rank matrix recovery via rank one tight frame measurements

Holger Rauhut , Ulrich Terstiege

Comments: 3 pages

Subjects

:

Information Theory (cs.IT)

; Probability (math.PR)

The task of reconstructing a low rank matrix from incomplete linear

measurements arises in areas such as machine learning, quantum state tomography

and in the phase retrieval problem. In this note, we study the particular setup

that the measurements are taken with respect to rank one matrices constructed

from the elements of a random tight frame. We consider a convex optimization

approach and show both robustness of the reconstruction with respect to noise

on the measurements as well as stability with respect to passing to

approximately low rank matrices. This is achieved by establishing a version of

the null space property of the corresponding measurement map.

Blind Identification of SM and Alamouti STBC-OFDM Signals

Yahia A. Eldemerdash , Octavia A. Dobre , Bruce J. Liao Subjects : Information Theory (cs.IT)

This paper proposes an efficient identification algorithm for spatial

multiplexing (SM) and Alamouti (AL) coded orthogonal frequency division

multiplexing (OFDM) signals. The cross-correlation between the received signals

from different antennas is exploited to provide a discriminating feature to

identify SM-OFDM and AL-OFDM signals. The proposed algorithm requires neither

estimation of the channel coefficients and noise power, nor the modulation of

the transmitted signal. Moreover, it does not need space-time block code (STBC)

or OFDM block synchronization. The effectiveness of the proposed algorithm is

demonstrated through extensive simulation experiments in the presence of

diverse transmission impairments, such as time and frequency offsets, Doppler

frequency, and spatially correlated fading.

Hybrid Analog-Digital Transceiver Designs for Cognitive Large-Scale Antenna Array Systems

Christos G. Tsinos , Sina Maleki , Symeon Chatzinotas , Bjorn Ottersten

Comments: Millimeter wave (mmWave), Cognitive Radio, Multiple-Input Multiple-Output (MIMO), Large-scale antenna arrays, Hybrid Pre-coding, Alternating Direction Method of Multipliers (ADMM)

Subjects

:

Information Theory (cs.IT)

Milimeter wave (mmWave) band mobile communications can be a solution to the

continuously increasing traffic demand in modern wireless systems. Even though

mmWave bands are scarcely occupied, the design of a prospect transceiver should

guarantee the efficient coexistence with the incumbent services in these bands.

To that end, in this paper, two underlay cognitive transceiver designs are

proposed that enable the mmWave spectrum access while controlling the

interference to the incumbent users. MmWave systems usually require large

antenna arrays to achieve satisfactory performance and thus, they cannot

support fully digital transceiver designs due to high demands in hardware

complexity and power consumption. Thus, in order to develop efficient

solutions, the proposed approaches are based on a hybrid analog-digital

pre-coding architecture. In such hybrid designs, the overall beamformer can be

factorized in a low dimensional digital counterpart applied in the baseband and

in an analog one applied in the RF domain. The first cognitive solution

developed in this paper designs the cognitive hybrid pre-coder by maximizing

the mutual information between its two ends subject to interference, power and

hardware constraints related to the analog counterpart. The second solution

aims at reduced complexity requirements and thus derives the hybrid pre-coder

by minimizing the Frobenious norm of its difference to the optimal digital only

one. A novel solution for the post-coder at the cognitive receiver part is

further proposed here based on a hardware constrained Minimum Mean Square Error

criterion. Simulations show that the performance of both the proposed hybrid

approaches is very close to the one of the fully digital solution for typical

wireless environments.

A series of maximum entropy upper bounds of the differential entropy

Frank Nielsen , Richard Nock

Comments: 18 pages

Subjects

:

Information Theory (cs.IT)

; Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We present a series of closed-form maximum entropy upper bounds for the

differential entropy of a continuous univariate random variable and study the

properties of that series. We then show how to use those generic bounds for

upper bounding the differential entropy of Gaussian mixture models. This

requires to calculate the raw moments and raw absolute moments of Gaussian

mixtures in closed-form that may also be handy in statistical machine learning

and information theory. We report on our experiments and discuss on the

tightness of those bounds.

MIMO Channel Reconstruction from Lower Dimensional Multiple Antenna Measurements

Rimvydas Aleksiejunas

Comments: 16 pages, 6 figures

Subjects

:

Information Theory (cs.IT)

A method for reconstructing multiple-input multiple-output (MIMO) channel

correlation matrices from lower dimensional channel measurements is presented.

Exploiting the symmetry of correlation matrix structure enables reproducing

higher dimensional MIMO channel matrices from available lower order

measurements. This leads to practically important applications allowing

prediction of higher dimensional MIMO system capacity. In particular, we study

Kronecker-type MIMO channels suitable for reconstructing full channel matrices

from partial information about transmit-receive fading in spatial and

polarimetric domains and analyze validity conditions for such models. One of

the important channel conditions is Doppler frequency related to

non-stationarity in the environment. We present simulations of cluster-type

scattering model using 2×2 MIMO channel correlation matrices to predict

performance of 2×4 MIMO system including recovery of angular power spectrum. An

example of dual circular polarized 2×4 MIMO land mobile satellite measurements

in 2.5 GHz frequency band illustrates applicability of the method to

reconstruct spatial and polarimetric channel correlation matrices for

estimating ergodic channel capacity from single-antenna or uni-polarized

measurements.

A Compressive Method for Centralized PSD Map Construction

Mohammad Eslami , Farah Torkamani-Azar , Esfandiar Mehrshahi

Comments: The 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017), Ph.D. Forum

Subjects

:

Information Theory (cs.IT)

Spectrum resources are facing huge demands and cognitive radio (CR) can

improve the spectrum utilization. Recently, power spectral density (PSD) map is

defined to enable the CR to reuse the frequency resources regarding to the

area. For this reason, the sensed PSDs are fused by a Fusion Center (FC) which

the sensed PSDs are collected by the distributed sensors in the area. But, for

a given zone, the sensed PSD by neighbor CR sensors may contain a shared common

component for a while. This component can be exploited in the theory of the

distributed source coding (DSC) to compress sensing data more. In this paper

based on the distributed compressive sensing (DCS) a method is proposed to

compress and reconstruct the PSDs of the sensors when the data transmission is

slightly imperfect. Simulation results show the advantages of using proposed

method in compressing, reducing overhead and also recovering PSDs. % Proposed

method can be used to develop a framework when the holding times of the users

are large in comparison with the rate of the spectrum sensing.

Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing

Constantinos Daskalakis , Qinxuan Pan Subjects : Learning (cs.LG) ; Information Theory (cs.IT); Probability (math.PR); Statistics Theory (math.ST); Machine Learning (stat.ML)

We show that the square Hellinger distance between two Bayesian networks on

the same directed graph, (G), is subadditive with respect to the neighborhoods

of (G). Namely, if (P) and (Q) are the probability distributions defined by two

Bayesian networks on the same DAG, our inequality states that the square

Hellinger distance, (H^2(P,Q)), between (P) and (Q) is upper bounded by the

sum, (sum_v H^2(P_{{v} cup Pi_v}, Q_{{v} cup Pi_v})), of the square

Hellinger distances between the marginals of (P) and (Q) on every node (v) and

its parents (Pi_v) in the DAG. Importantly, our bound does not involve the

conditionals but the marginals of (P) and (Q). We derive a similar inequality

for more general Markov Random Fields.

As an application of our inequality, we show that distinguishing whether two

Bayesian networks (P) and (Q) on the same (but potentially unknown) DAG satisfy

(P=Q) vs (d_{

m TV}(P,Q)>epsilon) can be performed from

( ilde{O}(|Sigma|^{3/4(d+1)} cdot n/epsilon^2)) samples, where (d) is the

maximum in-degree of the DAG and (Sigma) the domain of each variable of the

Bayesian networks. If (P) and (Q) are defined on potentially different and

potentially unknown trees, the sample complexity becomes

( ilde{O}(|Sigma|^{4.5} n/epsilon^2)), whose dependence on (n, epsilon) is

optimal up to logarithmic factors. Lastly, if (P) and (Q) are product

distributions over ({0,1}^n) and (Q) is known, the sample complexity becomes

(O(sqrt{n}/epsilon^2)), which is optimal up to constant factors.

Testing Bayesian Networks

Clement Canonne , Ilias Diakonikolas , Daniel Kane , Alistair Stewart Subjects : Data Structures and Algorithms (cs.DS) ; Information Theory (cs.IT); Learning (cs.LG); Statistics Theory (math.ST)

This work initiates a systematic investigation of testing {em

high-dimensional} structured distributions by focusing on testing {em Bayesian

networks} — the prototypical family of directed graphical models. A Bayesian

network is defined by a directed acyclic graph, where we associate a random

variable with each node. The value at any particular node is conditionally

independent of all the other non-descendant nodes once its parents are fixed.

Specifically, we study the properties of identity testing and closeness testing

of Bayesian networks. Our main contribution is the first non-trivial efficient

testing algorithms for these problems and corresponding information-theoretic

lower bounds. For a wide range of parameter settings, our testing algorithms

have sample complexity {em sublinear} in the dimension and are sample-optimal,

up to constant factors.

Testing Ising Models

Constantinos Daskalakis , Nishanth Dikkala , Gautam Kamath Subjects : Data Structures and Algorithms (cs.DS) ; Information Theory (cs.IT); Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST)

Given samples from an unknown multivariate distribution (p), is it possible

to distinguish whether (p) is the product of its marginals versus (p) being far

from every product distribution? Similarly, is it possible to distinguish

whether (p) equals a given distribution (q) versus (p) and (q) being far from

each other? These problems of testing independence and goodness-of-fit have

received enormous attention in statistics, information theory, and theoretical

computer science, with sample-optimal algorithms known in several interesting

regimes of parameters. Unfortunately, it has also been understood that these

problems become intractable in large dimensions, necessitating exponential

sample complexity.

Motivated by the exponential lower bounds for general distributions as well

as the ubiquity of Markov Random Fields (MRFs) in the modeling of

high-dimensional distributions, we initiate the study of distribution testing

on structured multivariate distributions, and in particular the prototypical

example of MRFs: the Ising Model. We demonstrate that, in this structured

setting, we can avoid the curse of dimensionality, obtaining sample and time

efficient testers for independence and goodness-of-fit. Along the way, we

develop new tools for establishing concentration of functions of the Ising

model, using the exchangeable pairs framework developed by Chatterjee, and

improving upon this framework. In particular, we prove tighter concentration

results for multi-linear functions of the Ising model in the high-temperature

regime.

欢迎加入我爱机器学习QQ8群:19517895

arXiv Paper Daily: Mon, 12 Dec 2016

微信扫一扫,关注我爱机器学习公众号

微博:我爱机器学习

原文  https://www.52ml.net/21401.html
正文到此结束
Loading...