10th International Conference on Computational Management

HEC Montréal, 1 — 3 May 2013

10th International Conference on Computational Management

HEC Montréal, 1 — 3 May 2013

Schedule Authors My Schedule

TA4 Statistical Computing

May 2, 2013 10:30 AM – 12:30 PM

Location: Serge-Saucier

Chaired by Jean-Francois Plante

4 Presentations

  • 10:30 AM - 11:00 AM

    Using BIRCH to Compute Approximate Rank Statistics on Massive Datasets

    • Lysiane Charest, Woozworld.com
    • Jean-François Plante, presenter, GERAD - HEC Montréal

    The BIRCH algorithm handles massive datasets by reading the data file only once, clustering the data as it is read, and retaining only a few clustering features to summarize the data read so far. Using BIRCH allows to analyse datasets that are too large to fit in the computer main memory. We propose estimates of Spearman’s rho and Kendall’s tau that are calculated from a BIRCH output and assess their performance through Monte Carlo studies. The numerical results show that the BIRCH-based estimates can achieve the same efficiency as the usual estimates of rho and tau while using only a fraction of the memory otherwise required.

  • 11:00 AM - 11:30 AM

    Learning from Massive Data, A Bayesian Approach

    • Vahid Partovi Nia, presenter, Polytechnique Montréal

    Due to recent advances in information technology, collecting and saving data has become quite cheap, producing massive data in biology, medicine, industry, satellite imaging, social networks, and many other domains. The main question then is: what we can extract from such large data sets as useful information? Learning from data without a clear objective is a complex task; often the first step is to group the data, called clustering or unsupervised learning. The common algorithms appropriate for unsupervised learning of massive data are developed in computer science, and such methods do not consider a model for data. Not considering a model makes it difficult to investigate the uncertainty associated to the achieved grouping. On the contrary, Bayesian clustering requires to specify a prior distribution for data grouping, and a likelihood for the observed data given a grouping, i.e. treating the unknown grouping as a random variable. The immediate consequence of this specification is summarizing all uncertainties in a unique and coherent measure, i.e. the posterior. This talk introduces the basic concepts of Bayesian approach to learning, and discusses its related computational and visualization tools such as Markov chain Monte Carlo, Bayesian Dendrogram, and consensus clustering diagram.

  • 11:30 AM - 12:00 PM

    A Comparison of Maximum Likelihood and Markov Chain Monte Carlo Approaches in Fitting Hierarchical Longitudinal and Cross-Sectional Binary Data: A Simulation Study

    • Ali Fotouhi, presenter, University of Fraser Valley

    It is conjectured in the literature that maximum likelihood estimates (MLE) are not suitable or even do not exist in some complicated problems. An alternative to MLE for estimation of the parameters is the Bayesian method. The Markov chain Monte Carlo (MCMC) simulation procedure is designed to fit Bayesian models. Bayesian method like classical method has advantages and disadvantages. In this research we compare the classical method MLE and the Bayesian method MCMC and investigate the effect of sample size, cluster size, level of the heterogeneity, and initial values on the performances of these procedures. We compare these methods by using, bias, mean square error, deviance, and Cpu-time in analyzing longitudinal binary data through simulation study.

  • 12:00 PM - 12:30 PM

    Mixed Effects Trees and Random Forests

    • Ahlem Hajjem, presenter, UQAM ESG

    I'll present the mixed effects regression tree and random forest methods. These are extensions of tree-based methods for clustered data, where the correlated observations are viewed as nested within clusters rather than as vectors of multivariate repeated responses. The mixed effects approach allows for unbalanced clusters and observations within clusters to be splitted, and can incorporate random effects and observation-level covariates. These extensions are implemented using standard tree and forest algorithms within the framework of the EM algorithm. Simulation results show that the proposed methods provide substantial improvements over standard tree and forest when the random effects are non-negligible. We illustrate these methods using a real dataset on first-week box office revenues of movies.

Back