CUHK Symposium on Statistics

Title: S.-Y. Lee’s Lagrange Multiplier Test in Structural Modeling: Still Useful?

Abstract: Professor S.-Y. Lee introduced constrained estimation of parameters subject to nonlinear restrictions in structural equation models (SEM) (Lee & Bentler, 1980). Based on the normal theory generalized least squares (GLS) function, Lee proved a variety of relevant theorems such as consistency and asymptotic normality of estimators and their asymptotic equivalence to maximum likelihood estimators. He also developed a GLS chi-square test of the adequacy of the model, of differences between nested sets of restrictions, and a Lagrange Multiplier (LM) test for evaluating correctness of model restrictions. This talk reviews Lee's results and presents an overview of further developments and trends in constrained SEM model testing. Although Lee developed the LM test as a confirmatory parameter testing methodology, its main contemporary use in SEM seems to be as an exploratory tool for adding parameters to improve statistically inadequate models. In that context, its use is standard.

Presentation File (1.2MB PDF)

Peter BENTLER
University of California, Los Angeles

Title: On Some Functional Characterizations of (Fuzzy) Set-valued Random Elements

Abstract: Numerous experimental studies involve semi-quantitative expert information, or measured in a non-precise way, which can be modeled with interval (fluctuations, grouped data, etc.) or fuzzy (ratings, opinions, perceptions etc.) data. A general framework to analyze these kinds of inexact data with statistical tools developed for Hilbertian random variables will be presented. The space of nonempty convex and compact (fuzzy) subsets of R^p, has been traditionally used to handle this kind of imprecise data. Mathematically, these elements can be characterized via the support function, which agrees with the usual Minkowski addition, and naturally embeds the considered into a cone of a separable Hilbert space. The support function embedding holds interesting properties, but it lacks of an intuitive interpretation for imprecise data. Moreover, although the Minkowski addition is very natural when p = 1, if p > 1 the shapes which are obtained when two sets are aggregated are apparently unrelated to the original sets, because it tends to convexify. An alternative and more intuitive functional representation will be introduced in order to circumvent these difficulties. The imprecise data will be modeled by using star-shaped sets on R^p. These sets will be characterized through a center and the corresponding polar coordinates, which have a clear interpretation in terms of location and imprecision, and lead to a natural directionally extension of the Minkowski addition.

Ana COLUBI
Justus Liebig University Giessen

Title: Statistical Inference on Membership Profiles in Large Networks

Abstract: Network data is prevalent in many contemporary big data applications in which a common interest is to unveil important latent links between different pairs of nodes. The nodes can be broadly defined such as individuals, economic entities, documents, or medical disorders in social, economic, text, or health networks. Yet a simple question of how to precisely quantify the statistical uncertainty associated with the identification of latent links still remains largely unexplored. In this talk, we suggest the method of statistical inference on membership profiles in large networks (SIMPLE) in the setting of degree-corrected mixed membership model, where the null hypothesis assumes that the pair of nodes share the same profile of community memberships. In the simpler case of no degree heterogeneity, the model reduces to the mixed membership model and an alternative more robust test is proposed. Under some mild regularity conditions, we establish the exact limiting distributions of the two forms of SIMPLE test statistics under the null hypothesis and their asymptotic properties under the alternative hypothesis. Both forms of SIMPLE tests are pivotal and have asymptotic size at the desired level and asymptotic power one. The advantages and practical utility of our new method in terms of both size and power are demonstrated through several simulation examples and real network applications.
(Joint work with Yingying Fan and Jinchi Lv)

Presentation File (2.5MB PDF)

Jianqing FAN
Princeton University

Title: Group Inference in High Dimensions with Applications to Hierarchical Testing

Abstract: Group inference has been a long-standing question in statistics and the development of high-dimensional group inference is an essential part of statistical methods for analyzing complex data sets, including hierarchical testing, tests of interaction, detection of heterogeneous treatment effects and local heritability. Group inference in regression models can be measured with respect to a weighted quadratic functional of the regression sub-vector corresponding to the group. Asymptotically unbiased estimators of these weighted quadratic functionals are constructed and a procedure using these estimator for inference is proposed. We derive its asymptotic Gaussian distribution which allows to construct asymptotically valid confidence intervals and tests which perform well in terms of length or power. The results simultaneously address four challenges encountered in the literature: controlling coverage or type I error even when the variables inside the group are highly correlated, achieving a good power when there are many small coefficients inside the group, computational efficiency even for a large group, and no requirements on the group size. We apply the methodology to several interesting statistical problems and demonstrate its strength and usefulness on simulated and real data.

This is based on the joint work with Claude Renaux, Peter Bühlmann and T. Tony Cai.

Presentation File (1.4MB PDF)

Zijian GUO
Rutgers University

Title: Separation of Inter-individual Differences, Intra-individual Changes, and Time-specific Effects in Intensive Longitudinal Data using the NDLC-SEM Framework

Abstract: In this talk, we propose a nonlinear dynamic latent class structural equation model (NDLC-SEM; Kelava & Brandt, 2019). It can be used to examine intra-individual processes of observed or latent variables. These processes are decomposed into parts which include individual- and time-specific components. Unobserved heterogeneity of the intra-individual processes are modeled via a latent Markov process that can be predicted by individual-specific and time-specific variables as random effects. We discuss examples of sub-models which are special cases of the more general NDLC-SEM framework. Furthermore, we provide empirical examples and illustrate how to estimate this model in a Bayesian framework. Finally, we discuss essential properties of the proposed framework, give recommendations for applications, and highlight some general problems in the estimation of parameters in comprehensive frameworks for intensive longitudinal data.

Kelava, A. & Brandt, H. (2019). A nonlinear dynamic latent class structural equation model. Structural Equation Modeling: A Multidisciplinary Journal, 26(4), 509-528. doi: 10.1080/10705511.2018.1555692

Presentation File (750KB PDF)

Augustin KELAVA
University of Tubingen

Title: Computing the Best Subset Regression Model

Abstract: Several regression-tree strategies for computing all subset regression models are presented. Branch-and-bound techniques are employed to reduce the number of generated nodes. To improve the efficiency of the branch-and-bound algorithms, the variables can be pre-ordered in the root node or in nodes deeper inside the tree. Approximation algorithms allow to tackle large scale problems while giving guarantees on the error bounds. If the desired subset sizes are known in advance, the recursive structure of the regression tree can be exploited to generate a minimal covering subtree. Given a pre-determined statistical search criterion, the various algorithms can be adapted to select the single best subset model, drastically reducing the number of generated nodes and thus improving execution times. An R package which efficiently implements the algorithms is described and its performance assessed.

E. J. KONTOGHIORGHES
Cyprus University of Technology / Birkbeck, University of London, UK

Title: On a Matrix Factor Models

Abstract: Some recently proposed time series models for the so called realized volatility matrices (RCOV) are introduced. From high frequency trading data, estimated RCOV can be utilized as a promising measure on the underlying covariance structure of low frequency returns. This motivates the need in modeling and forecasting the RCOV’s. Bayesian approach for the factor model used in the finance literature proposed by S Y Lee et al. (2007) is reviewed. The Bayesian approach could have great potential for factor models defined for the realized volatility matrices.

Presentation File (183KB PDF)

Wai Keung LI
The Education University of Hong Kong

Title: Financial Systemic Risk Prediction with Non-Gaussian Orthogonal-GARCH Models

Abstract: There are several aspects of financial asset portfolio construction relevant for success. First, the methodology should be applicable to a reasonably large number of assets, at least on the order of 100. Second, calculations should be computationally feasible, straightforward, and fast. Third, realistic transaction costs need to be taken in account for the modeling paradigm to be genuinely applicable. Fourth, and arguably most importantly, the proposed methods should demonstrably outperform benchmark models such as the equally weighted portfolio, Markowitz IID and Markowitz using the DCC-GARCH model. A fifth "icing on the cake" is that the underlying stochastic process assumption is mathematically elegant, statistically coherent, and allows analytic computation of relevant risk measures for both passive and active risk management. The model structure to be shown, referred to as "COMFORT", satisfies all these criteria. Various potential new ideas will also be discussed, with the aim of enticing and motivating other researchers to collaborate and/or improve upon the shown investment vehicles.

Presentation File (7.4MB PDF)

Marc PAOLELLA
University of Zurich

Title: Modelling Function-valued Processes with Non-separable and/or Non-stationary Covariance Structure

Abstract: Separability of the covariance structure is a common assumption for function-valued processes defined on two- or higher-dimensional domains. This assumption is often made to obtain an interpretable model or due to difficulties in modelling a potentially complex covariance structure, especially in the case of sparse designs. We proposed using Gaussian processes with flexible parametric covariance kernels which allow interactions between the inputs in the covariance structure. When we use suitable covariance kernels, the leading eigen-surfaces of the covariance operator can explain well the main modes of variation in the functional data, including the interactions between the inputs. The results are demonstrated by simulation studies and by applications to real world data.

Presentation File (5.3MB PDF)

Jian Qing SHI
Newcastle University and
The Alan Turing Institute

Title: Differential Item Functioning Analysis without A Priori Information on Anchor Items: Scree Plots and Graphical Test

Abstract: The detection of differential item functioning (DIF) is an important step in establishing the validity of measurements. Most traditional methods detect DIF using an item-by-item strategy, via anchor items that are assumed DIF-free. If anchor items are contaminated, the methods will yield misleading results due to biased scales. In this article, based on the fact that the item’s relative change of difficulty difference (RCD) does not depend on the mean ability of individual groups, a new DIF detection method (RCD-DIF) is proposed under the true null hypothesis, without a priori knowledge of anchor items. The RCD-DIF method consists of RCD-scree plot that facilitates visual examination of DIF, and RCD confidence interval that facilitates a formal test of DIF at test level. Two simulation studies indicate that RCD confidence interval performs better than three widely used methods in controlling Type I error rate and with greater power, especially under unbalanced DIF conditions. Moreover, the RCD-scree plot displays the results in graphics, thereby visually revealing the overall pattern of DIF in the test and the size of DIF for each item. A real data analysis is conducted to illustrate the rationality and effectiveness of the RCD-DIF method.

Presentation File (1.8MB PDF)

Ke-Hai YUAN
University of Notre Dame

Title: Challenges in Analyzing Two-sided Market and Its Application on Ridesourcing Platform

Abstract: In this talk, we will introduce a general analytical framework for large scale data obtained from two-sided markets, especially ride-sourcing platforms like DiDi. This framework integrates classical methods including Experiment Design, Causal Inference and Reinforcement Learning, with modern machine learning methods, such as Graph Convolutional Models, Deep Learning, Transfer Learning and Generative Adversarial Network. We aim to develop fast and efficient approaches to address five major challenges for ride-sharing platform, ranging from demand-supply forecasting, demand-supply diagnosis, MDP-based policy optimization, A-B testing, to business operation simulation. Each challenge requires substantial methodological developments and inspires many researchers from both industry and academia to participate in this endeavor. Based on our preliminary results for the policy optimization challenge, we receive the Daniel Wagner Prize for Excellent in Operations Research Practice in 2019. All the research accomplishments presented in this talk are joint work by a group of researchers at Didi Chuxing and our international collaborators.

Hongtu ZHU
University of North Carolina at Chapel Hill

Last Updated: 13 December 2019