Boosting Data Analytics with Synthetic Volume Expansion

Upcoming Events

Topic:	Boosting Data Analytics with Synthetic Volume Expansion
Date:	11/01/2024
Time:	2:30 pm - 3:30 pm
Venue:	Hui Yeung Shing Building G05
Category:	Latest Seminars and Events
Speaker:	Professor Xiaotong Shen
PDF:	Prof.-Xiaotong-Shen_11-JAN.pdf
Details:	Abstract Synthetic data generation heralds a paradigm shift in data science, addressing the challenges of data scarcity and privacy and enabling unprecedented performance. As synthetic data gains prominence, questions arise regarding the accuracy of statistical methods compared to their application on raw data. Addressing this, we introduce the Synthetic Data Generation for Analytics framework, which applies statistical methods to high-ﬁdelity synthetic data produced by advanced generative models like tabular diﬀusion models. These models, trained using raw data, are enriched with insights from relevant studies. A signiﬁcant ﬁnding within this framework is the generational eﬀect: the error of a statistical method initially decreases with the integration of synthetic data but may subsequently increase. This phenomenon, rooted in the complexities of replicating raw data distributions, introduces the ”reﬂection point,” an optimal threshold of synthetic data deﬁned by speciﬁc error metrics. Through three case studies–sentiment analysis, predictive modeling, and inference of tabular data, we demonstrate the eﬀectiveness of this framework. This work is joint with Y. Liu and R. Shen.

The Chinese University of Hong Kong