top of page
Computer Programming

Machine Learning

Ethics, Data Contextualization, Security

Projects & Publications: 
Data Processing
A Comprehensive Approach to Evaluating Usability and Hyperparameter Selection for Synthetic Data Generation

Data is the key component of every machine-learning algorithm. Without sufficient quantities of quality data, the vast majority of machine learning algorithms fail to perform. Acquiring the data necessary to feed algorithms, however, is a universal challenge. Recently, synthetic data production methods have become increasingly relevant as a method of ad-dressing a variety of data issues. Synthetic data allows researchers to produce supplemental data from an existing dataset. Furthermore, synthetic data anonymizes data without losing functionality. To advance the field of synthetic data production, however, measuring the quality of produced synthetic data is an essential step. Although there are existing methods for evaluating synthetic data quality, the methods tend to address finite aspects of the data quality. Furthermore, synthetic data evaluation from one study to another varies immensely adding further challenge to the quality comparison process. Finally, al-though tools exist to automatically tune hyperparameters, the tools fixate on traditional machine learning applications. Thus, identifying ideal hyperparameters for individual syn-thetic data generation use cases is also an ongoing challenge.




Fault Prediction Demo

A brief demonstration of the applications for XAI tools in root cause analysis for time-series data using open source tools and datasets



bottom of page