Project SyDaPro: Synthetic Data within Production

Welcome to the website of the SyDaPRo project. Here you will find information about the project and contact persons.

Challenge

Artificial intelligence (AI), in particular machine learning (ML), is becoming increasingly important in industrial production plants. It enables, for example, predictive maintenance or control and optimization of production processes. A key challenge in the use of ML is the provision of large volumes of data that are required for learning process models and can be exploited, for example, in the simulation and optimization of production processes. Especially in the manufacturing industry, sufficient data is often not available in sufficient quantity and quality.

 

Solution approach

This project addresses the generation of synthetic data in production. The starting point for this is stochastic models, which are created based on real data and deep learning. By means of the stochastic models random data are generated, which have the same characteristics as real data. In this way, in addition to the availability of training data for ML, data-based optimization of production processes and the avoidance of anomalies should be made possible. The generation and use of synthetic production data requires the consideration of the following sub-aspects, which are addressed in the SyDaPro project as follows:

  • Creation of stochastic process models: Within the scope of the project, stochastic models are developed, which are trained on the basis of real data. A GAN-VAE framework, which combines the advantages of Variational Autoencoders and Generative Adversarial Networks, serves as a basis. This class of models induces a probability distribution in latent space, which is mapped to a manifold in data space. Time series are modeled by NARX (Nonlinear AutoRegressive eXogenous) for short-term phenomena and recurrence for long-term phenomena. Furthermore, the project considers discrete events whose emission distributions are described using Bayesian networks.
  • Synthesis of artificial production data: The stochastic models generate synthetic data by sampling random time histories. Many nodes in the underlying directed graphical model are normally distributed or multinomially distributed, making them easy to sample. No-U-turn samplers can also work with more complex distributions. These techniques allow sampling of synthetic data with specified boundary conditions. Unbiased sampling of personal production data (work rate or similar) ensures anonymization and privacy.
  • Exploiting physical knowledge in the synthesis of production data: In machine learning, feature generation through input transformations is central. Generative models use dual output transformations suitable for this purpose. Transformations for typical behavior in production are developed based on prior physical or control knowledge to improve the generative models. For example, for strongly correlated quantities such as current and voltage, the stochastic model can independently predict the correlated component (signal) and the uncorrelated component (noise, anomaly) by using an adapted coordinate system. Fourier transforms are the basis for control engineering approaches, where frequencies and amplitudes are learned instead of the signals.
  • Using synthetic data to optimize production processes: The synthetic data will be tested beyond the training of ML models in the optimization of production processes. On the one hand, a part of the data channels, for example a new production process, is to be given and the power consumption is to be predicted by sampling (inference). On the other hand, a target variable is to be specified and synthetic data generated for it. 

 

Profile

Project Title: SyDaPro: Synthetische Daten in der Produktion
Runtime: 01.10.2021 - 30.09.2023
Funding: Bundesministerium für Bildung und Forschung (BMBF)
Goal: Generation of synthetic data within the production