Multi-View Symbolic Regression
Abstract
Symbolic regression (SR) searches for analytical expressions representing the relationship between a set of explanatory and response variables. Current SR methods assume a single dataset extracted from a single experiment. Nevertheless, frequently, the researcher is confronted with multiple sets of results obtained from experiments conducted with different setups. Traditional SR methods may fail to find the underlying expression since the parameters of each experiment can be different. In this work we present Multi-View Symbolic Regression (MvSR), which takes into account multiple datasets simultaneously, mimicking experimental environments, and outputs a general parametric solution. This approach fits the evaluated expression to each independent dataset and returns a parametric family of functions f(x; theta) simultaneously capable of accurately fitting all datasets. We demonstrate the effectiveness of MvSR using data generated from known expressions, as well as real-world data from astronomy, chemistry and economy, for which an a priori analytical expression is not available. Results show that MvSR obtains the correct expression more frequently and is robust to hyperparameters change. In real-world data, it is able to grasp the group behavior, recovering known expressions from the literature as well as promising alternatives, thus enabling the use of SR to a large range of experimental scenarios.
AI-Generated Overview
Overview of "Multi-View Symbolic Regression"
-
Research Focus: The study introduces Multi-View Symbolic Regression (MvSR), a method to derive analytical expressions that represent relationships across multiple datasets obtained from varied experimental setups.
-
Methodology: MvSR evaluates multiple datasets simultaneously by applying a generalized parametric model while independently fitting the model parameters for each dataset and aggregating the fitness scores using different functions such as max or average.
-
Results: MvSR was tested with both synthetic datasets and real-world data from astronomy, chemistry, and finance, demonstrating higher accuracy and robustness compared to traditional symbolic regression approaches—particularly in the presence of noise.
-
Key Contribution(s): The authors adapt existing symbolic regression algorithms to support multi-view learning, allowing for more flexible model formulation and improved outcomes in complex data scenarios by leveraging the aggregate information from multiple sources.
-
Significance: This research highlights the potential of MvSR as a powerful tool for uncovering generalizable scientific models across various fields when faced with complex data structures, enhancing the ability to interpret relationships that may not be evident from single-source datasets.
-
Broader Applications: MvSR's framework can be applied across numerous scientific disciplines requiring data analysis, including but not limited to epidemiology, econometrics, and astrophysics, facilitating the understanding of multifaceted phenomena through a unified modeling approach.