### 1 — 16:20 — Management of categorical variables in a mixed-variable (surrogate-based) evolutionary algorithm (Part I)

The development of methods combining numerical simulation tools and advanced optimization algorithms plays a crucial role in exploring new concepts. In practice, various industrial design problems have continuous, discrete, and categorical variables. From an engineering point of view, the specific case of categorical variables is of great practical interest by its ability to represent the choice of a material or an engine architecture, the shape of a cross-section, etc. Our objective is to manage efficiently and concurrently these different types of variables within optimization processes using either an evolutionary algorithm or a combination of surrogate models with this algorithm. The developments presented are carried out in Minamo, the in-house design space exploration and multi-disciplinary optimization platform of Cenaero.

In this first part, a mixed-variable encoding, first introduced in [Wang et al., 2021] for a particle swarm optimization (PSO) algorithm, is adapted for an evolutionary algorithm (EA). In short, each possibility, named "attribute", of a categorical variable is associated to a real value defined in [0,1]. This value is updated during the optimization process and is exploited when the genetic operators are applied on the categorical variables involved. The higher the value is for an attribute, the more this attribute will be favored by such operators. The main difference with the method proposed in [Wang et al., 2021] is the use of these values inside the crossover operator of a EA. Additionally, new ways to update the values associated with each attribute are introduced by taking the fitness of the points sharing this attribute also into account.

In order to analyze the influence of the different techniques proposed, a benchmark is presented where the considered test problems come from structural design problems. This set of test problems is used to compare the performance of the different variants proposed to handle categorical variables with the current EA used within Minamo. Moreover, a PSO algorithm with the original method proposed in [Wang et al., 2021] is added to this benchmark to analyze the difference with the optimization performance coming from the proposed methods for the EA. Finally, the open-source algorithm NOMAD [Le Digabel, 2011] is also included in this benchmark.

### 2 — 16:50 — Management of categorical variables in a mixed-variable (surrogate-based) evolutionary algorithm (Part II)

The development of methods combining numerical simulation tools and advanced optimization algorithms plays a crucial role in exploring new concepts. In practice, various industrial design problems have continuous, discrete, and categorical variables. From an engineering point of view, the specific case of categorical variables is of great practical interest by its ability to represent the choice of a material or an engine architecture, the shape of a cross-section, etc. Our objective is to manage efficiently and concurrently these different types of variables within optimization processes using either a genetic algorithm or a combination of surrogate models with this algorithm. The developments presented are carried out in Minamo, the in-house design space exploration and multi-disciplinary optimization platform of the research center Cenaero. In this second part, a refinement of the notion of distance between variables is proposed. Indeed, since the construction of the surrogate models and the optimization search require the comparison of points with variables of different types, an adequate approach is to define a specific distance, namely here the Heterogeneous Euclidean-Overlap Metric, as proposed in [McCane and Albert, 2008] and [Wilson and Martinez, 1997]. In our approach, we propose to redefine the notion of distance between the possible string values of a categorical variable (named "attributes"), through the original concept of "affinity". The notion of affinities between attributes can be interpreted as a weighted relationship between attributes. These affinities are usually defined based on a physical intuition of the designer, but we also consider here how to set these affinities mathematically, using clustering or projection methods. Indeed, affinities are generally implicitly associated to the behavior of one or several outputs (objectives or constraints) that behave(s) similarly for various attributes. Some attributes can therefore be declared as "close to each other" because they have a similar impact on one or more quantities (even if these attributes can also be far away for other quantities). We therefore slightly modify the overlap distance accordingly, by allowing the notion of affinities between attributes of categorical variables. In order to study the impact of the use of affinities (on the quality of the trained surrogate models or on the convergence performance in a surrogate-based optimization process), specific test problems coming from structural design framework (on which these affinities can be defined) are studied, and numerical results are presented.

### 3 — 17:20 — A graph-structured distance for heterogeneous datasets with meta variables

Heterogeneous datasets emerge in various machine learning or optimization applications that feature different data sources, various data types and complex interrelationships between variables. In practice, heterogeneous datasets are often partitioned into smaller well-behaved ones that are easier to process. However, some applications involve expensive-to-generate or limited size datasets, which motivates methods that utilize heterogeneous datasets in their entirety. This last remark is particularly important for blackbox (or simulation-based) optimization that tackles objective functions and constraints that may require hours, or even days, to evaluate. The first main contribution of this work is a modelling graph-structured framework that generalizes state-of-the-art hierarchical, tree-structured, or variable-size frameworks. This framework models domains that involve heterogeneous datasets in which variables may be continuous, integer, or categorical, with some identified as meta if their values determine the inclusion/exclusion or affect constraints of other so-called decreed variables. Excluded variables are introduced to manage variables that are included in some points, but excluded in others. The second main contribution is the graph-structured distance that compares extended points with any combination of included and excluded variables: any pair of points can be compared, allowing to work directly in heterogeneous datasets with meta variables. The contributions are illustrated with some regression experiments, in which the performance of a multilayer perceptron w.r.t. to its hyperparameters is modeled with inverse distance weighting and K-nearest neighbors models.