All computations were carried out on a HP Spectre X360 X64-based PC with 16 GB RAM using MATLAB 2017b. The same reasoning that led to stratified sampling, ensuring that. The errors and the number of model parameters (when necessary) are then used to calculate performance metrics, which are MaAE, RMSE, Akaike Information Criteria, Bayesian Information Criteria, R-Squared, and R-Squared Adjusted (Mathworks, 2017) Additionally, the training time was recorded during the phase of the program where the model was trained, and evaluation time was recorded during the phase when the final outputs were generated. If I 1, we have random sampling over the entire sample space. The error, the difference between the surrogate model prediction and the actual function output, is calculated for each of these 100,000 input-output pairs. Then, the performances of all surrogate models were evaluated using a data set of 100,000 input-output pairs that were generated according to the Sobol sampling technique. Each surrogate model was trained using these data sets. From each challenge function, input-output pairs were generated using the three sampling methods for nine sample sizes.
These methods were selected as they are shown to sample input space uniformly with limited sample sizes for functions up to ten dimensions ( Dife and Diwekar, 2016). Sobol and Halton sequences are both quasi-random low-discrepancy sequences, which seek to distribute samples uniformly across the input space. Then, each of the N partitions is sampled once, and randomly combined. LHS is a stratified sampling technique, and it splits the range of each input variable into N intervals of equal probability, where N is the number of sample points.
The sampling methods that were utilized to generate training data from the challenge functions include LHS, Sobol Sequence and Halton Sequence. Eden, in Computer Aided Chemical Engineering, 2018 4 Computational Experiments