SPE-182808-MS Reservoir Simulation Assisted History Matching: From Theory To Design

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

SPE-182808-MS

Reservoir Simulation Assisted History Matching: From Theory to Design


M. Shams, Amal Petroleum Company

Copyright 2016, Society of Petroleum Engineers

This paper was prepared for presentation at the SPE Kingdom of Saudi Arabia Annual Technical Symposium and Exhibition held in Dammam, Saudi Arabia, 25–28
April 2016.

This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents
of the paper have not been reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect
any position of the Society of Petroleum Engineers, its officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written
consent of the Society of Petroleum Engineers is prohibited. Permission to reproduce in print is restricted to an abstract of not more than 300 words; illustrations may
not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright.

Abstract
Petroleum reservoirs are geologically large and complex. In order to technically and economically
optimize the exploitation of these hydrocarbon reserves, multimillion dollar investments, reliable numer-
ical reservoir simulation models should be constructed to predict the reservoir performance and response
under different production scenarios. Reservoir numerical simulation models can be only trusted after
good calibration with actual historical data. The model is considered to be calibrated if it is able to
reproduce the historical data of the reservoir it represents. This calibration process is called history
matching, and this is the most time consuming phase in any numerical reservoir simulation study.
Traditional history matching is carried out through a trial and error approach of adjusting model
parameters until a satisfactory match is obtained. The biggest challenge that faces the simulation engineer
during this critical phase is that, several combinations of reservoir history matching parameters might be
satisfactorily matching the past dynamic behavior of the system which makes the process ill-posed due to
the non-uniqueness solution issues. Traditional history matching is accordingly a time-consuming,
expensive, and often frustrating procedure and as a consequence, the assisted history matching technique
has been arisen.
In assisted history matching technique, the simulated data is compared to the historical data by means
of a misfit function, objective function. The history matching problem is translated into an optimization
problem in which the misfit function is an objective function bounded by the model constraints. The
objective function is minimized using appropriate optimization algorithm and thus the results are the
model parameters that best approximate the fluid rates and pressure data recorded during the reservoir life.
The objectives of this paper are to clarify the idea behind assisted history matching process and discuss
its different aspects, experimental design, proxy models, and optimization algorithms, and show how these
aspects are integrated to overcome the frustrating nature of the history matching problem. Finally, guide
lines are provided to enhance the design of the assisted history matching process.

Introduction
Understanding the theory and idea behind allows the designer to select the optimal tools to use
depending on the problem on hand. In this paper the main aspects of the assisted history matching
concept are introduced and discussed. Assisted history matching is presented as being composed of
2 SPE-182808-MS

three big topics, experimental design, proxy modeling, and optimization theory. The idea is to guide
reservoir simulation engineers towards good understanding of the assisted history matching concept
and open a new window on the literature for new powerful techniques that are widely used in other
industrial designs. The idea of assisted history matching is inspired by the optimization theory which
was first introduced by Leonid Kantorovich (1939) in the area of mathematical programming.
Generally, the optimization concept is a very powerful concept which could potentially be applied to
any engineering discipline.
A general definition of optimization as per Oxford English Dictionary (2008) definition is that;
⬙optimization is the action or process of making the best of something; (also) the action or process
of rendering optimal; the state or condition of being optimal⬙. In any optimization problem, a
predefined function called objective or cost function is to be optimized through minimization or
maximization. And hence, the variables that lead to an optimal value of the objective function are
determined. Switching to reservoir simulation history matching, we seek a set of input parameters of
the numerical model which are able to obtain a satisfied match between the simulated and historical
observed data. The objective function in this situation takes some form representing the difference
between the simulated and observed data. Minimizing this objective function means that we are
approaching the match between the model and the actual results. The set of input parameters are
explored using experimental design techniques. As the name implies, experimental design is the
technique used to guide the choice of the experiments to be conducted in an efficient way. Using
experimental design techniques, samples are chosen in the design space in order to get the maximum
amount of information through the lower number of samples. The chosen samples are evaluated in
terms of the objective function via numerical simulations. Once numerical simulations runs are
conducted, the objective function is computed. A proxy model, response surface model that
interpolates the information coming from experimental design, is built. The proxy model is an
analytical function that would be used as the function to be optimized, minimized, using the proper
optimization algorithm. The results of optimizing the proxy model will be the set of input parameters
yielding a history matched reservoir model or models. Fig. 1 shows a flowchart of the overall
optimization process as described above.

Figure 1—Optimization overall process flowchart


SPE-182808-MS 3

Simple or complex constraints can be added on the input variables in an attempt to minimize the
uncertain search space. Simple constraints may be defined by setting the upper and the lower bounds
for each variable, whereas more complex constraints can be defined using probability distribution
functions for each variable. In the general optimization process and consequently in the assisted
history matching process, it is possible to optimize more than one objective function at the same time.
This approach is called multi-objective optimization. Multi-objective optimization concept can be
used in case when different simulation results output are optimized or when the optimization process
is conducted on both full field and individual wells scales on the same time.

Experimental Design
From the point of overview of the optimization theory, an experiment can be defined as a set of
several tests in which the design factors are changed according to a predefined rule in order to select
samples that cover the design in order to get the maximum amount of information using the minimum
amount of resources.
Experimental design was first born by Ronald Aylmer Fisher (1925). Box and Wilson (1951) used
the idea behind for several industrial experiments and developed the response surface model concept.
On the other hand, Goodwin N. H. (1988), Watkins and Parish (1992a), Watkins and Parish (1992b),
Parish et al. (1993), Parish and Little (1994), Parish and Little (1997), and Craig et al. (1997) all
introduced different experimental design concepts and techniques in reservoir engineering and
specially in assisted history matching.
Nist and Sematech (2006) stated that replication, randomization, and blocking are the basic
principles of experimental design concept. Replication is the process of repeating the experiment in
an attempt to obtain a more rigorous result, sample mean value, and to determine the error in the
experimental (sample standard deviation). Randomization refers to that the experiments are run in
random order and the conditions in one run should be independent on the conditions of the previous
run and don’t predict the conditions of the subsequent runs. On the other hand, blocking is the process
that aims to eliminate a bias effect that could cause obscuring of the main effects. This is involves
the arranging of the experiments in manners that are similar to one another. Accordingly, the sources
of variability are detected and reduced. In the following section, some experimental designs are
presented and discussed.

Full Factorial Design


Full factorial is probably the most common experimental design technique. Simply, the idea is that
the samples are determined by all possible combinations of the values of predetermined factors
levels. If there are k design factors and L ⫽ 2 levels for factor therefore, the sample size is N ⫽ 2k.
Combining all possible combinations of the values of predetermined factors levels doesn’t allow for
the effect of each factor over the response variable to be confounded with the other factors. In
literature, it sometimes happens that the central point of the design space, in which all the design
factors have a value which is the average between their high and low levels, is added to the full
factorial samples. Table 1 shows a three factors, X1, X2, X3 and two levels full factorial design. The
two levels are denoted ⬙⫹1⬙ high, and ⬙⫺1⬙ low.
4 SPE-182808-MS

Table 1—Example of 23 full factorial experimental


design
Factor Levels

Experiment Number X1 X2 X3

1 ⫺1 ⫺1 ⫺1
2 ⫺1 ⫺1 ⫹1
3 ⫺1 ⫹1 ⫺1
4 ⫺1 ⫹1 ⫹1
5 ⫹1 ⫺1 ⫺1
6 ⫹1 ⫺1 ⫹1
7 ⫹1 ⫹1 ⫺1
8 ⫹1 ⫹1 ⫹1

The 2k full factorial experimental design concept can be extended to the general case of k design factors
and each of factors has a different number of levels, L therefore, the total number of samples will be N
⫽ Lk. Fig. 2 shows graphical representations of Lk full factorial experimental designs.

Figure 2—Examples of Lk full factorial experimental designs

The advantage of full factorial design over other designs is that they efficiently use all the data and do
not confound the effects of the parameters, so it has the capability of evaluating the main and the
interaction effects easily. The main disadvantage is that, the sample size grows exponentially with the
number of parameters and the number of levels.
Fractional Factorial Design
As stated in the previous section, as the number of parameters and their levels increases, a full factorial
design becomes very crowded and a huge amount of time is needed to run all experiments using numerical
simulators. As a result, fractional factorial design concept arises where only a subset of the full factorial
design is chosen. The fractional factorial sample size can be one-half or one-quarter . . .etc, of the full
factorial. Quite good information about the main effects and some information about interaction effects
are still possible to be provided using fractional factorial design. Fig. 3 shows graphical examples of
fractional factorial designs. Montgomery (2000) and Nist and Sematech (2006) presented a list of tables
for the most common fractional factorial designs.
SPE-182808-MS 5

Figure 3—Graphical examples of fractional factorial experimental design

The fractional factorial samples must be chosen in a manner that yields a balanced and orthogonal
sample. Balanced sample means that each factor has the same number of samples for each of its levels.
Orthogonal sample is the sample in which the scalar product of the columns of any two-factors is zero.
Plackett-Burman
Plackett and Burman (1946) developed the most economical two levels experimental design. Plackett-
Burman design gives out the lowest number of samples and hence it is very useful when only the main
effects are of interest. Plackett-Burman design is used to study a definite factors number, k⫽ N-1. The size
of the sample is a multiple of four with maximum limit of thirty-six. Fig. 4 shows a graphical example
of Plackett-Burman design of two levels six factors.

Figure 4 —Plackett-Burman design: two levels and six factors

Box-Behnken
Box and Behnken (1960) introduced an incomplete three levels fractional factorial design. Following the
same concept of fractional factorial design, Box-Behnken design was introduced to limit the sample size
as the number of parameters grows. Box-Behnken design requires at least three factors and it is suitable
for construction quadratic proxy models. The sample consists of points in the middle of the edges, mean
levels, of the sample hypercube and its center. Fig. 5 shows graphical representation of the three factors
Box-Behnken design.
6 SPE-182808-MS

Figure 5—Box-Behnken experimental design for three factors

The size of the Box-Behnken sample is determined based on the number of design factors to meet the
criterion of rotatability. A design is considered rotatable if the variance of the predicted response at any
point is a function of the distance from the central point and as a consequence there is no a general rule
to define the sample size of the Box-Behnken design.

Central Composite
A central composite design is simply a two levels full factorial plus the central star points. The star points
are the points in which all the factors are set at their mean level except one factor. The value of this
remaining factor is determined in terms of the distance from the central point. Distance from central point
can be chosen in four different ways based on ratio between the central point and each point of the full
factorial design:
● If it is equal to 1, all the samples are located on a hyper sphere centered in the central point and
called central composite circumscribed, or CCC. CCC design requires five levels for each factor.
● If it is equal to /k, the value of the design factor remains on the same levels of the 2k full
factorial and the design is called central composite faced or CCF. CCF method requires three
levels for each factor.
● If a CCC sample is required, but the limits specified for the levels cannot be broken, the CCC
design can be reduced so that all the samples have distance from the central point equal to /k.
This sample is called central composite inscribed, or CCI. CCI method like the CCC requires five
levels for each factor.
● If the distance is set to any rather value, central composite scaled or CCS is chosen. The method
requires five levels for each factor.
For k factors, the sample size for the central composite design equals to 2k⫹2k⫹1, where 2k full
factorial are added to 2k star points and one central point. Fig. 6 shows graphical examples of the central
composite designs.
SPE-182808-MS 7

Figure 6 —Graphical examples of central composite experimental designs

Space Filling Designs


Space filling designs concept is based on spreading the samples points around the operating space an do
not follow a particular model form. Simply, space-filling techniques fill out the space dimensions with
regularly spaced samples. The algorithm divides the probability distribution of an uncertain variable into
areas of equal probability. It ensures that the sample values for each parameter are distributed over the
entire range of that parameter. Accordingly, the space filling designs are not based on the concept of
factors levels and the number of generated sample is predefined by the designer and doesn’t depend on
the number of the problem factors. The most important feature of the space filling design is that it should
be used in cases where there is little or no information available about the effects of the design factors on
responses. On the other side, space filling techniques cannot investigate the design factors main and
interaction effects as good as in the case of factorial experimental techniques. Several efficient space
filling designs, Halton Sequence, Sobol Sequences, and Latin Hypercube are based on random numbers
generators as illustrated in the following sections.
Halton Sequence Design
Halton (1960) introduced a space filling technique in which the quasi-random low-discrepancy mono-
dimensional Van Der Corput sequence is used. Van Der Corput (1935) concept is to subdivide the search
space into sub-areas and locate a sample in each. Halton Sequence Design uses the Van Der Corput
sequence base of two for the first dimension, base of three sequence for the second dimension, base of five
for the third dimension, . . .etc.
Sobol Sequence Design
Sobol (1967) introduced another space filling design technique that uses one base Van Der Corput
sequence for all design space dimensions and a different permutation of the vector elements for each
dimension. Sobol sequence works better in cases of high-dimensional degradation.
Latin Hypercube Design
The Latin hypercube experimental design is considered one of the most efficient experimental design
techniques. This technique involves dividing the designs space onto N orthogonal grid elements of the
same length per factor and within the multi-dimensional grid, N sub-volumes are subdivided so that along
each column and row of the grid only one sub-volume is randomly chosen. Fig. 7 shows two Latin
hypercube design examples for two factor four sample and three factor three sample designs.
8 SPE-182808-MS

Figure 7—Examples of Latin hypercube designs

Fig. 8 shows a comparison between Sobol sequence and Latin hypercube space filling experimental
design techniques on a case with two factors and a thousand samples. It is obvious that the Sobol sequence
gives more uniformly distributed samples than the Latin hypercube technique.

Figure 8 —A comparison between Sobol and Latin hypercube experimental design techniques for k ⴝ 2, N ⴝ 1,000

Optimal Design
Hardin and Sloane (1993) and Kappele (1998) introduced the concept of optimal design technique. They
stated that optimal design would be very useful whenever the classical experimental design techniques
may fail due to the existence of special constraints on the design search space. The most important feature
of the optimal design is that its outputs greatly depend on the proxy model technique that would be used
later. Initial set of samples is needed to start the design; this is usually given by a full factorial design
including many levels for each design factor. After that, the optimal design tests the given sets of samples
searching for the sample that minimizes a certain objective function. Optimal design is an iterative method
which involves a lot of computations and so needs a lot of time and so the calculations are stopped after
a certain number of iterations, and the best solution is considered as the optimal. If we have k design
factors with L levels for each one, the number of possible combinations of N samples in the set is LkN/N!.
Different optimal design methods are introduced in the literature depending on different predefined
optimality criteria. The most efficient optimal design technique is the I-optimal which targets the
minimization of the normalized average of the predicted response.
SPE-182808-MS 9

Proxy Modeling
Proxy means the substitution or the representation of a real simulation and it was first introduced by Box
and Wilson (1951). In literature, it is also called surrogate reservoir model (SRM), or response surface
model (RSM), or meta-model. It is constructed based on an equation which fits the results from simulated
sampled value of the studied parameters. This proxy would estimate results from un-sampled value of the
same parameters in a very fast way as it eliminates running full simulations.
Generally, the experimental design data is followed by proxy modeling in an attempt to create an
approximation of the variable response over the design space. And as consequence, the proxy model can
be used to the set of the studied variable that could yield a predefined optimal response. Optimizing the
proxy model, optimizing an analytical function, is very fast process and eliminates the huge time and
effort required for additional computer simulations. Obviously, if the variable deign space is poorly
explored using the experimental design the results of the proxy model optimization will be badly
estimated and can’t be relied on. Different approaches, approximation or interpolation, are introduced in
the literature for proxy modeling construction.
Coming back to our area of interest, reservoir simulation assisted history matching, the idea is to create
approximating or an interpolating n dimensional curve or surface in the (n ⫹ 1) dimensional space given
by the n variables plus the predefined objective function. After constructing the proxy mode, an
optimization technique is applied to determine the values of the set of the studied reservoir parameters that
yield a matched model.
Objective Function
The objective function, misfit of cost function, is a mathematical expression that usually takes a form
representing the difference between the observed and simulated data. The predefined objective function
is minimized using appropriate optimization algorithm. The objective function can be expressed as a
single or as a multi objective function depending on the approach used in the assisted history matching
process. One popular form of the single objective function is represented by:

Where i represents each set of data to be calibrated, represents the simulated data resulted from
of simulation runs using the set of parameters x, y represents the observed data, is the variance of the
observed values, and ⬀i are the weights assigned to each set of data. Variable constraints can easily be
added to the objective function during the process of optimization. The multi-objective function can take
the form of:

Where m denotes to the various objective functions that are intended to be optimized simultaneously.
Proxy Modeling Techniques
Box and Wilson in (1951), as stated earlier in this paper, were the first who introduced the proxy modeling
to the literature. They suggested that the first degree polynomial model would be very suitable to
approximate a variable response. Nocedal (1992), Nocedal and Wright (1999), Byrd et al. (1995), and Zhu
et al. (1997) stated that since 1970 proxy models have been used to enhance and accelerate the
optimization algorithms. Some of the most common proxy modeling techniques in the area of reservoir
simulation history matching are presented in following sections.
Least Squares Method
Least squares method was developed by Gauss (1825) as a data fitting technique. The idea is to minimize
the sum of the squared residuals between scattered points and the function values. Edwards (1984) and
10 SPE-182808-MS

Bates and Watts (1988) subdivided the least squares methods into linear and nonlinear sub-categories.
Linear least squares problems are solved through a closed form. Linear least squares problems are not
accurate and proper only for guessing the major trends of the variables response. The non-linear squares
problems are solved iteratively, the initial values of the interaction coefficients are chosen then they are
updated iteratively. One important point to consider here is that the sample size resultant from experi-
mental design, N, should equal the number of the least squares problem coefficients to safely and
accurately interpolate the experimental design data. If the sample size is lower than the number of the
coefficients, the least squares model will not correctly fit the experimental design data and the resultant
proxy model cannot be relied on. In case when the sample size is greater than the number of the
coefficients, resultant proxy model is over determined.
The quality of the generated proxy model can be evaluated by a regression parameter called R2. This
regression parameter is defined so that its values fall within the range [0, 1]. As R2 value approaches to
one a better model is expected. R2 is expressed by:

Where ␤ is the interaction coefficients and:

As appears from these equations, R2 measures the sum of the squared errors of the generated proxy
model at all sample points.
Kriging Method
Krige (1951) introduced the Kriging method for prediction issues in geostatistics. Kriging method is a
Bayesian methodology. Simply, the Bayes’ theorem describes the probability of an event to occur
according to the conditions that are related to that event. For example, if we are interested in the
probability of raining fall, and knows the wind speed. If the raining fall is related to the wind speed then,
using Bayes’ theorem, information about wind speed can be used to more accurately assess the probability
of raining fall. Kriging method can be defined as an interpolation method that is based on regression
against the values of the surrounding observed data points, weighted according to spatial covariance
model. The mechanism of the Kriging method can be summarized as follows:
1. Kriging assumes that all points are spatially correlated to each other.
2. The extent of correlation or spatial continuity is expressed by a covariance function (or a
variogram model).
3. This covariance model is used to estimate the spatial correlation between the sampled and
unsampled points and determines the weight of each sampled point on the estimation.
4. The more spatially correlated a previously sampled value with the estimation location, the more
weight it will have on this location.
Therefore Kriging is data exact, such that it will reproduce the observed value at a sampled location.
Thin Plate Splines Method
As the name indicates, the thin plate splines method refers to the analogy of bending a thin plate of metal.
Thin plate splines functions are piecewise polynomials of order K. The joint points of the pieces are
usually called knots. The concept of this method is that the function values and the first (K-1) derivatives
agree at the knots. Given a set of data points, a weighted combination of thin plate splines centered on each
data point gives the interpolation function that passes through the all points exactly. It is not simple to
decide the number and position of the knots and the order of the polynomial in each segment. Wold (1974)
suggests that there should be as few knots as possible, with at least four or five data points per segment.
SPE-182808-MS 11

Also, there should be no more than one extreme point and one point of inflexion per segment. One
important point to be considered here is that the great flexibility of splines functions makes it very easy
to over fit the data.
Artificial Neural Network, ANN
Freeman and Skapura (1991), Fausett (1993), and Veelenturf (1995) introduced the artificial neural
network as a fitting technique. Artificial neural network is an algorithm that is inspired by the learning
process conducted in human brain which includes modification of connections between the neurons based
on experimental knowledge. The human brain incorporates nearly 10 billion neurons and 60 trillion
connections, synapses, between them. By using multiple neurons simultaneously, the human brain
performs its functions much faster than the fastest computers in existence today. Figs. 9 and 10 present
a schematic of the human and the artificial neurons, respectively. In human neuron, the dendrites receive
information, electrical signals, by special connections called synapses. Soma, cell nucleus, receives
information from dendrites and when accumulated signals exceed a threshold value transmits to connected
neurons by Axon. Switching to the artificial neuron, dendrites correspond to the input data whereas axon
and synapse correspond to output and weights, respectively.

Figure 9 —A schematic of the human neuron

Figure 10 —A schematic of the artificial single neuron

Like human brain, ANN’s Knowledge about the learning task is given in the form of examples called
training examples. In a similar way, ANNs adjust their structure based on input and output information
during the learning phase. Fig. 11 an example of Feed Forward Neural Network, FNNN, which is a more
12 SPE-182808-MS

general network architecture, where there are hidden layers between input and output layers. Hidden nodes
do not directly receive inputs nor send outputs to the external environment. Hidden layers build up an
internal representation of the data so they can handle non-linearly separable learning tasks.

Figure 11—An example of an ANN

The mechanism of the ANNs can be summarized as follows:


1. Initial random weights are assigned to the inputs.
2. Training data is presented to ANN and its output is observed.
3. If output is incorrect, the weights are adjusted accordingly using following formula:

where p is iteration number (1, 2, 3, etc), ␣ is the learning rate, a positive constant less than unity, xi(p)
is the ANN input, Y(p) is the ANN output, and Yd(p) is the ANN desired output.
4. Once the modification to weights has taken place, the next piece of training data is used in the
same way.
5. The learning process continues until acquiring the desired convergence and so all the weights are
corrected.
Rojas (1996) stated that the most common training process is the back-propagation or backwards
propagation. The back propagation algorithm searches for weight values that minimize the total error of
the network over the set of training examples. The back-propagation is an iterative learning process and
is usually terminated when the sum of squares of errors of the output values for all training data is less
than some threshold predetermined value. It worth to highlight the following important ANN controlling
parameters:
1. Input signal weights:
● Initial weights are randomly chosen, with typical values between (-1 and 1).
● If some inputs are much larger than others, random initialization may bias the network to give
much more importance to larger inputs.

2. Learning rate: is application dependent, values between (0.1 and 0.9) have been used in many
applications.
SPE-182808-MS 13

3. Number of layers:
● It’s an application dependent and determined by a trial and error process.
● Either we can start from a large network and successively remove some neurons and links
until network performance degrades or begin with a small network and introduce new neurons
until performance is satisfactory.

4. Number of training examples; as a rule of thumb, the number of training examples should be at
least five to ten times the number of weights of the network.

Optimization Algorithms
From the mathematical point of overview, optimization is the search for the maximum or the minimum
in the value of a certain response. In reservoir simulation assisted history matching, we use optimization
algorithms to minimize the objective function in an attempt to minimize the misfit between observed and
simulated data. Optimization algorithms can be classified according to several approaches. Based on the
need of computation of the gradients of the objective function, optimization algorithms are classified into
deterministic and stochastic.
Deterministic Optimization
Deterministic optimization is the classical approach of optimization which completely relies on the linear
algebra as it requires the computation of the gradients of the mathematical model with respect to the
parameterization in order to minimize the objective function. Deterministic approach is based on inverse
problem theory. From the model parameters available, information or data is tried to be extracted. The
main advantage of the deterministic optimization algorithms is that the convergence to the solution is
much faster when compared to the use of stochastic optimization algorithms. According to the determin-
istic nature, the deterministic optimization algorithms search for stationary points in the space of the
response variable, and as a consequence, the optimal solution could be a local optimum and not the global
optimum. A local optimum, either minima or maxima, is not the correct optimal solution. When the
optimization algorithm fails to determine the global optimum it is described by the widely used expression
‘stuck to local minima or maxima’. In our area of interest and due to the non-uniqueness nature of the
history matching problem, we expect to encounter several local minima’s and so, the deterministic
algorithms is expected to work badly. Due to this reason, in this paper, we will only list the different
deterministic algorithms without going into details. Liang (2007) listed the most common deterministic
optimization algorithms in the area of reservoir simulation assisted history matching as follows:
● Steepest Descent.
● Gauss-Newton.
● Levenberg-Marquardt.
● Singular Value Decomposition.
● Conjugate Gradient.
● Quasi-Newton.
● Gradual Deformation.
Stochastic Optimization
As the name implies, stochastic optimization algorithms depend on the presence of randomness in the
search procedure. Stochastic optimization method is based on forward problem using random input
to eventually obtain satisfied outcome. Most of the stochastic optimization algorithms are population-
based algorithms. Population-based feature involves that, during the solution of the optimization
problem, a set of initial samples evolves up to convergence. Stochastic optimization doesn’t require
14 SPE-182808-MS

the computation of the objective function gradients as the case in the deterministic approach.
However, slower rate of solution convergence is expected, the stochastic optimization methods avoid
local minima convergence issues. Most of the stochastic optimization algorithms are inspired by
randomized search methods comes from concepts from biology, simplified models of some natural
phenomena, behavior of some animals in evolution,. . .etc. A very important point to be mentioned
here is that a balance should be considered between the need of improving the algorithm robustness
by widely exploring the design space and the need of reaching the solution convergence criteria in
a reasonable amount of time. This balance can be achieved by tuning the algorithm using some
assigned controlling parameters.
In the following sections, the interesting concepts behind the stochastic optimization algorithms
are presented. In addition, the most important controlling parameters of the most popular stochastic
optimization algorithms are introduced.

Genetic Algorithm
Genetic algorithms were developed in the 1975 by the work of Holland (1975) and become popular
by the work of his student Goldberg (1989). Genetic algorithm is inspired by Darwin’s theory ⬙the
strongest species that survives⬙. Darwin also stated that ⬙the survival of an organism can be
maintained through the process of reproduction, crossover and mutation⬙. In genetic algorithm the
input variables are encoded into binary strings representing real valued input variables. Switching to
the language of the genetic algorithm, Chromosome presents a solution generated by the genetic
algorithm. Population is a collection of chromosomes. A chromosome is composed from genes
which value can be either numerical, binary or characters. Fig. 12 shows the graphical representation
of one chromosome containing four genes.

Figure 12—One chromosome four genes

The mechanism of genetic algorithm can summarized as follows:


2. Determine initial solutions (scooping runs variables value).
3. Evaluate the fitness of each chromosome by calculating the objective function.
4. Select the highest fitness chromosomes for the next generations (Roulette wheel. . .etc).
5. Reproduce chromosomes: (Crossover and Mutation).
6. Re-evaluate reproduced chromosomes till reaching satisfied solutions or predefined number of
iterations.
Fig.13 presents the diagram of the genetic algorithm mechanism.
SPE-182808-MS 15

Figure 13—Genetic algorithm mechanism diagram

Chromosomes reproduction in the genetic algorithm is conducted through two processes, crossover and
mutation. As indicated in Fig. 14, the crossover process involves the random selection of genes position
in the parents’ chromosomes and then exchange sub-chromosomes. Cross over process is controlled by
crossover rate, not all chromosomes undergo cross-over.

Figure 14 —Graphical representation of the cross over process

On the other side, the mutation process involves replacing the gene at a random position with a new
value as shown in Fig. 15. Mutation process is controlled by mutation rate, few chromosomes undergo
mutation.
16 SPE-182808-MS

Figure 15—Graphical representation of the mutation process

Based on the above discussion, the controlling parameters of the genetic algorithm are; the population
size, the number of generations, the crossover rate, and the mutation rate.
Memetic Algorithm
Similar to genetic algorithm, but the elements that form a chromosome are called memes, not genes. The
unique aspect of the memetic algorithm is that all chromosomes and offspring are allowed to gain some
experience through a local search before being involved in the evolutionary process. After a local search,
cross-over and mutation operations are applied. Generated offspring are then subjected to the local search
to always maintain the local optimality, as presented in Fig. 16.

Figure 16 —Graphical representation of local experience provided to the offspring in memetic algorithm

The mechanism of the memetic algorithm can be summarized as follows:


1. Several local-search algorithms are applied, however, can be designed to suit the problem nature.
2. Local-search algorithm can be conducted by adding or subtracting an incremental value from
every gene and testing the chromosome’s fitness. The change is kept if the chromosome fitness
improves.
The controlling parameters of the memetic algorithm are the same as the genetic algorithm in addition
to a local-search mechanism.
Particle Swarm Algorithm
Particle Swarm Algorithm was developed by Kennedy and Eberhart (1995). The algorithm is inspired by
the social behavior of a flock of birds, particles, trying to reach an unknown destination. Each bird
searches in a specific direction and then communicates together and determines the bird that is in the best
location. Accordingly, each bird speeds towards the best bird and then investigates the search from its
current location. The process is repeated until the flock reaches the desired destination. The main
SPE-182808-MS 17

component of the particle swarm algorithm is the particle which is a solution analogous to a chromosome.
Each particle monitors 3 values; current position, best previous position, flying velocity.
The mechanism can be summarized as follows; in each cycle, each particle updates its position based
on its private thinking by comparing its current position with its own best in addition to social
collaboration among other particles by comparing its position with that of the best particle according to
the following concept of updating:

Upper limit of particle velocity is specified to control the change of particles’ velocities. The
controlling parameters of the particle swarm are the population size, the number of generations, the
maximum change in particle velocity, and the inertia weight factor, ɯ, which is responsible for to
improving current particle velocity considering its history of velocities.

Conclusions and Guidelines


● Several experimental design techniques are available to be used in reservoir assisted history
matching. However, there is no best choice; a recommended workflow can be introduced.
● A recommended workflow to better explore the uncertain search space is to apply a cheap
experimental design technique like Plackett-Burman as a preliminary study for estimating the main
effects. For more precise computation of the main and some interaction effects, space filling Sobol
and Latin hypercube experimental design techniques always perform better than other techniques.
● Artificial neural network fitting technique has a great potentiality in creating proxy models in case
they are well trained.
● In case of not existing sufficient training data to well operate the artificial neural network, thin
plate splines and kriging work better than polynomial and least squares methods.
● If the computational effort of building several proxy models is not an issue, it is recommended to
build as many as proxy models using the different methods to compare based on the goodness of
fitness.
● Focusing and orienting to tochastic optimization algorithms solely can handle the muti-objective
optimization problems and avoid local minima sticking issues.

Acknowledgments
This work is a part of the literature review chapter of the author’s Ph.D. dissertion that is, in the time of
publishing this paper, performed in Cairo University under the supervision of Drs. Helmy Sayyouh and
Ahmed El-Banbi. Hence, the author would like to acknowledge his supervisors for their monitoring and
guiding this work.
18 SPE-182808-MS

References
Box, G. E. P., & Wilson, K. B. (1951). Experimental attainment of optimum conditions. Journal of the Royal Statistical
Society, 13, 1–45.
Box, G. E. P., & Behnken, D. (1960). Some new three level designs for the study of quantitative variables. Technometrics,
2, 455–475.
Byrd, R. H., Lu, P., Nocedal, J., and Zhu, C. (1995). A limited memory algorithm for bound constrained optimization.
SIAM Journal on Scientific Computing, 16.5 1190 –1208.
Bates, D. M., & Watts D. G. (1988). Nonlinear regression and its applications. New York: Wiley.
Craig, P. S., Goldstein, M., Seheult, A. H., and Smith, J. A. (1997). Pressure matching for hydrocarbon reservoirs: a case
study in the use of Bayes linear strategies for large computer experiments. Case Studies in Bayesian Statistics.
Springer New York, 37–93.
Edwards, L. A. (1984). An introduction to linear regression and correlation (2nd ed.). San Francisco: Freeman.
Fausett, L. (1993). Fundamentals of neural networks. Architecture, algorithms, andapplications. Englewood Cliffs:
Prentice Hall.
Fisher, R. A. (1925). Statistical methods for research workers. Edinburgh: Oliver and Boyd.
Freeman, J. A., & Skapura, D. M. (1991). Neural networks. Algorithms, applications, and programming techniques.
Reading: Addison-Wesley.
Goodwin, N. H. (1988). The application of multi-objective optimisation to problems in reservoir engineering and reservoir
description. Internal report, Scientific Software Intercomp.
Gauss, J. C. F. (1825). Combinationis observationum erroribus minimis obnoxiae Gottingen: University of Gottingen.
Goldberg, D. E. (1989). Genetic algorithms in search, optimization, and machine learning. Reading: Addison-Wesley.
Halton, J. H. (1960). On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional
integrals. Numerische Matematik, 2(1), 84 –90.
Hardin, R. H., & Sloane, N. J. A. (1993). A new approach to the construction of optimal designs. Technical report, AT&T
Bell Laboratories.
Holland, J. H. (1975). Adaptation in natural and artificial systems: An introductory analysis with applications to biology,
control, and artificial intelligence. Ann Arbor: University of Michigan.
Kappele, W. D. (1998). Using I-optimal designs for narrower confidence limits. In Proceedings of the IASI Conference,
Orlando, FL, February 1998.
Kennedy, J., & Eberhart, R. C. (1995). Particle swarm optimization. In IEEE International Conference on Neural
Networks, Perth, November/December 1995.
Krige, D. G. (1951). A statistical approach to some basic mine valuation problems on the witwatersrand. Journal of the
Chemical, Metallurgical and Mining Society of South Africa, 52(6), 119 –139.
Liang, B., 2007. An Ensemble Kalman Filter module for Automatic History Matching. Dissertation presented to the Faulty
of Graduate School, University of Texas, Austin.
Leonid Kantorovich, 1939.
Mostaghim, S., Branke, J., & Schmeck, H. (2006). Multi-objective particle swarm optimization on computer grids. In
Proceedings of the 9th annual conference on genetic and evolutionary optimization, London.
Nocedal, J. (1992). Theory of algorithms for unconstrained optimization. Acta numerica 1 199 –242.
Nocedal, J., and Wright, S. J. (1999). Numerical optimization. Vol. 2. New York: Springer.
NIST/SEMATECH (2006). NIST/SEMATECH e-handbook of statistical methods. http://www.itl.nist.gov/div898/hand-
book/.
Oxford english dictionary. Oxford: Oxford University Press, 2008.
Parish, R. G., and Little, A. J. (1997). Statistical Tools to Improve the Process of History Matching Reservoirs. Bahrain,
MEOS. SPE-37730-MS.
Parish, R. G., Watkins, A., Muggeridge, A., and Calderbank, V. (1993). Effective History Matching: The Application of
Advanced Software Techniques to the History Matching Process. SPE 25250.
Parish, R. G., and Little, A. J. (1994). A Complete Methodology for History Matching Reservoirs. 6th ADIPEC.
Plackett, R. L., & Burman, J. P. (1946). The design of optimum multifactorial experiments. Biometrika, 33(4), 305–325.
Quasi-monte carlo simulation. Pontificia Universidade Catòlica do Rio de Janeiro. http://www.sphere.rdc.puc-rio.br/
marco.ind/quasi_mc.html.
Rojas, R. (1996). Neural networks. Berlin: Springer.
Sobol’ I. M. (1967). On the distribution of points in a cube and the approximate evaluation of integrals. USSR
Computational Mathematics and Mathematical Physics, 7(4), 86 –112.
Taguchi, G., & Wu, Y. (1980). Introduction to off-line quality control. Nagoya: Central Japan Quality Control Association.
SPE-182808-MS 19

Veelenturf, L. P. J. (1995). Analysis and applications of artificial neural networks. Englewood Cliffs: Prentice Hall.
Watkins, A. J., and Parish, R. G. (1992a). A Stochastic Role for Engineering Input to Reservoir History Matching. LAPEC
II; Caracas, Venezuela. SPE-23738-MS.
Watkins, A. J., and Parish, R. G. (1992b). Computational Aids to Reservoir History Matching. Petroleum Computer
Conference, Houston, Texas. SPE-24435-MS.
Van der Corput, J. G. (1935). Verteilungsfunktionen. Proceedings of the Koninklijke Nederlandse Akademie van
Wetenschappen, 38, 813–821.
Zhu, C., Byrd, R. H., Lu, P., and Nocedal, J. (1997). Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale
bound-constrained optimization. ACM Transactions on Mathematical Software (TOMS), 23.4: 550 –560.

You might also like