This post is more than two years old and could be outdated
April 12, 2020
Evolutionary algorithms are search-based global optimization algorithms based on natural phenomenon:
local search, such as gradient descent, simplex method, line minimization, Newton's method
non-population based global search, such as grid search, random walk, exhaustive search, simulated annealing
population based global search, such as genetic algorithm, evolution strategies, ant colony optimization, particle swarm optimization, differential evolution
Evolutionary algorithms are very good at approximating solutions, but computational complexity and cost can be a limiting factor in production environments (see Evolved antenna).
An optimization problem can be solved numerically by adjusting the input (decision variable), to some function to find minimum or maximum output. The aim is to the find decision variables that gives minimum or maximum output.
A minimization problem is the inverse of maximization, so finding y = MAX[x(10 - x)]
is equivalent to y = MIN[-x(10 - x)]
(as illustrated below).
y = lambda x : x * (10 - x)
def MAX(fn, res, range_):
for n in range_:
res.append((n, fn(n)))
return print(max(res, key = lambda tuple: tuple[1]))
MAX(y, [], range(100)) # (5, 25)
y = lambda x : -x * (10 - x)
def MIN(fn, res, range_):
for n in range_:
res.append((n, fn(n)))
return print(min(res, key = lambda tuple: tuple[1]))
MIN(y, [], range(100)) # (5, -25)
Optimization problems can be function-based (cost function) or trial-and-error, static or dynamic (subject to change), constrained or unconstrained (most optimization problems are constrained).
To solve a constrained optimization problem: Lagrange multiplier method, MIN[f(x,y,z)]
s.b. c = g(x,y,z)
, where the Lagrange function is L(x,y,z,l) = f(x,y,z) + l(g(x,y,z) - c)
, and l
is the Lagrange multiplier, the stationary point is L(x,y,z,l) = 0
Exhaustive search
An exhaustive search method, or brute-force, is a numerical method that can be used to search a large but finite solution space using combinations of different variables.
Brute-force methods do not get stuck in local minimum with fine sampling and work with both continuous or discontinuous functions (no derivatives). However, brute-force can take long time to find global best, or missed due to under-sampling, and only practical with small number of variables in limited search space.
Nelder-Mead method
The Nelder-Mead method, or downhill simplex method, can be used to find local minimum of function with several variables (a simplex is an elementary shape formed in dimension n + 1
, such as 2D triangle, where points are referred to as best, B
, good, G
, and worst, W
.
The process to search for local minimum is repeated until acceptable error and the points BGW
are iterated when finding better points.
For example: f(x,y) = x + y
with x = 2
and y = 3
gives f(2,3) = 5
, and point W = (4,5)
gives f(4,5) = 9
, so move towards (2,3)
, which is smaller (minimum)
move towards local minimum (reflection, expansion), where reflection point is R = M + (M - W) = 2M - W
and expanding point is E = R + (R - M) = 2R - M
contract around minimum (contraction, shrinking), where contraction points are C1 = (W + M) / 2
and C2 = (M + R) / 2
, and shrink towards B
, so shrink point is S = (B + W) / 2
and midpoint is M = (B + G) / 2
Line minimization
A line minimization method, or line search, starts at a random point and selects direction to move until cost function changes (increases or decreases depending on goal).
Line search methods are generally robust and fast to converge, but use gradients, so function need to be differentiable, and is not guaranteed to find global minimum:
initialise starting point
compute search direction
compute step length
update, x[k+1] = x[k] - f'(x[k]) / f''(x[k]
check convergence, or repeat from step 2
Random walk
The random walk method use random numbers to move in different directions, which is particularly useful to avoid getting stuck in local minimum or explore different regions to locate additional minimums, concentrate search to smaller region to refine solution:
initialise starting point and best point (this can be the same point)
evaluate starting point
generate random correction
update, x[k+1] = x[k] + D[k] * h[k]
where D
is random search direction matrix and h
is step size vector
evaluate function at point and update best point
return best point, or repeat step 3
Random walk and other methods using random numbers work with discrete and continuous variables, discontinuous functions and non-differential functions (no derivatives), and not as sensitive to initial point selection but more sensitive to control parameters (direction and step size).
Note that solutions found using random numbers are not repeatable, so multiple runs needed to verify.
Genetic algorithms describe population-based methods that are inspired by natural selection. Darwin's theory of evolution states that an offspring has many characteristics of its parents, which implies population is stable, and variations in characterisitcs between individuals passed from one generation to the next often is from inherited characteristics, where some percentage of offsprings survive to adulthood.
Binary genetic algorithms (BGA) work well with continuous and discrete variables and can handle large number of decision variables. The decision variables are evaluated with cost functions. BGA is less likely to get stuck in local minimum.
Notation
N[var] |
number of decision variables of chromosome |
N[bits] |
total number of bits of chromosome |
N[pop] |
population size |
N[pop]N[bits] |
total number of bits in population |
X |
selection rate in step of natural selection |
N[keep] = XN[pop] |
number of chromosomes that are kept for each generation |
N[pop] - N[keep] |
number of chromosomes to be discarded |
x[low] |
lower bound of variable x |
x[high] |
upper bound of variable x |
P[n] |
probability of chromosome n in mating pool N[keep] to be chosen |
c[n] |
cost of chromosome n |
C[n] |
normalised cost of chromosome n |
mu |
mutation rate, which is probability of mutation |
Decision variables are represented as chromosomes, such as [v[1], v[2], ..., v[N[var]]
, where each gene is coded by m
-bits, so total number of bits per chromosome is N[bits] = mN[var]
, cost is evaluated by some cost function, and result is presented as sorted table, or cost table (most fit at top).
A population, N[pop]
, is a group of chromosomes, each representing a potential solution to function, such as f(x,y)
, where (x, y)
represent some chromosome, such as [1100011, 0011001]
.
Natural selection
The selection process imitates natural selection, where only best potential solutions are selected. A selection occur each generation, or iteration, and selection rate, X
, is fraction of population that survive.
The number of chromosomes that are kept is N[keep]
, where best are kept and worst are discarded (replaced by offsprings): N[pop] = 8
and X = 0.5
, then N[keep] = X * N[pop] = 8 * 0.5 = 4
, so keep best four chromosomes.
Thresholding is computationally cheaper than selection rate, where chromosomes with costs lower than threshold survive and higher are discarded (minimization). A new population is generated when no chromsomes are within threshold (threshold can be updated with each iteration).
Binary encoding and decoding
Below is an example of binary encoding and decoding:
convert 25.3125 (base 10) integer part to binary by repeatedly dividing integer part by 2 until 0 (rounded down), non-even result gives 1, otherwise 0, then flip (read from decimal point and out), then
25 / 2 |
12.5 | 1 |
12 / 2 |
6 | 0 |
6 / 2 |
3 | 0 |
3 / 2 |
1.5 | 1 |
1 / 2 |
0.5 | 1 |
convert 25.3125 (base 10) fraction part to binary by repeatedly multiplying fractional part by 2 until 1, result greater or qual to 1 gives 1, otherwise 0
2 * 0.3125 |
0.625 | 0 |
2 * 0.625 |
1.25 | 1 |
2 * 0.25 |
0.5 | 0 |
2 * 0.5 |
1 | 1 |
convert 11001.0101
(base 2) integer part to decimal, print(1 * (2 ** 4) + 1 * (2 ** 3) + 0 * (2 ** 2) + 0 * (2 ** 1) + 1 * (2 ** 0))
should give 25
convert 11001.0101
(base 2) fractional part to decimal, print(0 * (2 ** -1) + 1 * (2 ** -2) + 0 * (2 ** -3) + 1 * (2 ** - 4))
should give 0.3125
Bits required to achieve precision
Below is a method to find bits required to achieve precision of d
decimal places given a range:
(x[high] - x[low]) / 10 ** -d <= 2 ** m - 1
x[low] = 25
, x[high] = 100
, and d = 2
, then (100 - 25) / 10 ** -2 <= 2 ** m - 1
gives m <= 12.8729
, so about 13 bits
The selection process involves selecting two chromosomes from the mating pool to produce two new offsprings, and the offsprings are either kept or discarded based on parameters:
pairing from top to bottom, chromosomes are paired two at a time starting from the top until $N_{keep}$ chromosomes are selected for mating, simple and easy to implement
random pairing, uniform random number generator to select chromosomes, all chromosomes have chance to mate, introduce diversity into population
Weighted random pairing
Weighted random pairing, or roulette wheel weighting, assign probabilities that are inversely proportional to cost to chromosomes in mating pool, where chromosomes with lowest cost have greatest chance to mate:
rank weighting is problem independent, probabilities are calculated once, so computationally cheaper, but small population have higher probability to select same chromosome
cost weighting is cost function dependent, tend to weight top chromosomes more when large spread in cost between top and bottom chromosomes, tend to weight chromosomes evenly when chromosomes have about same cost, probabilities are calculated each generation, so computationally expensive
Tournament selection
Tournament selection is problem independent and work best with larger population size, but sorting is computationally expensive, chromosomes with good quality have higher chance to be selected: randomly pick small subset of chromosomes from mating pool in N[keep]
, where chromosomes with lowest cost in subset become parent.
The crossover process create offsprings by exchanging information between parents selected in the selection process:
single-point crossover
N[pop] - N[keep]
N[pop] - N[keep]
is replaceddouble-point crossover, segments between two randomly generated crossover points are swapped between parents
uniform crossover, bits are randomly selected for swapping between parents
Mutation
A mutation is a random alteration of certain bits in cost table with chromosomes, this is to allow the algorithm to explore a wider cost surface by introducing new information:
choose mutation rate, mu
, which represent probability of mutation
determine number of bits to be mutated
mu(N[pop] - 1) * N[bits]
mu(N[pop]) * N[bits]
flip bits (at random or otherwise)
The selection-crossover-mutation process should continue until the population meets some stopping criteria (convergence), such as finding an acceptable solution, exceeded maximum number of iterations, no changes to chromosomes, no changes to cost, or population statistics on mean and minimum cost.
Continuous genetic algorithms (CGA) are similar to a binary genetic algorithm, except that the chromosome is represented by a string of real numbers, and the crossover and mutation process is different. It is more logical to represent machine precision in floating-point and results in faster execution (no encoding and decoding).
CGAs can deal with complex problems with high dimensionality and only limited to internal precision errors in computers.
Crossover is similar to BGA, but exchanging variables instead of bits, so the crossover point is a random number between first and last variable of parent chromosomes:
blending is a linear process done for all variables left and right of crossover point, blending ensure that we do not introduce more extreme variables than already present in population (sometimes disadvantage)
O[1] = B * P[1][m] + (1 - B) * P[2][m]
and O[2] = (1 - B) * P[1][m] + B * P[2][m]
, where B
is random variable between 0 and 1 (blending), O[n]
is offspring, P[n]
is parent chromosomes, and m
is m
-th variable in parentextrapolation allows introduction of variables outside of parent values (can be discarded if outside allowed range)
O[1] = P[1][m] - B(P[1][n] - P[2][n])
and O[2] = P[2][m] + B(P[1][n] - P[2][n])
, where B
is random variableIn CGA, total number of variables that can be mutated in population is mu * (N[pop] - 1) * N[var]
(elitism) and mutate chosen variables P'[n] = P[n] + C * N[n](0, 1)
, where C
is constant and N[n](0, 1)
is normal distribution.
A complicated cost function can be computationally expensive and time consuming for evaluation. Here are a few approaches to reduce the number of function evaluations:
identical chromosomes are not evaluated more than once (duplicate evaluation)
only allow unique chromosomes in population
duplicate offsprings are discarded (search time is less than evaluation time)
only evaluate cost of offsprings with mutated chromosomes
Multiple-objective optimization refers to optimizing for a set of cost functions where objectives conflict:
optimizing one cost function degrade the other (not always possible to optimize all cost functions)
goal is to find feasible solution (maximizing yield or minimizing cost)
Weighted sum approach is a simple approach to multiple-objective optimization, where cost functions are combined into a single cost function by weighting each cost function, as such g(x) = SUM[w * f(x)]
, where w > 0
and SUM[w] = 1
.
Gray codes are used to reduce the probability of crossover resulting in extreme offsprings, where only one bit is changed between two consecutive numbers in the code. Hamming distance is the number of bits that are different between two numbers in the code.
A permutation problem is a problem where the order of variables is important, such as the traveling salesman problem, and normal crossover and mutation is not suitable.
Partally matched crossover
In partally matched crossover (PMX):
P[1] = [3|46|215]
P[2] = [4|15|326]
O[1] = [3|15|215]
O[2] = [4|46|326]
O[1] = [3|15|2(4)(6)]
O[2] = [(1)|46|32(5)]
Ordered crossover
In ordered crossover (OX):
P[1] = [34|62|15]
P[2] = [41|53|26]
O[1] = [34|53|15]
O[2] = [41|62|26]
P[1] = [X4|62|X5]
P[2] = [41|53|XX]
O[1] = [??|53|??]
O[2] = [??|62|??]
O[1] = [62|53|14]
O[2] = [53|62|41]
Cycle crossover
In cycle crossover (CX):
P[1] = [3|46215]
P[2] = [4|15326]
O[1] = [4|46215]
O[2] = [3|15326]
O[1] = [4|1|6215]
O[2] = [3|4|5326]
O[1] = [4162|2|5]
O[2] = [3453|1|6]
O[1] = [416|3|25]
O[2] = [345|2|16]
A penalty function is a function that is added to the cost function to penalize infeasible solutions, such as solutions that violate constraints, such as fP(x) = f(x) + SUM[m](C(x))
, where fP(x)
is penalty cost function, f(x)
unpenalised cost function, m
number of constraints, and C(x)
penalty function imposed on voilating constraints.
Genetic programming is a method used to evolve computer programs, where a program is represented as a tree structure, and the tree is evolved by crossover and mutation. A terminal set specify all variables and constants and function set contains all functions that can be applied to elements in terminal set. An evolved program is evaluated to measure performance within problem domain (fitness).
In a tree-based representation each chromosome is a program. An adaptive chromosome are variable length and size (tree depth), and shape is branching factor of nodes in tree. The domain-specific grammar includes terminal set (variables and constants, leaf nodes in tree), function set (operators, such as arithmetic, boolean, decision structures), and semantic rules (to create correct programs).
The genetic programming process:
initialize population (random or otherwise)
cost function to measure performance
crossover and mutation to exchange information between parents
mutate to explore new information
The schema theorem is a template to represent subset of binary strings using symbols (*
is any).
*1100 => { 01100, 111000 }
*110* => { 01100, 01101, 11100, 11101 }
Properties of schema:
o(S)
, is number of fixed positions (length of template minus number of symbols)d(S)
, is distance between first and last fixed positionS[1] = ***001*110 => o(S[1]) = 6, d(S[1]) = 10 - 4 = 6
S[2] = ****00**0* => o(S[2]) = 3, d(S[2]) = 9 - 5 = 4
The 1 + 1 evolutionary strategy, or (1 + 1)-ES, is one parent and one offspring, so population is only one individual, X = [x_1, ..., x_n]
. There is no recombination process (crossover) but mutation can be applied with x'(t) = x(t) + a(t) * N_j(0,1)
, where a
is parameter and N is Gaussian noise.
The (mu + 1)-ES is mu
-parent and one offspring, so population is greater than one (mu > 1
). Uniform crossover where two random parents are selected, chromosomes in offspring is randomly selected between the parents.
X(t) = [x_1(t), ..., x_n(t)]
x(t) = { x_1(t) if r <= 0.5, x_2(t) otherwise }
Offspring is muted similar to (1 + 1)-ES and best mu
individuals from mu + 1
population survive to next generation.
The (mu + lambda)-ES is mu
-parent and lambda
-offspring, where best mu
individuals from mu + lambda
population survive to next generation. Elitism can be applied for faster convergence and better exploitation.
The (mu, lambda)-ES generate lambda
offsprings from mu
parents, where lambda
is greater or equal to mu
, which is greater or equal to 1. The best mu
individuals from lambda
offsprings replace parents. Elitism is not applied (better exploration and larger diversity).
A crossover operation can be local or global. A local crossover is two randomly selected parents producing one offspring. A global crossover is more than two randomly selected parents produce one offspring.
Discrete recombination is when actual components of parents are used to generate offspring. Intermediate recombination is weighted average of components of parents. No crossover select parent as offspring.
Example of local, discrete recombination:
x_j(t) = { x_1_j(t) if r_j <= 0.5, x_2_j(t) otherwise }
a_j(t) = { a_1_j(t) if r_j <= 0.5, a_2_j(t) otherwise }
r: random number [0,1]
parent 1: [x_11, x_12, x_13], [a_11, a_12, a_13]
parent 2: [x_21, x_22, x_23], [a_21, a_22, a_23]
offspring: [x_11, x_22, x_13], [a_11, a_22, a_13]
Example of local, intermediate recombination:
x_j(t) = r * x_1_j(t) + (1 - r) * x_2_j(t)
a_j(t) = r * a_1_j(t) + (1 - r) * a_2_j(t)
r: random number [0,1]
parent 1: [x_11, x_12, x_13], [a_11, a_12, a_13]
parent 2: [x_21, x_22, x_23], [a_21, a_22, a_23]
offspring: [r * x_11 + (1 - r) * x_21, r * x_12 + (1 - r) * x_22, r * x_13 + (1 - r) * x_23], [r * a_11 + (1 - r) * a_21, r * a_12 + (1 - r) * a_22, r * a_13 + (1 - r) * a_23]
A swarm intelligence is some system with collective behaviour by unsophisticated agents interacting locally with environment, causing emergent global patterns.
A swarm provides basis to explore collective or distributed problem solving without centralised control or provision of a global model.
Examples of swarms in nature:
group of fish swimming in the same direction
ants working together to find food and bring back to nest
bird flocking to decrease energy consumption
animal herding to defend against predators
Social colonies
A social colony is flexible enough to respond to internal perturbations and external challenges, robust enough to complete tasks even if some individuals fail, decentralised (no central control in colony), and self-organised (paths to solutions are emergent).
Self organisation
Self-organisation is a set of dynamic mechanisms where structures appear at global level from interactions of lower-level components (such as ants):
positive feedback is a form of amplification to show right direction towards optimal solution, or used to reinforce portions of good solutions that contribute to quality of solutions
negative feedback is used to introduce time scale into algorithms through pheromone evaporation, which can prevent premature convergence, counter-balance, and improve stabilisation (anti-stagnation)
amplification of fluctuation is introduced randomness so individuals can find new sources, i.e. explore (better when amplified by positive feedback)
multiple interactions is direct or indirect communication
A stigmergy is a mechanism that mediate animal-to-animal interactions and form of indirect communication mediated by modifications of environment. Signs observed by individual triggers specific response or action (reinforce or modify signals to influence actions of others, positive or negative).
There are two forms of stigmergy:
sematechtonic stigmergy, communication via changes in physical characteristics of environment
sign-based stigmergy, communication via signalling mechanism (chemical compound deposited by ants)
An artificial pheromone can be an artifical stigmergy, such as indirect communication mediated by numeric modifications of environment states locally accessible to agent, or stigmergic variable, such as encapsulate information used by artifical ants to communicate indirectly.
Ants appeared on earth about 100 million years ago and have an estimated population of 10 ** 16
individuals. They are social, live in colonies of 30 to millions, and show collective behaviour (foraging, division of labour, cooperative support, construction).
Ants are stimulus-response agents where an individual perform simple and basic action based on local information. Simple actions appear to have a large random component.
Example of ant colony optimization applications:
search for optimal path in graph
find shortest path between nest and source, without central and active coordination mechanisms
Binary Bridge Experiments
The Binary Bridge Experiment is a simple and elegant experiment to study foraging behaviour of ants:
ants deposit chemical pheromone while walking
ands have larger probability to follow path with higher pheromone trial
shortest path selection
probability to choose path A: P_A(t + 1) = ((c + n_A(t)) ** a) / (((c + n_A(t)) ** a) + ((c + n_B(t)) ** a)) = 1 - P_B(t + 1)
n_A: number of ants on path A
n_B: number of ants on path B
c: degree of attraction of unexpected branch
a: bias using pheromone deposits in decision process
Example artificial ant decision process algorithm:
generate random number, r[0,1]
choose values, c
and a
for each path A
, calculate P_A(t + 1)
if r <= P_A
, then follow path A
Simple Ant Colony Optimization (SACO)
The simple ant colony optimisation method is a graph ((i, j)
is edge from node i
to node j
), where pi_ij(t)
is pheromone concentration associated with edge at generation t
(pi_ij(0)
is assigned small initial random value) and L_k(t)
is length of path from source to destination constructed by ant k
.
The stopping criteria is maximum number of iterations, an acceptable solution, or most ants following the same path:
k
at node i
(roulette wheel selection method)p_k_ij(t) = { pi_a_ij(t) / sum[pi_a_iu(t)] if j in N_k_i(t), 0 otherwise }
N_k_i(t): set of feasible nodes connected to node i with respect to and k
a: constant greater than zero
edge: (i, j)
pi_ij(t) = (1 - rho) * pi_ij(t)
rho: evaporation rate (random number between 0 and 1, not inclusive)
pi_ij(t + 1) = pi_ij(t) + sum[delta(pi_ij(t))]
delta(pi_ij(t)) = { Q / f(x_k(t)) if edge in path x_k(t), 0 otherwise }
x_k(t): solution of ant k
f(x_k(t)): quality of solution
Q: constant greater than 0
Ant Colony System (ACS)
An ant system (AS) is an improvement on SACO which includes heuristic information to transition probability, set of feasible nodes (immidiate neighbours, nodes not visited to prevent loops), different update strategies for pheromone intensity, and elitism.
An ant colony system (ACS) is an improvement on AS which includes different transition rule (favour specific nodes) and different pheromone update rule:
local and global pheromone update rules (updated for all edges, reinforce on edges of best path)
local update rule (pheromone evaporation)
Particle swarm optimization is a numerical and population-based optimisation technique to discover optimal regions of high-dimensional search space using collective behaviour of individuals:
particle, an individual (potential solution)
swarm, a population
update rule, rules to update individuals to move towards optimal region
Particle swarms expand the ant-swarm behaviour model to N
-dimensional physchosocial space without constraints of physical laws. The swarm consists of set of particles and each particle represent potential solution for some N
-dimensional optimization problem.
The velocity updated as previous velocity + cognitive bias (c_1
) + social bias (c_2
). The stopping criteria is maximum number of iterations, an acceptable solution, no improvement observed over a period of iterations, or swarm radius close to zero.
Global best
The global best particle swarm method use star topology, whole swarm as neighbours, and particles are updated based on social information from all particles in swarm (best position found by swarm):
global: Y(t) = [Y_1(t), ..., Y_n(t)]
personal: y_i(t) = [y_i_1(t), ..., y_i_n(t)]
v_ij(t + 1) = v_ij(t) + (c_1 * r_1_j(t)) * (y_ij(t) - x_ij(t)) + (c_2 * r_2_j(t)) * (Y_j(t) - x_ij(t))
c_n: acceleration constants
r_n: random numbers [0,1]
x_i(t + 1) = x_i(t) + v_i(t + 1)
x_i(t): [x_i_1, ..., x_i_n]
v_i: [v_i_1, ..., v_i_n]
y_i(t + 1) = { y_i(t) if f(x_i(t + 1)) >= f(y_i(t)), x_i(t + 1) otherwise }
min[f(y_1(t + 1), ..., f(y_n(t + 1)))]
Local best
The local best particle swarm method use ring topology, small number of particles as neighbours, and particles updated based on social information exchanged within neighbourhood of particle (local best position within neighbourhood):
global: Y(t) = [Y_1(t), ..., Y_n(t)]
personal: y_i(t) = [y_i_1(t), ..., y_i_n(t)]
v_ij(t + 1) = v_ij(t) + (c_1 * r_1_j(t)) * (y_ij(t) - x_ij(t)) + (c_2 * r_2_j(t)) * (Y_ij(t) - x_ij(t))
c_n: acceleration constants
r_n: random numbers [0,1]
x_i(t + 1) = x_i(t) + v_i(t + 1)
min[f(x(t))]
min[f(x(t))]
Different methods for selecting neighbourhoods:
particle indices, computationally expensive, promote spread of information irrespective of position of particle
spatial similarity, computationally expensive, information of similar particles can be used for local search
overlapping, same particle member of more than one neighbourhood, interconnection can promote information sharing (converge faster)
Velocity clamping is when particle velocity is adjusted before updating particle positions.
v'_ij(t + 1) = { v_ij(t + 1) if v_ij(t + 1) < V_max, V_max otherwise }
V_max: limit velocity to user-defined maximum value
An advantage of velocity clamping is that explotion is controlled, where otherwise velocity values could be updated to quickly to large values (pushing particles to boundaries). The disadvantages are that searching direction of particle is changed and particle search hypercube when velocity reach maximum.
Dynamic velocity approach
The dynamic velocity approach is method to change maximum velocity if global best position does not improve over a period of iterations.
V_max(t + 1) = { D * V_max(t) if f(Y(t)) >= f(Y(t - t')), V_max(t) otherwise }
V_max: limit velocity to user-defined maximum value
D: decrease linearly from 1 to 0.01
Example of exponentially decay in maximum velocity:
V_max(t + 1) = (1 - (t / n) ** a) * V_max(t)
a >= 0
n: maximum number of iterations
An inertia weight is an mechanism to control exploration and exploitation abilities of swarm without controlling velocity.
v_ij(t + 1) = w * v_ij(t) + (c_1 * r_1_j(t)) * (y_ij(t) - x_ij(t)) + (c_2 * r_2_j(t)) * (Y_j(t) - x_ij(t))
w: * > 0
c_1: cognitive bias
c_2: social bias
Other methods to adjust weight:
random adjustment, w: random number [0,1]
linear decreasing, w(t) = (w(0) - w(n)) * ((n - t) / n) + w(n)
where w(0) >= w(n)
nonlinear decreasing, w(t + 1) = ((w(t) - 0.4) * (n - t)) / (n + 0.4)
where w(0) = 0.9
A differential evolution method is a stochastic and population-based search strategy where distance and direction information of population is used to guide search (difference vectors).
A difference vector provide information about position and fitness of individuals in landscape. A large distance between individuals means larger step sizes (explore more search space) and small distance between individuals means smaller step sizes (exploit local areas).
B
, p
, t
(counter), and C(0)
(population)[X_1, X_2, ...., X_ns] = [[x_11, ..., x_1nx], ... [x_ns1, ..., x_nsnx]]
ns: size of population
nx: number of decision variables
C(t) = [x_1(t), ..., x_ns(t)]
evaluate cost function
mutation, produce trial vector for each selected individual (mutating target vector with weighted differential vector)
u_i(t) = x_i_1(t) + B(x_i_2(t) - x_i_3(t))
u_i(t): trial vector
x_i_1: target vector
x_i_2: randomly selected individuals
x_i_3: randomly selected individuals
B: scale factor
x'_ij(t) = { u_ij(t) if j in J, x_ij(t) otherwise }
J: set of element indices to undergo crossover (crossover points)
selection to determine if parent or offspring will survive to next generation
repeat from step 2 until stopping criteria is met
A differential evolution strategy is a variation to basic differential evolution with different ways to select target vectors, different number of difference vectors, and different ways to select crossover points.
Notation (DE/x/y/z)
x |
method of selecting target vector |
y |
number of difference vectors |
z |
crossover method (binomial, exponential) |
DE/rand/1/z
A random individual from current population selected as target vector with one difference vector, any crossover method, and trial vector (same as basic differential evolution).
u_i(t) = x_i_1(t) + B(x_i_2(t) - x_i_3(t))
DE/best/1/z
The best individual from current population is selected as target vector with one difference vector, any crossover method, and trial vector.
u_i(t) = x_best(t) + B(x_i_2(t) - x_i_3(t))
DE/x/nv/z
Any method to select individual from current population as target vector with nv
difference vectors, any crossover method, and trial vector.
u_i(t) = x_i_1(t) + B * sum_nv[x_i_2_n(t) - x_i_3_n(t)]
DE/rand-to-best/nv/z
A combination of random and best strategies with nv
difference vectors, any crossover methods, and trial vector.
u_i(t) = A * x_best(t) + (1 - A) * x_i_1(t) + B * sum_nv[x_i_2_n(t) - x_i_3_n(t)]
A: random number [0,1] (1 is more exploitation, 0 is more exploration)
DE/current-to-best/1+nv/z
The parent vector selected as target vector with 1 + nv
difference vectors (first difference vector formed by parent vector and best individual from current population), any crossover method, and trial vector.
u_i(t) = x_i(t) + B * (x_best(t) - x_i(t)) + B * sum_nv[x_i_2_n(t) - x_i_3_n(t)]