Journal of Computational Finance

Risk.net

Fast stochastic forward sensitivities in Monte Carlo simulations using stochastic automatic differentiation (with applications to initial margin valuation adjustments)

Christian Fries

  • In this paper, we apply the stochastic (backward) algorithmic differentiation to calculate stochastic forward sensitivities, i.e., the random variable representing sensitivities at a future points in time.
  • A typical application of stochastic forward sensitivities is the exact calculation of an initial margin valuation adjustment (MVA), assuming that the initial margin is determined from a sensitivity based risk model.
  • We demonstrate that these forward sensitivities can be obtained in a  single stochastic automatic differentiation sweep. Our test case generates 5 million sensitivities in seconds.

In this paper, we apply stochastic (backward) automatic differentiation to calculate stochastic forward sensitivities. A forward sensitivity is a sensitivity at a future point in time, conditional on future states (ie, it is a random variable). A typical application of stochastic forward sensitivities is the exact calculation of an initial margin valuation adjustment, assuming the initial margin is determined from a sensitivity- based risk model. The ISDA Standard Initial Margin Model is an example of such a model. We demonstrate that these forward sensitivities can be obtained in a single stochastic (backward) automatic differentiation sweep with an additional conditional expectation step. Although the additional conditional expectation step represents a burden, it enables us to utilize the expected stochastic (backward) automatic differentiation: a modified version of the stochastic (backward) automatic differentiation. As a test case, we consider a hedge simulation requiring the numerical calculation of 5 million sensitivities. This calculation, showing the accuracy of the sensitivities, requires approximately 10 seconds on a 2014 laptop. However, in real applications the performance may be even more impressive, since 90% of the computation time is consumed by the conditional expectation regression, which does not scale with the number of products.

1 Introduction

We consider a Monte Carlo simulation of state variables Xj(t) (possibly given by an Euler discretization of a stochastic differential equation (SDE), eg, a London Interbank Offered Rate (Libor) market model), modeled over a filtered probability space (Ω,,{t}). Here, Xj are our model primitives (and adapted processes). The model is usually specified by model parameters and the initial values Xj(0).

Let V(t) denote the time-t value of a financial product (eg, a derivative) under the given model. Then, V(0)/Xj(0) is called sensitivity of V(0) with respect to Xj(0).

Likewise, V(t)/Xj(t) is called the time-t forward sensitivity of V with respect to Xj. That is, the forward sensitivities are random variables representing the on-path sensitivities of V(t;ω) with respect to the initial values Xj(t;ω).11 1 Although the exposition only considers sensitivities with respect to initial values (ie, deltas), sensitivities with respect to model parameters (eg, vegas) are also covered, because they can formally be seen as additional components of the vector-stochastic process.

The numerical valuation of forward sensitivities in a Monte Carlo simulation is demanding due to two aspects.

  • In a Monte Carlo simulation, the value V(t) is often hard to obtain. Instead, we often deal with random variables U(t) such that V(t) is the time-t conditional expectation of U(t), ie, V(t)=E(U(t)t).22 2 An example for U(t) is the sum of the discounted future cashflows of a swap.

  • The numerical valuation of forward sensitivities via a standard finite-difference approximation of the partial derivative (bump-and-revalue) would require a huge amount of revaluations, namely for each time t=ti and each path ω=ωk.

To solve the first issue, one may utilize analytic formulas or analytic approximations for V(t) in terms of Xj(t). If this is not possible, one usually relies on estimation methods (regression, local regression, etc). These methods, sometimes referred to as American Monte Carlo, are well elaborated and fairly standard (see Fries 2007).

To solve the second issue, one may also utilize analytic formulas. If this is not possible, one may rely on numerical methods, eg, automatic (algorithmic) differentiation (AAD; see Capriotti and Giles 2011; Giles and Glasserman 2006; Homescu 2011).

However, a subtle problem is that, for products that involve stochastic operators (like conditional expectations) in their valuation (eg, Bermudan options), the direct application of backward AAD appears to be nontrivial (see Antonov 2017; Capriotti et al 2016). In Fries (2017b), the automatic differentiation of such products was greatly simplified by using expected stochastic AAD.

In the following, we first reformulate the calculation of the forward sensitivities such that they are represented by a single backward automatic differentiation; we then show that expected stochastic AAD can be used for forward sensitivities as well.

2 Stochastic automatic differentiation for forward sensitivities

Stochastic automatic differentiation (Fries 2017b) is the reformulation of the automatic differentiation algorithms for random variables, with a special treatment of some stochastic operators.

The algorithm allows us to numerically calculate

  yxk,  

where y and xk are random variables.

In the notation of Fries (2017b), the backward automatic differentiation is derived for an algorithm calculating y depending on some inputs x0,,xn-1, ie,

  y=f(x0,,xn-1),  

where y and xi are (discretized) random variables, by considering intermediate results, ie, calculation steps. Let y=xN, where, for nmN,

  xm:=fm(xi1(m),,xik(m)(m))   (2.1)

denote intermediate results, where fm is an operator of k(m) arguments specified by the arguments indexes list (i1(m),,ik(m)(m)).

Returning to our application of the calculation of forward sensitivities, we have that our model primitives Xj(ti) are intermediate results of the Monte Carlo simulation, ie, Xj(ti)=xk for some k=k(i,j). However, we are not in the situation of considering a single dependent variable y. Instead, we are interested in the differentiation of the variables V(ti) for i=0,1,2,.

We now assume that the valuation of the derivative is given by random variables C(ti) with

  V(t)=N(t)E(tk>tC(tk)|t),   (2.2)

where C(tk) are tk-measurable random variables.

This representation is very common in Monte Carlo valuations, where C(ti) are numéraire-relative future cashflows and N is the numéraire.33 3 For example, the implementation design in finmath.net (n.d.-b) is such that in a Monte Carlo simulation all financial derivatives are represented in this form.

To achieve the valuation of all forward sensitivities in a single automatic differentiation step, we assume (or observe) that

  C(tk)X(ti)=0for tk<ti.   (2.3)

The property (2.3) is natural, since a time-tk cashflow cannot depend on a later, ie, time-ti, model variable; otherwise, C(tk) would not be tk-measurable. Hence, we can define the random variable44 4 Note that then V(0)=N(0)E(Z).

  Z:=k=0nC(tk)   (2.4)

and get55 5 The equality =* may require additional regularity assumptions in general, but it is trivial for a Monte Carlo simulation on a discrete sample space.

  V(ti)Xj(ti) =(2.2)Xj(ti)N(ti)E(tk>tiC(tk)|ti)  
    =*N(ti)E(tk>tiC(tk)Xj(ti)|ti)+N(ti)Xj(ti)V(ti)N(ti)  
    =(2.3)N(ti)E(k=0nC(tk)Xj(ti)|ti)+N(ti)Xj(ti)V(ti)N(ti)  
    =(2.4)N(ti)E(ZXj(ti)|ti)+N(ti)Xj(ti)V(ti)N(ti)  

for all state variables j=0,1,.

The additional term (N(ti)/Xj(ti))(V(ti)/N(ti)) is just the theta of the derivative with respect to the numéraire accrual account. In cases where the numéraire is a model primitive state variable, eg, X0:=N and/or we are not interested in the theta, we get the even simpler expression

  V(ti)Xj(ti)|N(t)=N(ti)E(ZXj(ti)|ti).  

As we will show in Section 2.2.1, we may replace Z by an unconditional expectation of Z and have

  V(ti)Xj(ti)|N(t)=N(ti)E(E(Z)Xj(ti)|ti).  

This is possible due to the application of the differential operator with respect to our t-measurable random variables. Hence, it is possible to determine all forward sensitivities from a single stochastic automatic differentiation of y=E(Z) with respect to the calculation nodes xk=Xj(ti). In summary, we require two steps.

  • Calculate y/xk using the stochastic automatic differentiation, where y=E(Z) and xk=Xj(ti) (for some k=k(i,j)).

  • For each time step ti, apply the conditional expectation operator, or an estimator for the conditional expectation to y/xk.

The numerical costs of this calculation are stunningly low. First, note that all partial derivatives y/xk for k=0,,N-1 are calculated in a single stochastic backward automatic differentiation sweep. Second, note that at each time step we can reuse the same conditional expectation estimator, which is a simple projection operator. Hence, the effort to calculate all forward sensitivities is comparable to a single stochastic backward automatic differentiation sweep plus a set of conditional expectation estimators, which, in turn, is comparable to a single valuation of a Bermudan option (which requires the estimation of conditional expectations at each time step).

2.1 Expected stochastic automatic differentiation

In Fries (2017b), it was shown that a modification of the backward automatic differentiation algorithm can be used to calculate an expectation of the stochastic differentiation. This modification is important for more complex derivatives, eg, Bermudan options. The theorem in Fries (2017b) also holds if a conditional expectation operator is applied to the derivative. Hence, we can apply this theorem here.

Theorem 1 from Fries (2017b) reads as follows.

Theorem 2.1.

(Expected stochastic backward automatic differentiation)   Let G denote a family of self-adjoint linear operators, ie, for any two random variables A, B, and any GG, we have

  E(A×G(B))=E(G(A)×B).  

Further, let the operators fm be sufficiently regular such that (d/dxj)G(fm)=G(dfm/dxj).66 6 Although this assumption may be nontrivial in general, it is trivial if we consider Ω to be a discrete (Monte Carlo) sampling space. Then, the modified backward automatic differentiation algorithm, defined as follows,

  • Initialize DN?=1 and Dm?=0 for mN.

  • For all m=N,N-1,,0 (iterating backward through the operator list),

    • for all j=1,,k(m) (iterating through the argument list)

        Dij(m)?{Dij(m)?+Dm?fmxij(m)(xi1(m),,xik(m)(m))iffm?Dij(m)?+G(Dm?)iffm?,  

gives

  E(yxi)=E(Di?).  

The operator E in this theorem can be replaced by a conditional expectation operator E(ti) to get

  V(ti)Xj(ti)=E(Dk?ti)  

given that the index k corresponds to the calculation of Xj(ti), ie, k=k(i,j) (given that Xj(ti) is an ti-measurable random variable).

This result is important, because it allows us to use the modified backward automatic differentiation presented in Fries (2017b) for forward sensitivities too.

The result also holds in more general cases where one does not have a decomposition of the product into a sum of discounted cashflows. In fact, the method also works for Bermudan options, where the random variable Z is constructed by backward algorithm. We will investigate this case later, after we have summarized the result as a theorem.

2.2 Main theorem

Theorem 2.2.

(Forward sensitivities via expected stochastic backward automatic differentiation)   Let the time-t value of a financial derivative be given by

  V(t)=N(t)E(tk>tC(tk)|t),   (2.5)

where C(tk) do not depend on Xj(ti) for ti>tk. Let the random variable Z be given by

  Z:=k=0nC(tk).  

Assume that Z is constructed from the model quantities Xj(ti) by an algorithm given by operators fm and intermediate results xm, defined in (2.1), with Xj(ti)=xk for k=k(i,j) and Z=y=xN. Let fm fulfill the assumption of Theorem 2.1 with E=E(Fti); then, we have

  V(ti)Xj(ti)=N(ti)E(Dk?ti)+log(N(ti))Xj(ti)V(ti),  

where the DkG are constructed using the modified backward automatic differentiation algorithm from Theorem 2.1.

Remark 2.3.

Alternatively, we may consider the process

  V~(t)=V(t)+tktC(tk)N(t)=N(t)E(Zt).  

The difference between V~ and V is that V~ contains the past cashflows accrued by the numéraire N (such that V~/N is a martingale).

2.2.1 Proof of the main theorem

Proof.

Let V*=N(t)Z denote the random variable of aggregated (discounted) future cashflows such that the time-t value is

  V(t)=E(V*t).   (2.6)

Let X denote an Ft-measurable state variable. This implies that we can interpret X(ω) as an initial value of the (nested Monte Carlo) simulation on the (future) paths ωiσ(t,ω), where σ(t,ω) is the smallest set At such that ωA. Recall that Ft-measurability is equivalent to requiring that X is constant on the set σ(t,ω) for any fixed ω.

The definition of V* as aggregated future cashflows implies that V*(ωi) does not depend on X(ωj) for ωiσ(t,ωj).77 7 This property is best visualized in a nested Monte Carlo simulation, although we will later approximate conditional expectations in non-nested simulations. Note that we first derive the exact result and then apply approximation methods to it. That is, we have

  V*(ωi)X(ωj)=0for ωiσ(t,ωj).   (2.7)

We are interested in the derivative of the conditional expectation of the future cashflows of our derivatives, that is,

  V(t)=E(V*t).   (2.8)

For a given ω, let A=σ(t,ω). So, on path ω, we are interested in calculating the derivative of

  VA(t)=E(V*A).   (2.9)

At this point, we might just apply the backward differentiation to each conditioning node A. However, this would require multiple backward differentiation sweeps, namely initializing the differentials with the indicator functions ?A.

However, due to (2.7) we can consider an unconditional expectation on V*, given that we differentiate with respect to the direction 1A.

Since X is Ft measurable it has to stay constant on the sets σ(t,ω). Hence,

  V(t)X(t)(ω) =E(V(ω)X(ω)|σ(t,ω))  
and
  V(ω)X(ω) =E(V*(ω)X(ω)|σ(t,ω)).  

Due to (2.7), we may take the expectation over all paths instead of just σ(t,ω) (because the other paths do not depend on the value of X(ω)), that is, we can replace the inner conditional expectation with an unconditional one:

  V(t)X(t)(ω)=E(X(ω)E(V*)|σ(t,ω)).  

This puts us in a situation to calculate the pathwise derivative of the unconditional expectation of V*, followed by a conditional expectation. The result has a simple interpretation: the conditional expectation can be replaced by an unconditional one since the paths on the complement of A=σ(t,ω) have no dependency on the initial value of A. This is intuitive if we depict the simulation as a nested Monte Carlo simulation.

This proves the result since Dk? are the backward differentiations of the unconditional expectation of V*. ∎

2.2.2 Interpretation

The result can be easily illustrated using matrix notion. Given the pathwise derivative matrix M=(V*(ωi)/X(ωj)), we apply the projection operator P=FFT (corresponding to the conditional expectation) and have

  V(t)X(ω)=PMP=FFTMFFT.  

Here, the projection on the left-hand side is due to V being a conditional expectation, and the projection on the right-hand side is due to considering directional derivatives with respect to an t-measurable random variable. The matrix F consists of columns representing the renormalized indicator functions ?Ai of the sets Ai{σ(t,ω)ωΩ} generating the filtration t. The matrix FFT is block diagonal with blocks of renormalized unit vectors. The matrix FTF is an m×m identity matrix (where m is the number of sets Ai).

Now (2.7) implies that M is a block-diagonal matrix, where each block corresponds to a column fi of F, that is, fiTMfj=0 for ij. In other words, FTMF is a diagonal matrix. Using the vector e=(1/n)fi=(1/n)(1,,1)T (the projection vector corresponding to the unconditional expectation), this implies

  fiTMfi=eTMfi,  

that is,

  fiTMFFT=eTMFFT,   (2.10)

that is,

  FFTMFFT=eeTMFFT.   (2.11)

This last equation is the backward differentiation ( vector multiplication from left to right) of an unconditional expectation (eT) carried out as a pathwise backward differentiation (M) followed by a conditional expectation on the row vector of results (FFT).

These results are exact for a full-nested Monte Carlo simulation. If we replace the conditional expectation operators by some approximation (eg, regression methods, where the matrix of regression functions is not generating t), the different approaches may give different approximations of the derivative. The reason for this is simple: the differentiation of an approximation is not the same as the approximation of a differentiation.

Such problems are not specific to our application. A similar example would be comparing the calculation of a sensitivity via pathwise differentiation to that calculation via likelihood ratio (the dual method) and applying a Monte Carlo approximation to both: the approximation errors are different.

2.3 Forward sensitivity via forward differentiation

We may also derive the corresponding result for forward differentiation. Using (2.7), we have

  fiTMfi=fiTMe,  

that is, in correspondence with (2.10), we have

  FFTMfi=FFTMe.   (2.12)

This means that we can calculate the forward sensitivity (V(t)/X(t))(ω) by “bumping” the input X simultaneously on all paths, followed by a single conditional expectation on the output values V*. For Markovian processes V, this result can be improved further (see Fries 2018).

Note, however, that in many applications forward mode differentiation may be less efficient, since we may be interested in the dependency of a single V on multiple t-measurable random variables X.

2.4 Bermudan options

The theorem also applies to Bermudan options or other products incorporating additional conditional expectation operators. For a Bermudan option, the valuation V(0) can be represented by a random variable U such that V(0)=N(0)E(U), where U is constructed using the backward algorithm (see Fries 2017a, 2007), ie, the Snell envelope.

In this case, the process

  V*(t)=N(t)E(Ut)  

has a natural interpretation: it represents the value of the Bermudan option including the accrual of past cashflows and/or the accrual of the value at a past exercise. In other words, if τ is the exercise time of a Bermudan option, then for τ<t we have V*(t)=V*(τ)N(t).

Note that the sensitivity of V* is exactly the quantity required in an xVA (margin valuation adjustment (MVA)) simulation, because the value includes the information of a past exercise.

2.4.1 Conditional expectation estimator: choice of basis functions

Assuming we use a least-squares regression for the estimation of the conditional expectation E(t), there is a pitfall in the naive application of the theorem to a Bermudan option: a poor choice of regression basis functions.

To illustrate this choice-of-basis-function issue, consider the example of a Bermudan option where we have the choice to receive K in T1 or S(T2) in T2. If τ denotes the optimal exercise time, we have that, conditional on τ>T1, the product has a delta of 1. Indeed, we would find

  V(t,ω)S(t,ω)={1for τ=T2,0for τ=T1.  

Now, consider that we use a regression with basis functions that are functions of S(t) (this is a common approach). Assume, for simplicity, that our basis function is just the constant 1: in this case, the conditional expectation estimator is the unconditional expectation, and we get E(V(t,ω)/S(t,ω))=P({τ=T2}) instead of E(V(t,ω)/S(t,ω)t)=1. Further, a function of S(t) with t>T1 cannot precisely capture the T1 exercise boundary, because S(t)-S(T1) is random.

From this example, it is obvious that the conditional expectation estimator can be improved in a simple way by including the information τ>T1 in the basis functions, that is, our basis functions are multiplied by the indicator ?τ>T1. We do not include the path τ=T1 in the regression, because the delta is known for these paths. Indeed, with this modification even the trivial basis function 1×?τ>T1 results in the correct estimate.

We illustrate the effect for the Bermudan option in Figure 7.

2.5 Discontinuous payoffs, differentiation of indicator functions

It is known that in a Monte Carlo simulation the presence of discontinuous payoffs will lead to high Monte Carlo errors for finite difference approximations of derivatives (Glasserman 2003). The expected stochastic automatic differentiation allows us to improve the derivative of discontinuities (see Fries 2017a).

Let us quickly mention that this representation can be used for the forward sensitivities without change. If we are only interested in the conditional expectation of the final result, it is sufficient to consider

  E(AX?(X>0)|t),  

which evaluates to

  E(AX?(X>0)|t) =E(A{X=0}t),  
which can be approximated by
    E(A12δ?(|X|<δ)|t).   (2.13)

That is, the differentiation of the indicator can be represented as a conditional expectation of the adjoint derivative A. Since our algorithm allows us to handle conditional expectation operators, this result opens up new ways to approximate the differentiation of the indicator function. For the above approximation, we find that, if we are only interested in the conditional expectation of the final result, we can approximate

  X?(X>0)12δ?(|X|<δ).  

This is the same approximation as for the time-zero sensitivities, so no special treatment is required for forward sensitivities.

It is important to note that our implementation allows us to adapt the handling of the differentiation individually for each indicator function (on a per-operator basis); see Fries (2017a) for an example.

2.5.1 Indicator functions at exercise boundary

Indicator functions also occur to express the exercise boundary of a Bermudan option, for example. In case of an optimal exercise, the first derivative of the indicator is known to be zero (Piterbarg 2004). Hence, it can – theoretically – be neglected. That said, numerical errors, eg, improper estimation of the exercise boundary, may lead to a positive contribution of differentiation of the exercise boundary.

Since our method allows us to assess this effect, it may be used to check the optimality of the exercise boundary (see Fries 2017a).

The implementation allows us to enable or disable differentiation of the indicator function on an individual (per-operator) basis, eg, allowing us to avoid differentiating an optimal exercise.

3 Numerical results

A typical application of stochastic forward sensitivities is the exact calculation of an initial MVA, assuming that the initial margin is determined from a sensitivity-based risk model. The ISDA Standard Initial Margin Model (ISDA SIMM) is an example of such a model.

However, presenting results for an MVA calculation (which is straightforward now) is not a good test case, since we lack a benchmark.

Instead, we analyze the hedge error of a delta hedge in a hedge simulation under a model for which we know the analytic solution.

3.1 Hedge performance of a delta hedge (using stochastic AAD forward sensitivities)

We consider the valuation of a derivative V, eg, a European option with V(T)=max(S(T)-K,0), given a model SDE

  dS(t) =rS(t)dt+σ(t)S(t)dW(t), S(0) =S0,  
  dN(t) =rN(t)dt, N(0) =N0,  

for the asset S and the bank account N.

Under this model, using a time discretization 0=t0<t1<<tn=T, we consider the delta hedge portfolio Π(ti) given by

  Π(ti)=ϕ1(ti)S(ti)+ϕ0(ti)N(ti),  

where

  ϕ1(ti) :=V(ti)S(ti)   (delta),  
  ϕ0(ti) :=1N(ti)(ϕ1(ti)-ϕ1(ti-1))S(ti)for i>0   (self-financing condition),  
  ϕ0(ti) :=1N(ti)(ϕ1(ti)S(t0)-V(t0))for i=0   (initial value).  

Note that the initial value implies Π(t0)=V(t0).

The time-discrete delta hedge results in a replication portfolio with

  Π(T)V(T).  

The hedge error V(T)-Π(T) depends on two aspects:

  • the frequency of the hedge, ie, the time-step size ti-ti-1; and

  • the accuracy of the calculation of the sensitivities V(ti)/S(ti).

We perform a time-discrete delta hedge on 50 000 paths with 100 time steps. Note that V(ti)/S(ti) is a random variable. Hence, the model requires the calculation of 5 million forward sensitivities. In this case, the numerical calculation of the forward sensitivities using stochastic automatic differentiation required 10 seconds on a standard MacBook Pro (Mid 2014, 2.8 GHz Core i7 (I7-4980HQ)). The implementation was performed in Java. The source code is available at finmath.net (n.d.-b).

3.1.1 Delta hedge of a European option

Under the given model, we have analytic expressions for the forward sensitivities V(ti)/S(ti) of European options. Hence, we can benchmark the hedge simulation using sensitivities obtained by our numerical method against the hedge simulation using analytic sensitivities. Note that both methods will show a residual error due to the time discretization.

We test the method using a European option with maturity T=5.0 and strike K=S0exp(rT).

The results are presented in Figures 13.

In Figure 1, we show the final time-T value of the option payoff V(T) and the replication portfolio Π(T). The delta hedge reproduces the final payoff with only small errors.

In Figure 2, we show the error distribution using the analytic formula for delta (blue) and the numerical (AAD) calculation for delta (green). From this, we see that the residual errors correspond to the error expected from the time discretization. We also depict the result if we omit the conditional expectation step in the calculation of forward sensitivity (red). We obtain wrong sensitivities and the hedge has a huge error.

Final payoff ... and the replication portfolio ... as a function of the underlying value. The plot shows the points ... (green) and ... (blue) for a set of sampling paths .... The replication portfolio is determined using a delta hedge, where delta is calculated using expected stochastic automatic backward differentiation.
Figure 1: Final payoff V(T)=max(S(T)-K,0) and the replication portfolio Π(T) as a function of the underlying value. The plot shows the points (S(T,ω),V(T,ω)) (green) and (S(T,ω),Π(T,ω)) (blue) for a set of sampling paths ω=ω1,ω2,. The replication portfolio is determined using a delta hedge, where delta is calculated using expected stochastic automatic backward differentiation.
Distribution of the hedge error. We show the distribution of the hedge error ... for the delta hedge of a European option using analytic deltas (blue) as well as numerical deltas calculated using expected stochastic automatic differentiation (green). We also depict the wrong result we get if we omit the conditional expectation step in the calculation of forward sensitivity (red).
Figure 2: Distribution of the hedge error. We show the distribution of the hedge error V(T)-Π(T) for the delta hedge of a European option using analytic deltas (blue) as well as numerical deltas calculated using expected stochastic automatic differentiation (green). We also depict the wrong result we get if we omit the conditional expectation step in the calculation of forward sensitivity (red).

In Figure 3, we show the error distribution using the analytic formula for delta and the numerical (AAD) calculation for delta with different numbers of paths. For the analytic method the number of paths is irrelevant. For the numerical method the number of paths enters into the accuracy of the estimation of the conditional expectation operator.

Distribution of the hedge error. We show the distribution of the hedge error ... for the delta hedge of a European option using analytic deltas (blue) as well as numerical deltas calculated using expected stochastic automatic differentiation (green, orange). The delta hedges using numerical estimates of the deltas are shown for different numbers of paths.
Figure 3: Distribution of the hedge error. We show the distribution of the hedge error V(T)-Π(T) for the delta hedge of a European option using analytic deltas (blue) as well as numerical deltas calculated using expected stochastic automatic differentiation (green, orange). The delta hedges using numerical estimates of the deltas are shown for different numbers of paths.

3.1.2 Delta hedge of a Bermudan option

We test the delta hedge of a Bermudan option. As we do not have analytic expressions for the forward sensitivities V(ti)/S(ti), we are limited to performing the replication using forward sensitivities obtained from the expected stochastic automatic differentiation. We can then analyze the terminal hedge error of the replication portfolio and the (accrued) derivative payoffs, eg, comparing them to the hedge error distribution of the European option analyzed in the previous section.

The exact specification of the test product is given in Table 1.

Table 1: Specification of the Bermudan option test product. [Bermudan option characteristics (K=S0exp(rT), T=5.0).]
Exercise Payoff upon   Exercise
time exercise Strike probability
τ=T1=2.0 S(T1)-K1 K1=0.7K P{τ=T1}=0.16
τ=T2=3.0 S(T2)-K2 K2=0.75K P{τ=T2}=0.20
τ=T3=4.0 S(T3)-K3 K3=0.8K P{τ=T3}=0.07
τ=T4=5.0 S(T4)-K4 K4=K P{τ=T4}=0.06
τ>T4=5.0 0   P{τ>T4}=0.51

The results are presented in Figures 46.

In Figure 4, we show the final time-T value of the option payoff V(T) and the replication portfolio Π(T) as a function of the underlying S(τ) at the exercise time τ.88 8 This is a nice illustration of the payoff of the Bermudan option, because it somewhat overlays the payoffs at the different exercise times. Note that the difference between V(τ) and V(T) is just a deterministic accrual factor. Apparently the delta hedge reproduces the final payoff with only small errors.

Final payoff V(T) and the replication portfolio ... as a function of the underlying value ... at the exercise time tau. The plot shows the points ... (green) and ... (blue) for a set of sampling paths .... The replication portfolio is determined using a delta hedge, where delta is calculated using expected stochastic automatic backward differentiation.
Figure 4: Final payoff V(T) and the replication portfolio Π(T) as a function of the underlying value S(τ) at the exercise time τ. The plot shows the points (S(τ(ω),ω),V(T,ω)) (green) and (S(τ(ω),ω),Π(T,ω)) (blue) for a set of sampling paths ω=ω1,ω2,. The replication portfolio is determined using a delta hedge, where delta is calculated using expected stochastic automatic backward differentiation.
Distribution of the hedge error. We show the distribution of the hedge error ... for the delta hedge of a European option (blue) and the delta hedge of the Bermudan option (green), both calculated using the expected stochastic automatic differentiation. We also depict the wrong result we get if we omit the conditional expectation step in the calculation of forward sensitivity (red).
Figure 5: Distribution of the hedge error. We show the distribution of the hedge error V(T)-Π(T) for the delta hedge of a European option (blue) and the delta hedge of the Bermudan option (green), both calculated using the expected stochastic automatic differentiation. We also depict the wrong result we get if we omit the conditional expectation step in the calculation of forward sensitivity (red).
Distribution of the hedge error. We show the distribution of the hedge error ... for the delta hedge of a European option (blue) and the delta hedge of the Bermudan option (green, orange), both calculated using expected stochastic automatic differentiation. For the European option we use 50\,000 paths; for the Bermudan option we use different numbers of paths.
Figure 6: Distribution of the hedge error. We show the distribution of the hedge error V(T)-Π(T) for the delta hedge of a European option (blue) and the delta hedge of the Bermudan option (green, orange), both calculated using expected stochastic automatic differentiation. For the European option we use 50 000 paths; for the Bermudan option we use different numbers of paths.

In Figure 5, we show the error distribution of the hedge error using the numerical (AAD) calculation for delta. We compare the hedge error of the European option (blue) with that of the Bermudan option (green). From this, we see that the residual errors correspond to the error expected from the European option. In this figure, we also depict the result if we omit the conditional expectation step in the calculation of forward sensitivity (red). We obtain wrong sensitivities and the hedge has a huge error. We see that the hedge error for the Bermudan option is slightly smaller than that of the largest European option. This is due to the shorter maturity options embedded in the Bermudan option having smaller hedge errors.

In Figure 6, we show the error distribution of the delta hedge of the Bermudan option using the numerical (AAD) calculation for delta with different numbers of paths. We also show the error distribution for the corresponding European option.

3.2 Choice of basis functions

In Figure 7, we compare the final derivative value V(T) (including accrual of past cashflows) and the corresponding replication portfolio Π(T), varying the basis functions used in the conditional estimation of the stochastic derivative.

We see that, for a Bermudan option, including the information of past exercises (Figure 7(b)) gives good results, whereas not including this information gives very poor results (Figure 7(a)).

Final payoff V(T) and the replication portfolio ... as a function of the underlying value ... at exercise time tau. (a) Poor choice of basis functions. (b) Smarter choice of basis functions. The plot shows the points ... (green) and ... (blue) for a set of sampling paths .... The replication portfolio is determined using a delta hedge, where delta is calculated using expected stochastic automatic backward differentiation. In part (a), the basis function for the expectation conditional ... is a function of S(t). In part (b), the basis function for the conditional expectation ... is a function of ....
Figure 7: Final payoff V(T) and the replication portfolio Π(T) as a function of the underlying value S(τ) at exercise time τ. (a) Poor choice of basis functions. (b) Smarter choice of basis functions. The plot shows the points (S(τ(ω),ω),V(T,ω)) (green) and (S(τ(ω),ω),Π(T,ω)) (blue) for a set of sampling paths ω=ω1,ω2,. The replication portfolio is determined using a delta hedge, where delta is calculated using expected stochastic automatic backward differentiation. In part (a), the basis function for the expectation conditional t is a function of S(t). In part (b), the basis function for the conditional expectation t is a function of S(t)?τ>t.

3.3 Performance results

We now summarize some results from the performance of the algorithm. The algorithm was implemented in Java (Java 8 update 121), using finmath.net (n.d.-a,b) running on a MacBook Pro (Mid 2014, 2.8 GHz Core i7 (I7-4980HQ)). The results are summarized in Table 2.

Table 2: Some performance numbers. [EU denotes European option and BER denotes Bermudan option. RMSE is root mean square error.]
  Sensitivities
   
  Analytic Stochastic AAD
     
Product EU EU EU EU EU BER BER
No. of paths 50 000 50 000 50 000 100 000 50 000 50 000 100 000
No. of time steps 100 200 100 100 200 100 100
Valuation 0.30s 0.59s 0.59s 0.46s 0.93
(model simulation)              
Stoch. derivatives 0.08s 0.15s 0.17s 0.16s 0.33
(5 or 10 million)              
Derivatives 11s 21s 23s 14s 31s
(cond. expectation)              
Total 3s 5s 12s 22s 24s 15s 32s
(calculation time)              
Accuracy 0.029 0.020 0.034 0.033 0.028 0.031 0.025
(hedge RMSE)              

The performance results are interesting with respect to their scaling properties. Apparently, the calculation of the 5 million or 10 million stochastic sensitivities is comparable to a single valuation. Here, the valuation also includes the model simulation and the building of the operator tree. The major part of the calculation time is used up by the conditional expectation step. Note, however, that the conditional expectation step – at least for noncallable products – does not scale with the number of derivative products in a portfolio, because the conditional expectation estimator is constructed from a model-dependent singular value decomposition, which may be shared among products or applied to an aggregate (as long as the basis functions are the same). This implies that, at least for some products, one may reuse the conditional expectation estimator.

3.4 Benchmark implementation

The results presented in this section were produced with version 0.7.0 of finmath.net (n.d.-a). The delta hedge is implemented in the package

  net.finmath.montecarlo.assetderivativevaluation.products  

in the class

  DeltaHedgedPortfolioWithAAD  

To reproduce the results of this section, run the unit test

  DeltaHedgedPortfolioWithAADTest  

(in the same package). More results can be found at finmath.net (n.d.-b).

3.5 Calculation of an exact MVA based on ISDA SIMM

The ISDA SIMM requires forward sensitivities to calculate an initial margin. Using the forward sensitivities derived from our AAD algorithm, we get the ISDA SIMM initial margin by transforming from model sensitivities to SIMM sensitivities. This is just an additional step in the chain rule. For details on this additional step and additional performance improvements, see Fries et al (2018). We summarize some results taken from this paper.

The first step toward an MVA is to simulate the stochastic process of the forward initial margin, ie, the initial margin IM(t,ω) at the future time t on path ω. The ISDA SIMM model gives this value in terms of on-path sensitivities, ie, forward sensitivities. Hence, it is an application of the algorithm presented in the previous sections. Given IM(t,ω), the MVA is defined as the funding costs of the initial margin, which is given by aggregating IM(t,ω) and taking the expectation

  MVA(t0)=IM(t0)+Nfd(t0)?(t01Nfd(t)dIM(t)|t0),  

where Nfd is the funding numéraire.99 9 The intuition behind this formula is simple: consider a constant initial margin IM(t)=IM(0) for tT, dropping to zero after maturity T, ie, IM(t)=0 for t>T. Then, MVA(t0)=IM(t0)+Nfd(t0)?(t01Nfd(t)dIM(t)|t0)=IM(t0)+Nfd(t0)?(1Nfd(T)t0)(0-IM(t0))=IM(t0)(1-Pfd(T;t0)), where Pfd(T) is the (funding) zero-coupon bond with maturity T. That is, we borrow the IM at the funding rate.

Forward initial margin of a Bermudan callable using a Libor market model for the simulation, stochastic AAD for the forward sensitivities and ISDA SIMM for the initial margin. At call dates, the Bermudan exercises into a swap, and on those paths the initial margin jumps to the classic initial margin of a swap. We depict a small selection of the sample path in blue, with the average in red and the standard deviation in gray.
Figure 8: Forward initial margin of a Bermudan callable using a Libor market model for the simulation, stochastic AAD for the forward sensitivities and ISDA SIMM for the initial margin. At call dates, the Bermudan exercises into a swap, and on those paths the initial margin jumps to the classic initial margin of a swap. We depict a small selection of the sample path in blue, with the average in red and the standard deviation in gray.

In Figure 8, we depict the paths of tIM(t,ω) for selected values of ω (blue). For information purposes, we also depict the expected forward initial margin E(IM(t,ω)) (red) as well as the 5% to 95% quantiles of IM(t) (gray). Note, however, that for the calculation of the MVA, the expectation and integration do not commute.

4 Conclusion

In this paper, we presented the calculation of stochastic forward sensitivities using stochastic (backward) automatic differentiation.

Utilizing the common representation of the derivative value as a sum of numéraire-relative future payoffs, we represented all forward sensitivities by a single backward automatic differentiation, applying only a time-dependent conditional expectation operator.

Due to the presence of the conditional expectation operator, we were able to utilize the expected stochastic (backward) automatic differentiation from Fries (2017b) such that we derived forward sensitivities for complex derivatives, where the valuation algorithm included conditional expectation operators (eg, callable products; see Fries (2017a)). Thus, the method is completely general and can be applied to options with early exercise features and path-dependency without any modification.

An important application of this result is the fast and efficient calculation of an MVA, eg, when initial margins are based on sensitivities (like for the ISDA SIMM).

Declaration of interest

The views expressed in this work are the personal views of the authors and do not necessarily reflect the views or policies of current or previous employers. Feedback is welcomed at email@christian-fries.de.

References

  • Antonov, A. (2017). Algorithmic differentiation for callable exotics. Working Paper, April 4, Social Science Research Network (https://doi.org/10.2139/ssrn.2839362).
  • Capriotti, L., and Giles, M. (2011). Algorithmic differentiation: adjoint Greeks made easy. Working Paper, April 2, Social Science Research Network (https://doi.org/10.2139/ssrn.1801522).
  • Capriotti, L., Jiang, Y., and Macrina, A. (2016). AAD and least squares Monte Carlo: fast Bermudan-style options and XVA Greeks. Working Paper, September 23, Social Science Research Network (https://doi.org/10.2139/ssrn.2842631).
  • finmath.net (n.d.-a). finmath-lib automatic differentiation extensions: enabling finmath lib to utilise automatic differentiation algorithms (eg, AAD). URL: http://finmath.net/finmath-lib-automaticdifferentiationextensions.
  • finmath.net (n.d.-b). finmath-lib: mathematical finance library – algorithms and methodologies related to mathematical finance. URLs: http://finmath.net/finmath-lib, https://github.com/finmath/finmath-lib.
  • Fries, C. P. (2007). Mathematical Finance: Theory, Modeling, Implementation. Wiley (https://doi.org/10.1002/9780470179789).
  • Fries, C. P. (2017a). Automatic backward differentiation for American Monte Carlo algorithms (conditional expectation). Working Paper, June 27, Social Science Research Network (https://doi.org/10.2139/ssrn.3000822).
  • Fries, C. P. (2017b). Stochastic automatic differentiation: automatic differentiation for Monte Carlo simulations. Working Paper, June 27, Social Science Research Network (https://doi.org/10.2139/ssrn.2995695).
  • Fries, C. P. (2018). Back to the future: comparing forward and backward differentiation for forward sensitivities in Monte Carlo simulations. Working Paper, January 16, Social Science Research Network (https://doi.org/10.2139/ssrn.3106068).
  • Fries, C. P., Kohl-Landgraf, P., and Viehmann, M. (2018). Melting sensitivities: exact and approximate margin valuation adjustments. Working Paper, January 15, Social Science Research Network (https://doi.org/10.2139/ssrn.3095619).
  • Giles, M., and Glasserman, P. (2006). Smoking adjoints: fast Monte Carlo Greeks. Risk 19(1), 88–92.
  • Glasserman, P. (2003). Monte Carlo Methods in Financial Engineering. Stochastic Modelling and Applied Probability. Springer (https://doi.org/10.1007/978-0-387-21617-1).
  • Homescu, C. (2011). Adjoints and automatic (algorithmic) differentiation in computational finance. Preprint (arXiv:1107.1831v1).
  • Piterbarg, V. (2004). Computing deltas of callable Libor exotics in forward Libor models (2004). The Journal of Computational Finance 7, 107–144 (https://doi.org/10.21314/JCF.2004.109).

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@risk.net or view our subscription options here: http://subscriptions.risk.net/subscribe

You are currently unable to copy this content. Please contact info@risk.net to find out more.

You need to sign in to use this feature. If you don’t have a Risk.net account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here