Journal of Computational Finance
ISSN:
1460-1559 (print)
1755-2850 (online)
Editor-in-chief: Christoph Reisinger
A review of tree-based approaches to solving forward–backward stochastic differential equations
Need to know
- A regression-tree-based approach is proposed to solve numerically high-dimensional backward stochastic differential equations (BSDEs).
- The approach is reviewed from different perspectives, and error analysis is performed.
- Several numerical experiments including high-dimensional problems are provided to demonstrate accuracy and performance.
- For the applicability of BSDEs in financial problems, the Heston stochastic volatility model and a high-dimensional nonlinear pricing problem are solved with the presented approach via BSDEs.
Abstract
In this work, we study ways of solving (decoupled) forward–backward stochastic differential equations numerically using regression trees. Based on general theta-discretization for time integrands, we show how to efficiently use regression-tree-based methods to solve the resulting conditional expectations. Several numerical experiments, including high-dimensional problems, are provided to demonstrate accuracy and performance. To show the applicability of forward–backward stochastic differential equations to financial problems, we apply our tree-based approach to the Heston stochastic volatility model to high-dimensional nonlinear pricing problems.
Introduction
1 Introduction
It is well known that many problems (eg, pricing and hedging) in the field of financial mathematics can be represented in terms of (forward–)backward stochastic differential equations ((F)BSDEs). These make problems easier to solve but usually have no analytical solution (see, for example, Karoui et al 1997a). The general form of (decoupled) FBSDEs reads
(1.1) |
where , is an matrix, is the driver function and is the square-integrable terminal condition.
The existence and uniqueness of solutions to such equations, under the Lipschitz conditions on , , and , were proved by Pardoux and Peng (1990). Since then, many works have tried to relax this condition: for example, the uniqueness of the solution was extended under more general assumptions for in Lepeltier and Martin (1997), but only in the one-dimensional case.
In recent years, many numerical methods have been proposed for coupled and decoupled FBSDEs. For the numerical algorithms with (least-squares) Monte Carlo approaches we refer the reader to Bender and Steiner (2012), Bouchard and Touzi (2004), Gobet et al (2005), Lemor et al (2006) and Zhao et al (2006); the multilevel Monte Carlo method based on Picard approximation can be found in E et al (2019). There exists a connection between BSDEs and partial differential equations (PDEs) (see Karoui et al 1997b; Peng 1991); some numerical schemes with the aid of this connection can be found in, for example, Ma et al (1994). For the deep-learning-based numerical method we refer to E et al (2017). The approach based on the Fourier method for BSDEs is developed in Ruijter and Oosterlee (2015). See also Zhao et al (2010) and Teng et al (2020) for the multi-step scheme.
In Crisan and Manolarakis (2010) an algorithm with the cubature method is proposed to solve FBSDEs. The cubature method is an approximation method for the law of the solution of an SDE that generates a tree on which expectations and conditional expectations are evaluated. By introducing a minimal variance sampling method for the support of the cubature measure, Crisan and Manolarakis address the issue that the tree methods suffer from an exponential increase in the size of the support of the approximating measure. A method relying on an approximation of Brownian motion by a simple random walk is proposed in Ma et al (2002) for BSDEs: that is, the conditional expectations are computed using a tree structure, in which the final condition is allowed to be quite general. In this paper, we show how to efficiently use regression-tree-based approaches to find accurate approximations of FBSDEs (1.1). We apply the general theta-discretization method for the time integrands, and we approximate the resulting conditional expectations using the regression-tree-based approach with sample-splitting techniques. More precisely, we deal with the projection of samples via trees to overcome the curse of dimensionality. Several numerical experiments of different types including 100-dimensional problems are performed to demonstrate our findings.
In the next section we give some notation and definitions, and in Section 3 we introduce the generalized theta method. Section 4 is devoted to how to use the regression-tree-based approaches to approximate conditional expectations. In Section 5 several numerical experiments on different types of FBSDEs, including financial applications, are provided to show the accuracy and applicability for high-dimensional nonlinear problems. Section 6 presents our conclusions.
2 Preliminaries
Throughout the paper, we assume that is a complete, filtered probability space. In this space, a standard -dimensional Brownian motion with a finite terminal time is defined, which generates the filtration , that is, , for FBSDEs. The usual hypotheses should be satisfied. We denote the set of all -adapted and square integrable processes in by . A pair of processes is the solution to the FBSDEs in (1.1) if it is -adapted and square integrable and satisfies (1.1) as
(2.1) |
where is -adapted and .
Suppose that the terminal value is of the form , where denotes the solution of in (1.1) starting from at time . Then the solution of FBSDEs in (1.1) can be represented (Karoui et al 1997b; Ma and Zhang 2005; Peng 1991) as
(2.2) |
which is the solution to the semi-linear parabolic PDE of the form
(2.3) |
with the terminal condition . In turn, suppose is the solution to FBSDEs, is a viscosity solution to the PDEs.
3 Discretization of the forward–backward stochastic differential equation using the theta method
We introduce the time partition for the time interval :
(3.1) |
Let be the time step, and denote the maximum time step by . For the FBSDEs, we need to additionally discretize the forward SDE in (1.1):
(3.2) |
Suppose that the forward SDE in (3.2) can already be discretized by a process such that
(3.3) |
which means strong mean square convergence of order . For the backward process (2.1), the well-known generalized -discretization reads
(3.4) |
For , we have
(3.5) | ||||
(3.6) |
where and denote the discretization errors (see, for example, Li et al 2017; Zhao et al 2012, 2014). Obviously, by choosing the different values for , , we can obtain different schemes. Note that cannot be set to zero.
4 Computation of conditional expectations with the tree-based approach
In this section we show how to innovatively use the tree-based approach (Martinez and Martinez 2007) to compute the conditional expectations included in (3.5) and (3.6). More precisely, the conditional expectation , , is represented by a tree, which is constructed with certain samples of and . The tree will be reused to very efficiently generate samples of by directly taking a group of arbitrary samples of (generated from ) as one input for further iterations. That is to say, we deal with the projection of samples via trees using (3.5) and (3.6); the curse of dimensionality can thus be overcome by, for example, considering the group of generated from entirely as one input for the tree.
4.1 Nonparametric regression
Suppose that the model in nonparametric regression reads
(4.1) |
where has a zero expectation and a constant variance, and it is independent of . It holds that
(4.2) |
The goal in regression is to find the estimator of this function, . By nonparametric regression, we are not assuming a particular form for . Instead, is represented in a regression tree. Given a set of samples , , for , where denotes a predictor variable and presents the corresponding response variable, we construct a regression tree that can then be used to compute for an arbitrary , whose value is not necessarily equal to one of the samples .
As an example, we specify the procedure for (3.5) . Assume that is Markovian, so (3.5) can thus be rewritten as
(4.3) |
There also exist deterministic functions such that
(4.4) |
Starting from time , we construct the regression tree for the conditional expectation in (4.3) using the following samples:
Thereby, the function
(4.5) |
is estimated and presented by a regression tree. By applying (4.5) to the samples we can directly obtain the samples of the random variable for . Recursively, backward in time, these samples will be used to generate samples of the random variables at time . At the initial time , we fix an initial value for , and no samples are needed. Using the regression trees constructed at time we obtain the solution .
4.2 Binary regression tree
We review the procedure in Breiman et al (1984) and Martinez and Martinez (2007) for constructing the best regression tree based on the given samples. Basically, we need to grow, prune and finally select the tree. We use the following notation.
- •
denote samples, namely observed data.
- •
is a node in the tree , and and are the left and right child nodes.
- •
is the set of terminal nodes in the tree with the number .
- •
represents the number of samples in node .
- •
is the average of samples falling into node , namely the predicted response.
4.2.1 Growing a tree
We define the predicted response as the average value of the samples that are contained in a node , ie,
(4.6) |
Obviously, the squared error in the node reads
(4.7) |
The mean squared error for the tree is defined as
(4.8) |
Basically, the tree is constructed by partitioning the space for the samples using a sequence of binary splits. For a split and node , the change in the mean squared error can be thus calculated as
(4.9) |
The regression tree is thus obtained by iteratively splitting nodes with , which yields the largest . Thereby, the decrease in is maximized. Obviously, the optimal stopping criterion is that all responses in a terminal node are the same, but that is not really realistic.
4.2.2 Pruning a tree
When using the optimal stopping criterion, each terminal node contains only one response, so the error , and thus , will be zero. However, first of all, this is unrealistic. Second, the samples are thereby overfitted and the regression tree will thus not generalize well to new observed samples. Breiman et al (1984) suggest growing an overly large regression tree and then finding a nested sequence of subtrees by successively pruning the branches of the tree. This procedure is called pruning a tree. We define an error-complexity measure as
(4.10) |
where represents the complexity cost per terminal node. The error complexity should be minimized by looking for trees. Let be the overly large tree, in which each terminal node contains only one response. Thus, we have , which indicates a high cost of complexity, while the error is small. To minimize the cost we delete the branches with the weakest link in tree , which is defined as
(4.11) |
where is the branch of corresponding to the internal node of subtree . We prune the branch defined by the node ,
(4.12) |
and thus we obtain a finite sequence of subtrees with fewer terminal nodes and decreasing complexity until the root node, which is
(4.13) |
On the other hand, we set
(4.14) |
and we thus obtain an increasing sequence of values for the complexity parameter :
(4.15) |
By observing both sequences (4.13) and (4.15), it is not difficult to see that for the tree is the one that has the minimal cost complexity for .
4.2.3 Selecting a tree
We have to make a trade-off between both the error criteria and the complexity, ie, we need to choose the best tree from the sequence of pruned subtrees. To do this, we can use independent test samples and cross-validation. As an example, we illustrate the independent test sample method; for cross-validation we refer the reader to Breiman et al (1984) and Martinez and Martinez (2007). Clearly, we need honest estimates of the true error to select the right size of tree. To obtain these estimates, we should use samples that were not used to construct the tree to estimate the error. Suppose we have a set of samples , which can be randomly divided into two subsets and . We use the set to grow a large tree and to obtain the sequence of pruned subtrees. Thus, the samples in are used to evaluate the performance of each subtree by calculating the error between the real response and predicted response. We denote by the predicted response using samples for the tree . Then, the estimated error is
(4.16) |
where is the number of samples in . This estimated error will be calculated for all subtrees. We need to pick the subtree that has the smallest number of nodes but still keeps the accuracy of the tree with the smallest error, eg, with an error . To do this, we define the standard error for this estimate as (Breiman et al 1984)
(4.17) |
and then we choose the smallest tree such that
(4.18) |
is a tree with minimal complexity cost but has equivalent accuracy to the tree with the minimum error.
4.3 Practical applications
Due to the linearity of conditional expectation, we construct the trees for all possible combinations of the conditional expectations. We denote the tree’s regression error by and the error of the used iterative method by , and we reformulate the scheme (3.4)–(3.6) by combining conditional expectations as
(4.19) |
For , , we have
(4.20) | ||||
(4.21) |
where denotes the calculated conditional expectation using the constructed regression tree with the samples of . For example, using samples of the predictor variable (ie, ) and samples of the response variable
(ie, ), we construct a regression tree. Then,
means the computed value of
using that constructed tree. This is to say that and are computed based on the tree at the th step, which depends on and . Let
Thus, we define
and
with
while and have zero mean and constant variance and , respectively. and can thus be reformulated as
(4.22) |
and
(4.23) |
where and are defined via (4.16). and denote the trees with the smallest error at the th step. Note that (4.17) and (4.18) are used just to find the tree with the smallest error in (4.16), which has the smallest number of nodes.
Theoretically, the regression error can be neglected by choosing sufficiently large . However, the tree-based approach is computationally not that efficient for a reasonably high value of . Due to the fact that we are only interested in the solutions of and conditional on a known present value of , we propose to split a large sample into a few groups of smaller samples at and project samples for in different groups. The main reason why we do this is that multiple tree-based computations for a small sample are still more efficient than one computation for a large number of samples. The computation can be done rapidly in the case of a constant predictor, and the quality of approximations for and relies directly on the samples of and . Therefore, for the step , we combine the samples of and from all groups, which are used as our samples of response variables. We also use or as the predictor variable. Our numerical results show that the splitting error of the samples’ projection from could be neglected. As an example, we propose the following fully discrete scheme by choosing , and .
- (1)
Generate samples and split them into different groups, where the sample number in each group is .
- (2)
For each group, namely , compute
For , ,
- (3)
Collect all the samples of for and use all these samples to compute
4.4 Error estimates
We perform the error analysis for the scheme , and ; a similar analysis can be done for other choices of . Supposing that the errors of the iterative method can be neglected by choosing a sufficiently high number of Picard iterations, we consider the initial discretization and regression errors. The errors due to the time-discretization are given by
Analogously to the deterministic functions given in (4.4), we define the deterministic functions as
These functions are approximated by the regression trees, resulting in the approximations with
Thus, we denote the global errors by
(4.24) | ||||
(4.25) | ||||
(4.26) |
Assumption 4.1.
Suppose that is -measurable with , and that and are -measurable in , as well as linear-growth-bounded and uniformly Lipschitz continuous, ie, there exist positive constants and such that
with .
Let be the set of continuously differentiable functions with uniformly bounded partial derivatives for and for ; and can be defined analogously, and is the set of functions with uniformly bounded partial derivatives for .
We give some remarks concerning related results on the one-step scheme.
- •
Under Assumption 4.1, if , for some , and are bounded and , then the absolute values of the local errors and in (3.6) and (3.5) can be bounded by when and and by when , and , where is a constant that can depend on , and the bounds of , , and in (1.1) (see, for example, Li et al 2017; Zhao et al 2009, 2012, 2013, 2014).
- •
For convenience of notation we can omit the dependency of local and global errors on the state of the FBSDEs and the discretization errors of , ie, we assume .
- •
For the implicit schemes we will apply Picard iterations that converge due to the Lipschitz assumptions on the driver and for any initial guess when is small enough. In the following analysis, we consider the equidistant time discretization .
For the -component we have (see (4.20))
(4.27) |
where can be bounded using Lipschitz continuity of by
(4.28) |
with Lipschitz constant . And it holds that
(4.29) |
Consequently, we calculate
(4.30) |
where Hölder’s inequality is used.
For the -component in the implicit scheme we have
(4.31) |
This error can be bounded by
(4.32) |
By using the inequality we calculate
(4.33) |
Theorem 4.2.
Under Assumption 4.1, if , for some , and are bounded, , then given
it holds that
(4.34) |
where is a constant that depends only on , and the bounds of , and , in (1.1); is a constant depending on , and ; and are the bounded constants; and and are the number of samples and splitting groups, respectively.
Proof.
By combining both (4.30) and (4.33), it is straightforward to obtain
which implies
We choose such that
(ie, ). Thus, the above inequality can be rewritten as
which implies
By induction, we then obtain
The regression errors and with samples are given by (4.22) and (4.23), from which we can deduce, for example,
Similarly, for and we obtain
respectively. Finally, with the known conditions and bounds of the local errors mentioned above, the proof is complete. ∎
5 Numerical experiments
In this section we use some numerical examples to show the accuracy of our methods for solving the FBSDEs. and are the total number of discrete time steps and the sample size, respectively. For all the examples, we consider an equidistant time partition and perform 20 Picard iterations. We run the algorithms 10 times independently and take the average value of the absolute error, where two different seeds are used every five simulations. Note that in the following numerical experiments we present the errors of and corresponding to Theorem 4.2. The errors at other time steps and for different fixed sample points are similar to the results in our test. The numerical experiments are performed using an Intel Core i5-8500 3.00 GHz processor with 15 GB RAM.
5.1 Example of linear BSDE
The first BSDE we consider is
(5.1) |
with the analytical solution
(5.2) |
The generator contains the components and . We set , where the analytical solution of is .
First, in order to see the computational acceleration by using the sample-splitting introduced above, we compare the scheme of using the sample-splitting (samples merging at different steps) with not using the sample-splitting in Figure 1. Letting and denote the result on the th run of the algorithm, , the approximations read as
In our tests we consider the average of the absolute errors: that is,
We see that there are no considerable differences for approximating . Also, the approximations of with the sample-splitting against converge in a very stable fashion. The sample-splitting allows a very efficient computation, especially merging at the penultimate step. In the remainder of this paper we perform all the schemes, always using the splitting with unless otherwise specified.
Next we study the influence of on the error. We fix the number of steps to and test all possible values of . We find that the explicit schemes for , and and the implicit schemes for , and show good convergence in terms of . As an example we report the absolute errors and the empirical standard deviations
for some chosen schemes in Table 1. We observe that the second-order scheme gives a very small error for quite a large , although . This is to say that the linear example is oversimplified in the scheme with the tree-based regression and can thus mislead in terms of efficiency. Therefore, in order to obtain reliable numerical convergence rates for our proposed tree-based approach, we will consider for the following analysis and all the numerical examples, which provides the smallest errors for both and in all the schemes except for the scheme . To take this a step further, we now fix and plot in Figure 2 the absolute error against the number of steps. We see that the scheme converges meaningfully.
(a) | |||||||
---|---|---|---|---|---|---|---|
2 000 | 5 000 | 10 000 | 50 000 | 100 000 | 200 000 | 300 000 | |
() | 0.0254 | 0.0220 | 0.0251 | 0.0248 | 0.0247 | 0.0244 | 0.0246 |
Standard deviation | 0.0193 | 0.0147 | 0.0099 | 0.0023 | 0.0021 | 0.0016 | 8.3067e04 |
() | 0.0177 | 0.0128 | 0.0125 | 0.0123 | 0.0121 | 0.0118 | 0.0120 |
Standard deviation | 0.0196 | 0.0151 | 0.0102 | 0.0023 | 0.0021 | 0.0017 | 9.1699e04 |
() | 0.0169 | 0.0129 | 0.0073 | 0.0020 | 0.0019 | 0.0017 | 7.2826e04 |
Standard deviation | 0.0197 | 0.0162 | 0.0113 | 0.0025 | 0.0022 | 0.0019 | 0.0011 |
() | 0.0171 | 0.0124 | 0.0070 | 0.0022 | 0.0020 | 0.0019 | 0.0017 |
Standard deviation | 0.0197 | 0.0159 | 0.0110 | 0.0024 | 0.0021 | 0.0019 | 0.0010 |
(b) | |||||||
2 000 | 5 000 | 10 000 | 50 000 | 100 000 | 200 000 | 300 000 | |
() | 0.1221 | 0.1210 | 0.1190 | 0.1166 | 0.1187 | 0.1197 | 0.1201 |
Standard deviation | 0.0303 | 0.0199 | 0.0147 | 0.0079 | 0.0037 | 0.0032 | 0.0025 |
() | 0.0579 | 0.0578 | 0.0562 | 0.0537 | 0.0561 | 0.0575 | 0.0578 |
Standard deviation | 0.0319 | 0.0235 | 0.0165 | 0.0081 | 0.0042 | 0.0036 | 0.0027 |
() | 0.0991 | 0.0453 | 0.0295 | 0.0158 | 0.0144 | 0.0074 | 0.0058 |
Standard deviation | 0.1111 | 0.0550 | 0.0312 | 0.0171 | 0.0173 | 0.0077 | 0.0060 |
() | 0.1114 | 0.1111 | 0.1095 | 0.1072 | 0.1095 | 0.1107 | 0.1112 |
Standard deviation | 0.0300 | 0.0230 | 0.0151 | 0.0079 | 0.0042 | 0.0035 | 0.0028 |
(a) | ||||||
1 000 | 2 000 | 20 000 | 100 000 | 300 000 | CR | |
2 | 4 | 8 | 16 | 32 | ||
() | 0.0329 | 0.0194 | 0.0080 | 0.0045 | 0.0012 | 1.17 |
Standard deviation | 0.0276 | 0.0224 | 0.0051 | 0.0020 | 9.8927e04 | |
() | 0.0262 | 0.0174 | 0.0056 | 0.0027 | 7.9174e04 | 1.28 |
Standard deviation | 0.0279 | 0.0226 | 0.0052 | 0.0020 | 9.9436e04 | |
(b) | ||||||
1 000 | 2 000 | 20 000 | 100 000 | 300 000 | CR | |
2 | 4 | 8 | 16 | 32 | ||
() | 0.1142 | 0.0752 | 0.0235 | 0.0157 | 0.0092 | 0.95 |
Standard deviation | 0.0273 | 0.0271 | 0.0193 | 0.0078 | 0.0055 | |
() | 0.0516 | 0.0448 | 0.0149 | 0.0091 | 0.0066 | 0.82 |
Standard deviation | 0.0285 | 0.0265 | 0.0180 | 0.0086 | 0.0056 | |
(c) Average run time (seconds) | ||||||
1 000 | 2 000 | 20 000 | 100 000 | 300 000 | CR | |
2 | 4 | 8 | 16 | 32 | ||
() | 0.1 | 0.5 | 2.3 | 23.9 | 147.0 | |
() | 0.1 | 0.2 | 2.3 | 25.5 | 144.4 |
5.2 Example of two-dimensional FBSDE
For the two-dimensional FBSDE we consider the Heston stochastic volatility model (Heston 1993), which reads
(5.3) |
where is the spot price of the underlying asset and is the volatility. It is well known that the Heston model (5.3) can be reformulated as
(5.4) |
where and are independent Brownian motions. To find the FBSDE form for the Heston model we consider the following self-financing strategy:
(5.5) | ||||
(5.6) |
where is the value of another option for hedging volatility; is used for the risk-free asset; and , and are numbers of the option, the underlying asset and the risk-free asset, respectively. We assume that
(5.7) |
which can be substituted into (5.6) to obtain
(5.8) |
with
(5.9) |
In the Heston (1993) model, the market price of the volatility risk is assumed to be . With the notation used in (5.4), the Heston pricing PDE including reads
(5.10) |
The solution of the FBSDE (5.8) is exactly the solution of the Heston PDE (5.10) by choosing . Equations (5.8) and (5.9) can thus be reformulated as
(5.11) |
with defined in (5.9). To the best of our knowledge, this is the first work in the literature to derive the Heston BSDE. Note that the generator in this example could be non-Lipschitz continuous. The European-style option can be replicated by hedging this portfolio. We consider a call option, for example, whose value at time is identical to the portfolio value , and . Hence, is the Heston option value and presents the hedging strategies, where
The parameter values used for this numerical test are
which give the exact solution . The forward processes and are simulated using the Euler method, and for the final values at the maturity we take
(5.12) |
where . The corresponding relative errors are reported in Table 3. We obtain a fairly accurate approximation for the Heston option price by solving the two-dimensional FBSDE, although the generator is not Lipschitz continuous. It is well known that a splitting scheme of the alternating direction implicit (ADI) type has been widely analyzed and applied to efficiently find the numerical solution of a two-dimensional parabolic PDE. We thus compare our tree-based approach to the Craig–Sneyd Crank–Nicolson ADI finite difference scheme (Craig and Sneyd 1988) for solving the Heston model in Table 3. We denote by and the number of points for the stock price and the volatility grid, respectively. The ADI scheme is performed in the domain for and for , with a uniform grid ; the time steps are given in Table 3. We can observe that the tree-based approach gives a better (at least, compatible) result.
(a) Tree-based approach () | |||||
---|---|---|---|---|---|
5 000 | 10 000 | 40 000 | 100 000 | 300 000 | |
2 | 4 | 8 | 16 | 32 | |
0.0207 | 0.0115 | 0.0043 | 0.0028 | 0.0015 | |
Standard deviation | 0.0840 | 0.0307 | 0.0173 | 0.0109 | 0.0051 |
Average run time (s) | 0.2 | 0.9 | 7.4 | 38.8 | 241.3 |
(b) Craig–Sneyd Crank–Nicolson ADI scheme | |||||
2 | 4 | 8 | 16 | 32 | |
0.0900 | 0.0103 | 0.0068 | 0.0062 | 0.0061 | |
Average run time (s) | 0.2 | 0.7 | 1.2 | 2.7 | 6.2 |
(c) Tree-based approach () | |||||
100 | 500 | 1 000 | 5 000 | 10 000 | |
8 | 8 | 8 | 8 | 8 | |
0.1181 | 0.0612 | 0.0278 | 0.0162 | 0.0051 | |
Standard deviation | 0.4637 | 0.2137 | 0.1171 | 0.0561 | 0.0183 |
Average run time (s) | 0.1 | 0.2 | 0.2 | 1.0 | 1.9 |
5.3 Examples of nonlinear FBSDE
In this section we test our scheme for nonlinear high-dimensional problems. We find that nonlinear training data may lead to overfitting when directly using the procedure introduced above. Therefore, to avoid overfitting for the nonlinear problems, we propose to control the error while already growing a tree. To do this, we estimate the cross-validation mean squared errors of the trees, which are constructed with a different number of observations in each branch node. Clearly, the best tree (ie, that with the optimal number of observations in each branch node) can be determined by comparing the errors. Theoretically, for the best results, the error control needs to be performed for each time step. However, that would be too computationally expensive. Fortunately, in our test we find that the optimal numbers of observations for each time step are very close to each other. For substantially less computation time, we only need to determine one of these, eg, for the first iteration, and then we need to fix it for all other iterations. We note that the pruning procedure cannot bring considerable improvement when the tree has been grown using the optimal number of observations, and it is thus unnecessary in this case.
(a) | ||||||
10 000 | 20 000 | 50 000 | 100 000 | 200 000 | 300 000 | |
0.0221 | 0.0024 | 0.0018 | 0.0011 | 0.0011 | 0.0009 | |
Standard deviation | 0.0027 | 0.0023 | 0.0023 | 0.0013 | 0.0013 | 0.0007 |
0.0247 | 0.0167 | 0.0100 | 0.0068 | 0.0050 | 0.0050 | |
Standard deviation | 0.0117 | 0.0068 | 0.0041 | 0.0017 | 0.0024 | 0.0021 |
Average run time (s) | 0.3 | 0.9 | 1.8 | 4.4 | 8.8 | 13.2 |
(b) | ||||||
10 000 | 20 000 | 50 000 | 100 000 | 200 000 | 300 000 | |
0.2729 | 0.1140 | 0.0405 | 0.0180 | 0.0081 | 0.0050 | |
Standard deviation | 0.0129 | 0.0033 | 0.0022 | 0.0008 | 0.0004 | 0.0002 |
0.0281 | 0.0175 | 0.0101 | 0.0072 | 0.0047 | 0.0039 | |
Standard deviation | 0.0019 | 0.0012 | 0.0009 | 0.0007 | 0.0005 | 0.0004 |
Average run time (s) | 35.5 | 80.4 | 215.1 | 450.4 | 1075.1 | 2015.7 |
We first consider a modified version of BSDE with quadratically growing derivatives (derived in Gobet and Turkedjiev (2015)), whose explicit solution is known. The BSDE in the 100-dimensional case is analyzed numerically in E et al (2017) and given by
(5.13) |
with the analytical solution
(5.14) |
where . Letting , , we test our scheme for and in order to compare our approach with methods in Gobet and Turkedjiev (2015) and E et al (2017). We plot the absolute errors of and against the number of steps in Figures 4 and 5 for the 3-dimensional and 100-dimensional cases, respectively, where the sample sizes are roughly adjusted according to the time partitions. We obtain very good numerical results and reach an error of order . To show the errors against the sample size , we fix and report the errors in Table 4, where the convergence in a stable fashion is shown. Further, a better approximation (smaller standard deviations) can always be achieved with larger . Note that the sample-splittings and are used for 3-dimensional and 100-dimensional problems, respectively, when .
(a) Tree-based approach | |||||||
---|---|---|---|---|---|---|---|
2 000 | 4 000 | 10 000 | 20 000 | 50 000 | 100 000 | 200 000 | |
0.0239 | 0.0152 | 0.0107 | 0.0073 | 0.0043 | 0.0035 | 0.0013 | |
Standard deviation | 0.2089 | 0.1100 | 0.0953 | 0.0686 | 0.0364 | 0.0352 | 0.0130 |
Average run time (s) | 0.1 | 0.1 | 0.2 | 0.3 | 0.9 | 1.8 | 3.7 |
(b) Multilevel Monte Carlo method | |||||||
No. of Picard iterations | |||||||
1 | 2 | 3 | 4 | 5 | 6 | 7 | |
0.8285 | 0.4417 | 0.1777 | 0.1047 | 0.0170 | 0.0086 | 0.0019 | |
Standard deviation | 7.7805 | 4.0799 | 1.6120 | 0.8106 | 0.1512 | 0.0714 | 0.0157 |
Average run time (s) | 0.0 | 0.0 | 0.0 | 0.3 | 3.1 | 38.7 | 1915.1 |
(a) Tree-based approach | ||||||
10 000 | 20 000 | 50 000 | 100 000 | 200 000 | 300 000 | |
0.0212 | 0.0111 | 0.0027 | 0.0026 | 0.0025 | 0.0022 | |
Standard deviation | 0.0652 | 0.0622 | 0.0283 | 0.0234 | 0.0179 | 0.0135 |
Average run time (s) | 20.1 | 43.2 | 112.6 | 224.9 | 450.8 | 677.3 |
(b) Multilevel Monte Carlo method | ||||||
No. of Picard iterations | ||||||
1 | 2 | 3 | 4 | 5 | 6 | |
0.4415 | 0.4573 | 0.1798 | 0.1042 | 0.0509 | 0.0474 | |
Standard deviation | 8.7977 | 11.3167 | 4.4920 | 2.9533 | 1.4486 | 1.3757 |
Average run time (s) | 0.0 | 0.0 | 0.0 | 0.4 | 4.2 | 52.9 |
As another example, we consider a pricing problem of a European option in a financial market with different interest rates for borrowing and lending to hedge the European option. We suppose that stocks, which are for simplicity assumed to be independent and identically distributed, are driven by
(5.15) |
where and . The terminal condition and generator for the option pricing with different interest rate are given by
(5.16) |
and
(5.17) |
respectively, where are different interest rates and , are strikes. Obviously, (5.16) and (5.17) are both nonlinear.
We first consider a one-dimensional case, in which we use instead of (5.16) to agree with the setting in Gobet et al (2005) and E et al (2017). The parameter values are set as , , , and . We use , computed using the finite-difference method, as the reference price. Note that the reference price is confirmed in Gobet et al (2005) as well. As in the previous example, we plot in Figure 6 the relative error against the number of steps, which gives very good numerical results. In Table 5 we compare our results with the results in E et al (2019, Table 5), and we show that the tree-based approach with can reach the accuracy level of the multilevel Monte Carlo with seven Picard iterations in significantly less computational time. Finally, we test our scheme for the 100-dimensional nonlinear pricing problem. Figure 7 compares the errors against the number of steps . In Table 6 we compare our results with those in E et al (2019, Table 6). The reference price is computed using the multilevel Monte Carlo with seven Picard iterations, whereas , and values of other parameters are the same as those for the one-dimensional case. For a comparison of computational cost in terms of the number of splitting groups, we set for , ie, there are twice as many splitting groups as in the latter 100-dimensional example. We see that our result with is already better than the approximation of multilevel Monte Carlo with six iterations for less computational time. Further, compared with the results in Table 4, the computational expenses are substantially reduced since the samples are split into more groups. Note that the same reference price is used to compare the deep-learning-based numerical methods for high-dimensional BSDEs in E et al (2017, Table 3), which achieved a relative error of 0.0039 in a run time of 566 seconds.
6 Conclusion
In this paper we have studied ways of solving forward–backward stochastic differential equations numerically using regression-tree-based methods. We showed how to use the regression tree to approximate the conditional expectations arising by discretizing the time integrands using the general theta-discretization method. We performed several numerical experiments for different types of (F)BSDEs, including their application to 100-dimensional nonlinear pricing problems. Our numerical results are quite promising and indicate that the tree-based approach is very attractive for solving high-dimensional nonlinear (F)BSDEs.
Declaration of interest
The author reports no conflicts of interest. The author alone is responsible for the content and writing of the paper.
Acknowledgements
The author thanks the referees for their valuable suggestions, which greatly improved the paper.
References
- Bender, C., and Steiner, J. (2012). Least-squares Monte Carlo for backward SDEs. Numerical Methods in Finance 12, 257–289 (https://doi.org/10.1007/978-3-642-25746-9_8).
- Bouchard, B., and Touzi, N. (2004). Discrete-time approximation and Monte Carlo simulation of backward stochastic differential equations. Stochastic Processes and Their Applications 111, 175–206 (https://doi.org/10.1016/j.spa.2004.01.001).
- Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees. Taylor & Francis.
- Craig, I. J. D., and Sneyd, A. D. (1988). An alternating-direction implicit scheme for parabolic equations with mixed derivatives. Computers and Mathematics with Applications 16, 341–350 (https://doi.org/10.1016/0898-1221(88)90150-2).
- Crisan, D., and Manolarakis, K. (2010). Solving backward stochastic differential equations using the cubature method: application to nonlinear pricing. SIAM Journal on Financial Mathematics 3(1), 534–571 (https://doi.org/10.1137/090765766).
- E, W., Han, J., and Jentzen, A. (2017). Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Communications in Mathematics and Statistics 5(4), 349–380 (https://doi.org/10.1007/s40304-017-0117-6).
- E, W., Hutzenthaler, M., Jentzen, A., and Kruse, T. (2019). On multilevel picard numerical approximations for high-dimensional nonlinear parabolic partial differential equations and high-dimensional nonlinear backward stochastic differential equations. Journal of Scientific Computing 79, 1534–1571 (https://doi.org/10.1007/s10915-018-00903-0).
- Gobet, E., and Turkedjiev, P. (2015). Linear regression MDP scheme for discrete backward stochastic differential equations under general conditions. Mathematics of Computation 85(299), 1359–1391 (https://doi.org/10.1090/mcom/3013).
- Gobet, E., Lemor, J. P., and Warin, X. (2005). A regression-based Monte Carlo method to solve backward stochastic differential equations. Annals of Applied Probability 15(3), 2172–2202 (https://doi.org/10.1214/105051605000000412).
- Heston, S. L. (1993). A closed-form solution for options with stochastic volatility with applications to bond and currency options. Review of Financial Studies 6(2), 327–343 (https://doi.org/10.1093/rfs/6.2.327).
- Karoui, N. E., Kapoudjan, C., Pardoux, E., Peng, S., and Quenez, M. C. (1997a). Reflected solutions of backward stochastic differential equations and related obstacle problems for PDEs. Annals of Probability 25, 702–737.
- Karoui, N. E., Peng, S., and Quenez, M. C. (1997b). Backward stochastic differential equations in finance. Mathematical Finance 7(1), 1–71 (https://doi.org/10.1111/1467-9965.00022).
- Lemor, J., Gobet, E., and Warin, X. (2006). Rate of convergence of an empirical regression method for solving generalized backward stochastic differential equations. Bernoulli 12, 889–916 (https://doi.org/10.3150/bj/1161614951).
- Lepeltier, J. P., and Martin, J. S. (1997). Backward stochastic differential equations with continuous generator. Statistics and Probability Letters 32, 425–430 (https://doi.org/10.1016/S0167-7152(96)00103-4).
- Li, Y., Yang, J., and Zhao, W. (2017). Convergence error estimates of the Crank–Nicolson scheme for solving decoupled FBSDEs. Science China Mathematics 60, 923–948 (https://doi.org/10.1007/s11425-016-0178-8).
- Ma, J., and Zhang, J. (2005). Representations and regularities for solutions to BSDEs with reflections. Stochastic Processes and Their Applications 115, 539–569 (https://doi.org/10.1016/j.spa.2004.05.010).
- Ma, J., Protter, P., and Yong, J. (1994). Solving forward–backward stochastic differential equations explicity: a four step scheme. Probability Theory and Related Fields 98(3), 339–359 (https://doi.org/10.1007/BF01192258).
- Ma, J., Protter, P., Martín, J. S., and Torres, S. (2002). Numerical method for backward stochastic differential equations. Annals of Applied Probability 12, 302–316 (https://doi.org/10.1214/aoap/1015961165).
- Martinez, W. L., and Martinez, A. R. (2007). Computational Statistics Handbook with MATLAB, 2nd edn. CRC Press, Boca Raton, FL (https://doi.org/10.1201/b13622).
- Pardoux, E., and Peng, S. (1990). Adapted solution of a backward stochastic differential equations. System and Control Letters 14, 55–61 (https://doi.org/10.1016/0167-6911(90)90082-6).
- Peng, S. (1991). Probabilistic interpretation for systems of quasilinear parabolic partial differential equations. Stochastics and Stochastic Reports 37(1–2), 61–74 (https://doi.org/10.1080/17442509108833727).
- Ruijter, M. J., and Oosterlee, C. W. (2015). A Fourier cosine method for an efficient computation of solutions to BSDEs. SIAM Journal on Scientific Computing 37(2), A859–A889 (https://doi.org/10.1137/130913183).
- Teng, L., Lapitckii, A., and Günther, M. (2020). A multi-step scheme based on cubic spline for solving backward stochastic differential equations. Applied Numerical Mathematics 150, 117–138 (https://doi.org/10.1016/j.apnum.2019.09.016).
- Zhao, W., Chen, L., and Peng, S. (2006). A new kind of accurate numerical method for backward stochastic differential equations. SIAM Journal on Scientific Computing 28(4), 1563–1581 (https://doi.org/10.1137/05063341X).
- Zhao, W., Wang, J., and Peng, S. (2009). Error estimates of the -scheme for backward stochastic differential equations. Discrete and Continuous Dynamical Systems B 12, 905–924 (https://doi.org/10.3934/dcdsb.2009.12.905).
- Zhao, W., Zhang, G., and Ju, L. (2010). A stable multistep scheme for solving backward stochastic differential equations. SIAM Journal on Numerical Analysis 48, 1369–1394 (https://doi.org/10.1137/09076979X).
- Zhao, W., Li, Y., and Zhang, G. (2012). A generalized -scheme for solving backward stochastic differential equations. Discrete and Continuous Dynamical Systems B 17(5), 1585–1603 (https://doi.org/10.3934/dcdsb.2012.17.1585).
- Zhao, W., Li, Y., and Ju, L. (2013). Error estimates of the Crank–Nicolson scheme for solving backward stochastic differential equations. International Journal of Numerical Analysis and Modeling 10(4), 876–898.
- Zhao, W., Zhang, W., and Ju, L. (2014). A numerical method and its error estimates for the decoupled forward–backward stochastic differential equations. Communications in Computational Physics 15, 618–646 (https://doi.org/10.4208/cicp.280113.190813a).
Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.
To access these options, along with all other subscription benefits, please contact info@risk.net or view our subscription options here: http://subscriptions.risk.net/subscribe
You are currently unable to print this content. Please contact info@risk.net to find out more.
You are currently unable to copy this content. Please contact info@risk.net to find out more.
Copyright Infopro Digital Limited. All rights reserved.
As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.
If you would like to purchase additional rights please email info@risk.net
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.
If you would like to purchase additional rights please email info@risk.net