Journal of Risk

Risk.net

Estimating future value-at-risk from value samples, and applications to future initial margin

Narayan Ganesan and Bernhard Hientzsch

  • This work compares several methods currently used for estimating dynamic initial margin and future value-at-risk, including nested Monte Carlo, Delta-Gamma, Johnson moment matching, Johnson percentile matching, Gaussian least square and quantile regression, where some of the methods are compared under various settings.
  • The goal of the paper is to present the implementation details and perform an objective comparison of estimation of future value-at-risk under different performance metrics, requirements, accuracy and ease of implementation.
  • We study the results obtained by these methods for different financial instruments with and without MPoR cash flows.
  • We present difficulties encountered in classical methods, such as violation of moment constraints in moment regression and means to rectify them. We also study the effects of additional inner samples for the methods in refining the estimates of value-at-risk.
  • Finally, we propose the use of pseudo-inner samples in lieu of actual inner samples which benefits methods such as nested Monte Carlo and Johnson percentile matching, therefore yielding comparable accuracy at a fraction of the computational cost. The use of pseudo-samples can be extended to other methods as well.

Predicting future value-at-risk is an important problem in finance. It arises in the modeling of future initial margin requirements for counterparty credit risk and future market value-at-risk. We are also interested in derived quantities such as both dynamic initial margin and margin value adjustment in the counterparty risk context and risk-weighted assets and capital value adjustment for market risk. This paper describes several methods that can be used to predict future value-at-risk. We begin with the nested Monte Carlo empirical quantile method as a benchmark, but it is too computationally intensive for routine use. We review several known methods and discuss their novel applications to the problem at hand. The techniques that are considered include computing percentiles from distributions (normal and Johnson) that were matched to estimates of moments and percentiles, quantile regressions methods and others with more specific assumptions or requirements. We also consider how limited inner simulations can be used to improve the performance of these techniques. The paper also provides illustrations, results and visualizations of intermediate and final results for the various approaches and methods.

1 Introduction

1.1 Portfolios, cashflows and values

Consider a generic financial portfolio described by a complete set of cashflows 𝐂𝐅=(ti,CFi)i4, where 4 is an appropriate index set. Each cashflow CFi is determined from values of “underliers” 𝑿s for st, where the underliers 𝑿t represent a certain stochastic process. Formally, CFi is measurable as of time ti with respect to a certain filtration t, which is generated by 𝑿s for st. Examples of the financial portfolio could range from a single European option on a single underlier to a large portfolio representing a complete netting set of many financial instruments depending on many underliers.

In addition, we suppose that the value of the outstanding cashflows of the portfolio at time t (in an appropriate sense, such as replication by the bank or a computation by some accepted central calculation agent) is given as a stochastic process Vt or as an exact or approximate function V(t,𝑿t) (or V(t,𝑿t,𝑼t), where 𝑼t is an extra Markovian state necessary to compute the value, eg, a barrier breach indicator for barrier options). For approximate computation, we can view it as V(t,𝑹t), where the 𝑹t are an appropriate set of features measurable as of t. We also assume that the Vt for different t are discounted to a common time or otherwise made comparable so that they can be added to or subtracted from each other and any CFi meaningfully.

We can interpret Vt as the value to the bank (“us”): positive CFi would be interpreted as payments to us, while negative CFi are interpreted as payments by us to some counterparty.

Assuming some realization of 𝑿t under some measure , indicated by an argument ω (as in 𝑿t(ω)), there is one corresponding realization Vt(ω) and there is one realization CF(ω). When comparing different measures, if V is given by a function, it will be the same function but its arguments will follow their dynamics under the measure , while the processes Vt will be different. In our examples here, the 𝑿t will be given as realizations of a system of stochastic differential equations (SDEs).

1.2 Change of value over a time period t to t+δ

Consider the change of value over the time period t to t+δ defined by

  ΔVtδ,f,(ω):=Vt+δ(ω)+i:ti[t,t+δ)f(CFi(ω),ti(ω),t)-Vt(ω).   (1.1)

If the measure or function f are clear from the context, we omit the corresponding superscripts.

Some simple examples for the function f(x,ti,t) are x, 0, x+ or x-. The interpretation would be that cashflows from the portfolio during that period are assumed to be not made or only partially made. In the counterparty credit risk applications, we could assume that the period from t to t+δ, the margin period of risk (MPoR), describes the period from the first uncertainty about the default of the counterparty to its resolution by default and liquidation and/or replacement of the portfolio. Often we assume that the bank will make payments until default is almost certain, while the counterparty will avoid payments, for example, by engaging in disputes (see Andersen et al (2017) for a more careful modeling of the MPoR). For market risk purposes, all payments are typically assumed to be made and received.

ΔVtδ,f,(ω) is now a random variable. In one common setting, the pertinent parts of the filtration t are 𝑿t and 𝑼t. Each possible ω corresponds to (different) realizations 𝑿s and 𝑼s for s[t,t+δ], which in turn lead to realizations of CF and Vs in that interval.

In market risk (for market risk value-at-risk (VaR)) and counterparty credit risk (for initial margin requirements), we are interested in quantiles for that random variable, denoted by Qα(ΔVtδ,f,t), and computed or approximated as functions qα(t,𝑿t,𝑼t) or qα(t,𝑹t). In the first qα expression, we find the function given the complete Markovian state. In the second, we find it as a function of a set of regressors 𝑹t. Often, we try to use as few regressors as possible to minimize computational requirements, for example, by using only Vt itself or together with a few main risk factors.

In the initial margin requirement case, the percentile α is typically 1% (or 99% if seen from the other side). For market risk VaR, α is often 3%, 1% or 0.3% (or 97%, 99% or 99.7% if seen from the other side).

Often initial margin and market risk VaR are computed with t representing today. However, to understand the impact of future initial margin (FIM) requirements or future VaR (fVaR), we consider t representing times beyond the valuation date to model FIM requirements or fVaR. Then, we compute future expected margin requirements (or VaR) or quantiles of the future expected margin requirements (or VaR). The first is called dynamic initial margin (DIM) for future expected margin requirements, which can be written as

  DIMt0,t,δ =E[Qα(ΔVtt)t0]  
    =E[IMt,δt0].   (1.2)

would be a pricing measure if the future expected margin requirements are computed for pricing purposes, or it could be a historical measure if computed for risk management purposes.

However, once computed, the initial margin requirements or VaR can be used in follow-up computations. For initial margin requirements, we can compute the expected cost to fund FIM requirements (margin value adjustment). VaR (and now also expected shortfall (ES)) is used in the computation of required regulatory or economic capital, and further, the cost to fund it (capital value adjustment). While we do not cover these applications here, the methods described here can and have been applied to them, and they will be described elsewhere.

In this paper, we assume that the stochastic process for Vt (respectively, the functions for V) are given (respectively, approximated) independently before we apply the methods given in the paper. In a follow-up paper, we describe how the Vt can be computed through backward deep backward stochastic differential equation (deep BSDE) methods and how to implement methods to compute the quantiles in that particular case.

The rest of the paper is organized as follows. Section 2 briefly outlines the assumptions and requirements for the methods studied. Section 3 briefly compares the VaR and initial margin computations as studied here with regulatory requirements and other methods that have been implemented in the industry. Section 4 describes the nested Monte Carlo procedure to estimate the FIM, which will be used as a benchmark to compare other methods against. Section 5 describes methods using moment-matching techniques implemented on a cross section of the outer simulation paths, such as Johnson least-square Monte Carlo (JLSMC) and Gaussian least-square Monte Carlo (GLSMC) in order to fit an appropriate distribution for the computation of the initial margin. Section 6 describes the quantile regression method and its implementation. Section 7 describes the Delta-Gamma methods that utilize the analytic sensitivities of the instruments to estimate conditional quantile and FIM. Section 8 analyzes the effect of adding a limited number of inner samples for the estimation of FIM and DIM. While the methods described in the earlier sections (with the exception of nested Monte Carlo) all work on the cross section of outer paths, the estimation of the properties of the distribution could be refined further by the addition of a limited number of inner paths, which is the subject of this section. Section 9 presents a comparison of running times and root mean square errors (RMSEs) across many methods for the call and call combination examples, and discusses their performance. We end with our conclusions and discussion in Section 10.

2 Approaches and their assumptions and requirements

Estimating FIM, as well as updating the estimates during frequent changes in market conditions, changes in counterparty risk profiles and changes in the portfolio itself, is an arduous task, especially for large and complex portfolios with many instruments and underliers. With sufficient computational resources, we can perform simulations of risk factors and cashflows under several portfolio scenarios. As discussed in Section 4, to use Monte Carlo to estimate the VaR and the quantiles of the portfolio value change at a future time for each of the several portfolio scenarios, it is necessary to perform nested simulations at each particular time for each scenario, which further increases the computational requirements.

Therefore, to avoid nested simulations on a large number of portfolio scenarios, and to estimate FIM requirements, there are several approximation methods that compute percentiles and other distribution properties of the portfolio value change. These methods are given pathwise risk factor values and portfolio values along a simple or nested set of paths either as pathwise values or as functions of risk factor values.

We briefly review a few of these methods to estimate FIM under suitable assumptions, such as whether portfolio values (V(t)), portfolio sensitivities (Delta and Gamma) and complete or limited nested paths in the simulation, etc, are available.

We present the procedures, implementations and comparisons of the following: nested Monte Carlo, Delta-Gamma and Delta-Gamma-normal, GLSMC, JLSMC, Johnson percentile matching (JPP) and quantile regression, where each is applicable under certain assumptions and each has certain requirements.

If portfolio values are not given or otherwise computed (pathwise or as a function), and thus the methods discussed in this paper cannot be used directly, contemporary approaches such as deep BSDE and differential machine learning (DML) can be extremely useful to compute required portfolio values (and also first-order portfolio sensitivities). We will discuss backward deep BSDE approaches (where portfolio values will be determined from given sample paths for risk factors and cashflows), and their use in computing FIM requirements, in a future paper. The various approaches and their requirements are summarized in Table 1.

Table 1: Matrix of the available methods to compute VaR and the corresponding requirements of the methods (indicated by check marks). [The backward deep BSDE approach is discussed in a future paper. Backward deep BSDE methods also need simulations of all instrument cashflows to learn portfolio values and their first-order sensitivities.]
  Requirements
   
  Risk factor        
  simulation Nested   Sensitivities: Limited
  and MPoR MC Portfolio Delta and nested
Method cashflows paths values Gamma paths
Nested Monte Carlo  
Delta-Gamma and    
Delta-Gamma-normal          
JLSMC and GLSMC      
Johnson percentile    
matching          
Quantile regression      
Backward deep BSDE        

As noted in Table 1, different methods are applicable under different assumptions and have different requirements. This paper aims to introduce these different methods, provide a comparison of them and also highlight their differences and their results obtained for various instruments.

3 Relation to other methods to compute value-at-risk and initial margin

For VaR, the Fundamental Review of the Trading Book (FRTB) international capital rules (Basel Committee on Banking Supervision 2019) require, among other things, that banks compute one-day VaR at both 97.5% and 99% of each of their trading desk’s positions with the model calibrated to the most recent 12 months’ market data. While banks use different models and methods to compute VaR, historical simulation of risk-factor dynamics and various Monte Carlo and similar stochastic methods are commonly used. There are also multiple existing methodologies to compute portfolio margin requirements. The standard portfolio analysis (SPAN) framework, released by the Chicago Mercantile Exchange in 1988 (CME Group 1988), and SPAN2, released in 2020, use limited sets of given scenarios to estimate initial margin. Another standard methodology for computing margin requirement is the standard initial margin model (SIMM) methodology specified by ISDA (International Swaps and Derivatives Association, Inc. 2018, 2021). SIMM uses sensitivities of the portfolio instrument values to various risk factors, including Delta, Vega, curvature and base correlation, for a set of broad risk classes. The risk classes that drive the analysis are interest rate (IR), credit, equity, commodity and foreign exchange (FX). The overall initial margin depends on net sensitivities of appropriate groups of instruments with respect to the above risk classes.

Hence, calculation of VaR at different levels and over different time horizons is fundamental to modeling risk and estimating capital and margin requirements with respect to both current and future dates and states. In this paper, we explore various methods to compute fVaR using Monte Carlo simulations that are based on either some model or some given way to generate simulations. They can serve as a baseline or comparison (or direct model) for other VaR modeling approaches. We could introduce scaling or other transformations to better approximate specific approaches to compute VaR and fVaR.

The Basel Committee on Banking Supervision and the International Organization of Securities Commissions, representing international regulators, have stated that any initial margin computation methodology should be compared against something that is computed through the 99th percentile of an appropriately defined potential future exposure over a 10-day close-out period as a baseline, based on or calibrated to historical data that includes a period of significant financial stress (Basel Committee on Banking Supervision–Board of the International Organization of Securities Commissions, 2020, Clause 3.1). The US regulation implementing this international approach to regulating margins also states the same in several places (OCC–BGFRS–FDIC–FCA–FHFA 2015). Thus, the methods that we present here can be understood to implement that baseline for comparison. If we need results that are closer to the results of other methodologies such as SIMM, we could multiply the results obtained in this paper with time-dependent scaling factors, or we could use similar simple transformations to achieve a better agreement. With such scaling factors or transformations, the presented methodologies could be used to check or estimate results from those other methodologies. See Caspers et al (2017) for a discussion about approximating SIMM results from regression-based VaR-type approaches.

In this work, we compute VaR and initial margin mainly at the 99th percentile, since that threshold is most commonly used in regulations.

4 Empirical nested Monte Carlo with many inner samples

One particular “brute force” approach is first to simulate “outside” 𝑿t (and 𝑼t) along a number NO of outer paths, or to generate those 𝑿t (and 𝑼t) distributed according to some “good” distributions, for a number of times ti, as in 𝑿tii (and 𝑼tii). For each outer path j and time ti, we simulate NI inner paths, for NI relatively large, starting now at 𝑿tij and 𝑼tij, and we compute the corresponding ΔVtδ,f,(ω). Thus, we have NI samples of this random variate, and we can compute the empirical quantile, obtaining one quantile for each simulated 𝑿tij (and 𝑼tij); follow-up computations based on these sampled quantiles can proceed.

Similarly, we can compute moments from these many inner samples, regress these empirical moments against appropriate regressors and perform further follow-up computations based on these moments (such as moment-matching methods).

In some very special circumstances it might be possible to directly characterize the distributions and quantiles analytically or to approximate them in closed form, but in the general case, nested Monte Carlo is a natural benchmark. Given the enormous computational requirements (including the requirement to generate nested paths and to generate prices on all of them), this is not a method that can be applied in production settings in general.

A nested Monte Carlo approach (or, more generally, an approach that looks at nested distributions) is illustrated in Figure 1, where the term “extreme quantile” is used to refer to the bottom 1% of the change in portfolio value.

The method was implemented for a simple Black–Scholes call combination, a portfolio which consists of a long call at strike K1=120.0 and two short calls at strike K2=150.0, with the maturity T=5.0 years with the corresponding model parameters r=0.03 and σ=0.1. The initial spot price of the underlier was S0=85.0 and the MPoR was fixed at 0.05 years.

hange in portfolio value as a distribution (or a nested sample) for a single sample path over the MPoR.
Figure 1: Change in portfolio value as a distribution (or a nested sample) for a single sample path over the MPoR.

Figure 2(a) shows the samples for ΔVtδ,f,(ω) for NI=2000 inner samples, for a representative outer sample path at time t=0.64 years. The portfolio value at that time for that outer path is Vt(ω)=6.47. Figure 2(b) shows the corresponding 99% quantiles of ΔV over the entire cross section of the outer paths at the same time instant, t=0.64 years, with the conditional quantiles Qα(ΔVV).

Nested Monte Carlo approach to estimate the DIM for the call combination. (a) Histogram of Delta V for the call combination at time t=0.64 years. (a) Sample path ID 500; V-sub-t is 6.47. (b) Monte Carlo-based Delta V. Conditional 99% quantiles of Delta V given V are represented by blue dots and the corresponding polynomial regressed values are shown as a red curve.
Figure 2: Nested Monte Carlo approach to estimate the DIM for the call combination. (a) Histogram of ΔV for the call combination at time t=0.64 years. (a) Sample path ID 500; Vt is 6.47. (b) Monte Carlo-based ΔV. Conditional 99% quantiles of ΔV given V are represented by blue dots and the corresponding polynomial regressed values are shown as a red curve.

5 Nonnested Monte Carlo: cross-sectional moment-matching methods

Due to the high computational requirements for simulation and valuation, it is rarely possible to spawn nested simulations for inner portfolio scenarios as required for the nested Monte Carlo-type estimators, in particular for complex and/or large portfolios. Often, it is only possible to obtain several independent runs of the portfolio scenarios. Thus, methods such as GLSMC and JLSMC (McWalter et al 2018; Caspers et al 2017) are used to extract information about the distribution of the portfolio value change only from such outer simulations.

The data usually available from outer simulations is illustrated in Figure 3.

5.1 Moment-matching methods: GLSMC and JLSMC

Moment-matching least square Monte Carlo methods can be outlined as follows.

  • Generate M portfolio scenarios (outer simulations) corresponding to the k underlying assets, given the various risk factors and correlations from time 0 to maturity T.

    Change in portfolio values from outer simulations used to estimate properties of the distribution Delta V.
    Figure 3: Change in portfolio values from outer simulations used to estimate properties of the distribution ΔV.
  • Define the appropriate change in portfolio value corresponding to sample path m as

      ΔVm(t,δ)=Vm(t+δ)+DCFm(t,t+δ)-Vm(t),   (5.1)

    where Vm is the pathwise portfolio value in scenario m, and DCFm is the value of the fully or partially included net cashflow between the counterparties from time t to t+δ.

  • Based on the distribution(s) to be fitted, compute the first four raw sample moments ΔVm, (ΔVm)2, (ΔVm)3 and (ΔVm)4 (for Johnson) or the first two raw sample moments ΔVm and (ΔVm)2 (for Gaussian) that will be used to estimate the actual raw moments.

  • Accordingly, estimate the first four raw moments r1, r2, r3, r4, or first two raw moments r1, r2, as a function of the underlying states 𝑿t or the portfolio value Vt (or other regressors 𝑹t), by a suitable parametric function such as polynomial regression:

      ri =f(𝜷,𝑿)E[(ΔV)i𝑿t=𝑿]  
    or
      ri =f(𝜷,V)E[(ΔV)iVt=V],  

    where 𝜷 is a vector of polynomial regression coefficients and i=1,2,3,4.

  • For GLSMC (Caspers et al 2017), the estimated raw moments r1 and r2 can be used to approximate the mean and variance of the distribution ΔVt (portfolio value change of interest during MPoR) for an assumed normal distribution of ΔVt. The normal distribution with the estimated mean and variance is then used to estimate the conditional quantile.

  • For JLSMC (McWalter et al 2018), the estimated first four raw moments, r1, r2, r3, r4, can be used to fit an appropriate Johnson distribution via the moment-matching method (JMM) (Hill et al 1976) with (approximately) these first four moments. The Johnson’s distribution so obtained is then used to estimate the conditional quantile.

An example of raw moments from the portfolio value change and the corresponding regressed moments with portfolio value Vt via polynomial regression for the first four moments for the call combination instrument is shown in Figure 4.

Polynomial regression of the first four raw moments of the portfolio value change, plotted over V-sub-t.
Figure 4: Polynomial regression of the first four raw moments of the portfolio value change, plotted over Vt.

A detailed procedure and the settings used for JLSMC is presented in McWalter et al (2018) and is not repeated here for brevity. However, it is noted here that the moment-matching algorithm (Hill et al 1976), originally designed for smaller sets of observations, is quite involved. Therefore, the total number of given sets of moments that must be fitted via the JMM algorithm is equal to the number of outer simulation paths, NO=100 000, at each time step where VaR is required. This poses a heavy computational burden in the absence of additional redesign and parallelization of the moment-matching algorithm. Hence, in order to reduce the computational load and obtain results with comparable accuracy, only moments at select values of the outer simulation values, either certain values of the underlying risk factors 𝑿t or certain quantiles of the portfolio values Vt at time t, are used for moment matching. Let vj=Qj/N(Vt),j=0,,N, be the j/Nth quantile of the samples Vt at time t and let

  rji =fi(𝜷,vj), i =1,2,3,4,  
or
  rji =fi(𝜷,xj), i =1,2,3,4,  

be the corresponding raw moments obtained via parametric regression at the specific values of Vt or 𝑿t. The parameters of Johnson’s corresponding distribution obtained by the moment-matching procedure are given by

  𝚡𝚒(𝚓),𝚕𝚊𝚖𝚋𝚍𝚊(𝚓),𝚍𝚎𝚕𝚝𝚊(𝚓),𝚐𝚊𝚖𝚖𝚊(𝚓),𝚝𝚢𝚙𝚎(𝚓)  
       =𝙼𝚘𝚖𝚎𝚗𝚝𝙼𝚊𝚝𝚌𝚑𝚒𝚗𝚐(𝚛𝟷(𝚓),𝚛𝟸(𝚓),𝚛𝟹(𝚓),𝚛𝟺(𝚓)),  

where 𝚡𝚒(𝚓),𝚕𝚊𝚖𝚋𝚍𝚊(𝚓),𝚍𝚎𝚕𝚝𝚊(𝚓),𝚐𝚊𝚖𝚖𝚊(𝚓)=ξj,λj,δj,γj are the parameters of Johnson’s distribution, and 𝚝𝚢𝚙𝚎(𝚓) represents the type or family of the distribution. The probability density function (pdf) of Johnson’s distributions belonging to different families is given by Johnson (1949) and McWalter et al (2018) as follows:

  • SL family,

      p(y)=δ2π1yexp{-12[γ+δlog(y)]2},ξ<X<;   (5.2)
  • SB family,

      p(y)=δ2π1-yyexp{-12[γ+δlog(y/(1-y))]2},ξ<X<ξ+λ;   (5.3)

    and

  • SU family,

      p(y) =δ2π1y2+1exp{-12[γ+δlog(y+Y2+1)]2},  
                          -<X<;   (5.4)

for y=(X-ξ)/λ.

5.2 IR swap example

Sample scenario of IR swap with the portfolio value, accumulated fixed and floating cashflows and cashflows within MPoR, plotted over time (in years).
Figure 5: Sample scenario of IR swap with the portfolio value, accumulated fixed and floating cashflows and cashflows within MPoR, plotted over time (in years).
Comparison of conditional quantiles at alpha=0.01 obtained via GLSMC and JLSMC methods at time t=8.38 for an IR swap, plotted over V-sub-t.
Figure 6: Comparison of conditional quantiles at α=0.01 obtained via GLSMC and JLSMC methods at time t=8.38 for an IR swap, plotted over Vt.

The above methods were compared for a realistic IR swap example, with one party making annual fixed rate payments and the counterparty making quarterly floating rate payments over a 15-year period. Here, party A pays a floating rate with a constant spread of 0.9% every three months over the current floating IR (London Interbank Offered Rate/prime, etc). Party B pays a fixed annual rate of 4.5% every year based on the IR agreed at the start of the contract. The outstanding notional value varies from USD36.2 million in the first year to USD63.2 million at the end of year 15. The MPoR is set to 10 business days or 0.04 years, with a total of 375 intervals until maturity. The IR and risk factors were simulated assuming G1++ short rate processes, and a sample path for cashflows, portfolio values and the risk factors is shown for a sample scenario in Figure 5.

The conditional 1% quantiles obtained via GLSMC and JLSMC at time t=8.38 years are compared in Figure 6 and are found to be in good agreement.

Time evolution of DIMs for the IR swap with moments regressed via Laguerre polynomials of degree 7. (a) Comparison of time evolution of DIM obtained via GLSMC and JLSMC with MPoR cashflows. Inner samples are 1, regressed with V-sub-t. (b) Comparison of time evolution of DIM obtained via GLSMC and JLSMC without MPoR cashflows. Inner samples are 1, regressed with V-sub-t.
Figure 7: Time evolution of DIMs for the IR swap with moments regressed via Laguerre polynomials of degree 7. (a) Comparison of time evolution of DIM obtained via GLSMC and JLSMC with MPoR cashflows. Inner samples are 1, regressed with Vt. (b) Comparison of time evolution of DIM obtained via GLSMC and JLSMC without MPoR cashflows. Inner samples are 1, regressed with Vt.

Finally, the time evolution of DIM for the same instrument over the entire 15-year period as defined in (1.2) is compared for GLSMC and JLSMC. The results are shown in Figure 7, where the first four moments are regressed with Laguerre polynomials of degree 7. In part (a), cashflows during MPoR are included in the change of value and part (b), MPoR cashflows are not included. It can be seen that not assuming that all cashflows are paid can make the margin requirements much more spiky. The results for the different methods are seen to be in good agreement with each other.

5.3 Further analysis: JLSMC moment regression

We present another instrument, namely an FX call option, in order to highlight the differences in the results obtained via the GLSMC and JLSMC methods. The Black–Scholes FX call is simulated and priced with a constant domestic risk-free rate rd=0.08 and a constant foreign rate of rf=0.02, along with an FX rate volatility of σ=0.3. The current spot FX rate is S0=100.0. The option has a maturity of T=1.0 year and the strike K=105.0. Similar to the other examples, the MPoR is fixed at 0.04 years, with 25 MPoR intervals until maturity.

An algorithm to fit an appropriate Johnson’s distribution given the first four raw or central moments is presented in Hill et al (1976). Raw moments are estimated by independent polynomial regression on the first four powers of the change in portfolio values, as shown in Figure 4.

Johnson’s analysis of the types of distributions in Johnson (1949) also outlines the constraints for various moments, under which a valid distribution can be found. The relationship between skew (β1) and kurtosis (β2) requires that β2β1+1. However, since the moments are regressed independently without imposing any joint constraints, there is no guarantee that relationship between skew and kurtosis will be satisfied everywhere when the regressed form of the moments is used. In fact, it can be seen that in some examples under some regression models, the skew–kurtosis relationship fails to hold at some points, and therefore we are unable to find a distribution that matches the moments, as shown in Figure 8. This is elaborated further in the following subsection.

5.3.1 Polynomial moment regression

The skew versus kurtosis obtained from the regressed raw moments at a single time is shown in Figure 8. Each data point corresponds to a unique path with a portfolio value Vm(t) and its corresponding portfolio value change ΔVm. For simplicity, only equally spaced quantiles from 0.01 to 0.99 of V(t) are shown. The darker colors indicate lower quantiles closer to 0.01, and lighter colors indicate higher quantiles closer to 0.99, along with the line β2=β1+1. The data points where the condition is satisfied appear below that line and those that do not satisfy the constraint appear above the β2=β1+1 line (the β2 axis is reversed).

In the standard implementation of the moment-matching algorithm in Hill et al (1976) and Jones (2014), the data points not satisfying the relationship are discarded or just replaced by normal distribution with the given mean and variance only (first and second moments only), thus essentially reverting to the procedure for GLSMC in that case.

The results of the time evolution of DIM for the FX option are shown in Figure 9. It can be seen that, although various methods agree well for the IR swap example (Figure 7), they deviate from each other for the FX option.

  • This strongly indicates that different instruments have their own characteristic distributions for ΔV(t)t and hence cannot be approximated by normal distribution alone with the given mean and variance as in the GLSMC method. This also shows that a more accurate characterization of the distributions is needed, as in the JLSMC method.

  • As mentioned earlier, when the skew–kurtosis relationship is not satisfied for the regressed moments, either the point is discarded or only a normal distribution matching only two moments is used. For the FX option, with the given polynomial regression, more data points violate the skew–kurtosis relationship at later times t. As those points are modeled by normal distributions rather than more expressive distributions, the estimated trends for DIM by JLSMC converge to that of GLSMC for such later times t, as shown in Figure 9, which is clearly an artifact of the standard implementation.

Skew--kurtosis obtained from independently regressed moments, shown at selected quantiles (1--99) of V-sub-t (outer simulation) at time t=0.16 years. (a) The skew--kurtosis relationship (regression order: 7). Data points that lie above beta-sub-2=beta-sub-1+1 (the dot-dashed line) violate the skew--kurtosis relationship and hence do not correspond to a valid distribution. Parts (b) and (c) show the corresponding raw beta-sub-1 and beta-sub-2 against V-sub-t for the same time. (b) The gray data points denote those violating the relationship without moment correction. (c) The moment-corrected data points.
Figure 8: Skew–kurtosis obtained from independently regressed moments, shown at selected quantiles (1–99) of Vt (outer simulation) at time t=0.16 years. (a) The skew–kurtosis relationship (regression order: 7). Data points that lie above β2=β1+1 (the dot-dashed line) violate the skew–kurtosis relationship and hence do not correspond to a valid distribution. Parts (b) and (c) show the corresponding raw β1 and β2 against Vt for the same time. (b) The gray data points denote those violating the relationship without moment correction. (c) The moment-corrected data points.
Time evolution of DIM for FX options with and without moment correction. (a) Comparison of time evolution of DIM obtained via GLSMC and JLSMC with polynomial regression of order 7 (with and without moment correction) for the FX option.
Figure 9: Time evolution of DIM for FX options with and without moment correction. (a) Comparison of time evolution of DIM obtained via GLSMC and JLSMC with polynomial regression of order 7 (with and without moment correction) for the FX option. (b) Comparison of time evolution of DIM obtained via GLSMC and JLSMC for the FX option with moment correction and polynomial order 32. Data in both parts is regressed with Vt.
Polynomial regression correction.

To avoid the data points being discarded or only represented by normal distributions, it is possible to map the invalid skew and kurtosis value pairs to the closest skew and kurtosis value pair that does not violate the relationship, and to use the corresponding Johnson distribution. This is illustrated in Figure 8, where the data points above the β2=β1+1 line are projected onto the line in an (orthogonal) least-square sense as indicated by the red points. This ad hoc method provides a better approximation of the distribution via a valid Johnson’s distribution instead of via a normal distribution. In addition, a better model with a higher-order polynomial regression or better regression techniques such as ensemble learning via random forest (RF) might mitigate the issue with fewer data points violating the moment constraints. The time evolution of DIM with the proposed moment corrections is shown in Figure 9.

6 Nonnested Monte Carlo: (conditional) quantile regression

Conditional quantile regression as in Koenker and Hallock (2001) can be used to estimate population quantiles based on observations of a data set (𝑪i,yi), where 𝑪i for each i is a vector of covariates or features and yi is the quantity of interest (for instance, low birth weight or CEO compensation, depending on maternal or firm characteristics, respectively, in two examples in Koenker and Hallock (2001)). Based on the vector 𝑹i of covariates regressed against, we estimate qα(Y𝑹). An extended quantile regression procedure (with additional loss function terms for ES) is used to extract conditional VaR and ES for XVA applications in Albanese et al (2020).

We will use the conditional quantile regression from Koenker and Hallock (2001) as follows: assuming that we have given (r-dimensional) samples 𝑹i of regressors (chosen from a larger set of possible features) and samples Yi from the (one-dimensional) conditional distribution to be analyzed, the function fα(𝑹) is characterized as the function from an appropriately parameterized set of functions that minimizes the loss function

  Lα=iρα(Yi-fα(𝑹i)),   (6.1)

with

  ρα(x)={αxif x0,(α-1)xif x<0.   (6.2)

The function fα defined by argminLα then estimates the conditional quantile qα(Y𝑹). Koenker and Hallock (2001) consider parametric functions μ(𝑹,β) and in particular functions that depend linearly on the parameters β. In their case, the optimal parameters can then be computed by linear programming. Here, we will apply conditional quantile regression both in the linear and in the nonlinear approximation context.

In our application, the conditional target distribution is the one of the portfolio value change Ym=ΔVm from time t to time t+δ given some regressors or the complete Markovian state. The covariates that can be used have to be measurable as of time t. Assuming that the portfolio value can be exactly or approximately considered Markovian in terms of current risk factor values 𝑿t and extra state 𝑼t, 𝑿t and 𝑼t should contain all necessary information. Computing conditional quantiles conditional on the full state might be too expensive, and thus a smaller set of covariates might be used, such as the portfolio value Vt and some of the “most important risk factors”.

In our machine and deep learning setting, we thus numerically minimize the loss function given above within an appropriate set of functions depending on the chosen regressors.

It is important to be parsimonious in the selection of the model and the number of free parameters, since with a sufficient number of free parameters (and if there is only one value of ΔVm for each 𝑹m) it is possible to learn each ΔVm for each 𝑹m. A simple polynomial regression or a sufficiently simple neural network can strike a balance between a reasonably smooth approximation of the quantiles and achieving a local minimum for the loss function.

Quantile regression performed on the clean change in portfolio values for the given outer portfolio scenarios for FX option instrument. Quantile regression via RF--LightGBM on clean portfolio changes at various times: (a) t=0.12, (b) t=0.36, (c) t=0.60 and (d) t=0.88. Quantile regression via DNN on clean portfolio changes at various times: (e) t=0.12, (f) t=0.36, (g) t=0.60 and (h) t=0.88.
Figure 10: Quantile regression performed on the clean change in portfolio values for the given outer portfolio scenarios for FX option instrument. Quantile regression via RF–LightGBM on clean portfolio changes at various times: (a) t=0.12, (b) t=0.36, (c) t=0.60 and (d) t=0.88. Quantile regression via DNN on clean portfolio changes at various times: (e) t=0.12, (f) t=0.36, (g) t=0.60 and (h) t=0.88.

Figure 10 shows the results of quantile regression for dVV with RF and LightGBM in parts (a)–(d) and deep neural networks (DNN) in parts (e)–(h). In the body of the Vt distribution, the results seem to mostly agree (with DNN giving smoother results than the stepwise curve coming from RF–LightGBM). For larger Vt, the behavior seems to be somewhat (but not completely) different.

Similar to least-square regression, assuming that we try to determine a quantile function as a linear combination of given functions of given features, there are faster deterministic numerical linear algebra methods available rather than resorting to generic optimization methods: linear least squares by singular value decomposition (in least-square regression) and linear programming (for quantile regression). Once we no longer operate in the space of linear combinations of given functions, we need to apply more generic numerical optimization techniques such as (stochastic) gradient descent or similar variants, for both least-square regression and quantile regression.

Comparison of time evolution of DIM for (a) the FX option and (b) the IR swap. Data is regressed with V-sub-t.
Figure 11: Comparison of time evolution of DIM for (a) the FX option and (b) the IR swap. Data is regressed with Vt.

Parts (a) and (b) of Figure 11 compare GLSMC, JLSMC and quantile regression methods for the FX option and IR swap, respectively. For the FX option, JLSMC and quantile regression seem to mostly agree, differing somewhat in smoothness (and looking similar to the earlier nested Monte Carlo results), while GLSMC gives materially different (and larger) values. For the IR swap, all give broadly similar results.

7 Nonnested Monte Carlo: given sensitivities

Another technique to estimate the change in portfolio values over MPoR is to use portfolio sensitivities if available. The distribution of the return of the underlying risk factors can be combined with the portfolio sensitivities, specifically the first and second derivatives (portfolio Delta and Gamma) to obtain a good approximation of the distribution of the change in portfolio values over MPoR. Let 𝐒(t)=(S1(t),,Sk(t)) be the values of the k underliers that constitute the given portfolio, and let 𝑹(t)=(ΔS1/S1,,ΔSk/Sk) be the corresponding vector of asset returns, where ΔSi(t)=(Si(t+Δt)-S(t)). The second-order approximation of the change in the value of the portfolio can be written in terms of the sensitivities as (see Shreve (2004); Alexander (2008))

  ΔV𝜹$𝑹+12𝑹T𝚪$𝑹,   (7.1)

where

  𝜹$ =(δ1$,δ2$,,δk$)T,   (7.2)
  δi$ =VSi×Si,   (7.3)
  𝚪$ =(γij$),   (7.4)
  γij$ =2VSiSj×Si×Sj,   (7.5)

and V is the value of the portfolio corresponding to the value of the underliers. For a single underlier, the above equation can be written as

  ΔVδ$R+12γ$R2,   (7.6)

for the corresponding single asset return R.

7.1 Delta-Gamma-normal

As a first approximation, the change in portfolio values can be considered to be normally distributed, with mean and variance obtained from (7.6). Here, the discounted asset return R is assumed to be normally distributed with mean zero and variance Ω=σ2, ie, R𝒩(0,Ω):

  Ωh =σ2Δt,  
  E(ΔV) =12tr(Γ$Ωh),  
  Var(ΔV) =12tr[(Γ$Ωh)2]+δ$Ωhδ$.  

Further, with the approximation that ΔV follows a normal distribution, the corresponding αth quantile of the distribution can be written as

  Qα(ΔVV)=zα(Var(ΔV))+E(ΔV),   (7.7)

where zα is the αth quantile of the standard normal distribution.

Approximation of Delta V distribution by Delta-Gamma-normal approach, which could lead to deviation from the actual Delta V distribution.
Figure 12: Approximation of ΔV distribution by Delta-Gamma-normal approach, which could lead to deviation from the actual ΔV distribution.

However, this approximation, which could significantly deviate from the actual ΔVV distribution, could lead to overestimation or underestimation of the margin requirements. This deviation due to the approximation is apparent from the actual sample distribution shown in Figure 2(a), which is skewed and could be heavy-tailed depending on the value of the portfolio and the risk factors (Figure 12). A comparison of DIM obtained via this approximation against other methods for the call combination discussed earlier is shown in Figure 13. Hence, although the normal approximation of the ΔVV distribution allows for rapid estimation of initial margin requirements, it may be not be suitable for complex portfolios or portfolios with more complex instruments.

7.2 Delta-Gamma with Cornish–Fisher cumulant expansion

A further refinement of the previous method is to relate the quantiles of an empirical distribution to the quantiles of standard normal distribution via the higher-order moments of the empirical distribution and the so-called Cornish–Fisher expansion. Given the raw moments and the corresponding central moments, cm1, cm2, cm3, cm4, cm5 from raw moments, compute the skew (κ3), kurtosis (κ4) and κ5=cm55/cm25/2 from the central moments. Hence, the Delta-Gamma method approximates the quantiles of the portfolio value changes ΔVV in terms of its raw moments, and it uses the Cornish–Fisher expansion to compute the αth percentile of ΔVV:

  Qα(ΔVt) =zα+κ3(zα2-1)6+κ4zα3-3zα24  
      -κ32(2zα3-5zα)36+κ5zα4-6zα2+3120  
      -κ3κ4zα4-5zα2+224+κ3312zα4-53zα2+17324,   (7.8)

where zα is the αth quantile of a standard normal distribution and κ3, κ4, κ5 are the skew, kurtosis and higher-order moments of the distribution, respectively. The Delta-Gamma approach approximates the above quantities from the corresponding portfolio sensitivities and asset returns over the MPoR.

Similar to the Delta-Gamma-normal method, the discounted asset return is assumed to be normally distributed with mean zero and variance Ω, ie, R𝒩(0,Ω). The corresponding raw moments written as powers of ΔV in terms of δ$(=δ),γ$(=γ),R and Ω are

  ΔV =δ$R+12γR2,   (7.9)
  ΔV2 =δ2R2+14γ2R4+δγR3,   (7.10)
  ΔV3 =δ3R3+14δγ2R5+δ2γR4+18γ3R6+12δγ2R5+12δ2γR4  
    =δ3R3+34δγ2R5+32δ2γR4+18γ3R6,   (7.11)
  ΔV4 =δ4R4+34δ2γ2R6+32δ3γR5+18δγ3R7+34δ2γ2R6  
      +12δ3γR5+38δγ3R7+116γ4R8  
    =δ4R4+32δ2γ2R6+2δ3γR5+12δγ3R7+116γ4R8,   (7.12)
  ΔV5 =δ5R5+32δ3γ2R7+2δ4γR6+12δ2γ3R8+116δγ4R9  
      +δ3γ2R7+12δ4γR6+34δ2γ3R8+14δγ4R9+132γ5R10  
    =δ5R5+52δ3γ2R7+52δ4γR6+54δ2γ3R8+516δγ4R9+132γ5R10.   (7.13)

Using the first 10 moments of the normal distribution, the expectations of the above powers of ΔV can be written as

  E(ΔV) =12γΩ,  
  E(ΔV2) =δ2Ω+34γ2Ω2,  
  E(ΔV3) =32δ2γ(3Ω2)+18γ3(15Ω3),  
  E(ΔV4) =δ4(3Ω2)+32δ2γ2(15Ω3)+116γ4(105Ω4),  
  E(ΔV5) =52δ4γ(15Ω3)+54δ2γ3(105Ω4)+132γ5(945Ω5).  
Comparison of Delta-Gamma approaches to estimate conditional quantiles and time evolution of DIM for the call combination. (a) Comparison of conditional quantiles at 1% for the call combination at a sample time t.
Figure 13: Comparison of Delta-Gamma approaches to estimate conditional quantiles and time evolution of DIM for the call combination. (a) Comparison of conditional quantiles at 1% for the call combination at a sample time t. Maturity is T=5.00, t=0.10 and MPoR=0.04. (b) Time evolution of DIM.

The central moments, cm1, cm2, cm3, cm4, cm5 from the above raw moments, the skew (κ3), kurtosis (κ4) and κ5=cm55/cm25/2 derived from above terms are then used to determine the corresponding αth quantile from (7.8). The comparison of DIM and VaR estimates obtained via this method for the call combination discussed earlier is shown in Figure 13, which is seen to be in much closer agreement with the benchmark nested Monte Carlo method, while requiring far lower computational resources than the same. Hence, the above method is suitable for instruments and portfolios with given analytical or approximate first- and second-order sensitivities.

8 Nested Monte Carlo with limited inner samples

In the previous sections, the primary focus was on methods that only use outer simulations to estimate properties of the distribution of ΔV, such as conditional quantile regression, quantile methods based on moment-matched distributions and regressed moments, or methods that use sensitivities and assumed distributions of underlying risk factor returns. However, these methods might benefit from refined estimates made possible by limited inner simulations. The methods presented in this section use the additional information from small and limited numbers of inner simulations to further refine their estimates for moment matching or quantile regression. Hence, the methods in this section could be thought of as a possible middle ground between a full-fledged nested Monte Carlo simulation with sufficiently many inner simulations requiring large computational resources on the one hand and efficient computational approaches such as moment-matching and quantile regression on the other.

8.1 Moment-matching methods

Based on the GLSMC and JLSMC methods presented in earlier sections, the moment-matching algorithms are applied to moment estimates of ΔVV that use an additional limited number of inner samples. The raw moments (ΔVm)i for i=1,2,3,4 from a single outer path are replaced by (1/NI)j=1NI(ΔVjm)i for i=1,2,3,4, where NI is the number of inner simulations for the outer portfolio scenario m. Here, the number of inner simulations is chosen to be rather small (NI=50), in order to limit the needed computational resources. The comparison of raw moments in Figure 14 at a specific time (t=0.16 years) across all the outer paths shows that additional inner samples decrease the variance of the raw moments. The time evolution of DIM obtained via GLSMC and JLSMC methods using these raw moments is shown in Figure 15. Although the time evolution obtained via the GLSMC method has not changed much, as it depends only on the first two moments, the JLSMC method seems to exhibit a smoother time evolution as it depends on and is sensitive to higher moments.

Refinement of first 4 raw moments at time 0.16 years with N-sub-I=50 inner samples.
Figure 14: Refinement of first 4 raw moments at time 0.16 years with NI=50 inner samples.
Comparison of the time evolution of DIM between GLSMC and JLSMC with different number of inner samples.
Figure 15: Comparison of the time evolution of DIM between GLSMC and JLSMC with different number of inner samples.

8.2 Johnson percentile-matching method

The Johnson percentile-matching method fits an Sl-, Su- or Sb-type Johnson distribution (“Sl” denotes logarithmic, “Sb” denotes bounded, “Su” denotes unbounded and “Sn” denotes normal) given a limited number of inner samples, based on a discriminant computed from particular percentiles of the inner samples data. The procedure is outlined in Slifker and Shapiro (1980) and George and Ramachandran (2011). In short, a fixed value z (0<z<1) is chosen and the corresponding percentiles P3z, P-3z, Pz and P-z of the standard normal distribution corresponding to the four points ±3z and ±z are computed. The values of data points corresponding to the percentiles are then determined from the sample distribution as QPi(X), where i=3z,z,-z,-3z, as x3z, xz, x-z, x-3z.

The value of the discriminant d=mn/p2 is then used to select the type of Johnson distribution, where p=xz-x-z, m=x3z-xz and n=x-z-x-3z. The Su Johnson distribution is fitted to the data if d>1.001, the Sb Johnson distribution is chosen if d<0.999 and the Sl Johnson distribution is chosen if 0.999d1.001. This method provides a closed-form solution to the Johnson distribution parameters ξ, γ, δ and λ in terms of m, n and the discriminant d, hence requiring far fewer computational resources than the moment-matching algorithm.

The time evolution of DIM obtained via this method with a much reduced NI=200 number of inner simulations is compared with nested Monte Carlo and Delta-Gamma approaches in Figure 13 and is seen to be in good agreement with the nested Monte Carlo approach.

8.2.1 JMM and JPP

Here, we compare the JMM and JPP methods with many (200K) and with fewer (300) inner samples. Figure 16 compares the empirical distribution from 200K inner samples and Johnson distributions fitted through the JPP and JMM methods with 300 (out of those 200K) inner samples in histogram/pdf and Q–Q plots (QQ), for three representative examples. While not perfectly matching in pdf and in QQ, JPP and JMM seem to fit the empirical distribution to a reasonable extent.

Call combination. Comparison of raw histogram pdfs from 200K inner samples versus fitted Johnson distribution via percentile- and moment-matching methods computed from 300 inner samples: (a) V-sub-t=8.26, X-sub-t=120.93, t=2.40; (b) V-sub-t=0.44, X-sub-t=87.51, t=2.40; (c) V-sub-t=0.03, X-sub-t=74.65, t=2.40. Corresponding Q--Q plots comparing raw histogram quantiles vs quantiles from fitted Johnson distributions via percentile- and moment-matching methods: (d) V-sub-t=8.26, X-sub-t=120.93,t=2.40; (e) V-sub-t=0.44, X-sub-t=87.51, t=2.40; (f) V-sub-t=0.03, X-sub-t=74.65, t=2.40.
Figure 16: Call combination. Comparison of raw histogram pdfs from 200K inner samples versus fitted Johnson distribution via percentile- and moment-matching methods computed from 300 inner samples: (a) Vt=8.26, Xt=120.93, t=2.40; (b) Vt=0.44, Xt=87.51, t=2.40; (c) Vt=0.03, Xt=74.65, t=2.40. Corresponding Q–Q plots comparing raw histogram quantiles vs quantiles from fitted Johnson distributions via percentile- and moment-matching methods: (d) Vt=8.26, Xt=120.93,t=2.40; (e) Vt=0.44, Xt=87.51, t=2.40; (f) Vt=0.03, Xt=74.65, t=2.40.
Table 2: Call combination: quantile estimates obtained via JPP, JMM and empirical methods, along with corresponding ranges implied by standard error, at Vt=0.44, Xt=87.51 and t=2.40.
𝒒 Emp200K Emp20K-Low Emp20K-Hi
0.01 -0.2284 -0.2288 -0.2277
0.02 -0.2091 -0.2099 -0.2087
0.03 -0.1967 -0.1974 -0.1964
0.10 -0.1480 -0.1487 -0.1480
0.90 0.1676 0.1665 0.1681
0.97 0.2729 0.2693 0.2725
0.98 0.3057 0.3016 0.3063
0.99 0.3602 0.3531 0.3607
𝒒 JPP200K JPP300-Low JPP300-Hi
0.01 -0.2285 -0.2388 -0.2205
0.02 -0.2094 -0.2166 -0.2020
0.03 -0.1966 -0.2026 -0.1892
0.10 -0.1475 -0.1518 -0.1390
0.90 0.1678 0.1566 0.1658
0.97 0.2727 0.2462 0.2635
0.98 0.3056 0.2721 0.2945
0.99 0.3599 0.3127 0.3460
𝒒 JMM200K JMM300-Low JMM300-Hi
0.01 -0.2289 -0.2386 -0.2238
0.02 -0.2097 -0.2167 -0.2043
0.03 -0.1968 -0.2029 -0.1909
0.10 -0.1476 -0.1524 -0.1403
0.90 0.1678 0.1589 0.1669
0.97 0.2733 0.2576 0.2667
0.98 0.3065 0.2873 0.2985
0.99 0.3614 0.3342 0.3522

Table 2 compares the quantile estimates obtained via JPP and JMM with many and fewer inner samples as explained above. These quantiles are compared with empirical quantiles obtained from large number of inner samples (20K, 200K). The column headings correspond to the different methods used to estimate the quantiles (empirical samples (Emp), Johnson Percentile matching (JPP), Johnson moment matching (JMM)) followed by the number of inner samples, with 20K and 300 corresponding to a reduced number and 200K corresponding to all inner samples. For the 20K and 300 inner samples, 10 independent estimations are performed, and standard error and mean are used to compute the interval (“Low” denotes mean minus (2 × standard error) and “Hi” denotes mean plus (2 × standard error)). Several different outer paths with different underlier and portfolio values at different time instants are tested for the call and the call combination, and Table 2 shows a representative sample of results.

It can be seen that 200K inner samples can resolve the quantiles up to about three decimal places. The quantile range, estimated by 20K inner samples, covers the quantiles from 200K inner samples and gives an indication of the accuracy of the estimates by 20K and 200K inner samples. Additionally, the quantiles estimated by the JPP and JMM methods from 200K samples agree well with the empirical ones, showing that these methods do not seem to have large persistent bias.

Table 2 also shows that the range of JPP and JMM quantiles estimated from 300 inner samples covers the quantiles estimated from 200K inner samples.

It is thus possible to estimate quantiles with reasonable accuracy with reduced numbers of inner samples (such as 300) via the JPP and JMM methods. Since DIM is an expectation of the quantiles over all outer paths, the variance in the DIM estimate is expected to be even smaller than the variance in individual quantile estimates.

The result from JPP based on 200 inner samples, as shown in Figure 13, supports this conclusion.

8.3 Conditional quantile regression

Comparison of the time evolution of DIM via quantile regression with different number of inner samples.
Figure 17: Comparison of the time evolution of DIM via quantile regression with different number of inner samples.

The conditional quantile regression approach can also take advantage of additional inner samples. Similarly to the other methods, samples from limited inner simulations were added to the training samples for quantile regression. A comparison of the time evolution of DIM obtained via quantile regression with a different number of inner samples is shown in Figure 17. Although the inner samples help refine the distribution, the time evolution thus obtained is still comparable in terms of accuracy and scale to that obtained with a single inner sample. This indicates that with a sufficient number of outer samples (here, NO=100K) the quantile regression procedure can be robust with respect to variations in inner distributions across the underlier at any given time or at different times. Comparing the results of a single inner sample versus the others, it seems that adding at least a few inner samples results in a somewhat smoother evolution of the DIM.

8.4 Pseudo inner samples

As the process of running nested inner simulations for a large number of outer simulations is computationally very expensive, an alternative method is to approximate inner simulations with available nearby outer simulation values around a neighborhood of Xt or Vt. Hence, under some circumstances the values from outer simulations can be considered as proxies for samples from the actual distribution of portfolio values or pseudo inner samples.

Comparison of the time evolution of DIM with pseudo inner samples. (a) Call combination with pseudo inner samples. (b) FX call with pseudo inner samples.
Figure 18: Comparison of the time evolution of DIM with pseudo inner samples. (a) Call combination with pseudo inner samples. (b) FX call with pseudo inner samples.

The distribution (ΔVxt=Xt) is approximated by observations from portfolio scenario m, with ΔVm given by (5.1) such that xt(m)ε(Xt) or Vt(m)ε1(Vt), where the underlying asset or portfolio value is observed to be within a neighborhood ε of Xt or Vt. Figure 18 presents the time evolution of initial margin estimates obtained via a limited number of pseudo inner samples considered around the neighborhood of Xt, for the JPP method, along with regular inner samples also implemented for the same method. It can be seen that both JPP with inner samples and JPP with pseudo inner samples agree well with DIM obtained via nested Monte Carlo for the call combination and FX call instruments. Additionally, the DIM obtained by estimating quantiles from the raw pseudo samples is also presented, which is seen to be in good agreement with that obtained via nested Monte Carlo as well.

9 Comparing running times and root mean square error

We compare the methods presented above in terms of accuracy and speed. The nested Monte Carlo method is considered to be the benchmark for estimating the initial margin, and hence it is used to measure the deviation in terms of RMSE accuracy. Additionally, the raw running time of the various methods is also recorded in order to compare multiple aspects of the methods. We run the tests on a computer with 32 GB of memory and an Intel Core i7 processor with eight cores. We do not spend much time optimizing the running time of the different methods or otherwise trying to ensure that the running times are very comparable, so the listed running times should only be taken as indicative.

Table 3 presents the results for the FX call option with DIM computed for 25 times, corresponding to various methods.

In terms of running time for this example, the nested Monte Carlo method, the Johnson moment-matching Monte Carlo method using LGBM regression and the Johnson percentile method take the longest, with DNN quantile regression being faster than those but slower than the quantile regressions with LGBM; LGBM quantile regressions take a medium amount of time and the other methods take less than 100 seconds, with some taking less than 10 seconds.

In terms of RMSE, the two quantile regressions using the LightGBM quantile regression are the most accurate, while all other methods except the GLSMC and Delta-Gamma-normal methods are of the same order of magnitude with respect to RMSE.

The JPP method is also implemented with pseudo samples, where a suitable Johnson distribution is fitted to a collection from samples from adjacent outer simulations taken in lieu of actual inner simulation samples. Since the JPP procedure has a convenient closed-form solution, the procedure can be vectorized for multiple outer simulation values corresponding to different Vt or Xt values, thus making for an efficient implementation for VaR or IM estimation. The DIM results obtained via this procedure, without the need for inner samples, are compared against the previously studied methods. Compared with traditional methods, the running times are significantly faster, and the results are more accurate.

In addition, the same procedure can be repeated to estimate quantiles at each Vt or Xt from raw (pseudo inner) samples without the need to fit a Johnson distribution. The results from this approach are also seen to outperform the traditional approaches both in terms of running time and accuracy.

Table 3: Comparison of performance metrics of various methods for the FX call instrument with 25 DIM instances.
Method RMSE Time Specification
Nested MC   438 s±6 s Outer=100K, Inner=2000
Raw pseudo samples 6.69e-2 3.28 s±0.2 s Outer=100K,
      pseudo inner=200
Johnson percentile 2.74e-2 438 s±9 s Outer=10K, Inner=200
Johnson percentile with 3.97e-2 5.97 s±0.2 s Outer=10K,
pseudo samples     pseudo inner=200,
      stride=10
Quantile regression 4.42e-3 53 s±0.9 s Outer=100K, LightGBM,
- LGBM(Vt)     Reg with Vt
Quantile regression 4.26e-3 52.6 s±0.8 s Outer=100K, LightGBM,
– LGBM(Xt)     Reg with Xt
Quantile regression 4.7e-2 281.6 s±15 s Outer=100K, Layers=4,
– DNN     Units=3, Batch-1024,
      Minimization steps=20K
Delta-Gamma 3.38e-2 10.6 s±0.8 s Outer=100K
Delta-Gamma-normal 1.72 9.5 s±0.8 s Outer=100K
JLSMC-Lag(Vt) 1.11e-2 12.82 s±1.5 s Computed at 200 points
      in body and tails of Vt,
      Outer=100K, Laguerre,
      order 32, Reg with Vt
JLSMC-Lag(Xt) 3.82e-2 13.4 s±1.3 s Computed at 200 points
      in body and tails of Xt,
      Outer=100K, Laguerre,
      order 32, Reg with Xt
JLSMC-LGBM(Xt) 4.13e-2 487 s±7 s Computed at 200 points
      in body and tails of Xt,
      Outer=100K, LightGBM,
      Reg with Xt
GLSMC-Lag(Vt) 2.54 7.4 s±1.2 s Computed at 200 points
      in body and tails of Vt,
      Outer=100K,
      Laguerre, order 32,
      Reg with Vt
GLSMC-Lag(Xt) 2.67 8.1 s±0.9 s Computed at 200 points
      in body and tails of Xt,
      Outer=100K, Laguerre,
      order 32, Reg with Xt
GLSMC-LGBM(Xt) 2.57 245 s±4 s Computed at 200 points
      in body and tails of Xt,
      Outer=100K, LightGBM,
      Reg with Xt
Table 4: Comparison of performance metrics of various methods for the call combination instrument with 125 DIM instances.
Method RMSE Time Specification
Nested MC   2375 s±43 s Outer=100K,
      Inner=2000
Raw pseudo samples 3.5e-5 16.2 s±0.1 s Outer=100K,
      pseudo inner=200
Johnson percentile 1.25e-4 2568 s±18 s Outer=10K, Inner=200
Johnson percentile with 8.3e-5 27.1 s±0.3 s Outer=10K,
pseudo samples     pseudo inner=200,
      stride=10
Quantile regression 3.93e-4 269 s±12 s Outer=100K, LightGBM,
– LGBM(Vt)     Reg with Vt
Quantile regression 6.96e-4 273 s±13 s Outer=100K, LightGBM,
– LGBM(Xt)     Reg with Xt
Delta-Gamma 7.33e-4 42.4 s±0.9 s Outer=100K
Delta-Gamma-normal 7.7e-3 35.4 s±0.7 s Outer=100K
JLSMC-LGBM(Xt) 4.86e-4 2412 s±45 s Computed at 200 points
      in body and tails of Xt
      Outer=100K, LightGBM,
      Reg with Xt
JLSMC-Lag(Xt) 6.8e-3 63.4 s±2.3 s Computed at 200 points
      in body and tails of Xt
      Outer=100K, Laguerre,
      order 32, Reg with Xt
JLSMC-Lag(Vt) 1.47e-2 58.3 s±1.1 s Computed at 200 points
      in body and tails of Vt
      Outer=100K, Laguerre,
      order 32, Reg with Vt
GLSMC-LGBM(Xt) 8.94e-2 1220 s±13 s Computed at 200 points
      in body and tails of Xt,
      Outer=100K, LightGBM,
      Reg with Xt
GLSMC-Lag(Xt) 9.7e-2 33.7 s±1.3 s Computed at 200 points
      in body and tails of Xt,
      Outer=100K, Laguerre,
      order 32, Reg with Xt
GLSMC-Lag(Vt) 8.9e-2 30.1 s±0.9 s Computed at 200 points
      in body and tails of Vt,
      Outer=100K, Laguerre,
      order 32, Reg with Vt

Table 4 presents results for a call combination with DIM computed for 125 times corresponding to various methods.

In terms of running time, we see similar results: nested Monte Carlo, JMM Monte Carlo using LGBM regression and the Johnson percentile method take the longest, LGBM quantile regressions take a medium amount of time and the other methods take less than 100 seconds, with some taking less than 10 seconds.

In terms of RMSE, using pseudo samples raw seems to give the best accuracy, followed by Johnson percentile methods with pseudo samples followed by the Johnson percentile method, quantile regressions using LGBM and JMM Monte Carlo using LGBM. On the other end of the spectrum, GLSMC methods, JLSMC regressing against Vt (due to the nonuniqueness of the Vt with respect to Xt) and Delta-Gamma-normal methods give the least accurate results as measured by RMSE.

We see that even across these two relatively simple examples for which we can perform benchmarking with nested Monte Carlo with a large enough number of samples, a rather large variety of methods performs reasonably well in terms of RMSE accuracy, and none performs uniformly best.

9.1 Discussion and guidelines

It is illustrative to reflect on the requirements and constraints of each of these methods in light of the above results. The methods with pseudo samples outperform the traditional JLSMC and quantile regression methods in terms of accuracy and speed for these particular financial instruments with one-dimensional risk-factors (Xt) and portfolio values (Vt). However, with multiple risk factors in higher dimensions the number of pseudo samples that can be borrowed from within a higher-dimensional bin would be much sparser, thus leading to noisy estimates of quantiles. Second, the number of higher-dimensional bins that need to be considered grows rapidly with the dimensions, thus forcing us to resort to Vt-based pseudo samples for quantile estimation. Further, even for one-dimensional problems, it is necessary to have a sufficient number of samples within a neighborhood of Vt or Xt, in order to accurately capture the properties of the distribution. The number of active samples within any neighborhood of Vt or Xt becomes sparse with a higher volatility and time horizon, again leading to noisy estimates. Hence, the pseudo-sample-based methods are ideally suited for problems with rich outer-simulation data.

Methods with moment regression (JLSMC) give medium accuracy, wherein it is necessary to have a good estimate of moment values, with methods and distributions that can fit higher moments typically performing better (JLSMC performing better than GLSMC).

Any inconsistencies in moment estimation leads to undefined distributions, thus necessitating moment correction. Hence, moment regression methods are particularly sensitive to the type of moment regression and the corresponding moment-matching procedure. We also see that moment regression methods are relatively sensitive to the exact way they are set up, thresholds, etc.

The quantile regression (LGBM quantile regression and deep quantile regression) procedures perform best with a sufficient number of outer samples as they aim to minimize the quantile regression loss function. However, the computational requirements grow with the number of available samples (outer paths) while requiring sophisticated optimization frameworks. It is seen that LGBM quantile regression produces results comparable to pseudo sample methods, as it employs a series of tree-based optimizations and empirical estimation of quantiles in order to minimize the quantile loss function at the cost of a longer running time. On the other hand, the deep quantile regression procedure involves minimizing the quantile loss function via neural networks over several thousand minimization steps, which can become impractical for a large number of DIM steps, unless the training of and optimization over the DNN can be sped up substantially.

The objective of this study is to present the details and procedures of various methods used for estimation of fVaR and initial margin, along with an objective evaluation of their accuracies and performance. The choice of method for a particular problem or financial instrument depends on a variety of constraints and requirements as outlined in Table 1, and there is probably not a single method that can be applied for all financial instruments under all scenarios.

10 Conclusion

This paper discussed several methods to estimate fVaR or margin requirements and their expected time evolution (also called DIM), from simple options to more complex IR swaps. The methods analyzed in this paper are suited for instruments whose portfolio values are known or can be approximated well pathwise. In order to establish a baseline, the nested Monte Carlo method, which is traditionally considered as a benchmark, was used to validate other classical methods to estimate DIM.

Under the current assumptions, the accuracy of these methods was studied by taking into account necessary computational resources (nested Monte Carlo, JMM, quantile regression and JPP) and when first- and second-order sensitivities were available (Delta-Gamma methods). Further, the differences between the estimated DIM curves were highlighted between the simpler GLSMC and Delta-Gamma-normal and their more complex counterparts, JLSMC and Delta-Gamma methods. It was shown that standard moment regression methods could lead to violations of the skew–kurtosis relationship and that a simple correction procedure could lead to valid Johnson distributions and better performance. Conditional quantile regression with or without deep learning was a suitable technique with the help of an optimization framework, but could, however, demand substantial computational resources for a large number of DIM steps.

The above methods were also compared for a limited number of inner simulation runs to see whether such inner samples would further improve or refine the conditional quantile estimates. While all the methods benefitted from available inner samples, it was seen that quantile regression tended to be more robust with a sufficiently large number of outer simulation runs, even without additional inner samples.

Finally, for the techniques where inner samples were necessary, such as nested Monte Carlo and JPP, it was possible to replace them with pseudo inner samples under specific conditions, with comparable performance and much faster running times. These results were demonstrated for FX call and call combination instruments.

Declaration of interest

The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper. The views expressed in the paper are those of the authors and do not represent the views of Wells Fargo.

Acknowledgements

The authors thank Vijayan Nair for valuable discussions and comments regarding methods (in particular his suggestions to explore pseudo sample approaches), presentation and results, and Agus Sudjianto for supporting this research.

References

  • Albanese, C., Crepey, S., Hoskinson, R., and Saadeddine, B. (2020). XVA analysis from the balance sheet. Quantitative Finance 21(1), 99–123 (https://doi.org/10.1080/14697688.2020.1817533).
  • Alexander, C. (2008). Value-at-Risk Models. Market Risk Analysis, Volume 4. Wiley.
  • Andersen, L. B., Pykhtin, M., and Sokol, A. (2017). Rethinking the margin period of risk. The Journal of Credit Risk 13(1), 1–45 (https://doi.org/10.21314/JCR.2016.218).
  • Basel Committee on Banking Supervision (2019). Minimum capital requirements for market risk. Standards, Bank for International Settlements. URL: http://www.bis.org/bcbs/publ/d457.pdf.
  • Basel Committee on Banking Supervision–Board of the International Organization of Securities Commissions (2020). Margin requirements for non-centrally cleared derivatives. Standards, April, Bank for International Settlements. URL: http://www.bis.org/bcbs/publ/d499.pdf.
  • Caspers, P., Giltinan, P., Lichters, R., and Nowaczyk, N. (2017). Forecasting initial margin requirements: a model evaluation. Working Paper, Social Science Research Network (https://doi.org/10.2139/ssrn.2911167).
  • CME Group (1988). CME SPAN methodology overview. URL: http://www.cmegroup.com/clearing/risk-management/span-overview.html.
  • George, F., and Ramachandran, K. M. (2011). Estimation of parameters of Johnson’s system of distributions. Journal of Modern Applied Statistical Methods 10(2), 494–504 (https://doi.org/10.22237/jmasm/1320120480).
  • Hill, I., Hill, R., and Holder, R. (1976). Algorithm AS 99: fitting Johnson curves by moments. Journal of the Royal Statistical Society C 25(2), 180–189 (https://doi.org/10.2307/2346692).
  • International Swaps and Derivatives Association, Inc. (2018). ISDA SIMM, Version 2.1. URL: http://www.isda.org/a/zSpEE/ISDA-SIMM-v2.1-PUBLIC.pdf.
  • International Swaps and Derivatives Association, Inc. (2021). ISDA SIMM, Version 2.4. URL: http://www.isda.org/a/CeggE/ISDA-SIMM-v2.4-PUBLIC.pdf.
  • Johnson, N. L. (1949). Systems of frequency curves generated by methods of translation. Biometrika 36(1/2), 149–176 (https://doi.org/10.1093/biomet/36.1-2.149).
  • Jones, D. (2014). Johnson curve toolbox for MATLAB: analysis of non-normal data using the Johnson family of distributions. College of Marine Science, University of South Florida, St. Petersburg, FL. URL: https://bit.ly/3yLgbap.
  • Koenker, R., and Hallock, K. F. (2001). Quantile regression. Journal of Economic Perspectives 15(4), 143–156 (https://doi.org/10.1257/jep.15.4.143).
  • McWalter, T., Kienitz, J., Nowaczyk, N., Rudd, R., and Acar, S. K. (2018). Dynamic initial margin estimation based on quantiles of Johnson distributions. Working Paper, Social Science Research Network (https://doi.org/10.2139/ssrn.3147811).
  • OCC–BGFRS–FDIC–FCA–FHFA (2015). Margin and capital requirements for covered swap entities. Rule by Office of the Comptroller of the Currency (Treasury), Board of Governors of the Federal Reserve System, Federal Deposit Insurance Corporation, Farm Credit Administration and the Federal Housing Finance Agency. Federal Register 80(229), 74 840–74 914.
  • Shreve, S. (2004). Stochastic Calculus for Finance II: Continuous-Time Models. Springer (https://doi.org/10.1007/978-1-4757-4296-1).
  • Slifker, J. F., and Shapiro, S. S. (1980). The Johnson system: selection and parameter estimation. Technometrics 22(2), 239–246 (https://doi.org/10.1080/00401706.1980.10486139).

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@risk.net or view our subscription options here: http://subscriptions.risk.net/subscribe

You are currently unable to copy this content. Please contact info@risk.net to find out more.

You need to sign in to use this feature. If you don’t have a Risk.net account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here