Journal of Risk Model Validation

Risk.net

Nonparametric tests for jump detection via false discovery rate control: a Monte Carlo study

Kaiqiao Li, Kan He, Lizhou Nie, Wei Zhu and Pei Fen Kuan

  • Combining multiple nonparametric tests via p-value pooling approach with dependence adjustment can improve the performance of jump detection in high frequency financial data.
  • The reproducibility of the proposed framework was assessed via correspondence curves (local independence) and irreproducible discovery rate (reproducibility across replicates).
  • False discovery rate adjustment is recommended to account for multiple hypothesis testing in jump detection.

Nonparametric tests are popular and efficient methods of detecting jumps in high- frequency financial data. Each method has its own advantages and disadvantages, and their performances may be affected by underlying noise and dynamic structures. To address this, we proposed a robust p-value pooling method that aims to combine the advantages of each method. We focus on model validation within a Monte Carlo framework to assess the reproducibility and false discovery rate (FDR). Reproducible analyses via a correspondence curve and an irreproducible discovery rate were analyzed with replicates to study local dependency and robustness across replicates. Extensive simulation studies of high-frequency trading data at a minute level were carried out, and the operating characteristics of these methods were compared via the FDR control framework. Our proposed method was robust across all scenarios under reproducibility and FDR analysis. Finally, we applied this method to minute-level data from the limit order book system: the efficient reconstruction system (LOBSTER). An R package JumpTest implementing these methods has been made available on the Comprehensive R Archive Network.

1 Introduction

Significant discontinuities known as jumps can occur in numerous scenarios, including derivatives valuation, risk management and asset allocation in financial markets (Merton 1976; Jarrow and Rosenfeld 1984; Duffie and Pan 2001). As jumps occur irregularly, they cannot be explained by a regular financial model. Failure to account for the jumps appropriately will yield inaccurate model estimates, which, in turn, will result in terrible forecasts in high-frequency trading. Jump processes can also lead to incomplete market and hedging errors (Naik and Lee 1990).

A standard procedure in jump detection is to calculate the test statistic for a specific period, followed by evaluating its statistical significance via the computed p-value. The null hypothesis for the test at each period i, where i=1,,M, is defined as

  H0i:there are no jumps in period i.  

Several nonparametric statistical methods have been proposed to achieve this goal, including the Barndorff-Nielsen and Shephard (2006) (BNS), Andersen et al (2007) and Lee and Mykland (2007) (ALM), Aït-Sahalia and Jacod (2009) (AJ), Jiang and Oomen (2008) (JO) and Andersen et al (2012) (Amin and Amed) tests for high-frequency data.

To date, there has been limited work on evaluating and validating these nonparametric jump-detection tests simultaneously in a systematic manner. One example of such research, however, is provided by Dumitru and Urga (2012), who compare the different jump-detection tests within the classical type I error control framework. Since each jump-detection test has its own advantages and disadvantages, one way to circumvent the need for choosing a particular test is to combine them into a single approach. This can be done by pooling the p-values from these tests, for example, via Fisher’s log-transformed p-value pooling using a chi-square distribution (FI) (Fisher 1925), via Stouffer’s inverse normal transformed p-value pooling (SI) (Stouffer et al 1950), via the minimum p-value method (MI) (Tippett 1931) or via the maximum p-value method (MA) (Wilkinson 1951). Comparisons of these methods were carried out in Won et al (2009) and Chang et al (2013). The resulting combined test is expected to be more robust than any of its components individually.

The abovementioned p-value combination approaches are valid if the p-values are independent of each other. However, in jump-detection testing scenarios, the p-value for each test is computed within the same period, resulting in dependent p-values. In this paper, we also consider p-value pooling approaches that account for the dependence structure, including modified Fisher’s and Stouffer’s combined p-values under dependence by Kost and McDermott (2002) and Hartung (1998). A comparison of the dependent p-value pooling methods with their independent counterparts has previously been provided in Alves and Yu (2014).

A typical way to evaluate the performance of the different nonparametric tests is to compare the power of these tests within a classical type I error control framework. However, the analysis of high-frequency data such as stock prices involves consecutive time periods; it thus falls within the multiple-testing framework. In this paper, we consider the notion of a false discovery rate (FDR) to adjust for multiple testing. Several FDR adjustment procedures are available: for example, the seminal work of Benjamini and Hochberg (1995) (BH) offers an FDR that achieves optimality under an uncorrelated tests scenario, whereas Storey (2003) introduces the positive false discovery rate (pFDR), which has a Bayesian interpretation, and the q-value (QV), which can be interpreted as a pFDR version of the p-value. In a related work by Yen (2013), an FDR is applied to detect jumps in financial data. The author uses the BH method to control for FDRs on the p-values computed from the BNS test.

The main goal of this paper is to perform a comprehensive nonparametric jump-detection model comparison and validation. To this end, we design an extensive Monte Carlo study to compare and validate these tests. We also investigate different ways of combining the statistical significance of the tests while accounting for the dependence among p-values, and we carry out a comparison of the different FDR control procedures. Finally, we assess the consistency and reproducibility of the tests via the correspondence curve and the irreproducible discovery rate (IDR) of Li et al (2011), which was developed from empirical survival copulas. We first describe the nonparametric tests, p-value pooling methods, reproducibility framework and FDR procedures in Section 2. This is followed by extensive simulation in Section 3 and a case study on stock trading data in Section 4. We conclude with a discussion in Section 5.

2 Methods

We start our exposition by describing the key steps in detecting jumps in stock prices. We first introduce the stochastic volatility (SV) model, a widely used model for financial data. Let st and Pt denote the logarithmic and original asset price at time t, respectively. The logarithmic price process often takes the following form:

  dst=μtdt+σtdWt+dJt,  

where μt is the drift, σt is the diffusion parameter, Wt is a Brownian motion and Jt is the jump process. The return in finance is defined as Rt=(St/St-1)-1 and the log return is defined as rt=st-st-1.

The realized variance (RV) of the price with M data points at time t is

  RVt=i=1Mri2,  

whereas the realized bipower variation (BV) at time t is

  BVt=π2MM-1i=1M-1|ri||ri+1|?σs2ds,  

which converges in probability (P) to the integrated variance σs2ds.

Next, we describe the nonparametric tests considered in this paper.

2.1 Nonparametric tests for jump detection

2.1.1 Barndorff-Nielsen and Shephard (BNS) test

Barndorff-Nielsen and Shephard (2006) developed a jump test that used the realized variance and realized bipower variation (Barndorff-Nielsen and Shephard 2004). They showed that the realized bipower variation (BVt) is a consistent estimator of the integrated variance (t-1tσs2ds) (the diffusion parameter in the log-price process). The difference between RVt and BVt allows us to infer whether any jumps occur in a time interval. The presence of a jump is indicated by RVt>BVt. Since the standard deviation of (RVt-BVt)/RVt is unknown, it can be determined by obtaining an estimate of the integrated quarticity t-1tσ4(s)ds as a scaled standard deviation using the jump-robust realized tripower quarticity proposed by Andersen et al (2003):

  TPt=μ4/3-3(M2M-2)i=3M|ri-2ri-1ri|4/3,  

where μx=E|Zx|, ZN(0,1).

Based on these ideas, Huang and Tauchen (2005) developed a ratio test that converges in distribution (L) to a standard normal given by

  z=(RVt-BVt)/RVt((1/M)(π/2)2+π-5)max(1,TPt/BVt2)?N(0,1).  

2.1.2 Andersen, Lee and Mykland (ALM) test

Andersen et al (2007) and Lee and Mykland (2007) introduced a robust nonparametric test for detecting significant discontinuities or jumps. The proposed test statistic L(i) is used to determine whether any jumps exist during the time period (ti-1,ti):

  L(i)lnS(ti)-lnS(ti-1)σ(ti)^,  

where S(ti) is the asset price at time ti, and σ(ti)^ is the instantaneous bipower variation estimated under the null hypothesis; this is defined as

  σ(ti)^21K-2j=i-K+2i-1|lnS(tj)S(tj-1)lnS(tj-1)S(tj-2)|,  

where K is the size of the window. The presence of jumps within the time period (ti-1,ti) can result in a biased estimation of σ(ti)^. However, one can select K appropriately to reduce the potential bias. An empirically tuned K value is 252×nobs, where nobs is the number of observations per day. By setting K in this manner, the effect of jumps on instantaneous bipower variation can be eliminated without introducing extra computational burden. As a result, the presence of earlier jumps does not influence jump detection in the targeted period, which makes the ALM test consistent and robust.

When there is no jump in the time period (ti-1,ti), the test statistic L(i) is asymptotically normally distributed. However, when there are jumps in the period, the test statistic L(i) approaches infinity if the interval between neighboring observations approaches zero. Thus, the rejection region for the test can be derived based on a limiting distribution, as follows:

  maxiA¯M|L(i)|-CMTM?ξ,  

where ξ satisfies P(ξx)=exp(-e-x),

  CM=2lnMc-lnπ+lnlnM2c2lnM  

and

  TM=1c2lnM;  

A¯n is the set of i{1,2,,M}, M is the number of observations and c=2/π.

2.1.3 Aït-Sahalia and Jacod (AJ) test

Another popular test for detecting jumps in asset returns is the Aït-Sahalia and Jacod (AJ) test (Aït-Sahalia and Jacod 2009). The AJ test is applicable to all Itô semimartingale processes, and the variation of jump magnitude has no influence on the performance of this test.

The AJ test statistic takes the form

  G^(p,k,Δn)t=B^(p,kΔn)tB^(p,Δn)t,  

where

  B^(p,Δn)t=i=1[M/Δn]|siΔn-s(i-1)Δn|p  

and siΔn represents the log price at time iΔn. V^n,M is the variance of the test statistic, based on a bipower variation estimator derived by Barndorff-Nielsen and Shephard (2004) and Barndorff-Nielsen et al (2006b). Here, V^n,M can be estimated using the following formula:

  V^n,M =ΔnM(p,k)A^(p/([p]+1),2[p]+2,Δn)tA^(p/([p]+1),[p]+1,Δn)t2,  
  A^(r,q,Δn)t =Δn1-(qr/2)mrqi=1[M/Δn]-q+1j=1q|Δi+j-1nX|r,  
  M(p,k) =1mp2[kp-2(1+k)m2p+kp-2(k-1)mp2-2k(p/2)-1mk,p],  
  mr =E|U|r,  
  mk,p =E|U(U+k-1V)|p,  
  U,V iidN(0,1).  

Aït-Sahalia and Jacod (2009) suggested using p>3 and k2, and the resulting test statistics is

  zt=V^n,t-1/2(G^(p,k,Δn)t-k(p/2)-1)?N(0,1).  

2.1.4 Jiang and Oomen (JO) test

Based on the bipower variation result of Barndorff-Nielsen and Shephard (2004), Jiang and Oomen (2008) developed a test called the “swap variance” test. The swap variance is equal to the summation of differences between the geometric returns and the arithmetic returns, ie,

  SwVt=2i=2M(Ri-ri),  

where Ri and ri are the daily return and log return, respectively, and

  (SwVt-RVt)P{0if no jumps in [0,t],20t(exp(Jt)-12Jt2-Jt2-1)dqtif jumps in [0,t].  

The difference between SwV and RV can be used for detecting jumps using the bipower variation. The Taylor series expansion justifies how this test exploits the impact of jumps on the higher-order moments, ie,

  SwVt-RVt=13i=2Mri3+112i=2MMri4+.  

A relatively large difference between SwV and RV (either positive or negative) indicates the presence of jumps (Barndorff-Nielsen et al 2006b, a). Thus, the test statistics under the null hypothesis of no jumps can be derived as

  zt =MBVtΩSwV(1-RVtSwVt)?N(0,1),  
  ΩSwV =μ69M3μ3/2-4M-3i=5M|riri-1ri-2ri-3|3/2.  

2.1.5 Andersen minimum (Amin) and median (Amed) tests

Andersen et al (2012) derived robust tests based on minimum (minRV) and median (medRV) realized variance, respectively:

  minRVt =πM(π-2)(M-1)i=2Mmin(|ri|,|ri-1|)2,  
  medRVt =πM(6-43+π)(M-2)i=3Mmed(|ri|,|ri-1|,|ri-2|)2.  

The minimum (minRQ) and median (medRQ) realized quarticities follow

  minRQt =πM2(3π-8)(n-1)i=2Mmin(|ri|,|ri-1|)4,  
  medRQt =3πM2(9π+72-523)(M-2)i=3Mmed(|ri|,|ri-1|,|ri-2|)4.  

The author showed that

  1-(minRVt/RVt)1.81δmax(1,minRQt/minRVt2)?N(0,1),  
  1-(medRVt/RVt)0.96δmax(1,medRQt/medRVt2)?N(0,1).  

2.2 Combined tests via p-value pooling

Since each proposed nonparametric test has its own advantages in different settings, a more robust procedure that avoids choosing a single test is to aggregate the results from these tests. In this paper, we consider the p-value pooling approach (Kuan and Huang 2013) to combine the statistical significance from the nonparametric tests. Without loss of generality, we assume that we have k candidate tests for a null hypothesis and k corresponding p-values p1,,pk. In the following subsections, we introduce several popular p-value pooling methods that are widely used in meta analysis.

2.2.1 Maximum p-value (MA) method

The MA method works by selecting the maximum p-value from the list of p-values. Under the null hypothesis, the maximum p-value follows a beta distribution with parameters α=k, β=1 if the p-values are independent. Within the context of jump detection, suppose we have p-values p1t,,pkt at time t; if they are independent, then

  maxpitbeta(k,1).  

2.2.2 Minimum p-value (MI) method

In contrast, the MI method works by selecting the minimum p-value from the list of p-values. Under the null hypothesis, the minimum p-value follows a beta distribution with parameters α=1, β=k if the p-values are independent.

2.2.3 Fisher’s independent (FI) method

Fisher’s method is a very effective tool for combining p-values from individual tests. Under the null hypothesis, and if the p-values from the k individual tests are independent, the sum of -2 log-transformed p-values follows a χ2 distribution with 2k degrees of freedom:

  τt=-2i=1klogpitχ2k2.  

Based on the definition of Fisher’s method, smaller p-values will make a larger contribution to Fisher’s statistic.

2.2.4 Stouffer’s independent (SI) method

Stouffer’s method is also known as the transformed Z-score test. The transformed z-statistic is given by

  zit=Φ-1(1-pit),  

where pit is the ith p-value at time period t and Φ is the standard normal distribution function. The Z-score,

  Z=i=1kzitk,  

follows a standard normal distribution under the null hypothesis and p-value independence assumption. To avoid directional conflict, Stouffer’s method is usually applied to one-sided p-values. The method can also be applied to two-sided p-values with some modifications (Kuan and Huang 2013).

The abovementioned p-value pooling methods work well if tests are independent of each other. However, in jump detection, each of the nonparametric tests shares similar goals (ie, identifying jumps within each time period), which gives rise to dependent p-values. Hence, these methods may not be sufficient to combine the p-values. In the following subsections, we consider modified versions of Fisher’s and Stouffer’s methods to account for the dependence among p-values.

2.2.5 Fisher’s dependent (FD) method

Fisher’s method can be extended to account for p-value dependence by modifying the distribution of τt under the null hypothesis (Kost and McDermott 2002), as follows:

  τtcχf2,  

where

  c=2k+i<jcov(-2lnpit,-2lnpjt)2k  

and

  f=4k22k+i<jcov(-2lnpit,-2lnpjt).  

Since the covariance matrix of the log transformation of p-values is unknown, one can estimate this quantity using an empirical approach following Kost and McDermott (2002).

In addition, let ρij be the correlation between pit and pjt, which can be estimated using the sample correlation, whereas

  cov(-2lnpit,-2lnpjt)3.263ρij+0.710ρij2+0.027ρij3,  

following Alves and Yu (2014).

2.2.6 Stouffer’s dependent (SD) method

Hartung (1998) introduced a method to combine the z-statistics derived from Stouffer’s method for dependent p-values. The modified Stouffer test statistic is

  z~t =i=1kzitk[1+(k-1)Eρ],  
  Eρ =2i<jρijk(k-1),  

where ρij is the correlation between zit and zjt, which can be estimated using the sample correlation. Under the null hypothesis, we have

  z~tN(0,1).  

2.3 Model validation strategy I: multiple-testing correction via an FDR procedure

Our jump-testing approach falls within a multiple-testing framework as it involves testing M hypotheses or time periods (windows) simultaneously. Inferences based on p-values computed from the methods described in previous sections only control for type 1 errors. However, the test yields too many false positives in a multiple-testing setting without appropriate adjustment. One of the most popular approaches to account for multiple-testing issues is FDR control. Table 1 summarizes the possible outcomes of testing M hypotheses simultaneously.

Table 1: Classification of the M hypothesis tests.
  Not    
  significant Significant Total
True null hypothesis U V M0
False null hypothesis T S M-M0
Total M-R R M

Based on Table 1, the FDR is defined as

  FDR={0,R=0,V/R,R0,  

whereas the false nondiscovery rate (FNR) is defined as

  FNR={0,m=R,T/(m-R),mR.  

In a multiple-testing framework, we aim to minimize the FNR while controlling for the FDR.

2.3.1 Benjamini and Hochberg (BH) method

The seminal work on FDRs was conducted by Benjamini and Hochberg (BH) and introduced in Benjamini and Hochberg (1995). Suppose p1,,pM are the p-values for tests H1,,HM and that p(1),,p(M) are the ordered p-values. H0(i) is the null hypothesis corresponding to p(i). The BH method chooses the largest k that satisfies

  p(k)kMq*,  

where q* is the FDR threshold, often set at 0.05. We reject all H(i) corresponding to ik.

2.3.2 Storey q-value (QV) method

An alternative approach to controlling the FDR is based on the q-values introduced by Storey (2002), which eliminate the need to preset an error rate. The q-values can be interpreted in terms of the positive false discovery rate (pFDR) of the p-values, where

  pFDR=E(VR|R>0).  

The q-value is then defined as

  q^(p(i))={pFDR^(p(i)),i=m,min{pFDR^(p(i)),q^(p(i+1))},i<m.  

2.4 Model validation strategy II: reproducibility

Another important aspect of comparing the competing nonparametric methods for jump detection is assessing the consistency of these tests. We adapted the correspondence curves and the IDR derived by Li et al (2011) to compare the reproducibility of the different tests in replicated experiments. A brief description of the correspondence curves and IDR is provided below.

2.4.1 Correspondence curves

The correspondence curve uses the survival copula (Nelsen 1999) and was developed to visualize the local dependency of the signals on replicated pairs. When comparing jump-detection tests, correspondence curves can be adapted to demonstrate the dependencies among different tests under the null hypothesis and the alternative hypothesis separately, which can potentially be more informative than correlations. The correspondence curve consists of two parts: the empirical survival copula Ψ and the corresponding derivative Ψ. When the two replicates are perfectly correlated, the Ψ curve falls on the diagonal line while the Ψ curve falls on the horizontal line. Meanwhile, if the two replicates are independent, Ψ falls on the parabola t2 while Ψ falls on the line 2t. Using these curves, we can assess the dependency at different quantiles. We used correspondence curves to compare the reproducibility across different jump-detection methods.

2.4.2 IDR

The IDR is a quantity that is used to measure the reproducibility of signals based on a Gaussian–Bernoulli mixture copula model, and it represents a posterior probability that the replicates are not reproducible. The IDR is similar to the FDR in that it views these replicates as generated from a mixture of nonreproducible (or null hypothesis, in the context of the FDR) and reproducible (or alternative hypothesis, in the context of the FDR) signals. The estimated IDR was used to compare the reproducibility of each jump-detection method (individual tests and p-value pooling approaches) across replicated runs, where a robust method should be reproducible across replicates.

3 Model evaluation via simulation studies

3.1 Simulation setting

SV models are often used to model high-frequency data. In our simulation studies for evaluating the performance of competing jump-detection methods, we consider two models: the one-factor SV (SV1F) and the two-factor SV (SV2F). Motivated by our case study in Section 4 that consists of five-year stock data (1200 days), we designed our simulation studies to mimic the real data, ie, we modeled the log prices for 1200 days with 100 replications generated for each model. All of the simulations were performed in R.

Specifically, our simulation models follow the SV1F and SV2F models of Chernov et al (2003) for the log-price process. The log-price equation for the SV1F model is

  dst =0.03dt+exp(0.125vt)dWpt+dJ(t),  
  dvt =αvvtdt+dWvt,  
  corr(dWp,dWv) =-0.62,  
  J(t) =j=1N(t)D(t,j),D(t,j)iidN(0,1),  
  N(t) iidPoisson(λdt),  

where pt is the log-price process, the W are standard Brownian motions, vt is the volatility factor and αv is the drift of the volatility process. The parameter λ represents the frequency of jumps. The log-price equation for the SV2F model takes the form

  dst =0.03dt-exp(-1.20+0.04v1t+1.50v2t)dWpt,  
  dv1t =-0.137e-2v1tdt+dWv1t,  
  dv2t =-1.386v2tdt+(1+0.25v2t)dWv2t.  

The SV2F model itself has relatively high volatility; thus, it contains false positive jumps. We set the parameters in the SV1F and SV2F models following the settings of Huang and Tauchen (2005) and Dumitru and Urga (2012).

3.2 Assessing empirical type 1 error in the SV model

We simulated the finance data for log price using the SV1F and SV2F models (Chernov et al 2003) under the null hypothesis (no jump). We considered window sizes of one, five, ten and fifteen minutes. The results are presented in Table 2.

Table 2: Empirical type 1 error of jump tests on SV1F model.
  Procedure 1 min 5 min 10 min 15 min
SV1F 0BNS 0.053 0.055 0.056 0.057
  0ALM 0.061 0.062 0.058 0.055
  0AJ 0.046 0.050 0.059 0.088
  0JO 0.063 0.085 0.106 0.123
  0Amin 0.047 0.043 0.040 0.039
  0Amed 0.050 0.052 0.055 0.057
SV2F 0BNS 0.075 0.098 0.108 0.113
  0ALM 0.699 0.476 0.385 0.336
  0AJ 0.071 0.121 0.152 0.209
  0JO 0.095 0.158 0.202 0.236
  0Amin 0.067 0.085 0.090 0.092
  0Amed 0.074 0.104 0.118 0.127

Based on the results of the SV1F model, the JO test showed an inflated type 1 error. However, the SV2F model could exhibit relatively high volatility because of the second factor, and the ALM test did not perform well under this model. Therefore, we dropped the JO and ALM tests from subsequent comparisons.

3.3 Assessing empirical power in the SV model

We performed a power analysis-based SV1F model with 20% of days including jumps. Our results are presented in Figure 1.

Empirical power of jump tests on SV1F model with 20% of days including jumps.
Figure 1: Empirical power of jump tests on SV1F model with 20% of days including jumps.

The power curve in this figure shows that the BNS, Amin and Amed tests have similar power, while the AJ test’s power decreased rapidly with frequency, ie, increasing window sizes. Therefore, we also dropped the AJ test from subsequent comparisons.

3.4 Reproducibility analysis

We simulated 1200 days with ten replications and set the window size to be every five minutes. We simulated 50% of the windows to contain jumps, as in Li et al (2011). The correspondence curves (see Figure 2) showed that the BNS, Amin and Amed tests are highly correlated under the alternative hypothesis. These tests are also highly correlated under the null hypothesis, especially the BNS and Amin tests. This suggests that one should consider applying a p-value pooling of these three tests with correlation correction, namely, the FD and SD methods for correlated p-values.

Correspondence curves between three selected tests. (a) BNS versus Amin. (b) BNS versus Amin. (c) BNS versus Amed. (d) BNS versus Amed. (e) Amin versus Amed. (f) Amin versus Amed.
Figure 2: Correspondence curves between three selected tests. (a) BNS versus Amin. (b) BNS versus Amin. (c) BNS versus Amed. (d) BNS versus Amed. (e) Amin versus Amed. (f) Amin versus Amed.
Irreproducible discovery rate (IDR) curves for different methods.
Figure 3: Irreproducible discovery rate (IDR) curves for different methods.

Next, to analyze the performance of the different p-value pooling methods, we simulated the p-values directly for 1200 days for two correlated replicates so that we could study the empirical IDR within the same test. We repeated the simulations 100 times. Based on the mean empirical IDR curves (see Figure 3), the SI, SD and FD methods have the lowest IDRs; thus, they are most reproducible across replications.

3.5 Assessing p-value pooling strategies with multiple-testing correction

In this subsection, we examined the p-value pooling approaches under the different FDR control procedures: namely the BH (Benjamini and Hochberg 1995) and QV (Storey 2002, 2003) methods. We pooled the p-values from the BNS, Amin and Amed tests via Fisher’s (FI/FD), Stouffer’s (SI/SD) and the minimum and maximum p-value (MI/MA) methods. Our results are presented in Figure 4, which shows that the MA, FI and SI methods exhibit inflated empirical FDRs. However, the MI, FD and SD methods control FDR and are robust to the sampling frequency. The BNS, Amin and Amed tests appear to be relatively sensitive to sampling frequency; thus, one might consider pooling them together to increase the robustness. Based on our earlier simulation studies, we suggest using Stouffer’s method with the correlation adjustment (SD) method. The correlation term in the SD method should be estimated under the null hypothesis; thus, we omitted the smallest 20% of p-values when estimating the correlation between the three tests.

Empirical false discovery rates (FDRs) and false nondiscovery rates (FNRs) of different methods. (a) BNS. (b) Amin. (c) Amed. (d) MI. (e) MA. (f) FI. (g) FD. (h) SI. (i) SD. Horizontal gray lines at 0.05 represent the nominal FDR value.
Figure 4: Empirical false discovery rates (FDRs) and false nondiscovery rates (FNRs) of different methods. (a) BNS. (b) Amin. (c) Amed. (d) MI. (e) MA. (f) FI. (g) FD. (h) SI. (i) SD. Horizontal gray lines at 0.05 represent the nominal FDR value.

4 Model validation via case study

To further assess our proposed method, we applied it to real data consisting of five stocks – General Electric (GE), Disney, IBM, JP Morgan (JPM) and Procter & Gamble (PG) – for 2009–13, similar to Dumitru and Urga (2012). The data was downloaded from the limit order book system: the efficient reconstruction system (LOBSTER) (Huang and Polak 2011). The performance of our proposed method is presented in Table 3.

The large proportion of significant tests indicates that inferences based on the original p-values will lead to a large number of false rejections. The QV adjustment method that is based on estimating the alternative hypothesis often exhibits a large number of rejections when the proportion of alternative hypotheses is relatively high. Therefore, in our case study, especially for one-minute-level data, the QV method identified a large number of jumps. The results based on Stouffer’s method with correlation adjustment (SD) under BH FDR control are consistent and stable.

Table 3: Proportion of jumps declared by the SD combined with a BH adjustment.
Company Method 1 min 5 min 10 min 15 min
GE Original 0.371 0.245 0.215 0.215
  BH 0.222 0.082 0.064 0.041
  QV 0.402 0.188 0.131 0.089
Disney Original 0.473 0.303 0.251 0.242
  BH 0.407 0.167 0.120 0.104
  QV 0.899 0.292 0.205 0.201
IBM Original 0.312 0.259 0.242 0.233
  BH 0.180 0.132 0.092 0.081
  QV 0.317 0.264 0.168 0.128
JPM Original 0.280 0.223 0.224 0.228
  BH 0.126 0.053 0.067 0.067
  QV 0.300 0.113 0.165 0.133
PG Original 0.493 0.334 0.313 0.287
  BH 0.387 0.236 0.201 0.176
  QV 0.970 0.402 0.294 0.257

A large number of jumps were declared using one-minute data; the proportion decreased gradually with five-, ten- and fifteen-minute data. One potential problem with one-minute data is the presence of microstructure noise in high-frequency data (Dumitru and Urga 2012). Therefore, we suggest using SD p-value pooling together with a BH FDR adjustment for five-minute data when jump testing.

5 Conclusion

This paper offers several contributions to financial jump detection including evaluating and validating the existing nonparametric jump-detection procedures. Based on results from a Monte Carlo study, we proposed a p-value pooling approach to improve detection accuracy. Various methods including parametric (eg, approaches based on continuous wavelet transformation (Chabert et al 1996) or adjusted for microstructure noise and SV (Bos 2008)) and nonparametric tests (Barndorff-Nielsen and Shephard 2006; Andersen et al 2007; Lee and Mykland 2007; Aït-Sahalia and Jacod 2009; Jiang and Oomen 2008; Andersen et al 2012) have been developed for jump detection. Parametric methods involve model-based procedures, which can be sensitive to underlying model assumptions. Nonparametric methods tend to be more robust, especially in high-frequency time series data, due to the volatility. Therefore, we selected six nonparametric tests that are widely used in analyzing minute-level data. The first step in our simulation study was to identity reliable methods to be included in the p-value pooling. In the SV1F simulation, we found that the JO test showed an inflated error rate (ie, detection false jumps) that could be attributed to its sensitivity to low volatility. For the AJ test, we used the default tuning parameters suggested by the authors and achieved a relatively good performance at the one-minute level. However, the AJ test exhibited a dramatic decrease in power with decreasing sampling frequency. The ALM test has the best performance under the SV1F model, but it has a significantly inflated type 1 error in the SV2F simulations because of the maximum local volatility estimates in each window, which can be sensitive to the microstructure noise. Therefore, we only included the BNS, Amin and Amed methods in our p-value pooling. Unlike regular p-value pooling for meta analysis, the different methods were applied to the same data; thus, the p-values computed from these methods were expected to exhibit high correlations. We included numerous p-value pooling methods under either independent or dependent structures in our simulation studies and showed that methods which account for the dependency yield better operating characteristics.

We also assessed the reproducibility of each method by adapting the correspondence curves and the IDR (Li et al 2011) to quantify the local dependency of the jump-detection methods. We found that these methods are highly correlated under the null hypothesis (no jumps) and exhibit significant correlation under the alternative hypothesis (including jumps). We also compared the irreproducibility discovery rate (IDR) to quantitatively evaluate the robustness of the jump-detection methods and the p-value pooling approaches. We found that the SI, SD and FD methods are more reproducible across replicates compared with individual jump-detection tests. The FDR control framework was also incorporated into this study to account for multiple hypothesis testings. A traditional family-wise error rate framework such as the Bonferroni procedure is often too conservative for jump testing within multiple time intervals simultaneously in practice. Our simulation results showed that the MI, SD and FD methods exhibited optimal performances in terms of both FDR and FNR. In our case study, we found that most methods declared a high proportion of jumps in one-minute-level data that could be attributed to the existence of microstructure noise (Dumitru and Urga 2012). However, the SD method coupled with a BH adjustment showed the most robust and consistent result in jump detection. Therefore, we suggest performing jump detection using the SD p-value pooling of the BNS, Amed and Amin methods as well as controlling for the FDR using a BH adjustment on five-minute-level data.

This paper can be extended by incorporating the microstructure noise caused by the high-frequency nature of the data in the model. One possible solution is to perform smoothing algorithms (eg, kernel or wavelet methods) on the data but with the caveat of smoothing away the jumps component if the bandwidth is not tuned appropriately. Finally, the estimation of the correlation structure in the p-value pooling approaches under dependency can potentially be improved via the Bayesian methods by imposing proper priors.

Declaration of interest

The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper.

References

  • Aït-Sahalia, Y., and Jacod, J. (2009). Testing for jumps in a discretely observed process. Annals of Statistics 37(1), 184–222 (https://doi.org/10.1214/07-aos568).
  • Alves, G., and Yu, Y.-K. (2014). Accuracy evaluation of the unified p-value from combining correlated p-values. PloS One 9(3), e91225 (https://doi.org/10.1371/journal.pone.0091225).
  • Andersen, T. G., Bollerslev, T., and Diebold, F. X. (2003). Some like it smooth, and some like it rough: untangling continuous and jump components in measuring, modeling, and forecasting asset return volatility. Preprint, Social Science Research Network (https://doi.org/10.2139/ssrn.473204).
  • Andersen, T. G., Bollerslev, T., and Dobrev, D. (2007). No-arbitrage semi-martingale restrictions for continuous-time volatility models subject to leverage effects, jumps and iid noise: theory and testable distributional implications. Journal of Econometrics 138(1), 125–180 (https://doi.org/10.1016/j.jeconom.2006.05.018).
  • Andersen, T. G., Dobrev, D., and Schaumburg, E. (2012). Jump-robust volatility estimation using nearest neighbor truncation. Journal of Econometrics 169(1), 75–93 (https://doi.org/10.1016/j.jeconom.2012.01.011).
  • Barndorff-Nielsen, O. E., and Shephard, N. (2004). Power and bipower variation with stochastic volatility and jumps. Journal of Financial Econometrics 2(1), 1–37 (https://doi.org/10.1093/jjfinec/nbh001).
  • Barndorff-Nielsen, O. E., and Shephard, N. (2006). Econometrics of testing for jumps in financial economics using bipower variation. Journal of Financial Econometrics 4(1), 1–30 (https://doi.org/10.1093/jjfinec/nbi022).
  • Barndorff-Nielsen, O. E., Graversen, S. E., Jacod, J., Podolskij, M., and Shephard, N. (2006a). A central limit theorem for realised power and bipower variations of continuous semimartingales. In From Stochastic Calculus to Mathematical Finance, pp. 33–68. Springer (https://doi.org/10.1007/978-3-540-30788-4_3).
  • Barndorff-Nielsen, O. E., Shephard, N., and Winkel, M. (2006b). Limit theorems for multipower variation in the presence of jumps. Stochastic Processes and Their Applications 116(5), 796–806 (https://doi.org/10.1016/j.spa.2006.01.007).
  • Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society B 57, 289–300 (https://doi.org/10.2307/2346101).
  • Bos, C. S. (2008). Model-based estimation of high frequency jump diffusions with microstructure noise and stochastic volatility. Technical Report, Tinbergen Institute (https://doi.org/10.2139/ssrn.967303).
  • Chabert, M., Tourneret, J.-Y., and Castanie, F. (1996). Performance of an optimal multiplicative jump detector based on the continuous wavelet transform. In 8th European Signal Processing Conference: EUSIPCO 1996, pp. 1–4. IEEE.
  • Chang, L.-C., Lin, H.-M., Sibille, E., and Tseng, G. C. (2013). Meta-analysis methods for combining multiple expression profiles: comparisons, statistical characterization and an application guideline. BMC Bioinformatics 14(1), 368 (https://doi.org/10.1186/1471-2105-14-368).
  • Chernov, M., Gallant, A. R., Ghysels, E., and Tauchen, G. (2003). Alternative models for stock price dynamics. Journal of Econometrics 116(1), 225–257 (https://doi.org/10.1016/s0304-4076(03)00108-8).
  • Duffie, D., and Pan, J. (2001). Analytical value-at-risk with jumps and credit risk. Finance and Stochastics 5(2), 155–180 (https://doi.org/10.1007/pl00013531).
  • Dumitru, A.-M., and Urga, G. (2012). Identifying jumps in financial assets: a comparison between nonparametric jump tests. Journal of Business and Economic Statistics 30(2), 242–255 (https://doi.org/10.1080/07350015.2012.663250).
  • Fisher, R. A. (1925). Statistical Methods for Research Workers. Genesis Publishing (https://doi.org/10.1038/155132a0).
  • Hartung, J. (1998). A note on combining dependent tests of significance. Technical Report 475: Komplexitätsreduktion in Multivariaten Datenstrukturen, Universität Dortmund (https://doi.org/10.1002/(sici)1521-4036(199911)41:7).
  • Huang, R., and Polak, T. (2011). Lobster: limit order book reconstruction system. Preprint, Social Science Research Network (https://doi.org/10.2139/ssrn.1977207).
  • Huang, X., and Tauchen, G. (2005). The relative contribution of jumps to total price variance. Journal of Financial Econometrics 3(4), 456–499 (https://doi.org/10.1093/jjfinec/nbi025).
  • Jarrow, R. A., and Rosenfeld, E. R. (1984). Jump risks and the intertemporal capital asset pricing model. Journal of Business 57(3), 337–351 (https://doi.org/10.1086/296267).
  • Jiang, G. J., and Oomen, R. C. (2008). Testing for jumps when asset prices are observed with noise – a “swap variance” approach. Journal of Econometrics 144(2), 352–370 (https://doi.org/10.1016/j.jeconom.2008.04.009).
  • Kost, J. T., and McDermott, M. P. (2002). Combining dependent p-values. Statistics and Probability Letters 60(2), 183–190 (https://doi.org/10.1016/s0167-7152(02)00310-3).
  • Kuan, P. F., and Huang, B. (2013). A simple and robust method for partially matched samples using the p-values pooling approach. Statistics in Medicine 32(19), 3247–3259 (https://doi.org/10.1002/sim.5758).
  • Lee, S. S., and Mykland, P. A. (2007). Jumps in financial markets: a new nonparametric test and jump dynamics. Review of Financial Studies 21(6), 2535–2563 (https://doi.org/10.1093/rfs/hhm056).
  • Li, Q., Brown, J. B., Huang, H., and Bickel, P. J. (2011). Measuring reproducibility of high-throughput experiments. Annals of Applied Statistics 5(3), 1752–1779 (https://doi.org/10.1214/11-aoas466).
  • Merton, R. C. (1976). Option pricing when underlying stock returns are discontinuous. Journal of Financial Economics 3(1–2), 125–144 (https://doi.org/10.1016/0304-405x(76)90022-2).
  • Naik, V., and Lee, M. (1990). General equilibrium pricing of options on the market portfolio with discontinuous returns. Review of Financial Studies 3(4), 493–521 (https://doi.org/10.1093/rfs/3.4.493).
  • Nelsen, R. B. (1999). Introduction. In An Introduction to Copulas, pp. 1–4. Springer (https://doi.org/10.1007/978-1-4757-3076-0_1).
  • Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society B 64(3), 479–498 (https://doi.org/10.1111/1467-9868.00346).
  • Storey, J. D. (2003). The positive false discovery rate: a bayesian interpretation and the q-value. Annals of Statistics 31(6), 2013–2035 (https://doi.org/10.1214/aos/1074290335).
  • Stouffer, S. A., Lumsdaine, A. A., Harper, M., Lumsdaine, R. M., Smith, M. B., Janis, I. L., Star, S. A., and Cottrell, L. S., Jr. (1950). The American Soldier: Combat and Its Aftermath. Princeton University Press (https://doi.org/10.2307/2086686).
  • Tippett, L. H. C. (1931). The Methods of Statistics. Williams & Norgate, London.
  • Wilkinson, B. (1951). A statistical consideration in psychological research. Psychological Bulletin 48(2), 156–158 (https://doi.org/10.1037/h0059111).
  • Won, S., Morris, N., Lu, Q., and Elston, R. C. (2009). Choosing an optimal method to combine p-values. Statistics in Medicine 28(11), 1537–1553 (https://doi.org/10.1002/sim.3569).
  • Yen, Y.-M. (2013). Testing jumps via false discovery rate control. PloS One 8(4), e58365 (https://doi.org/10.1371/journal.pone.0058365).

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@risk.net or view our subscription options here: http://subscriptions.risk.net/subscribe

You are currently unable to copy this content. Please contact info@risk.net to find out more.

You need to sign in to use this feature. If you don’t have a Risk.net account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here