Hedging of financial derivative contracts via Monte Carlo tree search

Oleg Szehr

Save this article

Need to know

We interpret derivatives hedging as a “game with the world”, where one player (the Investor) bets on what will happen and the other player (the Market) decides what will happen.
Inspired by the success of Monte Carlo Tree Search in specific games and general game play, we introduce this algorithm as a method for hedging in the presence of risk and market friction.
We show that Monte Carlo Tree search is capable of maximizing the utility of the investor’s terminal wealth in a setting, where no external pricing information is available and rewards are granted only as a result of contractual cash flows.
We observe superior performance as compared to the deep Q-network algorithm and comparable performance to “deep-hedging” methods.

Abstract

The construction of replication strategies for the pricing and hedging of derivative contracts in incomplete markets is a key problem in financial engineering. We interpret this problem as a “game with the world”, where one player (the investor) bets on what will happen and the other player (the market) decides what will happen. Inspired by the success of the Monte Carlo tree search (MCTS) in a variety of games and stochastic multiperiod planning problems, we introduce this algorithm as a method for replication in the presence of risk and market friction. Unlike model-free reinforcement learning methods (such as Q-learning), MCTS makes explicit use of an environment model. The role of this model is taken by a market simulator, which is frequently adopted even in the training of model-free methods, but its use allows MCTS to plan for the consequences of decisions prior to the execution of actions. We conduct experiments with the AlphaZero variant of MCTS on toy examples of simple market models and derivatives with simple payoff structures. We show that MCTS is capable of maximizing the utility of the investor’s terminal wealth in a setting where no external pricing information is available and rewards are granted only as a result of contractual cashflows. In this setting, we observe that MCTS has superior performance compared with the deep Q-network algorithm and comparable performance to “deep-hedging” methods.

As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.

If you would like to purchase additional rights please email info@risk.net

You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.

If you would like to purchase additional rights please email info@risk.net