We describe here how GPU technologies can improve efficiency and throughput in financial instruments analysis. We present a number of benchmarks specific to financial analysis in order to demonstrate the tremendous advantage of porting trading algorithms to the GPU platform.

The following benchmarks prove that algorithms written for the GPU:

  • Improve application performance from 10 to 700 fold.
  • Allow to Quickly build and deliver Massively Parallel Applications.
  • Allow to Leverage existing grid infrastructure by simply adding low cost graphic hardware to the units.
  • Applications built to run on GPU hardware have virtually no limits in their scalability.

Speedup for Option Pricing

Black Scholes Option Pricing



Speedup function of the number of Options



The horizontal axis gives the total number of pricing problems to be solved. The vertical axis represents the speedup factor between CPU and GPU computation times.

The GPU significantly outperforms the CPU, when enough options are priced to allow for high resource utilization, and the constant costs of GPU buffer management to be amortized.

The slight decrease in performances of the GPU software after around 10 Millions options, is due to the fact that the whole data set is loaded at once to the GPU, reaching its memory bandwidth limits. The production version of this software considers those limits and streamline the memory transfers to maintain the performances at the optimal level.

Options per Second function of the number of Options



The horizontal axis gives the total number of pricing problems to be solved. The vertical axis represents the number of options computed per second. This graph presents a curve that has a similar shape of the preceding.

Binomial Option Pricing

Another family of option pricing models is Lattice models. These models use a dynamic programming approach to derive the value of an option at time 0 (now) by starting at time T (expiration date) and iteratively stepping "backward" toward in a discrete number of time steps (N). This approach is versatile and simple to implement, but it can be computationally expensive due to its iterative nature. The binomial lattice model presented here can be used to compute both European and American options.

Speedup function of the number of Time Steps



The horizontal axis gives the number of steps computed between "now" and the option expiry date. The vertical axis gives the speedup between the CPU and GPU code. In this analysis, one can measure the real impact of the GPU acceleration, when the code is written to handle recursive iterative computations.

Experimental Setup

The same pricing modules have been coded in C++, for the CPU version and in CUDA language (NVIDIA's "Compute Unified Device Architecture") for the GPU version.

The C++ version implements double precision floating numbers, the CUDA version implements single precision floating numbers. The L1 Norm of the difference between CPU and GPU results is monitored for every option calculation, in order to control the precision.

Hardware

CPU - AMD Athlon 64 X2 Dual Core Processor 4200+, 2.21 GHz, 2Gb of RAM

GPU - NVIDIA Ge Force 8800 GT, 1.5 GHz, 1Gb of RAM

Software

Microsoft Windows NT 5.1.2600 Service Pack 3

Microsoft Visual Studio 2005

NVIDIA CUDA 2.0 Beta 2 Driver v177.35

NVIDIA CUDA 2.0 Beta 2 SDK