Market data in the cloud

Get quick insights with Google Cloud, powered by CME Group real-time data

Privacy
Welcome
Risk manager
Infrastructure engineer
Data scientist

Risk management requires insight

Market uncertainty poses risk to any trading organization's assets and operations. Organizations that analyze, quantify and mitigate their risks find themselves better prepared for the future. Real-time market data feeds are a key component of risk management strategies, but collecting price data for an array of different instruments can be complex and costly to operate.

The CME Smart Stream real-time market data feed, delivered through Google Cloud, enables organizations to consume this data efficiently and economically. In this demo, we show how it can help risk managers, data scientists, and infrastructure engineers to operate more effectively.


E-Mini
S&P 500 (ES)
E-Mini
Nasdaq 100 (NQ)
Bitcoin (BTC)
bedtimeMarket closed
receipt CONTRACT timeline BID timeline  ASK update UPDATED
info_outline mps ... latency ...
bedtimeMarket closed
receipt CONTRACT timeline BID timeline  ASK update UPDATED
info_outline mps ... latency ...
bedtimeMarket closed
receipt CONTRACT timeline BID timeline  ASK update UPDATED
info_outline mps ... latency ...
Learn more keyboard_arrow_right


lightbulb The market data series on the Google Cloud blog goes in-depth on real-time market data visualization and creating a serverless market data pipeline.

About the data

In contrast to traditional market data distribution architectures, CME Smart Stream allows you to be much more selective about the subset of market information processed by your application. Oftentimes applications discard large numbers of messages just to extract the prices of instruments they do follow. CME Smart Stream, however, broadcasts each change over Google Cloud’s Pub/Sub messaging service and assigns each product, like corn or silver futures, to their own individual topic. Consumers can subscribe only to the symbols of interest, driving down transport, ingest, processing, compute and storage costs.

Architecture

The real-time price chart’s architecture is simple: a Google Kubernetes Engine (GKE) deployment consumes Pub/Sub messages and broadcasts them over a websockets stream, which feeds a Google Charts-based UI. The design minimizes time-to-eyeball by visualizing streamed data rather than a persisted copy. GKE provides self-healing and automatic scaling features that reduce operational toil.

Hope is not a strategy

It can be challenging to assess a firm's overall risk exposure in real-time. Pricing data delivered at low-frequency, or limited access to historical training datasets, can paint a picture of risk that is inaccurate, hampering decision-making downstream. Historically, access to real-time risk exposure has required a data operation too costly or complex for many market participants.

CME Group's real-time market data streamed on Google Cloud can help change that. Consuming CME Smart Stream quotes via Pub/Sub and joining this data with diverse data sets is both time- and cost-efficient. Positions can be marked-to-market for each tick from the exchange, simplifying the visualization of exposures in real-time.


Live prices
Open positions
Forward curve
Realized P&L
Candlesticks
E-Mini
S&P 500 (ES)
E-Mini
Nasdaq 100 (NQ)
Bitcoin (BTC)
bedtimeMarket closed
receipt CONTRACT timeline BID timeline  ASK update UPDATED
info_outline mps ... latency ...
bedtimeMarket closed
receipt CONTRACT timeline BID timeline  ASK update UPDATED
info_outline mps ... latency ...
bedtimeMarket closed
receipt CONTRACT timeline BID timeline  ASK update UPDATED
info_outline mps ... latency ...
E-Mini
S&P 500 (ES)
E-Mini
Nasdaq 100 (NQ)
Bitcoin (BTC)
bedtimeMarket closed
android TRADER swap_vert TRADE timeline PRICE timeline MARKET exposure P&L
info_outline mps... latency...
bedtimeMarket closed
android TRADER swap_vert TRADE timeline PRICE timeline MARKET exposure P&L
info_outline mps... latency...
bedtimeMarket closed
android TRADER swap_vert TRADE timeline PRICE timeline MARKET exposure P&L
info_outline mps... latency...
Crude Oil
Futures (CL)
Eurodollar
Futures (GE)
Bitcoin (BTC)
bedtimeMarket closed
info_outline mps... latency...
bedtimeMarket closed
info_outline mps... latency...
bedtimeMarket closed
info_outline mps... latency...
E-Mini
S&P 500 (ES)
E-Mini
Nasdaq 100 (NQ)
Bitcoin (BTC)
bedtimeMarket closed
android TRADER exposure P&L swap_vert TRADE gavel SETTLED
info_outline
bedtimeMarket closed
android TRADER exposure  P&L swap_vert TRADE gavel SETTLED
info_outline
bedtimeMarket closed
android TRADER exposure P&L swap_vert TRADE gavel SETTLED
info_outline
E-Mini
S&P 500 (ES)
E-Mini
Nasdaq 100 (NQ)
Bitcoin (BTC)
bedtimeMarket closed
info_outline
bedtimeMarket closed
info_outline
bedtimeMarket closed
info_outline
Learn more keyboard_arrow_right
Charts

Having real-time price data enables a snapshot of current exposures. This assists in the understanding of market behavior, and provides insights into individual trader performance.

There are several technical components involved in producing these charts: Pub/Sub for topic-based messaging, GKE for hosting of the Websocket streams powering the visualizations; Dataflow for data transformation, and BigQuery for data warehousing, calculation and generating performance metrics.

Architecture

Because this demo has multiple data use cases, it relies on multiple data solutions. Bigtable, a high-throughput, low-latency NoSQL database, is used for rapid processing of the most recently persisted time-series events. BigQuery - a serverless, petabyte-scale data warehouse - enables fast analysis of longer-horizon historical events in SQL. The Pub/Sub to Websocket bridge that feeds data to the UI runs a Google Kubernetes Engine cluster per Smart Stream topic (corresponding to a single product - like wheat futures).

BigQuery currently holds about 100 GB of data - four months of top-of-book tick data for the three instruments. CME Smart Stream enables à la carte-style selection of symbol feeds, so instead of sifting through data you don't need to get to the symbols that you do, CME Smart Stream lets developers subscribe to a single Pub/Sub topic per product to get pricing data for all delivery months.

Real-time market data architecture
Dataflow

Here is the directed-acyclic graph that is shown within Dataflow for the templated pipeline that ingests CME Smart Stream price changes in real-time to Bigtable.

DAG representation
Cloud functions

Cloud functions provide the RESTful endpoints that perform tasks like settling trades or querying BigQuery for data to visualize.

CloudFunctions RESTful endpoints

Using cloud functions allowed the team to represent the application as a series of loosely-coupled, isolated code chunks to be easily tested, debugged, and run.

Firestore was used as the transactional database for stateless, non-historical UI displays such as Open Positions. While Firestore stores only the open positions, BigQuery persists the entire position history for rapid analysis, and exposes query result sets via cloud functions to the web application.

For more information about the tools we used to build the Risk Manager section of this demo, see:


Automation shreds technical risk

To manage complexity sprawl, infrastructure engineers develop repeatable and reliable processes for management routines, but proprietary, legacy or heavily-customized operational components can interfere with this goal.

Google Cloud's approach to infrastructure management embraces the infrastructure-as-code paradigm that provides a solid foundation needed to maximize automation and minimize toil. You can instantiate infrastructure components predictably and declaratively with code that's reviewed, approved and tested before deployment.

Learn more keyboard_arrow_right
Architecture

When all the lights are green, infrastructure engineers risk becoming a team’s least appreciated members. Without an unwelcome incident, it's easy to forget the toil involved in maintaining operational availability and performance.

Wherever possible we used serverless components -- such as Cloud Functions, BigQuery and Dataflow -- to scale this demo commensurate with end-user demand and market activity. We also used fully managed services, such as Google Kubernetes Engine, for hosting our Websockets deployment. To automate production builds and repeatably provision cloud resources, we authored Terraform scripts to be used with Cloud Build.

Deploying with these safeguards helps, but systems are rarely immune from developers injecting defects or instituting inefficiencies. The infrastructure engineer’s goal is to automate and monitor the platform such that this becomes less likely and, if problems do occur, less severe. Infrastructure-as-code (IaC) component definitions and CI/CD pipelines are practices that facilitate this kind of operational methodology.

Infrastructure architecture

From a monitoring perspective, we used Cloud Monitoring to ensure the bots are behaving as intended. This includes an intervaled check of the prices for the Random and Model bots, with a more frequent check of prices for the Momentum bot, as indicated by the graphs below. Additionally, the three graphs with the line value of 1 indicate that the bots have opened a position in the last expected interval. If that were not the case, the line would be at zero and an alert would be triggered.

Infrastructure architecture
We are also monitoring the availability of the endpoints and the Istio ingress gateway. To do that, we set up monitoring using uptime checks on the following endpoints. This will trigger an alert if any of these endpoints fail their uptime check.

Infrastructure architecture

For more information about the tools we used to build this part of the demo, see:


Real-time predictive analytics is real-time risk management

Data transformation and protracted training durations present a challenge for data scientists. Developing, socializing and refining timely and explainable insights is key to the success of the data science organization.

When real-time data is distributed and consumed in Google Cloud, it seamlessly integrates with all the analytics and modeling tools available to help you train transparent, explainable models - reducing your time-to-insight.


Model performance
Model explainability
E-Mini
S&P 500 (ES)
E-Mini
Nasdaq 100 (NQ)
Bitcoin (BTC)
bedtimeMarket closed
info_outline
Learn more keyboard_arrow_right
Prediction pipeline

The demo's goal in training predictive models was not to time the market, but to illustrate the ease with which predictive models can be built, deployed, and served on Google Cloud. To create the trading algorithms, the price data we ingested into BigQuery was imported into AutoML tables, which then performed its own feature selection. The trained model was deployed to production and exposed via a RESTful API, which then returns price predictions for a one-minute time horizon.

A Dataflow job retrieves that real-time data from Bigtable, which offers low-latency retrieval of time-series data. The model predicts what the maximum price of each instrument will be one minute from the API call. The trading bot then determines, based upon the prediction and the market's prevailing price, whether to go long, short or not trade at all.

Data scientist architecture

For more information about the tools we used to build the Data Scientist section of this demo, see: