Virtu Financial enables its customers to apply advanced analytics and machine learning on trade and market data by provisioning Amazon SageMaker

This is a guest post by Erin Stanton, who currently runs the Global Client Support organization for Virtu Analytics.

Virtu Financial is a leading provider of financial services and products that uses cutting-edge technology to deliver liquidity to the global markets and innovative, transparent trading solutions to its clients. Virtu uses its global market-making expertise and infrastructure to provide a robust product suite including offerings in execution, liquidity sourcing, analytics, and broker-neutral, multi-dealer platforms in workflow technology. Virtu’s product offerings allow clients to trade on hundreds of venues across over 50 countries and in multiple asset classes, including global equities, ETFs, foreign exchange, futures, fixed income, and myriad other commodities. In addition, Virtu’s integrated, multi-asset analytics platform provides a range of pre- and post-trade services, data products, and compliance tools that clients rely on to invest, trade, and manage risk across global markets.

In this post, we discuss how Virtu enables its customers to apply advanced analytics and machine learning (ML) on trade and market data by provisioning Amazon SageMaker.

An overview of asset manager workflow

Asset managers run funds, which invest in a wide range of securities from equities, to commodities, to FX. Let’s start by describing a typical workflow inside such an asset manager. The lifecycle of a trade begins with a portfolio manager deciding to buy or sell a security for the fund they manage. The portfolio manager inputs this decision into their order management system (OMS). The order undergoes various risk controls and passes to a trading desk inside the firm.

The trading desk is then faced with a decision on how to execute this order. The sorts of factors the desk needs to take into account include time sensitivity, size of the order, liquidity of the security, is it listed on a venue or trades over the counter, can it be crossed internally, which broker supports the security or venue, and so on. Typically, this process results in the order being entered into an Execution Management System (EMS). The EMS then has the choice to submit the order directly to a venue (direct market access), a high-touch broker, or a given broker’s algorithm.

Alternatively, the buy-side trader may submit the order to an Algo Wheel. An Algo Wheel is simply an automated approach to allocating trades across a client-defined selection of broker algorithms within the customer’s EMS based on the customer’s pre-set weightings. Such broker/algorithm selection is data-driven by performance. The recent surge in popularity in Algo Wheels has been prompted by the best execution requirements of MiFID II and increased scrutiny in many buy-side firms on which brokers are used and why.

In all cases, the aim of the process is for the broker algo to obtain the best price for the customer subject to the selected strategy or other order constraints. For customers, being able to get the best price is significant, with a rough calculation showing the magnitude of this. If an an asset manager, with $1 trillion under management and 20% turnover of their portfolio each year, is able to save 5 bps in execution costs, the savings will be $100 million. This is then passed onto the asset manager’s own customers.

After the order has been filled at the best possible price, this information needs to be fed back, in real time, to the asset manager via the EMS. Such real-time information on the execution process is required in order to be able to manage risk, price funds, and other operational reasons. In addition to real-time data, the asset manager needs historical data about the execution process to help decide how to route future orders, to help with modeling the cost of trading and to meet regulatory requirements.

In summary, a persistent and increased focus on best execution and trading analytics due to regulatory and competitive pressures, as well as a general push to workflow automation, has resulted in increased usage of Algo Wheels and the subsequent analysis of that data.

The Virtu Analytics Client Coverage team

At Virtu, the broker-neutral Virtu Analytics Client Coverage team receives execution data directly from its customers’ OMSs and EMSs. The data may include data generated by Virtu’s broker-dealer subsidiaries in its role as the customer’s broker and data generated by the customer’s other brokers.

Getting historical execution data back to the customer is important for Virtu’s Analytics team to be able to quantitatively demonstrate the value of Virtu’s own Algo Wheel offering. The problem statement for Virtu was how best to package the historical data in a way that the customer could derive value from it.

Virtu began by providing its Algo Wheel customers access to online visualizations via its Portal platform, which facilitated the comparative analysis of broker performance. While the interactive front end meets most needs, some clients wanted to apply additional customization to both the analytics framework and the broker reporting.

In response, Virtu developed a shared ML environment for their customers where customers can access their execution data integrated with additional market data metrics, fed by APIs and accessed through an interface that supports Jupyter notebooks. In such an environment, the customer can apply custom metrics and analysis that is specific to their unique investment and trading objectives, such as performance and reporting metrics. These metrics have evolved over the years from a simple reporting of benchmark performance to providing more complex metric context around order difficulty and market conditions. The most recent iterations include distributions of performance, normalizing samples, and controlling for outliers. Having API access to features generated from Virtu’s Algo Wheel execution data enables users to customize the data and integrate it into other trade platforms and decision-making applications. With screen share technology, Virtu’s Analytics team can guide the customer as they explore their data and learn the system in order to query their data.

Solution overview

In response to customer feedback, Virtu’s Analytics Client Coverage team launched an Open Python platform supported by SageMaker. This enables Virtu customers to log on and explore their execution data flexibly―with or without a screen share from Virtu. Customers also wanted to mine their data and extract meaningful insights, so Virtu developed a Python API that is exposed through the environment. Considerations for the implementation included security, scalability, resilience, and usability. In terms of security, Virtu needed to make sure that any solution was designed such that proprietary data was only accessible by the customer and even particular users within a customer’s internal group.

Let’s briefly review the core architecture of the solution. SageMaker instances are deployed in the private subnet of a VPC. The VPC uses three different Availability Zones. Egress traffic routes via NAT Gateway, which enables Virtu to limit API calls to predesignated IP addresses. Because the SageMaker instances are deployed in a private subnet, AWS PrivateLink is used. AWS PrivateLink enables connection directly to the SageMaker API or to the SageMaker Runtime through an interface endpoint in the VPC, instead of connecting over the internet. The VPC interface endpoint connects the VPC directly to the SageMaker API or Runtime without an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection. The instances in the VPC don’t need public IP addresses to communicate with the SageMaker API or Runtime. The following is a schematic of the architectural solution.

Virtu selected SageMaker for four key reasons:

Ease of setup – Virtu wanted to be self-sufficient in terms of setup and deployment. Virtu is able to create a new client environment in less than an hour. Because setup is quick, there is less concern about customer churn.
Cost – Virtu required a low-cost solution with easy control of compute resources.
Shared code base – The ability to provide customers with sample code to get them started.
Customer support – Because many customers were new to programming, there was a need to help them troubleshoot syntax issues or work with Virtu APIs. SageMaker enables the Analytics Client Coverage team to log directly into a customer’s environment, make a few adjustments, and message them when done so that they can continue with their analysis.

In summary, SageMaker has solved the customer-driven programming environment requirements and allowed the Virtu Analytics Client Coverage team to expand into customer-based ML solutions.

Customer use case example

In this section, we run through a use case with an example customer. As a customer of Virtu, Schroders was one of the first adopters of the new platform.

“As we continue to optimize our trading with rigorous analysis of our trade executions, AWS has provided us with a platform and the tools that enhance our speed and ability to handle and manipulate large amounts of our Virtu trade data,” says Will Lishman, Head of Trading, Americas, at Schroders Investment Management. “Python is proving to be a powerful tool for us to assimilate this data, investigate hypotheses, and ultimately further systematize our trading, creating more efficient trading desks and allowing traders to spend more time focusing on the more nuanced asymmetrical risk trades where they can add greater value.”

In the following screenshot, we show the SageMaker environment provisioned by Virtu to its customers. The first step of analyzing Virtu’s Algo Wheel data involves pulling data from Virtu’s Open Technology API platform. The Virtu Analytics Client Coverage team typically helps set up the first API payload, which includes date range filtering for the trade date and other relevant filters such as region and asset class. Key metrics are then pulled from the API, including the ticker of the stock, the algo strategy that was used, the executing broker of the algo, and resulting performance. In this example, performance is measured using an implementation shortfall (IS) benchmark, which compares the price at the time the broker receives the order vs. the average execution price that they achieve while executing the order in the stock market. Data is returned as JSON and converted to a Pandas DataFrame for easy manipulation.

Because providing feedback to brokers directly is important, the client wants to be able to anonymize the broker names and send high-level statistics on performance to all brokers. A unique, anonymous, custom broker name is created, and all subsequent reporting includes the anonymized broker names. When the client sends out the results to each broker, they highlight their anonymized number directly, and they can see where they perform well and where they need to improve. Brokers are incentivized to continually improve, because the basis for flow routing is performance.

Liquidity is an important factor in algo performance, so the next common step is to add a custom tag that allows for grouping of orders by percentage of liquidity in the market. Some brokers are better than others at executing larger orders, so based on this information, the trading desk can decide to route orders based on liquidity.

After the custom tag is added, the charting library Altair is used to quickly visualize performance. A positive value against the IS benchmark is good performance, whereas a negative value is a loss against the benchmark. The following chart shows that Anonymized Broker 3 is very good at executing orders that are greater than 7% of daily volume, but not as good at very small orders or less than 0.5% of daily volume. Anonymized Broker 3 should be receiving more flow when the order is large but less when the order is small.

The next step in the analysis involves reviewing outlier trades. Virtu Analytics typically uses a weighted average of trading cost performance to compare brokers (with better brokers receiving more trading flow), but weighted averages can hide wider distributions. In the case of Virtu’s Algo Wheel routing, a more certain, tighter distribution for one broker is often preferred over slightly better average performance where high cost outlier trades may occur.

Although initial Virtu Algo Wheel analysis included all trades, outlier filtering can provide a more uniform comparison across brokers. Multiple approaches are used, for example a z-score approach. Through the SageMaker environment, the buy-side can experiment with the impact of different outlier exclusion methodologies, selecting the one that fits their workflow and data the best.

From the example below, the buy-side client opted to exclude any trade where the z-score was ±3. Comparing brokers across the initial IS (bps) metric, Anonymized Broker 2 had the most favorable performance (most positive number) but once outliers are excluded, Anonymized Broker 3 had the best performance. Anonymized Broker 2 had a few very positive outliers that skewed their overall performance up. Excluding a small amount of trading due to outliers changes which broker should receive more order flow.

The combination of the SageMaker environment and Virtu’s Open Technology APIs allow the buy-side to experiment with different bucketing and order exclusions. After a methodology is decided on, results can be quickly updated on a month or quarterly basis.

Virtu’s customers are extremely keen to incorporate ML into their working practices. However, they face two key problems. First, many customers lack in-house ML expertise. Second, the Virtu Analytics team doesn’t have insight into a client’s proprietary investment process and as such can’t assist the customer.

SageMaker has solved both these problems. Virtu uses the co-coding environment in SageMaker to assign an analytics analyst to code an initial model and then, through discussions with the customer, continue to fine tune it until an optimal solution is created. This also provides full transparency of all the data normalization, feature selection, and model tuning for both Virtu and the customer. The sorts of questions asked include: What data was left out? What are the known biases? Are the assumptions correct? The collaborative SageMaker environment provides customers and Virtu analysts dual access to the same data and models.

In the following screenshot, the customer uses k-means to identify stocks with similar characteristics from a trading perspective. In the past, a trader might consider a new ticker on their blotter that is a small cap to be similar to other small cap securities they have traded. Using the k-means algorithm allows the machine to find similarities in stocks, which can then be analyzed holistically from a trading strategy optimization approach.

Conclusion

SageMaker has surpassed Virtu’s client programming and ML needs in a scalable, cost-efficient manner. Virtu is excited for the continued feature rollout and plans to directly use more of the natural language processing resources later this year. To learn more about Amazon SageMaker, visit the webpage.

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post

About the Authors

Erin Stanton has 15+ years of experience in big data and data science, and currently runs the Global Client Support organization for Virtu Analytics. Erin is known for her big energy, which she brings to everything she does, and has been more recently championing potential machine learning and AI techniques to answer client questions. Erin received her undergraduate degree in Computer Science and Economics from Lehigh University and more recently completed her master’s in Information and Data Science from UC Berkeley

Hugh Christensen is an AWS Principal Analytics Specialist working in Global Financial Services. Prior to joining AWS, Hugh worked between academia and industry. The former in the Engineering Department at the University of Cambridge, and the latter in various roles in algorithmic trading and data analytics solutions for financial exchanges and trading houses.

Muhammad Mansoor is an AWS Solutions Architect and has designed, built and deployed scalable and resilient cloud architectures. He advises customers on their AWS adoption and provides strategic recommendations. Muhammad has deep passion for AI/ML, Cloud Security, Containers, DevOps, DevSecOps and SRE.