Building Freetrade

Data Infrastructure at Freetrade

Benen Cahill

January 12, 2021

Benen Cahill

Principal Data Engineer Benen Cahill shares our process for data here at Freetrade.

Introduction


This post is intended as an introduction into how our Data function operates here at Freetrade.

Today, we’ll start with a whistle-stop tour of our infrastructure and attempt to shed some light on what goes into telling the story of Freetrade for our management, investors and customers alike. 


At the heart of everything we do is our Data Warehouse. We use it to track our growth, measure our KPIs, produce insights for our product and marketing teams and to assist the rest of the engineering team in their development activities.

And as we now have over 300,000 customers, that’s a lot of data.

In the following sections you’ll see how we leverage serverless infrastructure and managed solutions like BigQuery, Cloud Functions, Terraform and Looker to develop a Data Warehouse that can scale to match the phenomenal growth of Freetrade. 


BigQuery


At the heart of everything that we do right now is BigQuery. BigQuery is a managed Data Warehouse solution provided by Google. Like the rest of our techstack, it’s a serverless platform. It allows you to run SQL queries, executed through the BigQuery UI or through one of their many support APIs.


We’re not going into much detail on BigQuery other than to say it’s awesome.

You can churn through terabytes of data in minutes. Queries that in previous professional lives would require spinning up a dedicated Hadoop cluster, or through cloning a relational database instance, can be answered in a few minutes after drafting a query in your web browser. Since its launch it has been a real game changer for Data Warehousing teams across the world.   


As such, BigQuery is our primary repository for data. Almost all the data that we produce as a company gets ingested into it and currently, in Data, all of our data products are driven off the back of queries to the BigQuery API. That may change in the future, but so far BigQuery has addressed most of the requirements we have had as a rapidly growing company. 


The difficulty of course with any Data Warehouse solution is not the Data Warehouse itself, but rather in all the ancillary work that goes into ingesting and then surfacing all the fruits of your data modelling and analysis to stakeholders.    

Ingestion


Our backend infrastructure, built largely on top of Firebase and Google Cloud Functions, has two primary data stores in use, Cloud Firestore and Firebase Realtime Database.

Both of these data stores are document oriented databases and so not very suited to the performance of large scale aggregation and analytics. So the first step in being able to measure our success here at Freetrade is in ingesting the data produced by our applications into a datastore more suited for analysis. 

Cloud Functions


You might have seen the previous posts discussing our serverless infrastructure for our Apps Backend and our Invest Platform. For the Data Warehouse, we leverage the same approach when it comes to the ingestion of data.  

To do this, we take advantage of the Cloud Firestore Triggers and Firebase Realtime Database Triggers APIs. For every document type in our databases we create an instance of a Cloud Function whose purpose is to replicate the current state of that document into our data warehouse whenever it changes. This replication is performed using the BigQuery REST API.

And that’s it. There’s been a new document created? Write it to BigQuery. There’s been a change to the document? Write it again. It’s a simple approach that gives us flexibility. Depending on our analysis needs, we can either refer to the latest version of the document or rebuild its entire history. The only downside is the additional cost of storage.

These functions get deployed and maintained in the same manner as the rest of our cloud functions, making them easy for anyone across the engineering function to maintain or enhance.

On occasion, we’ve also had to ingest data from a third party via a REST API or similar.

When this happens, we largely take the same approach of writing a dedicated Cloud Function or two. The only difference is:
 

  • Rather than writing that function in Typescript, we’ll author it in Python to take advantage of the DataFrames BigQuery API for streaming data into BigQuery.
  • We’ll generally trigger those functions on a schedule; for which we’ll implement them as HTTP functions and trigger them with Cloud Scheduler.   

Natively Supported BigQuery Services


Of course, we rely on other sources of data than just our backend databases, and for that we make heavy use of BigQuery’s Data Transfer Service to ingest Data from Cloud Storage or Amazon’s S3. This makes the ingestion of flat data like CSVs or Avro files simply a matter of configuration. 


We also make use of Scheduled Queries. This allows us to ingest things like data from external databases (like Postgres) using BigQuery’s Ferated Queries functionality, or from other Google data sources like Sheets which we ingest to facilitate KPI reporting on our business processes.  


The advantage of using BigQuery’s native ingestion is that it can be very quickly and easily configured through the BigQuery UI. However, this obviously isn’t a scalable approach, which is why most of our ingestion activities are configured through Terraform, which we’ll discuss in more detail later in the post. 

Modelling

Here at Freetrade we follow the ELT pattern for integrating Data in our Warehouse. We apply most of our data transformations and enrichment directly in BigQuery. Another post will detail our approach to how we model and transform our data; for now let’s focus on the infrastructure and tooling. And that’s DBT. 

DBT

DBT (Data Build Tool) was originally a tool to manage the transformation of data in the Data Warehouse. What it essentially provides is a way to write code (using a mix of jinja and SQL) against your data models and then a mechanism to compile and execute said code against your database.


We’re great fans of DBT here at Freetrade. Most of our data models are specified in DBT; we make heavy use in particular of its functionality for partitioning in BigQuery and incremental models. We leverage jinja to paginate queries when using federated queries so we don’t overwhelm the external data source. We leverage DBT Cloud to schedule and execute our transformation jobs.

And we leverage DBT Cloud again for our alerting through its Slack integration. 

And when we're not on Slack, we're checking out the data in person

Visualisation

Often the quickest and easiest way to surface the answer to specific questions of our Data Warehouse is through BigQuery’s integration with Google Sheets, and in general it’s something we make extensive use of here at Freetrade, especially when it comes to supporting our engineers and operations teams to resolve specific issues. 


But in general, when it comes to surfacing data to our stakeholders, we take one of two approaches: Data Studio or DBT. 

Data Studio

Data studio is quick and easy to get up and running with. Specify a table in BigQuery as a data source then quickly start putting together a report with filters and charts all through their WYSIWYG editor.

We tend to leverage Data Studio to put together our basic operational dashboards; searching particular clients, finding a certain invoice; that type of thing. Generally speaking, when a new feature launches at Freetrade, in as little as a few minutes we can have a new dashboard up and running enabling our operations team to support it.

Data Studio is not without its limitations however. Its ease of use can often be its own undoing; when the underlying models change it can be hard to understand the impact it might have on our dashboarding and large aggregations spanning multiple datasets can be time-consuming to implement. That’s when a more programmatic approach becomes beneficial, and for that we use Looker. 

Looker

Looker is used for much higher level data aggregations. What’s the average value of our customer’s portfolios? How many signed up to Plus in the past month? Looker is fantastic at answering those kinds of questions and indeed most of our Business Intelligence activities are facilitated by Looker.


That’s largely implemented through the power of LookML; we make extensive use of LookML both to abstract our data models in BigQuery and also to author in code our business critical dashboards. Its validation tool is extremely useful in helping us to understand the impact of changes to our data models on our stakeholders in the context of how they consume the data. 

Tying it all Together

Terraform 

Terraform is used to manage all our infrastructure in Data currently. It’s used to deploy and schedule the invocation of our cloud functions, specify our scheduled queries, manage our secrets, define our data transfer jobs from Cloud Storage and S3, and even to define certain schemas that are used for ingestion into the data warehouse. 


Adopting Terraform has been a game changer for us at Freetrade. In the early days of the Warehouse, our focus on speed and agility meant that we would configure many ingestions and transformations directly through the BigQuery UI. But overtime, those manual configurations became difficult to maintain and search for. 


Now, all of our infrastructure is specified in one place and rolling out a new environment for testing purposes is a matter of executing a handful of commands at the CLI. In fact, once you become familiar with Terraform, and have codified your common patterns in modules, configuring a new ingestion or transformation job becomes much quicker than fiddling with all those dropdowns and text inputs in the UI. You’ll never look back.  

Final Thoughts

This has been a whistle stop tour of Data’s infrastructure and tooling at Freetrade currently. We can’t say it’s been an entirely smooth journey to arrive at this point; there were some hard lessons learned along the way but the flexibility and scalability afforded by our technology choices have made the journey to this point much easier than it might otherwise have been. 


Future posts will go into more detail on the approaches we take to modelling that data and how we organise the data function at Freetrade. But hopefully this has given you an indication of the exciting work we do here in the Data function at Freetrade. 


P.S. did we mention we're hiring?


Building Freetrade

Using lint rules to prevent bugs

Principal Software Engineer, Alex Curran, shares more about our process to prevent bugs as we build our mobile apps.

22/1/2021

|

Alex Curran

Building Freetrade

Nine books every product designer should read

Principal Product Designer Caitlin Rich shares her top reads to help inspire good design and keep user experience front of mind.

14/1/2021

|

Caitlin Rich

Building Freetrade

120 US real-estate investment trusts are here

Including some big name S&P 500 companies

21/12/2020

|

Sam Poullain

Building Freetrade

The road to 500 serverless functions

Software Engineer Theo Gregory shares how we use serverless to speed up deployment.

17/12/2020

|

Theodore Gregory

Building Freetrade

Changing an industry with the help of a 250,000 person product team

Senior Product Manager, Jani Kiilunen, shares how we work with the Freetrade community to build the product

17/12/2020

|

Jani Kiilunen

Building Freetrade

A stock-ing filler: massive market cap ADRs, biotechs, and ETFs

Including SAP, Credit Suisse, Ryanair, Trivago

10/12/2020

|

Sam Poullain

Building Freetrade

Earn 3% interest on cash in Freetrade Plus

Brand new for Plus members

6/1/2021

|

Sam Poullain

Building Freetrade

Lessons learnt with Cloud Firestore

Freetrade Head of Engineering, Invest, Tim Drew, shares how we scale our platform using Cloud Firestore

21/12/2020

|

Timothy Drew

Building Freetrade

Remote onboarding to a new job

Freetrade Software Engineering Manager Rokey Ge shares his virtual onboarding experience.

10/12/2020

|

Sam Poullain

Building Freetrade

Referring friends and earning free shares just got even easier

A big improvement to Free Share is here.

10/12/2020

|

Sam Poullain

Building Freetrade

Investment fees calculator

See how Freetrade compares to other brokers.

10/12/2020

|

Sam Poullain

Building Freetrade

Building Reactive Applications at Freetrade

Senior Software Engineer Jimmy Thompson takes you through the three layers of the Freetrade app

10/12/2020

|

Building Freetrade

600 brand new stocks, including David Beckham's Esports team

Gold miners, Twinkies, McDonald's and more.

10/12/2020

|

David Kimberley

Building Freetrade

Increasing the US order value limit

A new limit of £25,000 for US stocks.

10/12/2020

|

Sam Poullain

Building Freetrade

Brand new ETFs on Plus

Including fixed income, investment-grade and government bonds.

10/12/2020

|

Sam Poullain

Building Freetrade

How your product vision could put a human on Mars

Freetrade VP Product Duncan Leslie on vision, strategy and measuring success.

10/12/2020

|

Duncan Leslie

Building Freetrade

Stock fundamentals are now on your app

Market cap, dividend yield, and P/E ratio are here.

10/12/2020

|

David Kimberley

Building Freetrade

Introducing the time-weighted rate of return

Compare your performance against a global benchmark

10/12/2020

|

David Kimberley

Building Freetrade

It’s raining stocks: 500 new investment opportunities on your app now

10/12/2020

|

Viktor Nebehaj

Building Freetrade

The Freetrade Christmas List 2020

Everything we plan to add to your app before the holidays.

10/12/2020

|

Sam Poullain

Building Freetrade

Optimising cold-starts with Google Cloud Functions

Freetrade engineer Simon Poole talks about overcoming some serverless infrastructure challenges.

10/12/2020

|

David Kimberley

Building Freetrade

Over 450 brand new stocks

From Papa John's to Zambian cattle farmers, we've added a wide array of new stocks to the Freetrade universe

10/12/2020

|

David Kimberley

Building Freetrade

Celebrating Black History Month at Freetrade

Freetrade Talent Sourcer, Isabelle Atunrase, shares why we should all celebrate Black History Month, and some of the ways we’re getting involved here at Freetrade.

10/12/2020

|

Sam Poullain

Building Freetrade

100 new ETFs from iShares, Vanguard, VanEck, and more!

Our biggest addition of ETFs yet.

10/12/2020

|

Alex Campbell

Building Freetrade

User Story Mapping - How we keep a focus on value

Freetrade Senior Product Managers Anant Sangar and Glenn Drawbridge have been busy working on limit orders and SIPPs. Here, they chat through how use User Story Mapping.

10/12/2020

|

David Kimberley

Building Freetrade

400 brand new stocks, including 200 exclusively for Plus

More of what you want.

10/12/2020

|

Alex Campbell

Building Freetrade

Event sourcing on Freetrade

Freetrade Senior Software Engineer Luke Smith talks about the nuts and bolts of our brokerage platform

10/12/2020

|

David Kimberley

Building Freetrade

New stocks coming to your free plan and Plus subscription

Introducing your expanded stock universe.

10/12/2020

|

Alex Campbell

Building Freetrade

Money-weighted rate of return

More ways to measure your portfolio performance

10/12/2020

|

Sam Poullain

Building Freetrade

Our first Open Banking integration is rolling out

It’s now even easier to add money to your Freetrade account

10/12/2020

|

Sam Poullain

Building Freetrade

How to land a role in Product Management

Freetrade Senior Product Manager Glenn Drawbridge shares his story.

10/12/2020

|

Sam Poullain

Building Freetrade

Announcing the Tesla free share winner

10/12/2020

|

Sam Poullain

Building Freetrade

Introducing Freetrade Plus

Find out what's inside, and request your invite.

10/12/2020

|

Duncan Leslie

Building Freetrade

Brand new stocks: fashion brands, fast food, biotech innovators, and another SPAC

Over 100 new stocks, including Kodak, La-Z-Boy, Tiffany & Co, and Crocs.

10/12/2020

|

David Kimberley

Building Freetrade

Meet our new Head of People, Amy Gilman

Amy joins Freetrade as our first Head of People.

10/12/2020

|

Sam Poullain

Building Freetrade

Brand new stocks: fashion brands, gold miners, and SPACs

100 stocks inc. Avis, Tripadvisor, Goodyear, AMC Entertainment, Denny’s

10/12/2020

|

David Kimberley

Building Freetrade

Your new order experience is here

The first of many additions to order types.

10/12/2020

|

Sam Poullain

Building Freetrade

Brand new stocks: fresh IPOs, Latin American stocks, and investment trusts

75 new stocks including Ericsson, Yelp, Gfinity, Youdao

10/12/2020

|

Sam Poullain

Building Freetrade

28 brand new ETFs and 70 new stocks

Country-specific ETFs, and stocks from Wendy’s to Columbia

10/12/2020

|

Sam Poullain

Building Freetrade

Brand new: ETFs, Korean telcos, Japanese app and Chinese airlines

100 brand new stocks and ETFs are here

10/12/2020

|

Sam Poullain

Building Freetrade

Cloud computing, ETFs, UK stocks and brands from AB Inbev to Dominos

This week's 100 new stocks and ETFs might be the best batch yet.

10/12/2020

|

Sam Poullain

Building Freetrade

Brand new stocks: Banks, planes, trains and automobiles

Ferrari, Honda, Canadian Railway, United Airlines, Canadian banks, ETFs, and more

10/12/2020

|

Sam Poullain

Building Freetrade

90 brand new stocks have landed

You can now own a piece of Ed Sheeran

10/12/2020

|

Sam Poullain

Building Freetrade

Buy weed (stocks) on Freetrade

Cannabis companies have arrived

10/12/2020

|

Sam Poullain

Building Freetrade

Perry Blacher, serial entrepreneur turned VC, is joining Freetrade’s board

The former entrepreneur will be Freetrade’s non-executive director

10/12/2020

|

David Kimberley

Building Freetrade

Meet the team: Renata Labude, Senior Growth Manager

Find out more about how Free Share works

10/12/2020

|

David Kimberley

Building Freetrade

Introducing more stocks on Freetrade

250 new US stocks have landed

10/12/2020

|

Viktor Nebehaj

Building Freetrade

Fractional shares: the rollout has started

You can now invest in a slice of US companies

10/12/2020

|

Viktor Nebehaj

Building Freetrade

Meet the team: Caitlin Rich, Principal Product Designer

Meet the person responsible for making Freetrade look cool

10/12/2020

|

David Kimberley

Thank you! Please check for your confirmation email.
Oops! Something went wrong while submitting the form.

Join the 25,000+ investors getting our take on the markets

Almost there! Please check your inbox to confirm subscription

Sign up for our newsletter

Download the app and start
investing now.