Building Freetrade

Lessons learnt with Cloud Firestore

Sam Poullain

November 20, 2020

Freetrade Head of Engineering, Invest, Tim Drew, shares how we scale our platform using Cloud Firestore

Cloud Firestore is Google’s premier NoSQL document-oriented database, it sports:


“automatic multi-region data replication, strong consistency guarantees, atomic batch operations, and real transaction support. We've designed Cloud Firestore to handle the toughest database workloads from the world's biggest apps” src


At Freetrade we get an enormous amount of value from this robust, autoscaling database. 


Particular points of note:


  • It has enough transactional support to ensure data consistency, whilst also reaping the benefits of an auto-scaling NoSQL store
  • Data changes can trigger cloud functions, allowing us to build event-based flows
  • Mobile SDKs for Android and iOS, allowing our apps to subscribe directly to changes in the database and respond immediately


For all our love of Firestore, it does come with its challenges, many of which relate to the import/export tool. Below are a collection of lessons learnt on the road to productionising our use of Firestore.

Lesson 1: Make sure your collection IDs are unique, even sub-collections

Firestore paths are made up of an alternating sequence of collection and document ids:


/<COLLECTION_ID>/<DOCUMENT_ID>/<COLLECTION_ID>…


It is very easy to make the mistake of thinking that the lower level collection IDs are “fully qualified” and isolated from others of the same name, but that’s not how it works. 


Take, for example, the following example data structures:


/snapshots/<ID>/history

/prices/<ID>/history


Here we have two history collections, with very different purposes and containing very different data. 


At first look this structure seems reasonable and the basics work fine, if you query /snapshots/<ID>/history you only get back snapshot documents, if you query /prices/<ID>/history you only get price documents. 


Makes sense, the two are totally separate, right? Wrong!


Under the hood in Firestore, these two history collections are actually one and the same. This can lead to a number of unexpected issues, for example:


  • Index rules and exceptions are set per collection ID, so if you have a specific index-exemption optimisation you want to apply, or a potentially expensive compound index you want to create, you can’t create it for just one usage of the history collection ID. It will be applied to all, not necessarily what you want


  • You can’t use the Firestore import/export tool on a fully qualified path, you can only use it on specific collection IDs. So it is impossible to backup or restore the /snapshots/<ID>/history or /prices/<ID>/history documents in isolation, you have to mix what might be very different types of data


  • The Firestore import/export tool assumes that the data for a given collection ID is homogeneous. This is an undocumented assumption. So in our example, if the /snapshots/<ID>/history collection has a small number of large documents and the /prices/<ID>/history collection has a very large number of small documents, that means bad times for whatever internal partitioning the import/export tool is using. We have personally experienced import/export times jumping from minutes to several hours, due to storing non-homogeneous data. This is a very easy mistake to make when seemingly separate collections are actually linked


The solution for all of this is simple, albeit ugly and against my preference for DRY. Your collection IDs should avoid generic names and instead should be unique per use case, like this:


/snapshots/<ID>/snapshots_history

/prices/<ID>/prices_history

Lesson 2: Delete or archive old Firestore documents

Firestore does not provide any kind of automated backup service. The closest thing they have is the import/export tool, which you have to orchestrate yourself. 


This tool allows you to take a backup of the database relatively easily, however it is important to realise that it is not a delta, it is a full backup of every document in the database and you are billed for the cost of a read on every one of those documents!


At first look, the costs of Firestore look pretty affordable (and they are) with read costs a tiny $0.036/100k document reads. As your database size increases though, these costs really start to stack up. 


Costs per backup (assuming 1 backup per day):


10M doc backup -> $3.6/day -> $109.20/month

100M doc backup -> $36/day -> $1092/month

500M doc backup -> $180/day -> $5460/month


If, like us, your risk tolerance means you need to backup your data multiple times per day and regulations mean you need to retain data for years, then the multiples can quickly make the cost of keeping everything in Firestore unsustainable, especially if your data design is biased towards lots of small documents.


As a result of the above I would strongly recommend you consider what data you need to hold onto and what you can delete outright. For historical data that needs to be retained, consider developing an archival process where you move documents to a cheaper long-term storage medium.


It has taken non-trivial effort, but by keeping our Firestore dataset to a lean record of current data, we have kept the costs very reasonable. In future we may look at rolling our own streaming backup utility, but I really hope that Google comes up with a managed service before we resort to that.


Lesson 3: Don’t use dynamic collection IDs

Some of our initial data structures involved dynamically created collection IDs. On the face of it this seemed a perfectly reasonable design decision, for example partitioning data by date:


/prices/<ID>/2020-01-01/...

/prices/<ID>/2020-01-02/…


In this example 2020-01-01 and 2020-01-02 are distinct collection IDs, with collections of documents that sit underneath them. 


This approach works seamlessly and Firestore allows you to implicitly declare collections just by creating the documents underneath them, something Google themselves highlight.


The devil however, is in the detail. While this kind of structure might seem perfectly functional when you first start using it, there are problems:


  • Custom indexes and exemptions have to be created per collection ID and there are some quite low limits on the numbers of those you can create


  • The Firestore import/export tool can’t handle databases with high hundreds or thousands of collection IDs. We saw our export/import times jump from from minutes to several hours because we crossed some some threshold, internal to the workings of their tool


We recommend you avoid any kind of dynamic collection naming and keep the number of unique collections to the 10s or low hundreds. In some cases this has meant we’ve needed to introduce superfluous intermediate collection/documents to avoid the dynamic collection IDs. For example, the following structure shifts the dynamic date ID to be on a document ID, rather than a collection ID.


/prices/<ID>/prices_by_date/2020-01-01/price_items/…

/prices/<ID>/prices_by_date/2020-01-02/price_items/…


Ugly but effective. 


You absolutely wouldn’t do this intuitively unless you knew about the issues related to dynamic collection IDs.

Lesson 4: Manage your own “missing documents”

Firestore paths are made up of an alternating sequence of collection and document IDs:


/<COLLECTION_ID>/<DOCUMENT_ID>/<COLLECTION_ID>…


But it is entirely possible and valid that your path might have “missing” documents at the intermediate levels. For example, our event-sourcing data structure looks something like this:


/clients/<CLIENT_ID>/events/<EVENT_ID>


In this example /clients/<CLIENT_ID> is a document that doesn’t actually exist, it is just an intermediate part of the path used to partition the events by client. 


This structure works fine when you’re directly addressing an individual client, but is problematic when you want to enumerate all clients. 


Regular collection queries do not include “missing” documents - the only way the SDK gives you to list them is a dedicated listDocuments method.


A key detail about listDocuments is that it does not give you any hooks for pagination, it simply returns all documents. This clearly won’t scale forever and indeed we saw this method starting to fail once our collections got into the 10ks of documents. 


This is more dangerous than it might seem at first. If your document IDs aren’t predictable, a UUID for example, then you could very easily be left with data in your database that you can’t find.


As a result of the above, we recommend either designing a data structure that doesn’t involve “missing” documents, or maintaining a separate collection of concrete documents that you can paginate through.

Conclusion

As mentioned at the outset, we at Freetrade get an enormous amount of value from Firestore, there is so much power that it provides out-of-the-box. 


As we’ve listed however, there are a bunch of gotchas that are not readily apparent when you first start out with the database. Hopefully you can save yourself some pain by learning from our mistakes.

Building Freetrade

Earn 3% interest on cash in Freetrade Plus

Brand new for Plus members

1/12/2020

|

Sam Poullain

Building Freetrade

Remote onboarding to a new job

Freetrade Software Engineering Manager Rokey Ge shares his virtual onboarding experience.

10/11/2020

|

Sam Poullain

Building Freetrade

Referring friends and earning free shares just got even easier

A big improvement to Free Share is here.

6/11/2020

|

Sam Poullain

Building Freetrade

Investment fees calculator

See how Freetrade compares to other brokers.

4/11/2020

|

Sam Poullain

Building Freetrade

Building Reactive Applications at Freetrade

Senior Software Engineer Jimmy Thompson takes you through the three layers of the Freetrade app

2/11/2020

|

Building Freetrade

600 brand new stocks, including David Beckham's Esports team

Gold miners, Twinkies, McDonald's and more.

2/11/2020

|

David Kimberley

Building Freetrade

Increasing the US order value limit

A new limit of £25,000 for US stocks.

2/11/2020

|

Sam Poullain

Building Freetrade

Brand new ETFs on Plus

Including fixed income, investment-grade and government bonds.

2/11/2020

|

Sam Poullain

Building Freetrade

How your product vision could put a human on Mars

Freetrade VP Product Duncan Leslie on vision, strategy and measuring success.

2/11/2020

|

Duncan Leslie

Building Freetrade

Stock fundamentals are now on your app

Market cap, dividend yield, and P/E ratio are here.

2/11/2020

|

David Kimberley

Building Freetrade

Introducing the time-weighted rate of return

Compare your performance against a global benchmark

2/11/2020

|

David Kimberley

Building Freetrade

It’s raining stocks: 500 new investment opportunities on your app now

2/11/2020

|

Viktor Nebehaj

Building Freetrade

The Freetrade Christmas List 2020

Everything we plan to add to your app before the holidays.

1/12/2020

|

Sam Poullain

Building Freetrade

Optimising cold-starts with Google Cloud Functions

Freetrade engineer Simon Poole talks about overcoming some serverless infrastructure challenges.

2/11/2020

|

David Kimberley

Building Freetrade

Over 450 brand new stocks

From Papa John's to Zambian cattle farmers, we've added a wide array of new stocks to the Freetrade universe

2/11/2020

|

David Kimberley

Building Freetrade

Celebrating Black History Month at Freetrade

Freetrade Talent Sourcer, Isabelle Atunrase, shares why we should all celebrate Black History Month, and some of the ways we’re getting involved here at Freetrade.

2/11/2020

|

Sam Poullain

Building Freetrade

100 new ETFs from iShares, Vanguard, VanEck, and more!

Our biggest addition of ETFs yet.

2/11/2020

|

Alex Campbell

Building Freetrade

User Story Mapping - How we keep a focus on value

Freetrade Senior Product Managers Anant Sangar and Glenn Drawbridge have been busy working on limit orders and SIPPs. Here, they chat through how use User Story Mapping.

2/11/2020

|

David Kimberley

Building Freetrade

400 brand new stocks, including 200 exclusively for Plus

More of what you want.

2/11/2020

|

Alex Campbell

Building Freetrade

Event sourcing on Freetrade

Freetrade Senior Software Engineer Luke Smith talks about the nuts and bolts of our brokerage platform

13/11/2020

|

David Kimberley

Building Freetrade

New stocks coming to your free plan and Plus subscription

Introducing your expanded stock universe.

11/11/2020

|

Alex Campbell

Building Freetrade

Money-weighted rate of return

More ways to measure your portfolio performance

2/11/2020

|

Sam Poullain

Building Freetrade

Our first Open Banking integration is rolling out

It’s now even easier to add money to your Freetrade account

2/11/2020

|

Sam Poullain

Building Freetrade

How to land a role in Product Management

Freetrade Senior Product Manager Glenn Drawbridge shares his story.

2/11/2020

|

Sam Poullain

Building Freetrade

Announcing the Tesla free share winner

2/11/2020

|

Sam Poullain

Building Freetrade

Introducing Freetrade Plus

Find out what's inside, and request your invite.

11/11/2020

|

Duncan Leslie

Building Freetrade

Brand new stocks: fashion brands, fast food, biotech innovators, and another SPAC

Over 100 new stocks, including Kodak, La-Z-Boy, Tiffany & Co, and Crocs.

2/11/2020

|

David Kimberley

Building Freetrade

Meet our new Head of People, Amy Gilman

Amy joins Freetrade as our first Head of People.

2/11/2020

|

Sam Poullain

Building Freetrade

Brand new stocks: fashion brands, gold miners, and SPACs

100 stocks inc. Avis, Tripadvisor, Goodyear, AMC Entertainment, Denny’s

2/11/2020

|

David Kimberley

Building Freetrade

Your new order experience is here

The first of many additions to order types.

2/11/2020

|

Sam Poullain

Building Freetrade

Brand new stocks: fresh IPOs, Latin American stocks, and investment trusts

75 new stocks including Ericsson, Yelp, Gfinity, Youdao

2/11/2020

|

Sam Poullain

Building Freetrade

28 brand new ETFs and 70 new stocks

Country-specific ETFs, and stocks from Wendy’s to Columbia

2/11/2020

|

Sam Poullain

Building Freetrade

Brand new: ETFs, Korean telcos, Japanese app and Chinese airlines

100 brand new stocks and ETFs are here

2/11/2020

|

Sam Poullain

Building Freetrade

Cloud computing, ETFs, UK stocks and brands from AB Inbev to Dominos

This week's 100 new stocks and ETFs might be the best batch yet.

2/11/2020

|

Sam Poullain

Building Freetrade

Brand new stocks: Banks, planes, trains and automobiles

Ferrari, Honda, Canadian Railway, United Airlines, Canadian banks, ETFs, and more

2/11/2020

|

Sam Poullain

Building Freetrade

90 brand new stocks have landed

You can now own a piece of Ed Sheeran

2/11/2020

|

Sam Poullain

Building Freetrade

Buy weed (stocks) on Freetrade

Cannabis companies have arrived

2/11/2020

|

Sam Poullain

Building Freetrade

Perry Blacher, serial entrepreneur turned VC, is joining Freetrade’s board

The former entrepreneur will be Freetrade’s non-executive director

2/11/2020

|

David Kimberley

Building Freetrade

Meet the team: Renata Labude, Senior Growth Manager

Find out more about how Free Share works

2/11/2020

|

David Kimberley

Building Freetrade

Introducing more stocks on Freetrade

250 new US stocks have landed

2/11/2020

|

Viktor Nebehaj

Building Freetrade

Fractional shares: the rollout has started

You can now invest in a slice of US companies

2/11/2020

|

Viktor Nebehaj

Building Freetrade

Meet the team: Caitlin Rich, Principal Product Designer

Meet the person responsible for making Freetrade look cool

2/11/2020

|

David Kimberley

Sign up for our newsletter

Download the app and start
investing now.