The evolution of the Codat tech stack

Engineering Wed 27 Oct 2021

Under the hood of Codat with Jason Dryhurst-Smith, Head of Engineering.

At Codat, our philosophy of keeping things simple and relying on ‘boring technology’ has meant we haven’t had to make many revisions to our tech stack since setting up shop. But, that’s not to say that we haven’t made some changes along the way in order to deepen our sophistication. We’ve outlined a number of our key learnings below:

2016 – The Start 🎬

Using the Web Apps and WebJobs services from Azure, along with SQL, and Azure Storage, Codat was born. The bulk of the machinery was situated in a monolithic service called The Data API.

This service had a lot to do, it governed the mechanics of the system and housed the query engine for the standardized data. All of the third parties that we integrated with got their own set of services for doing things like authorization, fetching from the third party API and then mapping that data to the Codat model.

This pattern that was set on day one for integrations is still the pattern we use now, it means that any team adding features to an integration can work on it behind a standardized interface, and scale that system independently in line with usage.

2017 – Mitosis ➗

As we gained more clients, we started to see various bottlenecks in the performance of the system and in the team’s ability to work on one large codebase.

This meant splitting The Data API into a few pieces that we could work on and scale the infrastructure for, independently. The biggest split was between the parts of the service that dealt with data Codat owned and needed to operate the system (we call this metadata) and the parts of the service that dealt with the standardized data (we call this contributed data). This led to the introduction of the clients API, along with a front-end API that handled user management and authorization – we had a distributed monolith.

At this point, the SQL database that housed all the contributed data also gained a set of serverless functions that rebuild indexes nightly, if you read ahead, you will understand that this is foreshadowing.

2018 – Push and Refactor 📈

The team started to grow rapidly in 2018, enabling us to push the product out on more fronts.

Over the course of the year, there were a huge number of internal improvements made to the internal libraries and SDKs that our engineers use every day.

We also started to refine our own high-performance .NET RPC over HTTP stack called ServiceClient. The SDK used by the integrations to handle the large volume of data needing to be mapped to the Codat standard was upgraded, to reduce its memory footprint – working mostly with JSON in .NET there can be a lot of string allocations if you’re not careful.

In the same year we started building our prototype Sync product. Despite being built from the tools we already had available to us, it involved more whiteboards and head scratching than new tech.

Throughout the early part of this year, it became increasingly obvious that our SQL schema for the contributed data was no longer fit for purpose. We had made all the improvements we could using the patchwork of Entity Framework and SQLBulkCopy and hand-rolled hashing for getting data into the database. A longer term solution was required.

2019 – A Reckoning 🕑

After a lot of thought and review, we decided that maintaining the schema for contributed data twice, in both C# classes and in SQL tables, was a waste of engineering time. Rebuilding and maintaining all the indexes necessary to get good query performance out of a dynamic query engine was also really time consuming and expensive. So, we chose Cosmos DB as the new persistence tool of choice for the contributed data stack. Our plan was to run the two systems in parallel and build it. How long could it take, right?

It took the best part of the year and we encountered a number of hurdles along the way with the technology never quite behaving as we had expected.

The final straw came after running a set of monitoring tests while paging through large volumes of data, it seemed that the paging engine was loading all the data, and sliding a window over it to return the right page! This meant that asking for page 5 of a query, fetched pages 1, 2, 3, and 4 before returning you the data. This was slow, and since Cosmos is priced in requests, very expensive. It was obvious, despite the sunk cost, this was not the technology for us.

What followed was a serious review of the bottlenecks in our data processing pipeline and its schema. This resulted in a few changes such as turning updates in the data cache into stored procs using merge statements and we started storing the row hashes rather than calculating them in .NET.

What we ended up with was a much more mature application of the technology we were used to operating, and a much better understanding of the pricing. Never has a project highlighted more clearly to Codat the benefits of choosing ‘boring technology’.

2020 – The Conscious Decoupling 🚀

To really accelerate our ability to rapidly change and continue to grow the product faster by adding more engineering resources, we had to start thinking seriously about any point where two or more domains had to be changed in order to release a new feature. This coupling incurred a cost that was a waste. While our culture has never been one of efficiency above all else, there were a lot of changes that required stop-the-world type activities to release new versions of shared contracts and this is prone to error and incurs downtime for our users.

The biggest area of coupling was the interaction between controlling systems (such as workflow orchestrators) and the integration services. We decided to enforce a maxim that all services should publish events about their capability to any consumer and the consumer would use this to control their interaction with that service. For integrations this was simply an event that told the world exactly what the integration was capable of. This information is also really useful for developers that are building against our API. Now any team can build an integration using our integrations SDK and release it, and it will tell the rest of the system that it exists and what features it supports.

Summary 📖

I hope that this has been a useful glimpse into the evolution of the technology stack and engineering culture at Codat. This is by no means a complete picture, not because there is anything we do that we wouldn’t share, but because I have tried to pick representative anecdotes of the many small and large problems and successes that we have encountered along the way. There are more recent moves, such as moving all UI to a micro-frontend architecture, that I haven’t detailed yet either, because we can’t evaluate our decisions yet.

I have also not really gone into any detail about the organization and management of engineering teams and their work, and how that has changed over time. How you manage people and getting ideas into specs and into code and then into operable systems running over the internet is arguably more important than any database technology, but we’ll save that for another time.

Jason Dryhurst-Smith, Head of Engineering

You can start building with Codat for free today. Sign up here for a free account or visit our docs to find out more about our data model.

Blog

The evolution of the Codat tech stack

Engineering Wed 27 Oct 2021

Under the hood of Codat with Jason Dryhurst-Smith, Head of Engineering.

2016 – The Start 🎬

Using the Web Apps and WebJobs services from Azure, along with SQL, and Azure Storage, Codat was born. The bulk of the machinery was situated in a monolithic service called The Data API.

2017 – Mitosis ➗

As we gained more clients, we started to see various bottlenecks in the performance of the system and in the team’s ability to work on one large codebase.

2018 – Push and Refactor 📈

The team started to grow rapidly in 2018, enabling us to push the product out on more fronts.

Over the course of the year, there were a huge number of internal improvements made to the internal libraries and SDKs that our engineers use every day.

In the same year we started building our prototype Sync product. Despite being built from the tools we already had available to us, it involved more whiteboards and head scratching than new tech.

2019 – A Reckoning 🕑

It took the best part of the year and we encountered a number of hurdles along the way with the technology never quite behaving as we had expected.

2020 – The Conscious Decoupling 🚀

Summary 📖

Jason Dryhurst-Smith, Head of Engineering

You can start building with Codat for free today. Sign up here for a free account or visit our docs to find out more about our data model.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	1 year	This cookies is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
lpv887333	30 minutes	No description
visitor_id887333	10 years	No description
visitor_id887333-hash	10 years	No description

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_89798244_1	1 minute	Google Analytics cookies are used to collect information about how Visitors use our site. We use the information to compile reports and to help us improve the site. The cookies collect information in an anonymous form, including the number of Visitors to the site, where Visitors have come to the site from, and the pages they visited.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
_hjAbsoluteSessionInProgress	30 minutes	This cookie is used to detect the first pageview session of a user. This is a True/False flag set by the cookie.
_hjFirstSeen	30 minutes	This is set by Hotjar to identify a new user’s first session. It stores a true/false value, indicating whether this was the first time Hotjar saw this user. It is used by Recording filters to identify new user sessions.
_hjid	1 year	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInPageviewSample	2 minutes	This cookie is set to let Hotjar know whether that visitor is included in the data sampling defined by your site's pageview limit.
_hjTLDTest	session	When the Hotjar script executes we try to determine the most generic cookie path we should use, instead of the page hostname. This is done so that cookies can be shared across subdomains (where applicable). To determine this, we try to store the _hjTLDTest cookie for different URL substring alternatives until it fails. After this check, the cookie is removed.
_lfa	2 years	This cookie is set by the provider Leadfeeder. This cookie is used for identifying the IP address of devices visiting the website. The cookie collects information such as IP addresses, time spent on website and page requests for the visits.This collected information is used for retargeting of multiple users routing from the same IP address.
pardot	past	The cookie is set when the visitor is logged in as a Pardot user.
vuid	2 years	This domain of this cookie is owned by Vimeo. This cookie is used by vimeo to collect tracking information. It sets a unique ID to embed videos to the website.

The evolution of the Codat tech stack

Under the hood of Codat with Jason Dryhurst-Smith, Head of Engineering.

2016 – The Start 🎬

2017 – Mitosis ➗

2018 – Push and Refactor 📈

2019 – A Reckoning 🕑

2020 – The Conscious Decoupling 🚀

Summary 📖

The evolution of the Codat tech stack

Under the hood of Codat with Jason Dryhurst-Smith, Head of Engineering.

2016 – The Start 🎬

2017 – Mitosis ➗

2018 – Push and Refactor 📈

2019 – A Reckoning 🕑

2020 – The Conscious Decoupling 🚀

Summary 📖

Want to get started?