How to build a data integration pipeline without the tech debt cover
Feb 8th 2021

How to build a data integration pipeline without the tech debt

Hassan Syyid profile image
Hassan Syyid

If you’re a B2B developer building a new product one of the earliest and most fundamental decisions in the product development phase is

How the heck will I get customer data into the product?

Pipeline without Tech Debt Cover Image

Whether you’re building:

  • accounting software that needs to pull invoices from NetSuite, Intuit Quickbooks, or Sage Intacct
  • sales software that needs to pull CRM data from Salesforce and HubSpot and billing data from Stripe or Chargebee
  • marketing software that needs to pull analytics data from Google Analytics or file uploads
  • or any other type of SaaS software

The trend is the same: data is only getting more spread out, and onboarding customers is really hard if your data integration pipeline is subpar. Nowadays customers want to try your product with a self-serve experience, and nothing says the opposite like a month of “data onboarding.”

I was in the same boat only a year ago, and had just left a startup that suffered from an internal data integration pipeline with a mountain of tech debt. Needless to say, I was determined to find a better solution — after all, with so many teams dealing with this there had to be tools that made it easier for developers.

I began my search, hopeful.

Almost immediately, I was confused. Data integration, ETL, data onboarding, data pipelines — which term applies to developers building products?

I found two groups of tools:

  • “enterprise ETL” tools like Talend and Informatica PowerCenter — a quick look at their sites showed they were far out of the price range of my startup, and weren’t designed for my use case.
  • analytics focused “no code” tools like Fivetran and Panoply— these are designed for business users who want to consolidate internal company data to generate useful metrics, not for B2B developers.

Dissatisfied, I decided to look at what the cloud providers like AWS, Azure, and GCP offered.

I quickly found AWS AppFlow and Azure DataFactory. They fit the price range, but they are focused on helping organizations integrate internal data and run pre-built transformation scripts. I was looking to integrate external data, and write my own scripts (preferably using data science libraries like Spark or Pandas which I was already familiar with). Another dead end.

I searched for “developer data integration” hoping to find a tool focused on developers dealing with the data integration problem.

I found data extraction tools like Stitch, and data transformation tools like Prophecy. These were definitely more developer focused, and wouldn’t break the bank. But these still left a lot to be desired — my team would still have to connect these tools, build a frontend over it, and manage that in perpetuity. We’d have to build custom workflows with something like Apache Airflow to get all the pieces to work together.

To me, that still sounded like a lot of tech debt.

This fruitless search eventually became the inspiration for a new section of “embedded data integration” tools. These tools are purely focused on developers looking to integrate customer data, and are built to plug directly into your product and abstract the data integration problem away.

Currently, there are two main embedded data integration tools:

Fusebit

fusebit.io

fusebit.png

Fusebit offers APIs that make building application integrations for your product simple. Although early, they just raised seed funding and are a huge step in the right direction.

At the time of writing they offer no way to try the APIs and have no public demo, but they are doing personalized demos by request for developers!

hotglue

hotglue.xyz

hotglue logo

hotglue offers a panel and APIs to create and manage integrations, along with a widget that drops into your webapp to give your customers a simple onboarding experience — think Stripe Checkout but for data integration.

hotglue does have a public demo and allows developers to try their platform for free directly from their site.

Conclusion

Although there are a ton of tools that market themselves as the solution to data integration, there are surprisingly few that focus on developers building products.

That is finally changing with the advent of embeddable data integration tools which aim to totally remove the tech debt most SaaS product development teams face as they scale their data integration pipelines.

If you are building a product, I recommend you check out these tools before deciding to build a custom solution.

Thanks for reading!