Day 11 - Sales Attribution

On February we allowed visitors on the website to sign up for a 14-day trial. At that time we decided to get rid of Hubspot because:

  1. Hubspot.js payload. Every lighthouse audit is complaining about the sheer size of that bundle (which includes attribution, chat, forms...)
  2. and Hubspot don't work together. If you want all your tracking calls and identify traits to show up in Hubspot (preferably as custom fields) you need to upgrade to a Hubspot enterprise subscription. $$$
  3. Attribution has now been fixed. We enjoyed hubspot reports on where a visitor originally came from and which sources helped in the final conversion.

We needed something simpler that we could extend so decided to use airtable for a backend to power the forms on the website (demo, contact form, trial form).

Airtable as backend

All forms on the website are now submitted to an /api/form endpoint. The most common form on the website is the request a trial flow:

Website Form Submission Flow
Website Form Submission Flow

The good part about creating the license first (Prezly Admin API) is that we receive a customer_id and a user_id which can later be used to link sales data back to the original form submission. In a way, you could say that Airtable has the mapping information to link visitor sessions (anonymousId, userId) to Prezly database information.

Turns out Airtable is super friendly to anyone that wants to organise those form submissions in a way that is useful for them. Take this view for example:

Example view in airtable
Example view in airtable

Which is an overview of expired customers that have healthy activity.

Extra data pushed into Airtable

Without going into details on how we did it (Lambda functions, Airtable automation and webhooks) here is the information we additionally store into the form submissions base:

  • userId, licenseId: see above
  • Channel Attribution Data:
    • First Attribution Channel
    • Last Attribution Channel
    • Nr Attributions: How many channels contributed
  • Qualification: Manual qualification low, medium, high.
  • Health Score: Score calculated by Vitally around activity (nr stories, nr session, ...)
  • Customer Status: Cancelled, Trial, Customer
  • MRR: Monthly Recurring Revenue

It would be nice to get some of that information back into (and mixpanel) so I create additional reports (already did some in day 7) answering questions:

  • Which channels bring in high-quality prospects?
  • How much MRR did Twitter bring in in the last 3 months?
  • How many channels on average contribute to a sale?

The Code

The first thing we do is add the handlers

Notice how I add a GET and POST endpoint? They are both doing the same thing but the GET endpoint is easier to use to debug/test and backfill the sales data for the last X months of trials.

The code to fetch data from airtable + fire identify calls is wrapped in a method trackUserSales.

I also added some variables to the .env files so make sure to have a look (or check out main) to make sure you have all config variables. Using a quick 'sls deploy' should publish that function.

Feed historic

To overwrite/fill in the history of our website data I have copied the list of all userIds from airtable. Other ways to find userIds:

  • Segment Audience builder > Download CSV (only userId)
  • Create mixpanel segment > Download > Remove columns
  • Use the data warehouse to find the userId for all users that did a specific event

That gave me a list of a few thousand userIds (I went with 3 months) which I stored in 'trial_userids.csv'. Using some bash I called the GET endpoint to backfill that data.

The code to trigger the calls (one call for every line in CSV):

> cat trials_alltime.csv| while read line; do https --print=b --ignore-stdin [LAMBDA_ENDPOIN]/prod/segment/identify/user/$line/sales; done

Few 1000 identifies being triggered

After I ran that command I could see that sales attribution data in mixpanel:

Trial Requests by Qualification (subset of data)
Trial Requests by Qualification (subset of data)

Real-time sales attribution

With the data now being backfilled, I need a way for this to automatically be triggered. I was planning to use Personas > Journey for that with the following criteria:

  • After requesting a trials
    • T+24hours
    • T+3 days (qualification will be done)
    • T+7 days (trial usage data, health score)
    • T+14 days (trial over, final health score and usage)
    • T+21 days (trials that activate within a week after expiry)
  • Based on lastSyncedSales + 30 days ago

That combination will make sure that we get pretty accurate reporting on new trials while the continuous sync (based on lastSyncedSales) will ensure that even older trials get refreshed every 30 days.

So I set this up using a Journey that looks like this

Personas Journey with 5 branches. Same webhook
Personas Journey with 5 branches. Same webhook

I am recycling the webhook from Day 10 which I modified to support the new steps (by key). The code now looks like this:

I set up a similar journey based on the lastSyncedSales property to ensure old trials are kept up to date at least once every month.

Tomorrow I will have a look at the different questions in this post, and if they can now be answered in Mixpanel just using this new data.