Day 1 - The Masterplan

What do we need to track?

We’re already sending analytics calls that would allow me to tie everything together:

  • page()(spec): every time a visitor opens a page. The payload has information about the browser referrer (if there is one)
  • track()(spec) : custom events that allow you to track stuff you’re interested in: Demo Request, Newsletter Subscribe, Start Chat…
  • identify()(spec) : assigns the visitor a unique identifier (customer id, email address, CRM id, …)

How analytics solutions allow you to tie together the full history of page views and/or events before someone is identified every browser gets a unique anonymousId which is added to every analytics call.

How an identify() call associates previous page() calls using anonymousId
How an identify() call associates previous page() calls using anonymousId

Once an identify() call is triggered (that also includes that same anonymousId) all the previous activity can be associated with the new userId. First time I learned about this trick I found this really clever.

Access to segment data

For this experiment I need access to the segment data. More specifically the details that go to every page() and identify() call. Let’s discover the easiest way to get access to the raw segment data.

Super high level data flow. From website to segment to the (enabled) integrations
Super high level data flow. From website to segment to the (enabled) integrations

Option 1: Data Warehouse 🤯

To get access to raw segment data you could set up a warehouse syncing all your events to your own data warehouse. After that, it’s just a question of tieing together the different records and tables to find what you’re looking for.

The problem with that approach:

  • segment maintains your database schema, there are no indices set up
  • segment real-time integration is $$$ (we have 2 times day)
  • working with this data is hard. You need special SQL powers to link together the different tables. Look at these examples

Option 2: Query an enabled destination 😐

Second idea: Get the data from one of our upstream partners (linked to the segment account).

The ones that might qualify are, Hubspot, Mixpanel and Hull. Let’s take a look

  • Super awesome (tech/product first) company that I am sure has a great tech stack/API. I could not find information in their API docs to get access to raw segment data.
  • Hubspot: Their API is ok but to get events in Hubspot we’d have to upgrade (see below paragraph on reporting)
  • Mixpanel: Access to raw segment data is available through export API only. We can’t query it so we need somewhere to store.
  • Hull (we’re currently trialing) : Might work. They are storing the data in an Elastic Search cluster so searching for events is pretty much making elastic queries.

This whole experiment started with my telling my co-founder

This whole stuff shouldn’t be that expensive. Actually I don’t think it’s that hard at all. Let me try something

So this disqualifies expensive software (Hubspot subscription) or linking up an extra source (Hull) to do just that. Back to square 1.

Option 3: Webhook 👍 allows you to set up webhooks that are fired in real time. It can be used to run your own logic on certain events, enable an integration that is not currently available in the catalog or just used as a backup to store your raw segment data (in real time).

For this experiment I would set up a quick webhook that stores the data somewhere so I can process it.

Storing the data would allow me to generate a full trail of all the events for a user (and it’s linked anonymous Ids). That event stream would then be used to run the attribution analysis and answer questions like:

  • Where did the visitor first come from?
  • What other sources contributed to the conversion?
  • What’s the last source right before the conversion happened?
  • How many devices did visitor use?

Identifying Visitor Sources

A big part of the Marketing Attribution challenge is knowing where visitors come from. Almost all integrations we have enabled in our segment stack, and even segment itself, have a way to do some kind of source identification.

Some examples:

The thing is that the identification models are different for every integration so there is no way to get uniform reporting.

To solve this I was thinking to use existing software for that. Here is what I found:

There is no magic there. It’s about passing the HTTP Referrer and the page url to some matchers and grouping those. The challenge here is to keep up with social networks URLs, new search engines, etc…

Considering I want to write as least code as possible I will be picking one of those libraries later.

Feeding back data 💡

Knowing that we want to pass all the learnings (attribution, first/last touch) back to all the software we use (Mixpanel,, …) this one is obvious for me → Feed the data back to segment.

The analytics events can be fired server side. The Visitor Source Identified event should be fired every time we detect a change in the visitor source while the traits per user (through identify() ) are maintained by making use of our stored pages.

The advantage here is that the different models (First touch, Last touch, decay, linear) can be added by adding additional traits for each user.

Funnel / Pipeline Reporting 📊

Your CRM can probably do a lot of reporting around which sources are generating the most deals, the current open pipeline etc…

We’re using Hubspot ourselves and while we have access to great reports (such as this one) the matching logic they use to bucket visitor sources is unknown. Plus you need to load their 288kb JS bundle (visitor.js) 😱

Funnel Reporting in Hubspot
Funnel Reporting in Hubspot

Sidenote: if you really want to feed all your existing track() events in Hubspot CRM you need to upgrade to their premium marketing professional plans. Last time I checked you’re looking at more than 20k/year in total to see tracking events in hubspot.

Now for reporting I personally am a big fan of Mixpanel. It allows me to build reports on top of my existing tracking plan and can be enabled with 3 clicks from the integration tab in your segment account. Mixpanel allows you to use both the user properties (coming from identify()) as well as the event data and it’s properties (coming from track()).

To learn more about Mixpanel features and reports check out their knowledge base. But for this experiment i would like to see funnel reports like the one below split up by first touch, last touch and maybe assisted conversions.

The only way this could work if all important attribution data would be available in mixpanel. Getting this right would solve a lot from the wishlist described in my previous post (scroll to the ultimate solution. WTF is it with Medium and page anchors?)

  • Seeing the different sources a visitor comes from → Any report grouped by user attribute First Touch or Last touch
  • Understanding which touchpoints contributed to a conversion → Something like Mixpanel flow where there are events for source detection.
  • Look at more than just the last touchpoint that contributed to a conversion → Mixpanel reports can apply any model as long as the user traits have the data on the models you want to run
  • Link different sessions / multiple devices to a single user identifier → Solved using segment cross device tracking
  • Be able to rewrite history. PQLs that ultimately end up buying are worth more than other PQLs → See previous paragraph on where does the funnel stop.

Additionally all the segment interactions are stored in our Postgres data warehouse (we’re running this on AWS Aurora) but I want to avoid interacting with it (unknown schema + no indices). For now I am considering this a data backup.

Where does the funnel stop?

If you want to fully understand marketing spent you need to be able to link together real revenue (won/lost deals) to marketing activity (visits, conversions).

That means that tracking demo requests or newsletter subscribes is not enough but you need to trigger events and update user or group properties when deals change status.

The green stuff is important. I won’t go into detail on how to do that but in essence, it’s as simple as firing off track() or identify() calls as soon as something interesting changes. So every time a deal is created or gets updated in your CRM you have to make sure the following events are triggered.


The frustrating part is that there are not a lot of affordable CRM’s doing this well. We had to teach Hubspot and Segment to play together (using something like


All right. No more questions on how/what/where. Tomorrow I will start writing some code to try and get a prototype up.

  • a minimal service that listens for webhooks
  • stores events in some kind of data store
  • run a library to analyse url/referrer
  • store the results of that library somewhere
  • trigger track() calls when a visitor source is noteworthy

Check-in tomorrow where I will post some code and a working prototype.