Day 8 - Feeding in sales data

  • Did the visitor join a demo?
  • What was the qualification of the visitor (PQL, MQL,...)?
  • Did they end up getting a proposal? How much?
  • Which of those visitors ended up as a customer? (deal won)

At Prezly we're using Hubspot as a sales CRM together with segment. Now Segment and Hubspot don't really get a long too well. Hubspot has their own way of doing things (visitor identification, attribution modelling,...) and the data isn't easily exchangeable.

So in our case:

All integrations are using the data we track through Analytics.js


Hubspot is doing their own tracking through their own JS snippet.

In the post below I will look up the sales data from Hubspot CRM and feed it back to Segment as user attributes.

Ideal Payload

Let's use the same approach as we did with the attribution where we leverage Customer.IO to trigger a lambda call and fill in the missing data.

Here is what I want the identify() payload to look like:

Note I stripped out some other common identify properties (context, ip, writeKey...).

The important properties are:

  • is_sales_qualified: Leads are manually classified as SQLs in Hubspot. If you don't know what that means read about it here.
  • is_marketing_qualified: Leads are manually classified as MQLs in Hubspot. If you don't know what that means read about it here.
  • lead_status: This a status that highlights if we can connect to the prospect. New, connected....
  • meeting_status: Did we have a meeting/demo yet? Scheduled, Not Scheduled, Done, Noshow
  • demo_by: Who did the demo ?
  • priority: Manual prioritisation of the lead (low, medium, high)
  • last_deal_amount: Dealsize of last deal
  • last_deal_currency: Currency of last deal
  • last_deal_stage: Status of deal which can be open, closed and won

By feeding that data back to Mixpanel and we can do more reporting around the challenges we said out in the original problem statement.

Mapping User Ids

In our marketing website we hash an email address and recycle that as a unique user identified. I am not sure why we made that choice exactly but it comes with some pros and cons:

  • Pro: Easier to do multi device attribution (hashing email can be done from everywhere)
  • Con: user ids can not be carried through to the app (where we use the user_id)
  • Con: hashing string (specifically the hash we use) will ultimately lead to conflicts. The hashing function might result in duplicates.

All the forms we have on the website submit the data to hubspot form API's while the Hubspot JS snippet is loaded. That combination will allow hubspot to identify and link preexisting page trackg with a user that submits the form. Very similar to how segment does reassociation, but their own Hubspot way.

In this case the id we assign our visitors does not flow into Hubspot so for the data association we can only use the email address which luckily is known and stored by

Show me the code

Let's create a new endpoint first where we'll accept an email address:

The (endpoint) naming is terrible, low inspiration mode today. I am out of coffee too.

Now let's install the hubspot node SDK and add our hubspot api key to .env

Create the handler code at src/handlers/segment/trackHubspot.js

Yeah i know this code is messy. Will clean it up later but it does the job.

Hit a quick sls deploy to deploy the new endpoint. Here is what the code does:

  • Search contact by email (passed in URL)
  • Get all the deals associated with that email
  • Fire an identify() call with the original id (hashed) and the new properties extracted from the deals/hubspot data

Triggering it (using

I have created some segments in to look if certain properties are available:

For example the SMA - Is Marketing Qualified is a segment of customers where is_marketing_qualified is marked as true.

Creating this segment before triggering the new endpoint will create an empty segment. To fix this we're creating a campaign that triggers every time a new user is identified that lacks those traits:

Criteria for the campaign
Criteria for the campaign

Workflow steps for that campaign:

Create the workflow
Create the workflow

Make sure that the webhook url has the right endpoint and passes in the user email as this is the identifier we'll use to extract Hubspot data.

On the last step you need to specify if you want to match current users only or future additions too

In our case I am triggering it for all people in the campaign and future additions only which will make sure that data is synced nightly for people that don't have the sales data yet.

After you have enabled this campaign will trigger a ton of webhook calls. The lambda will fail on hubspot thottle limits (too many requests at the same time) and will be retried by automatically? (not sure about this)

Triggered webhook
Triggered webhook

Reporting (in mixpanel)

Once that is done you can use Mixpanel to create some reports. Let's start by creating some new User Cohorts with that new data:

Using these Cohorts you can now see a what typical behavior of a Marketing Qualified lead is:

Show only Source identified events
Show only Source identified events

Or let's try to find out the different sources (see Day 7) that ultimately flow to a Marketing Qualified Lead:

Now we're getting somewhere. In the Insights > Users we can now aggregate the deal size and group it by the first channel the user came from:

Pipeline generated by source_first_referrer_type
Pipeline generated by source_first_referrer_type

The same data but looking at the last source:

Pipeline generated grouped by source_last_referrer_type
Pipeline generated grouped by source_last_referrer_type

Still early to say that this excercise is finished but let's go back to the original problem statement in my original post:

Here is a list of things that would help understand and improve our marketing campaigns when it comes to attribution:
  • ✅ Seeing the different sources a visitor comes from
  • Understanding which touch points contributed to a conversion.
  • ✅ Look at more than just the last touchpoint that contributed to a conversion
  • ✅ Link different sessions / multiple devices to a single user
  • ✅ Be able to rewrite history / look back in time. PQLs that ultimately end up buying are worth more than other PQLs.

I will now ask the Marketing team what they think of this and how useful this extra information is. When comparing this to our sales data (reports in Hubspot) i found some issues:

  • User ids change. Conflict between website and app cookies with as domain. Wrote about this here.
  • Not all users are known by Segment. The leads reported in segment are 12% lower than what is in our CRM. This is related to ad/tracking cookies being blocked. Will be looking how to solve this tomorrow
  • Funnels are unusable until we trigger events for Deal/Lead changes. Events such as Proposal Created or Demo Took Place are important to get a good understanding on the speed of the funnel
  • Hubspot changes are not automatically propagated. Will be looking into using webhooks to trigger updates to users.