When comparing the Prezly sales pipeline Hubspot data against what I was seeing in Mixpanel there was a 10% difference. We could work with a 10% difference for most of the marketing decisions but at the same time could not use Mixpanel only for all marketing/sales reporting.
For one given period we found about 556 demos in Hubspot and only 485 in Segment/Mixpanel.
The website is setup that our forms (the form at www.prezly.com/demo) and any other form on the website submits data directly to Hubspot Forms API urls.
This is some of the code that submits the data from the browser to hubspot:
To investigate I exported all the Hubspot demos and all the identify() calls from Day 2 (stored in DynamoDB) for a given period in a google sheet.
The first thing I did is removed the duplicates from the Hubspot list using a sheet addon:
Considering that Hubspot had the most data I added two columns to figure out if the same data was in segment. I ended up using vlookup in the 2nd table to see if the same email was there.
Added another helper column that used that data to put 0/1 in a new column
Ended up with something like this:
Allright. Something is up here. Comparing from both sides I found that all segment identify() data is in Hubspot but not the other way around.
Next questions for me was to figure out if this was a temporary thing? Like at one point some integration was broken/JS errors? So made a quick chart to see the distribution of misses over time:
This was consistent with the numbers higher up. Around 10-12% of demos end up in Hubspot but not in segment.
Now I was pretty confident that some visitors block calls to segment.com or other integrations. The 10% rate is consistent an article I found
I already knew that browsers are getting smarter about blocking 3th party scripts and tracking snippets. Big user of Brave myself, try to fall back to Firefox when that doesn't work, but I find myself disabling Brave shield a lot because some sites just don't work without it.
Reading up on the subject I discovered that a lot has happened in this area and blockers are getting better and stronger. Good. Here is a list of stuff that can potentially break my integration:
- black lists: most hosted analytics solutions are in some kind of black list used by blocking extensions (Ghostery, uBlock Origin) or browsers itself (Firefox, Brave,..) blocking either the snippet (CDN host) or the tracking endpoint (API)
- 3th party cookies: Browsers making it stricter to use third party cookies
What can we do?
I'm not looking to track adblocking users or violate anyone's trust. What I am trying to get at is event-tracking for attribution purposes (marketing efforts) and if that is possible with an all-in-one segment approach.
After reading through segment forums, some docs and best practices I think we should change our segment setup:
- ✅ Use server side tracking where we can. Not loading Customer.io, Hubspot, Mixpanel or other snippets if we can offload that work to Segments server side (cloud mode) integrations
- ✅ Serve the analytics bundle ourselves. So not cdn.segment.com but www.prezly.com/analytics.js or something
- ✅ Not use any third party cookies and make sure that cookie domain of all our tracking calls is secure and locked to www.prezly.com (and not .prezly.com). Wrote about this here.
- Proxy tracking calls through our own host. So instead of the segment snippet sending it's payload to api.segment.io it could go to www.prezly.com/analytics/. Good thing Segment supports just which they call Custom Domain Proxy.
I did most of the things above. Still working with Segment and our website infrastructure to also proxy the tracking calls.
Tomorrow I will try to discover if I can back fill the missing data in any way.