arrow-left arrow-right brightness-2 chevron-left chevron-right circle-half-full facebook-box facebook loader magnify menu-down rss-box star twitter-box twitter white-balance-sunny window-close
Day 2: Capture segment events
7 min read

Day 2: Capture segment events

Today I only want to deploy some simple code to capture and store track() and identify() calls.

To keep it as simple as possible I will be using AWS lambda function and DynamoDB. I can probably get starting in less than an hour saving these raw events in a data store for analysis a little later

Setting up the project

This project will be using javascript and lambda. So I assume you know the basics around npm, git. Ensure you have node 10.19.0 (i use nvm for that), the project might work in other versions but just to be sure.

mkdir solving-marketing-attribution && cd solving-marketing-attribution
git init
npm init
nvm use 10.19.0
npm install -g serverless
npm install body-parser aws-sdk serverless-http express --save
npm install serverless-offline --save-dev

To simplify deployment and speed up the overall development we’ll be using serverless (which is awesome) and Amazon Web Services. Check out the getting started guide.

Log into your AWS account and find (or create) your access keys. Another guide for that. After that tell serverless to use them

sls config credentials --provider aws --key YOURKEY --secret YOURSECRET

All done. Let’s get started.

Testing Serverless

Add this basic boilerplate to serverless.yml

service: solving-marketing-attribution
provider:
  name: aws
  runtime: nodejs10.x
  stage: dev
  region: eu-west-1
functions:
  hello:
    handler: handler.hello
    events:
      - http:
          path: hello
          method: post

And this very basic code to index.js

const serverless = require('serverless-http');
const express = require('express');
const app = express();

app.post('/hello', function (req, res) {
    res.status(200).send(`Hi. Awesome guide. Keep reading @ https://medium.com/solving-marketing-attribution`);
});

module.exports.handler = serverless(app);

Now let’s try it

sls deploy

If all is good you will get some output with your endpoint:

Serverless: Packaging service...
Serverless: Excluding development dependencies...
Serverless: Service files not changed. Skipping deployment...
Service Information
service: solving-marketing-attribution
stage: dev
region: eu-west-1
stack: solving-marketing-attribution-dev
resources: 11
api keys:
  None
endpoints:
  POST - https://[YOURURL]/dev/hello
functions:
  hello: solving-marketing-attribution-dev-hello
layers:
  None

After which you can try to hit that endpoint using a POST request

curl -X POST https://[FILLINURL].com/dev/hello

Voila! We have a working boilerplate for the project. I use httpie myself and not curl so would be:

> http POST  https://[URL]/dev/hello

HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 82
Content-Type: text/html; charset=utf-8
Date: Wed, 22 Apr 2020 19:29:44 GMT
Via: 1.1 d1f9689a3caeb0a19dffbc049d2b2141.cloudfront.net (CloudFront)
X-Amz-Cf-Id: WNTlEGv7ihx7UStbvc1-c2xAsyjFIzjDlRgw7enBP1IWGR27-Enliw==
X-Amz-Cf-Pop: LHR61-C2
X-Amzn-Trace-Id: Root=1-5ea09b28-b10adbf4536a00323c1bec80;Sampled=0
X-Cache: Miss from cloudfront
etag: W/"52-WjenCnXcPLNfe0kZO7NA0GScpyc"
x-amz-apigw-id: LZ0uYGMGjoEF-7A=
x-amzn-Remapped-content-length: 82
x-amzn-RequestId: 2760ac24-8a9d-4666-8e84-47b0f6f906f9
x-powered-by: Express

Hi. Awesome guide. Keep reading @ https://medium.com/solving-marketing-attribution

Let’s keep going!

The database — Dynamo

Honestly, I started with Postgres (using Prisma). It worked but was a lot harder to explain needing to model the database schema, set up the RDS instance and managing migrations.

For the sake of learning something new (have not done anything with DynamoDB yet), and keeping this guide simple i am going with AWS DynamoDB.

For day 1 we need 2 tables to store track() and identify() calls. Update your serverless.yml file to this:

service: solving-marketing-attribution

custom:
  # sma = Solving Marketing Attribution
  tableIdentify: 'sma-identify-${self:provider.stage}'
  tableTrack: 'sma-event-track-${self:provider.stage}'

provider:
  name: aws
  runtime: nodejs10.x
  stage: dev
  region: eu-west-1
  iamRoleStatements:
    - Effect: Allow
      Action:
        - dynamodb:Query
        - dynamodb:Scan
        - dynamodb:GetItem
        - dynamodb:PutItem
        - dynamodb:UpdateItem
        - dynamodb:DeleteItem
        - dynamodb:ListStreams
      Resource:
        - { "Fn::GetAtt": ["SegmentIdentifyDynamoDBTable", "Arn" ] }
        - { "Fn::GetAtt": ["SegmentTrackDynamoDBTable", "Arn" ] }
  environment:
    IDENTIFY_TABLE: ${self:custom.tableIdentify}
    TRACK_TABLE: ${self:custom.tableTrack}

functions:
  hello:
    handler: index.handler
    events:
      - http: 'POST /events'

resources:
  Resources:
    SegmentIdentifyDynamoDBTable:
      Type: 'AWS::DynamoDB::Table'
      Properties:
        AttributeDefinitions:
          - AttributeName: messageId
            AttributeType: S
        KeySchema:
          - AttributeName: messageId
            KeyType: HASH
        ProvisionedThroughput:
          ReadCapacityUnits: 1
          WriteCapacityUnits: 1
        TableName: ${self:custom.tableIdentify}
    SegmentTrackDynamoDBTable:
      Type: 'AWS::DynamoDB::Table'
      Properties:
        AttributeDefinitions:
          - AttributeName: messageId
            AttributeType: S
        KeySchema:
          - AttributeName: messageId
            KeyType: HASH
        ProvisionedThroughput:
          ReadCapacityUnits: 1
          WriteCapacityUnits: 1
        TableName: ${self:custom.tableTrack}

Nothing special here: Create 2 tables, add some permissions and set some ENV variables to use later in the storing code.

Hit sls deploy and see serverless create the resources for you. Under the hood they are using cloudformation for you to document the resources you need in serverless.yml. It’s pretty awesome.

If you log in to your AWS account and go to DynamoDB you should now see something like this:

Storing Events (using lambda)

Now the only thing we need to do is create a controller that accepts the body of a segment webhook and stores it into the newly created tables.

const serverless = require('serverless-http');
const bodyParser = require('body-parser');
const express = require('express');
const app = express();
const AWS = require('aws-sdk');

const IDENTIFY_TABLE = process.env.IDENTIFY_TABLE;
const TRACK_TABLE = process.env.TRACK_TABLE;
const dynamoDb = new AWS.DynamoDB.DocumentClient( {
    // ensures empty values (userId = null) are converted
    // more @ https://stackoverflow.com/questions/37479586/nodejs-with-dynamodb-throws-error-attributevalue-may-not-contain-an-empty-strin
    convertEmptyValues: true
});

app.use(bodyParser.json({ strict: false }));

// Create User endpoint
app.post('/events', function (req, res) {
    const request_body = req.body;
    const type = request_body.type;

    if (type === 'page' || type === 'identify') {
        const { messageId } = req.body;

        const params = {
            TableName: ( type === 'identify' ? IDENTIFY_TABLE : TRACK_TABLE),
            Item: {
                messageId: messageId,
                ...request_body,
            },
        };

        dynamoDb.put(params, (error) => {
            if (error) {
                console.log(error);
                res.status(400).json({ error: 'Could not store event' });
            }
            res.json({ messageId, request_body });
        });
    } else {
        res.status(200).send(`Not a page / identify event. Skipping`);
    }
});

module.exports.handler = serverless(app);

Hit deploy again. You can do it faster by only deploying the function so it will skip the Cloud Formation setup.

sls deploy --function=hello

Mucho faster!

Deploying and testing

To test if everything is working create some fake json files with identify() and track() calls. I grouped them in a folder /events

The easiest way to get to those is to log into your segment account, click a source and go to the debugger.

Click `RAW` and copy the event into a json file

If you’re too lazy you can borrow some of mine(took out IP address and other sensitive info).

{
  "anonymousId": "f1ef0f8d-68d2-4fbc-a7e6-cade9d42f3aa",
  "context": {
    "ip": "8.8.8.8",
    "library": {
      "name": "analytics.js",
      "version": "3.11.4"
    },
    "locale": "pt-BR",
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36"
  },
  "integrations": {},
  "messageId": "ajs-d19c0a0b9109264efd4328fe62e95de9",
  "originalTimestamp": "2020-04-22T19:50:21.915Z",
  "receivedAt": "2020-04-22T19:50:24.010Z",
  "sentAt": "2020-04-22T19:50:21.918Z",
  "timestamp": "2020-04-22T19:50:24.007Z",
  "traits": {
    "created": "2017-08-16T07:23:26.000Z",
    "email": "john@gmail.com"
  },
  "type": "identify",
  "userId": "john@gmail.com"
}
events/identify.json
{
  "anonymousId": "a5b62615-cf5e-4e1c-bd01-a9852914baaa",
  "context": {
    "campaign": {
      "medium": "ppc",
      "name": "High Intent Keywords",
      "source": "adwords",
      "term": ""
    },
    "ip": "8.8.8.8",
    "library": {
      "name": "analytics.js",
      "version": "3.11.4"
    },
    "locale": "en-US",
    "page": {
      "path": "/press-release-examples",
      "referrer": "https://www.google.com/",
      "search": "?utm_term=&utm_campaign=High+intent+Keywords&utm_source=adwords&utm_medium=ppc&hsa_grp=5&hsa_cam=160858s191&hsa_src=g&hsa_tgt=dsa-877asda743510184&hsa_net=adwords&hsa_ad=4197063512adas53&hsa_kw=&hsa_ver=3&gclid=CjwKCAjw1v_0BRasdasdAkEiwALFkj5darpcyuadczuYZRGOVBxudIwY3Xl7PBqJjxCyAUaEElPbyCqToEq0QwXBhoCvl8QAvD_BwE",
      "title": "How to craft a killer Press Release (with 142 examples)",
      "url": "https://www.prezly.com/press-release-examples?utm_term=&utm_campaign=Re+-+High+intent+keywords&utm_source=adwords&utm_medium=ppc&hsa_grp=99459701487&hsa_mt=b&hsa_acc=8390923425&hsa_cam=160858191&hsa_src=g&hsa_tgt=dsa-877743510184&hsa_net=adwords&hsa_ad=419706351253&hsa_kw=&hsa_ver=3&gclid=CjwKCAjw1v_0BRAkEiwALFkj5rpcyuczuYZRGOVBxudIWY3Xl7PBqJjCyAUaEElPbyCqToEq0QwXBhoCvl8QAvD_BwE"
    },
    "userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36"
  },
  "integrations": {
    "All": true
  },
  "messageId": "ajs-88055880c705d64737bd77afe09fe03d",
  "originalTimestamp": "2020-04-22T19:49:24.086Z",
  "properties": {
    "path": "/press-release-examples",
    "referrer": "https://www.google.com/",
    "search": "?utm_term=&utm_campaign=Re+-+High+intent+keywords&utm_source=adwords&utm_medium=ppc&hsa_grp=99459701487&hsa_mt=b&hsa_acc=8390923425&hsa_cam=160858191&hsa_src=g&hsa_tgt=dsa-877743510184&hsa_net=adwords&hsa_ad=419706351253&hsa_kw=&hsa_ver=3&gclid=CjwKCAjw1v_0BRAkEiwALFkj5rpcyuczuYZRGOVBxudIWY3Xl7PBqJjCyAUaEElPbyCqToEq0QwXBhoCvl8QAvD_BwE",
    "title": "How to craft a killer Press Release (with 142 examples)",
    "url": "https://www.prezly.com/press-release-examples?utm_term=&utm_campaign=Re+-+High+intent&utm_source=adwords&utm_medium=ppc&hsa_grp=99459701487&hsa_mt=b&hsa_acc=8390923425&hsa_cam=160858191&hsa_src=g&hsa_tgt=dsa-877743510184&hsa_net=adwords&hsa_ad=419706351253&hsa_kw=&hsa_ver=3&gclid=CjwKCAjw1v_0BRAkEiwALFkj5rpcyuczuYZRGOVBxudIWY3Xl7PBqJjCyAUaEElPbyCqToEq0QwXBhoCvl8QAvD_BwE"
  },
  "receivedAt": "2020-04-22T19:49:24.424Z",
  "sentAt": "2020-04-22T19:49:24.090Z",
  "timestamp": "2020-04-22T19:49:24.420Z",
  "type": "page",
  "userId": null
}
events/page.json

Now to test this using HTTPie use this:

http POST https://YOURURL/dev/events < events/identify.json

It should play back the event for you like this:


And 200 status code means that it’s now stored in DynamoDB. Some more tricks.

I use a variable to store the endpoint host:

export BASE_DOMAIN=https://YOURURL/dev
echo $BASE_DOMAIN

http POST $BASE_DOMAIN/events < events/identify.json
http POST $BASE_DOMAIN/events < events/track.json

From now on I will use this shorter syntax.

If you want to do console.logs in your code and see them use the following command

sls logs --function=hello -t | tr '\r' '\n'

The part after the pipe is fixing a bug with end lines and node 10. It’s a known bug.

See you tomorrow!