Every application request will be greeted by Nginx. The job of nginx in this case is two fold:
- redirect non secure traffic to ssl (301 redirects)
- terminating ssl
For *.prezly.com we use a number of dedicated/wildcard SSL certificates. Client newsrooms using custom domains are served making us of letsencrypt certificates.
Nginx passes off requests to varnish or haproxy based on some hard coded rules. E.g. requests on rock.prezly.com are passed on to the load balancer directly because there is no use in caching them. 90% of requests is passed on to Varnish though.
Frigging awesome caching layer. Prezly serves newsrooms for a number of large enterprise brands and so we encounter some traffic peaks from time to time. Did anyone lose it’s aircraft ?
We pretty much consider the bottleneck of Varnish to be the uplink from infrastrcture set-up to the world. During large peaks I tend to log in to see how our load balancers are holding up. Most of the time the log shipping takes more resources then handling the traffic itself. Again, varnish is friggin awesome!
Did I mention it makes us lazy application engineers too ?
Load Balancer: Haproxy
Distributing the traffic to the different application servers is done by Haproxy. The number of active webservers is dynamic at all times but haproxy.cfg is modified automatically by chef scripts upon server initialisation.
Requests are split up in a few categories/applications (api, backend, frontend, website) and all those applications have different health checks. When a health check fails (server failure, load problem, bad code,…) the request is passed on to the next available server. Kawabunga!
For the sake of the length of this post I won’t go into detail on application framework internals. So I’ll stick to stuff that fits in the context of this post.
- Apache 2.4.7
- PHP 7.0.7
The pool of active webservers consists of 4 machines. To minimise the monthly AWS invoice during weekends/downtime everything is being scaled down to a single webserver. Sessions are being shared using a memcached instance.
Recently upgrading PHP 5.x to PHP 7.x had a HUGE impact on performance. Application response times for pages with some basic rendering logic/data fetching went from 74ms average to less than 50ms. Memory usage is a lot lower and overall the application feels more stable and robust. Where large application peaks used to result in a spin up of up to 4 webservers after upgrading I haven’t seen a load based spin up of over 2 web servers.
We use a number of SQS queues to process background jobs. Those jobs are split up into different queues (p1 -> p4) which are handled by long-polling PHP scripts. PHP is daemonised by using supervisord where we spin up around 10 daemons per worker instance.
The good thing about this approach is that SQS supports long poll threading which has a great impact on the performance of the PHP daemons. Before using SQS we had a self-made queuing engine that used some kind of wait() function which caused the cpu load to go crazy when run in large concurrency.
Worker instances are spawned when the tresholds of the different queues are reached right up to the point where we start noticing slowdowns on database performance. Those rules are defined by using Opsworks auto-scaling rules.
Static Assets: Cloudfront
All images, attachments and videos uploaded by our customers are stored on AWS S3. We make use of cloudfront to surface those assets globally.
In the past we have used MySQL, Mongo, CouchDB, MariaDB and probably a few others. Some of them were in production concurrently where we stored parts of our data in different storage engines. Sounds logical to me.
Today we use PostgresSQL only. Gotta admit, we’ve had our pain with background jobs, vacuum/analyse operations and IO tuning but we can say that database operations are under control now.
By making use of Amazon RDS we outsource the management and maintenance of our datastore.
For central logging we use rsyslog to ship logs to elastic search. Logs are being consumed using Kibana. I plan on writing a post the elaborates on the set-up of this central logging set-up.
Monitoring: newrelic & cloudwatch
We use new relic application monitoring and server monitoring.
- Application monitoring gives us a good insight in end-user performance and application performance
- Server monitoring checks disk, cpu and memory usage and reports on that in slack
In addition we have set up details cloudwatch monitoring that are used to autoscale our web and worker server.