What would happen if you, unexpectedly, had to build your entire production infrastructure from scratch? Would you be able to perform a full recover off all services and dependencies to an acceptable level? How long would it take? Hours? Days? Would the engineering team know what to do? What problems would you encounter? What about full data recovery and databases? Are backups available? How to manage operations and set expectations across the business and clients? This is the kind of nightmare situation that keeps any SRE awake at night, specially if you’re running a SaaS platform. There is a common perception that these events are similar to something coming out of the Black Swan Theory: they can have a profound impact when they (rarely) happen and always arrive as a surprise. But they are less rare than we think. In the last couple of years, I’ve seen major security incidents…
Leave a Commentfmarques.org Posts
Update (12/4/20): I highly recommend using the latest Amazon ECS-optimized Amazon Linux 2 AMI. It uses Docker’s OverlayFS (overlay2) storage driver. The same partition is used for OS, Docker images and metadata. It’s easier to monitor filesystem usage using the Prometheus Node exporter. If you still have to use the older ECS AMI v2015.09.d or later, this article might be useful for you. There are plenty of tools out there for monitoring Docker using Prometheus. We can use the Node Exporter to gather useful information for Docker hosts at an OS/kernel level (memory, cpu, network, filesystem) and at a container level there is cAdvisor which reports resource usage and performance data. Unfortunately I couldn’t find any way of monitoring Docker Thin Pool usage with Prometheus so I wrote a quick Python script to generate usage metrics that are exposed using Node Exporter’s textfile collector. So first, what does “Thin Pool”…
Leave a CommentHad the opportunity to write an article for AWS Startups Blog, explaining how we use EC2 Spot Instances with ECS at Signal: “Every day, Signal ingests millions of documents from a growing number of publishers, including online media, print newspapers, broadcast, regulation and legislation. Our text analytics pipeline processes these documents in real time, applying our own AI algorithms and machine learning, preparing them to be searched from our application and distributed via our alerts system. The entire Signal platform is built on a large number of microservices running on Docker containers deployed to Amazon ECS. In fact, we run almost all of our workloads on top of several ECS clusters including ingestion, processing and consumption. With the hyper growth of our platform we have started to face several challenges, primarily on efficiency and capacity planning. We had a lot more questions than answers. What is the best way to…
Leave a CommentView this post on Instagram Man on the walk. #bordeaux A post shared by Frederico Marques (@freddygoestolondon) on Apr 16, 2018 at 12:31pm PDT
Leave a CommentBom Dia. #chinatown #london A post shared by Frederico Marques (@freddygoestolondon) on Feb 11, 2018 at 4:07am PST
Leave a CommentBill Bryson (A Short History of Almost Everything) knows how to write rich and engaging history books (he’s a remarkable storyteller). One Summer is a spectacular book for understanding America in the mad 1920s (and all the craziness around Charles Lindbergh, including early aviation history). A must read.
Leave a CommentHalford Engine collection A post shared by Frederico Marques (@freddygoestolondon) on Oct 29, 2017 at 11:03am PDT
Leave a CommentOlha o robot #tatemodern #london A post shared by Frederico Marques (@freddygoestolondon) on Apr 30, 2017 at 9:06am PDT
Leave a CommentThis is a common trend. You’ve been using Ansible to provision your infrastructure for some time and all of a sudden you will have a couple of secrets to manage, usually SSL/SSH private keys, API credentials, passwords, etc. Because you don’t want these secrets to be stored “in the clear” on your git repository, you will declare them as variables inside yaml files and then use Ansible Vault to encrypt them using an AES symmetric key. You can then run ansible-playbook with –ask-vault-pass, so yaml var files will get decrypted on the fly when running the playbook. Sometimes I use Ansible together with other tools under the same repository. For example, I prefer to provision AWS infrastructure with Terraform and then call Ansible as a provisioner to customize an EC2 instance and Cloudflare to update the DNS record . Or use Packer to bake an AMI and use Ansible as a local provisioner. In…
Leave a CommentMr Smith #cats A photo posted by Frederico Marques (@freddygoestolondon) on Nov 25, 2016 at 12:44pm PST
Leave a Comment