AWS ECS Fargate Container Log Shipping using Vector

7 min readMar 3, 2022

In this article, we will be seeing on how to use vector for shipping the AWS ECS Fargate application logs to Elasticsearch and few other destinations.We are using vector as a side car container in the AWS Fargate Task to accomplish this.

Today's microservices based applications generate millions of logs in a day and we are interested in ingesting those logs into analytics engine like elasticsearch, splunk etc and also to monitoring tools like Datadog. For Instance, if we had to ship logs to these two destinations(elasticsearch and datadog) from an application we had to write a separate implementation which sends our logs via their APIs. If we wish to send our logs to few more destinations we need to have a separate implementation once again. To avoid this and make our work simple, we can make use of Vector which is a high performant system to ship our app logs easily with just a configuration file which is just 20–30 lines of code. Now let's go into further detail.

What is Vector?

Vector is a high-performance observability data pipeline that enables you to collect from “x” number of sources, transform the data to a desired form and route all of your logs and metrics to “y” number of destinations.

Vector can automatically handle rate changes for http based event destinations (like ElasticSearch) via an adaptive method, and can perform on-disk buffering if needed. It supports collecting log events from many sources, for example, as an http listener or file listener and publish them to Amazon S3, ElasticSearch, Kafka, among others. Its event processing transforms can help manipulate log events to conform to schemas (e.g. Elastic Common Schema (ECS)). Vector can collect host metrics and exposes its own internal metrics that can be scraped by Prometheus via a corresponding sink. Vector is now the backbone of observability on Kubernetes. You can read more about Kubernetes integration with vector here.

Why Vector?

We chose vector as a log shipper for our application because it is cost effective, very less code which involves few lines of configuration to ship logs from various sources to different destinations like we have done in our use case(shipping logs from ECS fargate container to multiple destinations like elasticsearch and AWS Cloudwatch).

Another reason to choose vector over logstash or fluentd is the Vector Remap Language(VRL) the heart of data processing in Vector.

VRL is an expression-oriented language designed to work with observability data in a highly performant manner. Read here for more information on this topic.

To learn more about vector this would be a great place to start with.

Pre-requisites

AWS Fargate
Docker
Vector

Installing Vector

Vector can be installed easily by using the following command.

curl --proto '=https' --tlsv1.2 -sSf https://sh.vector.dev | bash

Vector installation is made easy on MAC using homebrew

brew install vector is the command to install vector on MAC.

Other installation methods for different OS, Platforms has been given here.

When you are done installing vector, you could check the vector version by,

vector --version

In order to run vector locally, create a vector configuration file with .toml extension with necessary configuration which we will be describing in the later steps.

Vector can be run with the following command

vector --config ./vector.toml

Now Let's go into more detail on how we have utilised vector in our application.

Architecture Diagram

1. It is possible to use vector in an AWS Fargate task definition that will act as log shipper for the main (service) container's log.

Fargate supports splunk listeners for log shipping, and vector provides a "Splunk HTTP Event Collector"(splunk_hec) that can be configured to ingest logs from the main container and ship it to a remote sink. To achieve this:

Create a container image with vector, any utilities and config files needed. The source specified in the vector config should be splunk_hec, with a port and token . The detailed code walkthrough of the vector configuration will be explained in the further s
In Fargate task specification, choose splunk type for the main (service) container’s log driver and set the following fields:

splunk-url: http://0.0.0.0:8088<port in vector config>splunk-token: abc<token value in vector config>splunk-verify-connection: false # for Fargate 1.4 this is REQUIRED, works without this field in Fargate 1.3

2. Vector container also runs in the same host like our application container. Vector pulls the logs from the local splunk hec endpoint which is the source

3. Vector transforms the logs received from splunk hec to our application log format.

Format received from Splunk HEC:

{
 “host”: “host-1234”,
 “message”: “{\”appname\”:\”benefritz\”,\”timestamp\”:\”2022–02–24T07:36:44.456Z\”,\”facility\”:\”authpriv\”,\”host\”:\”some.de\”,\”message\”:\”We’re gonna need a bigger boat\”,\”msgid\”:\”ID191\”,\”procid\”:\”9473\”,\”severity\”:\”crit\”}”,
 “source”: “stdin”,
 “source_type”: “splunk_hec”,
 “tag”: “123”
}

Transformed Format:

{
 “appname”: “benefritz”,
 “facility”: “authpriv”,
 “hostname”: “some.de”,
 “message”: “We’re gonna need a bigger boat”,
 “msgid”: “ID191”,
 “procid”: 9473,
 “severity”: “crit”,
 “timestamp”: “2021–01–20T19:38:55.329Z”
}

4.a. Vector then sends the transformed message to Elasticsearch sink. We have defined some properties or configuration for the data being sent to elasticsearch. Some of the important properties or configuration are as follows:

Buffer

We have set Buffer type as disk which stores the logs in the machine’s disk storage and this also stores the data in memory to bring in some performance. Through this type of storage we could prevent data loss when the container crashes for some reason

Concurrency

The type of concurrency being used is adaptive concurrency. We have chosen adaptive concurrency since vector itself manages the concurrency in this case to send requests based on the response codes and RTT from the elasticsearch server.

If the RTT is decreasing/constant and/or response codes are consistently successful (200–299), Vector sees 🟢 and increases the throughput linearly.

If the RTT is increasing and/or response codes consistently indicate failure — codes like 429 Too Many Requests and 503 Service Unavailable — Vector sees 🟡 and exponentially decreases concurrency.

4.b. Vector also sends the transformed message to the AWS Cloudwatch logs sink.Some of the important properties for the aws_cloudwatch_logs type sink are as follows:

— group_name-Cloudwatch Log group name

— stream_name-Cloud watch log group stream name (The stream name value should be of the form <cloudwatch_log_group_name>/<”hostname path extracted from the incoming message — This field could be anything based on your choice”>)

— type of concurrency-Concurrency is set as “adaptive”

Configuration

Following is the code for all the flows which we saw above.

Create a file named config.toml in the project's directory.

Vector works based on three different topologies as follows

Source — Vector wouldn’t be very useful if it couldn’t ingest data. A source defines where Vector should pull data from, or how it should receive data pushed to it. A topology can have any number of sources, and as they ingest data they proceed to normalize it into events . This sets the stage for easy and consistent processing of your data. Examples of sources include file, syslog, statsd, and stdin.
Transform — A transform is responsible for mutating events as they are transported by Vector. This might involve parsing, filtering, sampling, or aggregating. You can have any number of transforms in your pipeline and how they are composed is up to you.
Sink — A sink is a destination for events. Each sink’s design and transmission method is dictated by the downstream service it interacts with. The socket sink, for example, streams individual events, while the aws_s3 sink buffers and flushes data.

Lines 4–7 defines the source configuration which is splunk HEC.

Lines 9–19 transforms the incoming message to the desired format.

Lines 21–31 sends the transformed message to Elasticsearch sink.

Lines 33–40 sends the transformed message to AWS Cloudwatch Logs

The explanation of the above code is already covered in the architecture flow section.

Deployment

We can build this application as a docker image and upload it in a docker HUB. We can then pull the docker image and run it as a container along with our main service container in AWS ECS service.

Following are the steps involved in deployment.

Step 1: Create a Dockerfile with the following contents

FROM docker.io/timberio/vector:0.18.1-alpineCOPY config.toml /etc/vector/vector.toml

Step 2: Build the docker image

docker build <app_name> .

Step 3: Login to a docker hub you wish to push the image to

docker login --username=<username> --email=<email>

Step 4: Push the image to the docker repository or docker HUB

docker push <app_name>

Now the docker image is available in a docker hub or repository.

We need to add an additional container for vector in the ECS Service Task Definition and the docker image used here would be the one created in the previous step. We need to update the AWS ECS service with the latest task definition which contains both the main service container and the vector container.

Once the ECS Service is successfully running, we can see the logs getting forwarded to both ElasticSearch and the AWS CloudWatch Logs through vector.

We could also write unit tests for the vector configuration file and test if the source, transform and sink works fine. Here is the reference for the same.

Conclusion:

Hence Vector enables dramatic cost reduction, novel data enrichment, and data security where you need it, not where it is most convenient for your vendors. Additionally, it is open source and up to 10x faster than every alternative in the space.

Some of the use cases of vector are as follows:

Reduce total observability costs.
Transition vendors without disrupting workflows.
Enhance data quality and improve insights.
Consolidate agents and eliminate agent fatigue.
Improve overall observability performance and reliability.

Hope this helps! Thanks for Reading :)