This blog post will go through how you can reduce your eye watering AWS network costs by leveraging VPC flow logs, AWS Athena and VPC endpoints. We’ll introduce these AWS services and components, explain the different types of AWS network costs as well as discuss some methods for identifying and implementing cost savings. Doing all of this can potentially result in a significant reduction in cloud network costs, from savings of hundreds of thousands of dollars to even millions per year.
Some basic knowledge of AWS services like S3, VPC and their associated components is assumed. Firstly let’s briefly run through the AWS services/features used.
VPC flow logs is a feature of the AWS VPC service that allows us to capture IP packet metadata from traffic traversing our VPC going to/from elastic network interfaces (ENIs). However, they do not capture packet contents itself, for this we’d need a tool such as Wireshark or tcpdump.
Flow logs are made of a collection of rows with specific fields such as srcaddr, dstport, protocol and action. An example is shown below:
| Version | Account ID | Interface ID | Src Addr | Dst Addr | Src Port | Dst Port | Protocol | Packets | Bytes | Start | End | Action | Log Status |
|---------|-------------- |-------------------------|-----------|-----------|----------|----------|----------|---------|-------|-------------|-------------|---------|------------|
| 2 | 123456789012 | eni-0a1b2c3d4e5f67891 | 10.1.1.1 | 10.1.2.1 | 49538 | 80 | 6 | 25 | 1948 | 1725480562 | 1725480592 | ACCEPT | OK |
| 2 | 123456789012 | eni-0a1b2c3d4e5f67895 | 10.1.3.1 | 10.1.1.1 | 10004 | 54296 | 6 | 6 | 414 | 1725480562 | 1725480592 | ACCEPT | OK |
| 2 | 123456789012 | eni-0a1b2c3d4e5f67883 | 10.1.1.1 | 10.1.4.1 | 8090 | 40870 | 6 | 6 | 414 | 1725480562 | 1725480592 | ACCEPT | OK |
| 2 | 123456789012 | eni-0a1b2c3d4e5f67913 | 10.1.1.1 | 10.1.5.1 | 56138 | 3306 | 6 | 6 | 363 | 1725480562 | 1725480592 | ACCEPT | OK |
| 2 | 123456789012 | eni-0a1b2c3d4e5f67890 | 10.1.6.1 | 10.1.1.1 | 10004 | 51414 | 6 | 6 | 414 | 1725480562 | 1725480592 | ACCEPT | OK |
This gives us crucial visibility into traffic flows within our VPCs and can help us identify traffic bottlenecks, validate security group & NACL configurations, and detect any unusual or potentially malicious activity. Lastly, it can help identity sub-optimal traffic patterns which often result in increased cloud network costs.
It’s worth noting that flow logs do not capture all VPC traffic; notable exceptions include DHCP and Amazon DNS server traffic.
Athena is a powerful serverless query service that allows us analyse data such as logs stored in Amazon S3 using simple SQL queries. It’s particularly useful for VPC flow log analysis, due to the gigantic volumes of data often created from network traffic flows. You simply define your schema, point to your S3 bucket and execute SQL queries, whilst only paying for the data you scan.
S3 buckets serve as the fundamental storage layer, where raw VPC flow logs are stored with a time based partition structure. These logs can be in different formats such as plain text or parquet.
Athena tables define the schema that maps the raw data to a structured format we can then query with SQL. In a nutshell, we’re telling Athena how to interpret each field in our network logs. It’s worth mentioning that tables don’t actually contain any data, rather it’s just a layer on top that describes the data structure and points to S3 files where the actual data lives.
Databases in Athena allow us to group and categorise multiple tables. Again, they don’t store data. For example, we might have a database called network_logs
with multiple tables containing data from different VPCs.
VPC flow log data can run into the Terrabytes so scanning and querying all this data can be very time consuming. This is where partitions can help us optimise query performance by reducing the data scanned. For example, we may only be interested in the network flows over a particular hour, day or month. Therefore, we can avoid wasting time scanning the whole data set.
If none of this makes sense right now, bear with me for now and we’ll dive deeper shortly.
AWS VPC endpoints allow us to form private connections bridging between our VPCs and AWS services or even 3rd party services. When we create a VPC endpoint, we essentially open up a private and dedicated passageway for our data which stays within the AWS backbone network.
It can also improve performance by reducing latency and allowing higher bandwidth connections to AWS services such as S3. Furthermore, it reduces the price we pay for data transfer thus cutting our cloud bill significantly. This can be a real game changer if you’re pumping some serious traffic out of your VPC.