Do you want to know the main components of the AWS virtual private cloud, and fundamentals of VPC Security? Well, by the end of this article, you definitely will.
Let’s begin by defining the AWS virtual private cloud, usually referred to as VPC. The VPC is a private virtual network inside of Amazon’s network. Since it’s a private network, it is logically separated from the other VPCs that are also residing on the AWS network, as well as Amazon’s own internal network. This means that other people can’t see the data. It also means there’s no problem with IP addressing. You can use a certain level of IP addresses in your VPC and another separate VPC on the AWS cloud could also use the same addresses because they’re constrained inside of their own logical VPCs.
What are the key components of a VPC? There are several components, of which we’ll cover the primary ones. The first and key component is the VPC routing table. The second component is an internet gateway. The third component will be NAT gateways. And after that, we’ll have VPC endpoints, and then security groups and access lists.
The first thing we’ll talk about when it pertains to the VPC is routing and routing tables. The VPC comes with its own high availability virtual router. For those of you who are not familiar with routing and switching, this virtual router is a computer with multiple interfaces. This “computer” has a logic in it that will determine how to get traffic from point A to point B. Anytime you get off a local subnet, you’re going to need a router to make forwarding decisions for you. Whether that traffic is on the internet, or that traffic is in your data center, or whether that traffic is in another part of the AWS network. Therefore, each VPC comes with a virtual router.
The next component is something called an internet gateway. If you have your systems on the AWS network, and you want to connect them to the internet (and I mean truly connect them) you’re going to have to use an internet gateway. Now, what I mean by truly connecting your systems to the internet is you want them to be completely exposed. For example, if a system needs to be completely on the internet, it’s going to have to go through an internet gateway and it’s going to need a public IP address and a gateway is going to need a public IP address as well.
You set up an internet gateway in the following manner. You attach an internet gateway to your VPC. From there, you create a default route pointing to the internet gateway and you assign the IP address to the gateway. And then from there, you configure the security because once you’re on the public internet, your systems can be attacked quite easily.
Now the above described is for true internet connectivity, meaning data going in and data going out of the servers. In many cases, you have systems that actually need to access the internet, but they don’t need to be reachable from the internet. That’s the next component of a VPC called the NAT gateway. NAT takes one network address and translates it into another network address. NAT could be useful during periods of migration such as when an organization purchases another organization that is using the same IP addresses. Something needs to be done because systems with the same IP address can’t communicate.
The reason you’re using a NAT gateway is that you want to take a system and connect it to the internet. By using a NAT gateway, you can have private IP addresses on all your systems, which means they’re not reachable from the internet, but they can still reach out to the internet to pull back patches and software updates and such. When you use a NAT gateway without an internet gateway, your systems will be able to reach out to the internet and bring back data, but the internet will not be able to reach to them.
Elastic IP Addresses
Moving on to the next component of the VPC, elastic IP addresses. Any system that needs to be reachable from the internet in some way, shape, or form must have a public IP address. It doesn’t have to be on the webserver per se, it could be on an elastic load balancer. But it must be understood that if a system is going to be reachable from the internet, it must have a public IP address. There’s not a whole lot of public IP addresses available, so what Amazon does is maintain a pool of IP addresses.
This pool is called elastic IP addresses. When the customer needs a public address, they take an address from the pool of elastic IPs and use it for as long as they need. As soon as the organization is done using the elastic IP address, meaning maybe that server’s out of commission or they needed it for a temporary period, that elastic IP address is then returned to the pool. Then Amazon can use that for another customer. This system works great and allows customers to have private IP addresses on the inside of their network and have public IP addresses as needed. There are no challenges or registration to get the IP addresses because the customer gets them directly from Amazon.
The next concept is an endpoint and what an endpoint is, is the ability to use the Amazon network to connect to another Amazon service or another Amazon customer’s VPC. Understand that endpoints are used to traverse the AWS network instead of the internet. So if you did not have an endpoint for example, and wanted to connect your VPC to S3, your organization would have to go out to the internet and then on the internet come back into the AWS network. This scenario would be problematic for a few reasons. One is you’re paying for internet access, so you definitely don’t want to do that if you don’t have to. But more importantly, the internet is going to be much slower than the AWS network.
There are a lot of reasons why a private network like the AWS backbone has better performance than the internet. The main reason the AWS backbone has better performance than the internet is, that Amazon manages the AWS network, which means they can have strict quality control and quality of service guarantees. But when traffic goes to the internet, you may have a guarantee of your speed to the internet, but your internet connection will have to go through multiple internet service providers in most cases to get to its destination and no one can guarantee the speed on other service providers. So, the endpoint is a way to get guaranteed performance as well as high security by traversing the AWS network to either other partners or other AWS services.
Now that I’ve talked just a little more about endpoints, especially since a lot of you are probably working on Amazon certified solution architect certifications, we’re going to take the endpoint and break it down into two types. There’s an interface endpoint, which is an elastic network interface that uses a private IP address from the VPC’s pool. An organization uses this endpoint as an entry point from their organization to a supported service. This uses the AWS private link service. And supported services could include almost anything on the AWS cloud or other VPCs.
This is different than a gateway endpoint, which is a private endpoint that provides high-security access to an AWS service. What happens is it places a route in the VPC’s routing table for traffic destined to that service. Typically, this is done with something like S3 or DynamoDB, but it could also be used with VPC peering as well.
Network Access Control Lists and Security Groups
The last two components of the VPC are the network access list and security groups. A network access list is very similar to an access list that’s on a router, it’s stateless. Access lists have to be put in terms of inbound and outbound terms because they are stateless and network access lists are attached to the subnet. I’m going to repeat that. Access lists are attached to the subnets. So they’re about keeping traffic out of a subnet.
The next part of the security component of the VPC is a security group. A security group is a host-based firewall. Security groups are stateful and security groups are attached to a server or a service. So, while the network ACL keeps traffic out of this subnet, the security group keeps traffic that you don’t desire outside of a system like an EC2 instance.
VPC SECURITY BEST PRACTICES
In the second part of today’s article, we will focus on securing your VPC from the network perspective.
Let’s start with the methods that we’ll use to secure your VPC. We’re going to begin by discussing routing, and then we’ll discuss firewalls. After firewalls, we’ll talk about network ACLs. After network ACLs, we’ll talk about security groups. Then we’re going to talk about host-based firewalls, and lastly, we’ll talk about DDoS protection as well as intrusion detection and prevention systems, and how they can be used to secure your VPC.
Routing and VLANs
I want to begin with routing. And the reason I want to begin with routing is if you can’t reach something, or you don’t have access to it, it’s going to be near impossible to attack. If we begin with routing and we can limit who actually has access to the systems, then it will become much harder for others to attack it. I like to look at my organization anytime I am designing systems and I need to determine who needs access to what.
For example, you’ve got the data center of your organization in a hybrid cloud environment, which is connected to AWS via Direct Connect. And you’ve got your AWS VPC. There are certain things in your VPC that only certain users are going to need to access. You shouldn’t provide access to the inner workings of your VPC to those that don’t need it. Take only those that need access to it, give them access to it, and keep everybody else away.
How would you do this? The first thing I suggest is to figure out who needs access to what. In a simple scenario, let’s say there is a network operations department, dev-test group, and another group. You would place each group in their own VLANs. A VLAN is a way to logically separate different parts of the switch. There’s much more content on VLANs on the internet. You can look at the Cisco CCNA press books. They’ll give you a great explanation of VLANs, and we recommend people understand this if they’re going to be working with cloud computing.
Back to the point, create a VLAN for the groups, and place them in their VLANs. We’re going to do this because we’ll give access to those VLANs. We’ll create an IP interface in the VLANs, and we’ll give access by sending the route to AWS so they know how to reach back to these VLANs. We’ll give the VLANs access to the routing information to get to AWS. But we don’t really need to give that information to anybody else. Through some thoughtful route filter, we can make sure that only the departments that need to access the cloud will be even able to reach it from a network perspective long before IM would ever even come into play.
If we’re going to constrain the people by putting them in specific VLANs, which means specific subnets, and we’re going to exchange routing information, we’ve done a lot already just by that simple step to secure the VPC. Well, we want more than that.
Let’s say an organization has a 200 user network operation center. It’s a pretty big NOC, but I’ve worked in organizations that have that, and much, much more. So, you’ve got 200 people in the network operations subnet who do need to access the AWS VPC. What happens if their systems get compromised? Now that an intruder has reachability to AWS, they could now attack your VPC. In order to keep systems from attacking each other with malware or things that can just accidentally happen, let’s further segregate the system.
What we’ll do is we’ll create something in the VLANs, we’ll make them something called a private VLAN. What a private VLAN is as you can see in the graph above, if you’ve got all these users in the VLAN, but they can’t talk to each other because the private VLAN functionality means they can’t. So, now you’ve got 200 users. One user gets infected with some type of malware, that user can’t give it to everybody else, because that user can’t reach them. Private VLANs are another great way to go for security.
Now, people don’t talk about this a lot these days, it was talked about a lot more in the past, but it’s something called 802.1x. It’s an authentication form for ethernet. I mentioned that first, you’re going to logically separate them in their own subnets or VLANs. And then you’re going to keep them from talking to each other, with a private VLAN. But what happens if somebody just plugs into the VLAN and doesn’t belong in that department. For example, they stick a computer or a Raspberry Pi or whatever on there, and they use that to launch an attack through your network, into the VPC. You can’t have that.
I recommend the users of your specific accounts that need access to the VPC, specifically the inner workings of it, use .1x authentication. Really what happens when you’re plugging your ethernet port into the network, it’ll look at a database and say, this Mac address, the physical address associated with the ethernet port is allowed on the network, open the port. And if an unauthorized user tries to plug into the network, they’ll be shut down, the port will just shut down.
Realistically speaking, by just using the network, we’ve already talked about a lot of ways to make it much more secure.
Now, firewalls. What a firewall does is builds a strong, secure perimeter around the edge of a network. I like to think of it in terms of a castle. And the castle has a giant wall around it to protect people inside. Then it has a moat that’s filled with water and dangerous animals. Then there is the impenetrable forest. The firewall is that hard exterior shell to the castle. It keeps outsiders out. Like I said before, you’re going to have to do a lot to protect the inside of your network. Most hacks happen internally instead of externally. And the internal ones are often just accidental. The point here is that you must secure the outside before you can even think about securing your inside. And you’re going to do that with a firewall.
There are some phenomenal commercial firewalls that you can use with your VPC. Cisco makes them, Palo Alto makes them, Fortinet makes them, and there are many other great vendors out there. These are really strong firewalls that not only block traffic out, but they’re adaptive. They look at the traffic that doesn’t make sense, and they can generate rules on-demand to block things that they find to be dangerous. Commercial firewalls are great options.
With firewalls, you’re going to create a policy. These policies are going to allow in only what needs to come in. Typically, that’s often TCP port 179 for BGP routing, and whatever ports are necessary for your systems, whether they be HTTPS or SSH or whatever’s necessary. Policies allow what needs to come in and block everything else. Firewalls are stateful. Firewalls block everything coming in, except for what you explicitly permit. But if I’m a user behind the firewall and I go out to the internet, the firewall looks at my connection and recognizes me as secure. My computer is connecting to the internet and I went to this AWS webpage. The firewall has to let the AWS traffic back to me. Because it’s stateful it knows that it’s my traffic and it knows that it’s my return traffic, so it’ll let it through. This is what is meant by stateful.
Now let’s talk about the AWS firewall solution. AWS has something called WAF or web application firewall. What happens is WAF is used as sort of one of the edge devices of your network. It’s typically placed on a cloud front distribution, Amazon API gateway, a rest API, or even an application load balancer. But the point is that it’s placing a firewall on these devices before things can get on your network. It’s good because you’re placing it on the edge of the network. It blocks the traffic long before it even comes in. WAF enables you to control access based upon a policy just like any other firewall.
WAF gives some fairly granular access to protect your resources. What you’re doing is you’re controlling access with web ACL’s rules or rule groups. Web access lists just allow or deny traffic based upon what you want to permit. Just like any type of access list, but these things are stateful. And you’d set up rules, which I described previously, which just allow or deny access. And then you can create something called rule groups, which are groups of individual rules that can be reused in other places.
If you’re building a high-security network, you’re going to be placing these kinds of things everywhere, at every endpoint to your device. Anytime you can create rules and then use those same rules somewhere else, not only saves time but also eliminates mistakes that could occur along the way. Using WAF gives you the ability to monitor your traffic metrics with CloudWatch because it’s an AWS product, therefore, it’s automatically integrated.
How does this work? It’s simple. You enable WAF on the application or device. You create your policy. And then WAF will look at your traffic based upon what’s going on. It’s going to permit the traffic based on your permit or deny policy. Now, if an attack occurs, you can create new rules to mitigate the attack. Because WAF is going to integrate with CloudWatch, you can set up alerts. You can create a rule and notify systems administrators so you can do something about it.
So, in this diagram above, you can see what we’ve done is we placed WAF on the edge devices, and we’re using that to secure the rest of our remaining network.
Access Control Lists
We talked about firewalls blocking traffic entering the network, but we’re going to use network ACLs to block traffic into the subnets and out of the subnets. With the firewall, we’re blocking the exterior, but if something were to get through the firewall, now we can block the traffic with a network ACL. When it comes to security layers, it’s like dressing for the cold. Intruders can get through one layer, but if you’ve got 40 levels of defense and you get notified along the way when things are happening, you have time to stop the intrusion. The more layers, the better. In this case, we’re using a network ACL. And we’re going to allow only necessary traffic into the subnet and we’re going to block everything else. What we need comes in, we allow, and what we don’t need comes in, is not allowed.
By blocking what you don’t need, you’re saving yourselves a tremendous amount in terms of being hacked. If you block almost everything, the only thing they can attack you is what you’re actually allowing in.
Network ACLs are not stateful like a firewall. Remember, I said if I connect to the internet through the firewall, it watches my connection, that’s called stateful, and then it allows the return. Network ACLs are not stateful, which means you have to write a network ACL for outbound traffic and inbound traffic. After all, it has no means to actually know the way the traffic is coming because it’s not watching it. Network ACLs create some packet inspection rules, allow, deny, and that’s it.
All network ACLs have a default policy, which is to deny all traffic, you will create your own policies. When you build a policy, you’re going to specify the source and destination address. There can be some wildcards in there as well, the protocol and the port number. I want to say this again, network ACLs are stateless. You must port them inbound and outbound. The order you write your network ACL is absolutely critical because network ACLs process rules in order.
I’m going to show you what not to do, and then I’ll tell you what to do. Typically, I do it the other way around, but I’ve just seen so many problems with network ACLs and ACLs in general throughout my career and networking.
Let’s say you have a rule 100, and rules are processed in order. You’re going to give them a sequence number. If the rule of 100 says denial of traffic, and then I have a rule 110 that says allow port 80 or WWW traffic in, both inward and outward, I’ve got a problem. The reason I’ve got a problem is as soon as it hits the denial traffic, all the traffic is thrown away and blocked. In this example, because they’re processed in order, rule 100 is processed before rule 110. Rule 100 denies all traffic. Traffic will never hit my permit rule. So, we have to have things that we want to permit long before we have things that we want to deny.
Now let’s talk about the proper way. The proper way is simply to create a rule 100 or a rule 110 that says permit TCPs port 80 source, or whatever destination we want. In this case, I’m just allowing TCP from everyone with a rule 110 that says TCP port any source. Then I’m going to allow the TCP port 80, the return traffic to any destination. That’s it, it’s going to work great. But order matters. Make sure you have a permit before something you need to deny unless you need to get very specific. For example, you could deny a host on a subnet and then allow the remaining of the subnet. In that case, the order would matter too, deny the host before the subnet.
So, be smart about your order, realize that it reads in a rule. If you’ve got a rule that says throw away, it’ll never process to the next rule. Here in this diagram below, you can see that we have a network ACL blocking access to the subnet.
Now let’s talk about security groups. We find many new students get security groups and network ACLs confused. Network ACLs keep traffic out of the subnet. Security groups keep traffic out of an instance. When you apply a security group, you’re going to apply it to an EC2 instance or a service. I just want you to understand the difference, so I’m going to say it again; network ACLs keep traffic out of a subnet, security groups keep traffic out of an instance. With security groups, you’re going to write allow rules, because that’s really what’s all supported. And you want to allow whatever you need in, and the rest of the traffic is going to be denied.
The good news with these security groups is that they’re stateful. You determine what you want to allow in, and it knows to allow the return traffic back out. This is another layer of protection because if an intruder gets past your routing, your firewall, and your network ACL, you’ve got the security group protecting your services. It’s part of those layers that you’re going to want to use. As you can see in the diagram below, you can see where the security groups are applied and we’re protecting some instances.
Like I keep saying, layers, layers, and more layers. I typically recommend whenever I’m dealing with high-security architectures, that the organizations place a host-based firewall on their servers. I even recommend it for the workstations. Typically, it’s a lighter firewall, and it’s placed on the devices, and these firewalls will then act as a last line of defense on that system.
If all your other systems are attacked, then these firewalls can protect the system, at least for a period, or maybe completely thwart the attack. Some of these can be quite significant in terms of security, but most of them are on the lighter side, but some of the commercial vendors that are used in the enterprise environment do produce some very good host-based firewalls. And some of these host-based firewalls can be integrated into IDS IPS systems, but we’ll talk more about IDS and IPS systems in a bit.
Now let’s talk about preventing a DDoS attack. For people that are not familiar with distributed denial of service attacks, typically, here’s what happens. In this scenario, I have a web server that listens on port 80. If I’ve done all the other things I’ve described to you, the only thing that the webserver would see is port 80 traffic. But what if my server can process 5,000 requests a second. If someone sent me 5,000 requests a second, that would eliminate all that my server could do, and it wouldn’t be able to serve anybody else because it would be dealing with the requests from one attacker serving me 5,000 requests per second.
This is the simplest description of a DoS attack, it comes from one user on a system, and they send whatever it takes to attack the service that I’m exposing to the internet, and it makes my system unusable. After it’s unusable, then hackers can then obtain privileged access often to these systems and truly compromise them.
A distributed denial of service attack is the hacker hacks multiple systems on the internet, and then they use multiple systems to launch an attack on the server that’s providing the service. There are a lot of ways that we can prevent distributed denial of service attacks or at least make an organization better prepared to deal with them. But in this particular case, we’re going to talk about it the AWS way, because we deal with a lot of people focusing on certifications. Realistically speaking, it’s a full security posture to block DDoS attacks.
Now, before proceeding, I’ll address auto-scaling for a moment because it can provide some incredible DDoS protection, even though we’re in the networking weeds. I previously gave you a situation where I have a web server that can handle 5,000 requests per second. And I told you, I had an attacker attacking me with 5,000 web requests per second. My server is going to serve 5,000 web requests, whether it comes from people on the internet or whether it comes from one attacker.
In the process of doing this, I’m in trouble. What if I could scale my webserver out 10 times over in the short term? Instead of being able to support 5,000 requests, I can support 50,000 requests. Well, it’s going to be very hard to take my system down if you’ve got a good auto-scaling policy enabled. Again, you’re not going to run this forever because if you do it in this manner and all your systems are running dealing with only hackers, it’s going to get very expensive. But if you’ve got a production system that’s critical, this is a great way to stay functioning. So, use auto-scaling as part of your overall security.
Amazon has its own DDoS protection called Shield. Which has two versions. There’s the standard version, which is free to organizations that are using WAF, and there’s the Shield Advance which protects a lot of services such as EC2 instances, load balancers, CloudFront distributions, Route 53, and global accelerators. You can get some fairly substantial protection in terms of DDoS protection with AWS Shield versions.
As I mentioned AWS Shield standard is free, and it protects against common AWS attacks. AWS claims that it can block and protect 96% of the most common attacks. And these are your typical attacks like your SYN-ACK floods, your reflection attacks, and your HTTP slow reads. But Shield standard is going to be based upon a policy and it’s static. It’s going to use that static policy that you create to prevent DDoS attacks.
Looking at Shield Advanced, we get into some much better protection. Shield Advanced is available at an additional cost, and it can get pretty expensive. You’re going to apply Shield Advanced on low balancers, EC2 instances, CloudFront, Route 53. And it offers more intelligent attack mitigation. I typically talk about some firewalls and other devices that would be adaptive. They would look at traffic patterns and they’d say, this doesn’t look right. Let’s create a rule and block it. Well, AWS shield can help with that. It can deploy web-based ACLs on demand based on traffic patterns and protect you. Shield Advanced is a great service to add another layer of security.
And it gives you good visibility into the attacks and gives you the notifications for layer three attacks (network layer), layer four attacks (transport layer), and layer seven attacks (application layer). And customers that are AWS Shield customers have access to a 24/7 DDoS response team, assuming they have business or enterprise support. Not only does the service keep you safe, but you’ve got people that can help you deal with this, especially if you’re not a security guru that’s used to critical event mediation. Let’s face it, most organizations are not used to being hacked daily, so, this is a great service because these people focus on it all day.
Intrusion Detection and Intrusion Prevention Systems
Now, intrusion detection, intrusion prevention systems are unbelievable because they can look at behaviors and adapt and stop them. One of the things that I find to be a great way to do this is to install the host-based IDS IPS agent on a system. What happens is there is a manager, and the manager’s responsible for managing the local systems or the agents. You install the software on the server, the server will send traffic pattern information to the manager, and the manager will watch what’s going on. If everything’s going well, they’re going to run, they’re going to run smooth. But if all of a sudden, the server starts behaving funny and another server starts behaving funny, that IDS IPS system will detect intrusion and it will stop it. It’ll create rules on demand. It’s very useful to have intrusion detection, intrusion prevention in your systems.
I mentioned a little bit ago how you’ve got the host agent, and then you’ve got the manager, which is the smart device that controls the host agent. And they can do a couple of things. They can heal themselves; they can drop packets that they determine not to look right. They can also block a source address, that they determine to be sending questionable traffic. They can reset the TCP connection, the connection that you would have between the hacker where the device is trying to attack you. It can also send you some alerts. And if you’ve got some good alerting capabilities, you know what to do.
I know we covered a lot in this article. A solid understanding of VPCs is an essential part of your cloud architect tool kit. I presented a brief introduction to VPC in general. This introduction was meant as a high-level overview as opposed to an in-depth discussion. We covered the main components including routing tables, internet gateways, NAT gateways, elastic IP addresses, VPC endpoints, network ACLs and security groups. After the introduction, we dove straight into some best practices for VPC security. Securing your organization’s VPC is essential, and you now have some familiarity with some of the best practices and tools available to do so.
In the next part, we will continue our review of VPCs, to cover VPC Endpoints and VPC Peering in more depth. We look forward to sharing that with you next week.