Comment: Why Amazon Is My #1 Attacker

"The need to establish reputation information from cloud instances can't be ignored in the world of the incident responder", says Constantine
"The need to establish reputation information from cloud instances can't be ignored in the world of the incident responder", says Constantine

A few years ago I built my first incident response workflow automation system; one of the key features was the automatic retrieval of all publicly available information about external hosts showing up in correlated logs. By populating this database with all the GeoIP information about hosts observed in alerts, I produced some interesting statistics from real data (instead of mere perception) on attackers. The most (initially) surprising discovery was that the organization most of our attacks originated from was Amazon.com.

It didn't take more than a second to realize that the source was not Amazon the book seller, but Amazon the cloud hosting provider: the ease of provisioning up temporary hosts, running an operation on them and then rapidly deleting the instance. Attackers have, perhaps, more interest in agility than defenders do; certainly more options to achieve it.

To me, this was the final nail in the coffin for a technique that had been dying on the vine for some years – network-based blacklisting. We've adapted ourselves, of course: more real-time feeds of independent IP addresses and uniform resource identifiers (URIs) are available, and the systems to maintain the timeliness of these feeds are improving. However, they don't address something of vital importance to the incident responder: attribution.

Let me tell you a tale from the front lines…

I was hunting down an active attack – endpoints had been compromised and they had migrated to using stolen credentials to access the network directly without further use of the remote access trojan on the compromised system. As I watched our attackers access the network from multiple locations, I began to build a profile of them: what hours were they active, their time zones, the number of people acting together. Finally, I realized something from the connection authentication information: the connections from multiple remote locations were actually only from a single host – a cloud-provisioned host.

As the host was brought online to stage attacks from, it was being relocated to lower-load hardware clusters with an entirely different upstream connectivity. The host had used many different IP addresses and physical connectivity sets – spanning three different countries – during its operational lifecycle, but it was still the same virtual machine instance the entire time.

I was well prepared to do identification of remote hosts on VPS-style co-location arrangements, but the global motility of hosts on cloud providers had temporarily thrown us for a loop. I realized that the game had, once again, changed right before my eyes.

In this case, I had an obvious advantage having pre-existing data points to correlate together that allowed us to uncover what was happening behind the curtain. But how much better would it be if we had the ability to easily identify this beforehand? I got to thinking: the network identification tools we have today are all built for a pre-cloud internet – a world where IP addresses are tied to physical hosts in physical locations, owned by identifiable registered organizations. Now anyone that remembers the internet pre-1999 will remember the venerable Ident() protocol (rapidly made obsolete because it represented a security risk in the open, untrusted internet).

Yet couldn't a tokenized, anonymous version of this provide some measure of utility in a cloud-served public internet? Wouldn’t the ability to query Amazon's web services and know that all three EC2 instances currently attacking me are all operated by Tokenized Amazon Customer F8E993C be useful? I wondered how different the statistics from my database would have been had I been able to break down the vast percentage that EC2 represented down to individual actors.

Until an initiative to create something of this sort arises and reaches implementation and acceptance, I'll have to stick with more abstruse methods of remotely fingerprinting cloud instances post facto. Because cloud computing offers ever more copious amounts of utility computing – OS instances that can be launched, operated and deleted in a matter of minutes – we on the defensive end of things need some way to keep up with the increasing complexity of attribution.

Personally, I would like to see this done in a manner that neither provides useful reconnaissance to attackers, nor presents an undue privacy violation. I see issues with both aspects, even with my suggestion of a tokenized identity query.

The increasing need for attribution techniques in incident response is not just some by-product of security analysts wanting to play the role of counter-intelligence agents. It is vital for correlating and prioritizing the tidal wave of data we need to pour through to make informed response decisions. Being able to correlate two seemingly unrelated minor attack attempts on different parts of the infrastructure launched from two random hosts on the same multinational cloud computing provider can mean all the difference between making directed remediation and ‘ignoring the diversionary attack, that actually isn't’.

There is much work in progress on establishing reputation between cloud service providers and their customers. One thing is for sure: the need to establish reputation information from cloud instances can't be ignored in the world of the incident responder.


Conrad Constantine is a research engineer with AlienVault. With an early background in searching for forbidden knowledge, pushing computing hardware to its limits and a nose for the truth, Constantine was born for a career in incident response. Over the last decade and a half, he has been on the front lines of defense work in telecom, medical and media corporations, not least of which being at ground zero for the 2011 RSA breach. Constantine is a firm believer that incident response must become an accessible and effective discipline, available to all. He’s striving to bring the mysteries of open-source intelligence generation, and defensive agility, to those willing to take the leap from fear to action –mostly via the medium of code.

What’s hot on Infosecurity Magazine?