The Rise and Rise of Bad Bots – Part 2: Beyond Web-Scraping

Written by

Anyone who listened to Aleks Krotoski’s five short programs on BBC Radio 4 entitled Codes that Changed the World will have been reminded that applications written in COBOL, despite dating from the late 1950s, remain in widespread use.

Although organizations are reliant on these applications they are often impossible to change as the original developers are long gone and the documentation is poor. With the advent of Windows and then web browsers, there was a need to re-present the output of old COBOL applications. This led to the birth of screen-scraping, the reading of output intended for dumb terminals and repurposing it for alternative user interfaces.

The concepts of screen-scraping have been reborn in the 21st Century as web-scraping. Web scrapers are bots that scan websites for information, when necessary manipulating i/o to get what they need. This is not necessarily a bad activity; price comparison sites rely on the technique – for example, an airline or hotel that wants its pricing information shared in the hope that its services will appear on as many sites as possible.

However, there are also less desirable applications of web-scraping, such as competitive intelligence. So, how do you tell good bots from bad?

This was the original business of Distil Networks. It developed technology that could be deployed as an on-premise appliance or invoked as a cloud service, enabling bots to be identified and policy defined about what they can or cannot do. So, if you sell airline tickets, it can recognize bots from approved price comparison sites, but block those that are from competitors or are just unknown.

Distil does this by developing signatures that allow good bots to be whitelisted (ie allowed). It recognizes bots in the first place by checking for a lack of a web browser (and therefore real user) and challenging suspects with CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart).

It has plans to extend this to APIs (application programming interfaces) that are embedded in the native apps that are increasingly being used to access online resources from mobile devices.

With the ability to recognize and block bots, Distil Networks has realized it has the ability to block other unwanted attention being received by its customers. For example:

  • Brute force logins are perpetrated using bots; these can be identified and blocked, and if necessary challenged with a CAPTCHA
  • Man-in-the-middle (MITM) attacks where a user’s communication with a resource is interfered with often involve bots; they can be detected and blocked
  • Online ad fraud/click fraud rely on bots to click many times mimicking user interest and potentially costing advertisers dearly; such activity can be identified and blocked
  • Bot-based vulnerability scanners can be limited to authorized products and services, blocking others that are being used by hackers to find weakness in target systems, giving resource owners back the initiative in the race to patch or exploit

Distil charges by the volume of page requests. So, for example, if you were worried about ad-fraud and a botnet was used to generate millions of clicks, then costs could spiral out of control. The answer to that is to use DDoS controls that can detect volume attacks (as discussed in Part 1 of this blog post) in conjunction with Distil’s bot detection and blocking capability.

Distil seems to be onto something. It has received $13m in VC funding so far, and has an impressive and growing list of customer. Unlike many security vendors, it seems happy to name its customers; perhaps just knowing such protection is in place will encourage the bad-guys to move on? In the UK this includes EasyJet and

Distil is set to make life harder for bad bots – but as ever there will surely be a fight back from the dark side.

What’s hot on Infosecurity Magazine?