Saturday, May 10, 2014

Sometimes Nmap Takes Down a Datacenter

"We don't perform denial of service testing," I say to my client.
Those words exit my mouth with an alkaline sting. Lies, all lies! We detect DoS by looking at version strings in server banners. There's no deep kung fu in this; Nessus has plugins for it. And if I see that my client's server could be brought down with killapache.pl, I will let them know. The truth is that our tests can occasionally identify DoS conditions safely, without actually denying any service.
But even this is only a half-truth. The whole truth is that even safe, industry accepted pen testing practices can wreak havoc on a network.
I was an information security officer for a respectable organization during a short lived run on the IPv4 address space market. Fortunately, mere weeks before network managers started jumping off the roofs of their data centers, one of my network guys snagged a few Class-Cs and had them pointed them at our DR site for safekeeping. Fat, dumb, and happy we were the doomsday preppers of the 32-bit address space apocalypse.
Time went by and we started gearing up for our annual external penetration test. There had been lots of changes in our network that year and I thought it would be best to scope the assessment by looking up our public IP assignments in ARIN, cross-referencing it against a scripted mass dig(1) of all the externally accessible DNS names in our asset management system. Everything checked out, a few externally hosted marketing sites notwithstanding.
The first night of testing went off without a hitch.
Eleven o'clock at night on the second night I got a call from the on-call system administrator, a bit miffed because the DR site had been taken down. The entire data center had gone dark! I quickly phoned my tester and had him pull the plug. Before you could say Ctrl-C, I had hung up and put the call back in to my system admin. Everything was back up.
I didn't expect that our pen tester pulled down our site. I only had him pull the plug for political reasons. If something went wrong, the networking team would blame the pen testers. Even when no tests were scheduled, they'd call me up to ask if there were any unscheduled scans going on. It's a defense mechanism they'd learned from years of being the infrastructure whipping boy. For years, every time a tablespace fills up, a process falls into an infinite loop, or a memory board smokes, people would run to the networking team.
At that place and time though, penetration testing was the new black magic. No one understood it; everyone feared it. If the lights flickered or the cafeteria ran out of chicken tikka masala, pen testing was probably the reason. So I had to put a stop to the testing before the real incident response could begin. I didn't expect that my lone tester could bring down a whole data center by himself though. The site coming back online mere seconds after he stopped scanning was too much of a coincidence. I called him back up and explained the synchronicity.
"I don't understand. All I was doing was an Nmap."
"Send me the command you ran." This breaks protocol a little. As a paying customer it is my responsibility to wag my finger across the phone at the tester and demand an assurance that this will never happen again. He would respond by categorically denying any wrong doing. Then he'd spend the rest of the engagement running a single thread discovery scan. The end result would be a blank report and a mysteriously fragile network.
Instead, I like to treat these nightmare inducing outages as a chance to learn something about the world. His Nmap scan seemed legit. No crazy plugins, no elaborate timing. It was a simple TCP connect scan. There were no payloads. I had him run it again while I sniffed the network at our border.

In front of our firewall was a border router who's only job was to be a DMARC between us and our ISP. It was a 1U implementation of a very simple bit of logic:
for each packet
  if packet.destination.ip_address ∈ $respectable_org.ip_addresses
  then
    send packet to $respectable_org
  else
    send packet to $isp
  end if
end for
Thing is, we had not yet added our new Class-C's to $respectable_org.ip_addresses. But our ISP's router implemented similar logic, and they had included the subnets in their list of our IP addresses. And of course I had included it in our tester's scope.
So what happened? Our tester started scanning our DR site on the second night, (the first night he had scanned production.) Our ISP sent each of his probes to our border router. Not recognizing the destination address as one of ours, the router sent the packet back out the port it came to our ISP. Our ISP recognized the destination address as belonging to us and send the probe right out the port it came, back to us. This two-node route loop lasted until the time-to-live on each packet decremented all the way--about thirty times per packet. And since the loop was as tight as the next hop away, it was much more impactful than running thirty times as many threads.
The penetration tester was exonerated, the appropriate routes were put in, and testing resumed. Ironically, it was the networking team's fault this time, though I'd hardly blame them.
The truth is, non-destructive means can reveal weaknesses that destructive attackers can exploit, non-destructive means can come to destructive ends with the most innocuous configuration problems, and that there's always the risk of something bad happening.
And to tell the whole truth, my client doesn't want to hear all of that. They want the assurance that their business won't suffer for having hired me. If I can't assure them of that with fewer than a dozen words, I can't assure them of that. So I'm stuck dragging out that old chestnut,
"We don't perform denial of service testing."