Talk : Applying machine learning methods to network mapping hunting
In recent years, network scanning has received an increasing focus from states, companies and individuals. As the network capabilities grow,
tools evolve to be able to scan much wider ranges (Zmap, MassScan) and they retrieve more and more information. Indeed, nowadays it is
achievable to scan a whole country or even /0.
Thus the analysis of scanned networks is becoming substantially harder. Manually digging into results is even more repetitive, time consuming
and analyst-biased. But today there are methods and algorithms to let computers work for us. Rather than “Big Data”, we prefer to speak of
machine learning (ML) regarding the analysis of large quantities of data.
To the best of our knowledge, at the time of writing, there is no related works on the particular subject of applying ML to network scans.
However, these algorithms already provide interesting results in IP stack recognition, which is a close field. Indeed in Nmap, the IPv6
stack identification is done with ML, and gives satisfying results despite a pool of samples way smaller than for IPv4.
During our talk, we will start with what information can quickly be retrieved from a scan, and how digging deeper into it can become biased
Then, we will introduce machine learning specifics without requiring a deep mathematics background. Actually, we will present a computer
scientist/infosec researcher oriented explanation, the one we would have liked to get when we started.
At this stage, the talk will focus on results we obtained on a real world example, and on dead ends we encountered.
These results include, but are not limited to, obtaining more discriminating information regarding our hosts, classifying them by
similarities and highlighting anomalies. In other words, automatically grouping web servers, printers… and finding isolated hosts, “goats” as
we call them (think the one from Jurassic Park), which may turn out to be low hanging fruits during a pentest.
All of this work, including codes and samples, will be available publicly, included in the scan digger framework IVRE, so everyone will be able to
reproduce the results for fun and profit.
Mougey Camille is an infosec engineer at CEA/DAM, mainly working on reverse
engineering and network mapping topics.
His previous talks include a presentation on execution trace for disobfuscation
at SSTIC 2014 and another one on DRM analysis at ReCON 2014.
Martin Xavier is a MSc student at Ensimag, France and a security enthusiast. He
also has a degree in mathematics, giving him a strong background in such topics.
The work presented here is a result of its second year internship at the CEA on
network mapping issues.