AutoEncoded Domains with Mean Activation for
DGA Botnet Detection
Binay Dahal
Department of Computer Science
University of Nevada, Las Vegas
Las Vegas, Nevada
Email:binay.dahal@unlv.edu
Yoohwan Kim
Department of Computer Science
University of Nevada, Las Vegas
Las Vegas,Nevada
Email: yoohwan.kim@unlv.edu
Abstract—Botnets are the powerful and effective way of
performing malicious activities over the internet. Over the years,
it has evolved into many forms. Earlier bots used static IP
to communicate with their command and control server. This
method stopped working as soon as that specific IP was identified
and blocked. These days, domain fluxing botnets are mostly in
practice. The idea is, using Dynamically Generation Algorithm
(DGA) to generate domains and use it to connect with C&C
server. Numerous researches have been done to detect DGA
botnets. These includes deriving features based on alphanumeric
distribution of DGA domains and performing classification on it.
Other studies include network logs analysis, time series analysis
etc. Most of these domain classification works rely upon the
features developed and may not work well if the botmaster
decides to generate domain with completely new features. We are
concerned with developing algorithm that is resilient to feature
change that also work well for domain generated by completely
new algorithm that was not seen before. We generated 16 bit
representation of domains using autoencoder and classified it as
benign or DGA generated using supervised learning(with neural
net and SVM). To make it work with previously unseen algorithm,
we tweaked our method with mean activation of 16-bit domain
representation. This helped improve classification accuracy for
completely new set of domain generation algorithm by up to
16%.
Keywords—DGA Botnets, AutoEncoder, Malicious domain De-
tection.
I. INTRODUCTION
The most prevalent malwares these days are the botnets.
Botnets are the network of infected computers called “Bot”
which communicate with a single Command & Control Servers
(C&C Servers) to perform malicious task such as DDoS
attacks, email spamming, click fraud etc. For the botnet to
be effective, it must connect with a single C&C server which
provide the instruction to perform specific nefarious activity.
Using the static IP for the C&C server might not be a good
idea, as that specific ip can be discovered and blacklisted.
Hence, these days botnets have a new way of communicating
with the C&C server. This new method employs “domain
fluxing” which is changing the domain names of the server
to avoid getting blacklisted.
The bots use some sort of Domain Generating Algorithm
(DGA) to generate large number of domains and tries to
connect with each of them. As Botmaster already knows the
set of domains that are generated by the bots, they can register
some of those domains and point them to the C&C server.
Once, the request sent from bot to any domain is resolved,
it then connects to the server pointed by that domain name
and start getting instructions. The domain name to the server
is periodically changed to avoid detection. This DGA based
botnets are resilient to ip blacklisting or sinkholing and hence
pose a serious challenge to network administrators. But as
mentioned above, each of these botnets send hundreds to
thousands of request to the domains they have generated. Only
few (one or two) domains are actually registered, so there are
lots of unresolved DNS response with NXDOMAINS. This
trend encouraged network security researchers to look into
network data and find if the host is infected with botnets.
Various attempts have been made to detect a DGA based
botnets based on the network activities. This involves finding
some sort of anomaly in a network behavior to conclude it is
infected. For example, DNS records of infected host contains
a lot of unresolved queries. This can be a symptom that it is
a part of botnet. Research have been done to analyze the time
sequence of a host activity. If there exists a specific pattern
like sending DNS queries on some fixed time intervals, this
can also signal that it is a part of botnet. Although, these
methods of identifying a botnet have yielded pretty good result,
it involves analyzing the complete DNS records which may not
be always available or it involves manually designing features.
If the botmaster decide to alter the botnet in some form,
previously formed method wont work well. Hence, we are
using new way of devising feature which is resilient to change
in the form of botnets. We propose to use deep learning to
detect if an individual domain is DGA generated or not. Deep
Neural Networks are renowned for automatically engineering
features based on the large number of training examples. This
developed deep network will be able to detect if a domain is
malignant for the new class of DGA algorithms as well.
II. RELATED WORKS
If we look from the perspective of intelligence of algorithm
used, botnet detection in general can be studied under two
broad categories. Those that doesn’t use deep learning and the
approaches that employ certain deep learning architectures.
Here, we review some of the recent trends of botnets and
various works in both of those categories that have been
proposed to tackle such botnets.
Since, static IP based bots can become ineffective once
the network administrator identifies the IP they are trying to