Contents

CEH-Module2 - Footprinting and Reconnaissance

Website Visitors:

Footprinting

Footprinting in cybersecurity refers to the process of gathering information about a target system or network to create a blueprint or map of its infrastructure and the organization’s digital footprint. It’s an essential phase in the pre-attack reconnaissance process used by ethical hackers, security professionals, or attackers to understand and analyze the potential vulnerabilities and weaknesses of a target.

Objectives of Footprinting

  1. Network Discovery: Identify active hosts, domains, IP ranges, network topology, and infrastructure details.

  2. System Identification: Gather information about operating systems, services, applications, and software versions running on the target systems.

  3. Organization Profiling: Collect data about the company’s employees, contact details, organizational hierarchy, and public-facing information available online.

Techniques Used in Footprinting

  1. Passive Footprinting: Gathering information from target without direct interaction. This is also referred as open source information gathering as we are not getting data from the victim directly.

    • Public Information Gathering: Collect information from public sources like company websites, social media, search engines, job postings, forums, and news articles.
    • DNS Enumeration: Discover domain names, subdomains, and associated IP addresses using DNS (Domain Name System) queries.
  2. Active Footprinting: gathering information from target with direct interaction.

    • Network Scanning: Use tools like Nmap to scan networks for live hosts, open ports, and services running on those ports.
    • Social Engineering: Gather information by directly interacting with employees or individuals associated with the organization, often through deceptive means.

Information Collected During Footprinting

  • Domain Names and Subdomains
  • IP Address Ranges and Network Blocks
  • Server Details and Operating Systems
  • Web Application Information
  • Email Addresses and Contacts
  • Employee Information and Job Postings
  • Physical Locations and Infrastructure Details

Footprinting helps in understanding the attack surface, identifying potential entry points, and assessing the security posture of an organization. Ethical hackers use this information to perform vulnerability assessments and penetration tests to strengthen security measures and mitigate potential risks.

Normally passive goes unnoticed as the target doesn’t know someone is doing it. Information can be organizational (employee details, telephone nos, location details), network (networks, domains, ip addresses, dns records), or system (os, usernames and pwds) related.

Footprint through search engines

Google hacking database

Google hacking, also named Google dorking, is a hacker technique that uses Google Search and other Google applications to find security holes in the configuration and computer code that websites are using. Google dorking is a form of hacking that uncovers hidden info in Google.

Google Dorking is used to find hidden information that is otherwise inaccessible through a normal Google search. Google dorks can reveal sensitive or private information about websites and the companies, organizations, and individuals that own and operate them.

Google hacking involves using operators in the Google search engine to locate specific sections of text on websites that are evidence of vulnerabilities, for example specific versions of vulnerable Web applications. A search query with intitle:admbook filetype:php would locate PHP web pages with the string “Fversion” in the title, indicating that the PHP based guestbook Admbook is used, an application with a known code injection vulnerability.

Google dorks hacking database: Exploit-db. This site has lot of dorks with some useful information. Search for the word “php” in this site and you ll find some php google dorks in exploit-db website. Next you can search them on google search.

More info here: Google Dorking for Pentersters - Freecodecamp

Top 35 google dorks: Top 35 Google Dorks List in 2023 - Box Piper

Some famous operators for Google Dorking:

  • Filetype is an operator that retrieves specific file types. For instance, filetype: log will show all the log type files. A few types can be searched by separating the extensions with “|”.

  • Intext is an operator that retrieves a specific text on a page.

  • Ext is an operator that retrieves files with a specific extension, similar to a filetype operator.

  • Inurl is an operator that retrieves URLs containing a specific sequence of characters.

  • Intitle is an operator that retrieves pages using a specific text in the page title.

  • Site is an operator that retrieves results from a specific site only. For instance, site:example.com will give results only from example.com and no other website.

  • If you use -site, results will not contain the site in search results. Ex: intitle:"techwithchay" -site:techwithchay.com This will show the sites having the word techwithchay in title but does not show techwithchay.com site in the results.

  • ‘*’ can be used in place of a missing word in the search query, to complete it.

  • cache: This operator allows you to view cached version of the web page. [cache:www.techwithchay.org]- Query returns the cached version of the website www.techwithchay.org. Cache is an operator that retrieves the cached (older) version of a website.

  • allinurl: This operator restricts results to pages containing all the query terms specified in the URL. [allinurl: techwithchay career]—Query returns only pages containing the words “techwithchay” and “career” in the URL. Ex: allinurl: network camera - lists pages containing the words network and camera. If you want to search with exact words like network camera then you need to add them in double quotes like: allinurl:“network camera”

  • inurl: This operator restricts the results to pages containing the word specified in the URL [inurl: copy site:www.techwithchay.org]—Query returns only pages in techwithchay site in which the URL has the word “copy”

  • allintitle: This operator restricts results to pages containing all the query terms specified in the title. [allintitle: detect malware]—Query returns only pages containing the words “detect” and “malware” in the title

  • inanchor: This operator restricts results to pages containing the query terms specified in the anchor text on links to the page. [Anti-virus inanchor:Norton]—Query returns only pages with anchor text on links to the pages containing the word “Norton” and the page containing the word “Anti-virus”

  • allinanchor: This operator restricts results to pages containing all query terms specified in the anchor text on links to the page. [allinanchor: best cloud service provider]—Query returns only pages in which the anchor text on links to the pages contain the words “best,” “cloud,” “service,” and “provider”

  • link: This operator searches websites or pages that contain links to the specified website or page. [link:www.techwithchay.org]—Finds pages that point to techwithchay’s home page

  • related: This operator displays websites that are similar or related to the URL specified. [related:www.techwithchay.org]—Query provides the Google search engine results page with websites similar to techwithchay.org

  • info: This operator finds information for the specified web page. [info:techwithchay.org]—Query provides information about the www.techwithchay.org home page

  • location: This operator finds information for a specific location. [location: techwithchay]—Query give you results based around the term techwithchay

Cache: link: related: info: site: allintitle: intitle: allinurl: inurl: location: are some of the popular google advanced search operators.

The beauty of Google Dorks is that operators can be combined to get even more accurate results. For instance, by using the operators “filetype” and “site” you can retrieve specific filetypes on a specific website.

You can use other video search engines such as Google videos(https://www.google.com/videohp), Yahoo videos (https://in.video.search.yahoo.com), etc.; video analysis tools such as EZGif (https://ezgif.com), VideoReverser.com (https://www.videoreverser.com) etc.; and reverse image search tools such as TinEye Reverse Image Search (https://tineye.com), Yahoo Image Search (https://images.search.yahoo.com), etc. to gather crucial information about the target organization.

You can also use FTP search engines such as FreewareWeb FTP File Search to gather crucial FTP information about the target organization or use NAPALM FTP indexer portal.

Finding domains and subdomains

Search engines like Google and bing provide external url details. Using sites like Netcraft.com we can find subdomains for a domain or url and also get full site report about websites using the sitereport option. You can also use tools such as Sublist3r (https://github.com), Pentest-Tools Find Subdomains (https://pentest-tools.com), etc. to identify the domains and sub-domains of any target website.

People search in social media sites and people search services

  • Social media sites like facebook and linkedin provide some information about the target. You can also use https://www.peekyou.com or Spokeo (https://www.spokeo.com), pipl (https://pipl.com), Intelius (https://www.intelius.com), BeenVerified (https://www.beenverified.com), etc., people search services to gather personal information of key employees in the target organization.
  • Theharvester tool: We can get email lists using theharvestor tool and “email spider” tools. theHarverstor –d Microsoft.com –l 100 –b linkedin searches for 100 people/emails/hosts for Microsoft.com in linkedin portal.
  • Sherlock is another tool for gathering the links where an individual has an account: python3 sherlock satyanadella searches for all urls where the username is satyanadella

You can also use tools such as Social Searcher (https://www.social-searcher.com), UserRecon (https://github.com), etc. to gather additional information related to the target company and its employees from social networking sites.

You can also use Spokeo (https://www.spokeo.com), pipl (https://pipl.com), Intelius (https://www.intelius.com), BeenVerified (https://www.beenverified.com), etc., people search services to gather personal information of key employees in the target organization.

People search from job posting sites like linkedin

From job description we can know which environment a specific company is using. Ex: If company is asking for netscaler admin with version 10, 11 and13.1 in netscalers, it means that company is using those version netscalers. With this information, hackers can know which versions/servers a company uses, and find the vulnerabilities and attack them.

Deep and Dark web footprinting

The deep web refers to all parts of the internet that are not indexed by traditional search engines, such as private databases, banking pages, password-protected websites, and other unindexed content. The dark web is a small portion of the deep web that is intentionally hidden and often associated with illegal activities.

You can use normal browser to access deep web sites but you need special browser like TOR to access dark web sites.

Shondan & Censys

Shodan indexes everything that is connected to internet. Shodan is a search engine that lets users search for various types of servers (webcams, routers, servers, etc.) connected to the internet using a variety of filters. Some have also described it as a search engine of service banners, which are metadata that the server sends back to the client. This can be information about the server software, what options the service supports, a welcome message or anything else that the client can find out before interacting with the server.

Shodan collects data mostly on web servers (HTTP/HTTPS – ports 80, 8080, 443, 8443), as well as FTP (port 21), SSH (port 22), Telnet (port 23), SNMP (port 161), IMAP (ports 143, or (encrypted) 993), SMTP (port 25), SIP (port 5060), and Real Time Streaming Protocol (RTSP, port 554). The latter can be used to access webcams and their video streams.

Censys is a search engine (search.censys.io) that scans the Internet searching for devices and return aggregate reports on how resources (i.e. Devices, websites, and certificates) are configured and deployed.

Website Footprinting

Photon is a very good tool for gathering website’s internal links, external links and files links.

  • Python3 photon.py –u websiteURL. Ex: Pyton3 photon.py –u www.certifiedhacker.com

  • Python3 photon.py –u www.certifiedhacker.com –l 3 –t 200 --wayback

    Note: - -u: specifies the target website (here, www.certifiedhacker.com)

    -l: specifies level to crawl (here, 3)

    -t: specifies number of threads (here, 200)

    --wayback: specifies using URLs from archive.org as seeds

You can further explore the Photon tool and perform various other functionalities such as the cloning of the target website, extracting secret keys and cookies, obtaining strings by specifying regex pattern, etc. Using this information, the attackers can perform various attacks on the target website such as brute-force attacks, denial-of-service attacks, injection attacks, phishing attacks and social engineering attacks.

Footprinting and recon are getting the basic information about the target. Website footprinting mainly consists of getting the Software and the version, OS and its version, sub directories and paths, technologies used etc.. HTML source code may also give you some information about the vulnerabilities that the website has. Also examining cookies can also provide us some information about the website.

Attackers use tools like Photon to retrieve archived URLs of a target website from archive.org.

Website Footprinting Using Ping Tool

You can find the maximum frame size on the network using ping. Ping a url and you’d get the ip, time to live, packets sent received etc.

Run ping www.certifiedhacker.com –f –l 1500 command. Here, -f: Specifies no fragmenting flag in packet, -l: Specifies buffer size. In the output it says “Packet needs to be fragmented but DF set” which means, the frame size is high but we specified -f so it wont fragment the packet. Reduce the number until you get ping response. At 1472 value we got ping response ie., ping www.certifiedhacker.com –f –l 1472 gives us ping response which means 1472 is the maximum frame size.

The maximum value you can set for TTL is 255.

Gathering Website Information Using Central Ops

Goto centralops.net url and enter a website name. It will list dns, ipv4, ipv4, whois domain, whois network etc details. You can also use tools such as Website Informer (https://website.informer.com), Burp Suite (https://portswigger.net), Zaproxy (https://www.zaproxy.org), etc. to perform website footprinting on a target website.

Website Footprinting Using Web Spiders

Web data extraction is the process of extracting data from web pages available on the company’s website. A company’s data such as contact details (email, phone, and fax), URLs, meta tags (title, description, keyword) for website promotion, directories, web research, etc. are important sources of information for an ethical hacker. Web spiders (also known as a web crawler or web robot) such as Web Data Extractor perform automated searches on the target website and extract specified information from the target website.

Tools like web data extractor and parsehub perform automated searches on target website and collects information such as employee name, emp id etc.. Attackers get the emp name and id and start attack. Google, yahoo and bing search sites do the same called crawling.

User-directed spidering: Attackers use standard web browsers to walk through the target web site functionalities. Incoming and outgoing traffic of target website is monitored. Attackers use tools like burp suite and webscarab to perform user directed spidering.

Mirror Entire Website

When a website is mirrored, it allows attacker to go through the directory structure and other information. You can use mirroring tools such as winhttrack website copier or Cyotek WebCopy, etc. to mirror a target website.

Gather Website Information Using Grecon

GRecon is a Python tool that can be used to run Google search queries to perform reconnaissance on a target to find subdomains, sub-subdomains, login pages, directory listings, exposed documents, and WordPress entries, pasting sites and displays the results.

Python3 grecon.py – This will run and prompt you to enter website.

Note: It will take approximately 5 minutes to complete the search.

Gather a Wordlist from the Target Website using CeWL

CeWL is a ruby app that is used to spider a given target URL to a specified depth, optionally following external links, and returns a list of unique words that can be used for cracking passwords.

CEWL, which stands for “Custom Wordlist Generator,” is a command-line tool used for creating custom wordlists or dictionaries for password cracking and other security testing purposes. It is typically used by cybersecurity professionals, penetration testers, and ethical hackers to gather potential passwords or target-specific words based on a given website or text source. The tool is written in Ruby and can be run on various operating systems, including Linux and Windows.

Here’s how CEWL works and what it can do:

  1. Web scraping: CEWL primarily operates by scraping a target website or a specified URL. It extracts text from web pages, including page titles, headers, body text, and metadata.
  2. Customization: Users can customize the tool to specify how many words or characters they want to extract from the target source. You can also set depth levels to scrape links on the webpage and follow them to gather additional content.
  3. Filtering: CEWL allows users to filter the extracted words based on specific criteria. For example, you can exclude common words (stop words), filter out numbers, and apply regular expressions to refine the wordlist.
  4. Output formats: CEWL provides options to save the extracted words in different formats, such as plain text, CSV, or custom formats. The generated wordlist can then be used with various password cracking tools or for other security testing purposes.
  5. Usage scenarios: CEWL is often used during penetration testing or security assessments to perform dictionary attacks or brute-force attacks on password-protected systems. By creating custom wordlists tailored to the target’s characteristics, security professionals can increase the efficiency of these attacks.

Ex: cewl -d 2 -m 5 https://www.certifiedhacker.com and press Enter. Note: -d represents the depth to spider the website (here, 2) and -m represents minimum word length (here, 5).

Extracting information from Archive.org

Archive.org, or the Internet Archive, is a nonprofit digital library offering access to historical collections of websites, books, videos, music, software, and more. It’s famous for the Wayback Machine, letting users explore past versions of web pages.

Tracking email communications

You can get the ip address of sender in the email header. You ll have victim’s internal and external address in email header. Emailtracker pro, infoga, mailtrack and politemail are some of the tools that allow an attacker to track email and extract information.

WHOis Lookup

Whois databases are maintained by regional internet registries and contains personal information of domain users if they are not privacy enabled. https://whois.domaintools.com/ or by installing whois tool in your parrot os. You can also use other Whois lookup tools such as SmartWhois (https://www.tamos.com), Batch IP Converter (http://www.sabsoft.com), etc. to extract additional target Whois information.

Extracting DNS information

DNS records provide info about location and type of servers. Based on this info, attackers can perform social engineering attacks. A, MX, NS, CNAME,SOA,SRV,PTR,RP,HINFO,TXT are some of the popular dns records. Create an account in Securitytrails.com and look for dns records for websites. You can also use DNSChecker (https://dnschecker.org), and DNSdumpster (https://dnsdumpster.com), etc. to perform DNS footprinting on a target website.

When you run nslookup directly, it might use a non-authoritative dns server. When you ping or open a site in browser if the response is coming from the dns server configured in your local machine, but not the server that legitimately hosts the website; it is considered to be a non-authoritative answer.

In the command prompt enter nslookup. In the nslookup terminal, type set type=a. This will provide ip address for the domain you enter.

To get the authorative dns server, type nslookup. In the nslookup shell type set type=cname command. This will ensure that nslookup gets the info from authoritative dns server directly. Copy the dns server hostname.

Again, in the same nslookup terminal, enter set type=a and enter the dns server fqdn. If you do not enter type=a, it might not provide ip address in the results when you enter a dns server fqdn.

set type=a - provides ip for the fqdn you enter.

set type=cname - provides the authoritative dns server details.

You can also run nslookup online from this site: http://www.kloth.net/services/nslookup.php. You can also use DNS lookup tools such as DNSdumpster (https://dnsdumpster.com), DNS Records (https://network-tools.com), etc. to extract additional target DNS information.

In nslookup you have below values for settype option. These are only few options there are lot more.The default resource record type is A.

A: Specifies a computer’s IP address.

ANY: Specifies a computer’s IP address.

CNAME: Specifies a canonical name for an alias. When you use set type=cname, this cname lookup is done directly against that website/domain name’s authoritative name server. cname is used to point a dns name like www.example.com to another dns name like apps.example.net. Alias is used to create another name for existing dns entry.

MX: Specifies the mail exchanger.

NS: Specifies a DNS name server for the named zone.

PTR: Specifies a computer name if the query is an IP address; otherwise, specifies the pointer to other information.

SOA: Specifies the start-of-authority for a DNS zone.

TXT: Specifies the text information.

UID: Specifies the user identifier.

Reverse DNS

DNS lookup is used for finding the IP addresses for a given domain name, and the reverse DNS operation is performed to obtain the domain name of a given IP address. Dnsrecon.py –r 162.241.216.0-162.241.216.255 (specify ip range)

You can also use DNSChecker (https://dnschecker.org), and DNSdumpster (https://dnsdumpster.com), etc. to perform DNS footprinting on a target website.

Network Range

If you get an ip address of a web server, try to scan the network range, you might have few other app/web servers running on other ips. Using https://search.arin.net website, find details about ip address.

Traceroute

Traceroute finds the ip addresses of intermediate devices such as routers, firewalls present b/w source and destination. Traceroute works on ICMP and use the TTL field in header of ICMP packets to discover routers on path to a target host. Graphical trace route tools are also available: Graphical trace route and tracerouteng. You can also use other traceroute tools such as VisualRoute (http://www.visualroute.com), Traceroute NG (https://www.solarwinds.com), etc. to extract additional network information of the target organization.

Social Engineering

It is an art of exploiting human behaviour to extract confidential information. It includes eavesdropping (unauthorized listening of conversations), shoulder surfing (secretly looking over shoulder when typing pwd etc), dumpster diving (looking for information in trash, printer bins, user desks for any sticky notes), impersonation (pretending to be legitimate or authorized person) are examples of social engineering.

Footprinting tools: Maltego and Recon-ng

Maltego is used to determine relationships using real world links between people, group of people, organizations, websites, internet infrastructure, documents etc. – VERY good/powerful tool. It maps just like visio but in realtime.

Recon-ng is web reconnaissance framework where reconnaissance can be conducted. We can use Recon-ng to perform network reconnaissance, gather personnel information, and gather target information from social networking sites. Run recon-ng command. Create workspace, add domain, load the module that you need for ex: contacts associated to a domain or creating a html report etc.. and use the run command to perform the operation.

Footprinting tools: FOCA and OSRFramework

FOCA: find metadata and hidden information in the documents it scans and OSRFramework. FOCA (Fingerprinting Organizations with Collected Archives) is a tool that reveals metadata and hidden information in scanned documents. These documents are searched for using three search engines: Google, Bing, and DuckDuckGo. The results from the three engines amounts to a lot of documents. FOCA examines a wide variety of records, with the most widely recognized being Microsoft Office, Open Office and PDF documents. It may also work with Adobe InDesign or SVG files. These archives may be on-site pages and can be downloaded and dissected with FOCA.

OSRFramework:

It includes apps related to username checks, dns lookups, information leaks research, deep web search etc..

OSRFramework is a set of libraries that are used to perform Open Source Intelligence tasks. They include references to many different applications related to username checking, DNS lookups, information leaks research, deep web search, regular expressions extraction, and many others. It also provides a way of making these queries graphically as well as several interfaces to interact with such as OSRFConsole or a Web interface.

  • Domainfy – gathers different domain names and their ip addresses from the domain word specified. Domainfy –n techwithchay –t all
  • Searchfy for searching usernames: searchfy –q “Tim cook”
  • There are few other osrframework tools like
    • Usufy - Gathers registered accounts with given usernames.
    • Mailfy – Gathers information about email accounts
    • Phonefy – Checks for the existence of a given series of phones
    • Entify – Extracts entities using regular expressions from provided URLs

BillCypher

BillCipher is an information gathering tool for a Website or IP address. Using this tool, you can gather information such as DNS Lookup, Whois lookup, GeoIP Lookup, Subnet Lookup, Port Scanner, Page Links, Zone Transfer, HTTP Header, etc. Here, we will use the BillCipher tool to footprint a target website URL.

OSINT framework website

OSINT Framework is an open source intelligence gathering framework that helps security professionals for performing automated footprinting and reconnaissance, OSINT research, and intelligence gathering. It is focused on gathering information from free tools or resources. This framework includes a simple web interface that lists various OSINT tools arranged by category and is shown as an OSINT tree structure on the web interface.

Your inbox needs more DevOps articles.

Subscribe to get our latest content by email.