Information gathering
Sources for these notes
- Hack The Box: Penetration Testing Learning Path
- INE eWPT2 Preparation course
- OWASP Web Security Testing Guide 4.2 > 1. Information Gathering
- My own notes coming from experience pentesting.
Methodology
Information gathering is typically broken down into two types:
- Passive information gathering - Involves gathering as much information as possible without actively engaging with the target.
- Active information gathering/Enumeration - Involves gathering as much information as possible by actively engaging with the target system. (You will require authorization in order to perform active information gathering).
Passive Information Gathering | Active Information Gathering/Enumeration |
---|---|
Identifying domain names and domain ownership information. | Identify website content structure. |
Discovering hidden/disallowed files and directories. | Downloading & analyzing website/web app source code. |
Identifying web server IP addresses & DNS records. | Port scanning & service discovery. |
Identifying web technologies being used on target sites. | Web server fingerprinting. |
WAF detection. | Web application scanning. |
Identifying subdomains. | DNS Zone Transfers. |
Identify website content structure. | Subdomain enumeration via Brute-Force. |
1. Passive information gathering
1.1. Fingerprint Web Server
Or Passive server enumeration.
OWASP Web Security Testing Guide 4.2 > 1. Information Gathering > 1.2. Fingerprint Web Server
ID | Link to Hackinglife | Link to OWASP | Objectives |
---|---|---|---|
1.2 | WSTG-INFO-02 | Fingerprint Web Server | - Determine the version and type of a running web server to enable further discovery of any known vulnerabilities. |
host command
DNS lookup utility.
whois command
WHOIS is a query and response protocol that is used to query databases that store the registered users or organizations of an internet resource like a domain name or an IP address block.
WHOIS lookups can be performed through the command line interface via the whois client or through some third party web-based tools to lookup the domain ownership details from different databases.
netcraft
Netcraft can offer us information about the servers without even interacting with them, and this is something valuable from a passive information gathering point of view. We can use the service by visiting https://sitereport.netcraft.com
and entering the target domain. We need to pay special attention to the latest IPs used. Sometimes we can spot the actual IP address from the webserver before it was placed behind a load balancer, web application firewall, or IDS, allowing us to connect directly to it if the configuration.
More issues fired up by netcraft: cms, server programming,...
censys
Shodan
Wayback machine
We can access several versions of these websites using the Wayback Machine to find old versions that may have interesting comments in the source code or files that should not be there.
We can also use the tool waybackurls to inspect URLs saved by Wayback Machine and look for specific keywords. Installation:
Basic usage:
1.2. Passive DNS enumeration
A valuable resource for this information is the Domain Name System (DNS). We can query DNS to identify the DNS records associated with a particular domain or IP address.
Some if these tools can also be used in Active DNS enumerations.
Worth trying: DNSRecon and https://domain.glass/
Tool + Cheat sheet | What it does |
---|---|
Google dorks | Google hacking, also named Google dorking, is a hacker technique that uses Google Search and other Google applications to find security holes in the configuration and computer code that websites are using. |
crt.sh | It collects information about SSL certificates. If you visit a domain and it contains a certificate you can extract other subdomain by using the View Certificate functionality. |
dnscan | Python wordlist-based DNS subdomain scanner. |
DNSRecon | Preinstalled with Linux: dsnrecon is a simple python script that enables to gather DNS-oriented information on a given target. |
dnsdumpster.com | DNSdumpster.com is a FREE domain research tool that can discover hosts related to a domain. Finding visible hosts from the attackers perspective is an important part of the security assessment process. |
https://domain.glass/ | |
viewdns.info | |
domaintools | |
1.3. Reviewing server metafiles
OWASP Web Security Testing Guide 4.2 > 1. Information Gathering > 1.5. Review Webpage content for Information Leakage
ID | Link to Hackinglife | Link to OWASP | Objectives |
---|---|---|---|
1.5 | WSTG-INFO-05 | Review Webpage Content for Information Leakage | - Review webpage comments, metadata, and redirect bodies to find any information leakage. - Gather JavaScript files and review the JS code to better understand the application and to find any information leakage. - Identify if source map files or other front-end debug files exist. |
Some of these files:
- robots.txt
- sitemap.xml
- security.txt (proposed standard which allows websites to define security policies and contact details.)
- human.txt (initiative for knowing the people behind a website.)
1.4. Conduct search Search Engine Discovery
OWASP Web Security Testing Guide 4.2 > 1. Information Gathering > 1.1. Conduct search engine discovery reconnaissance for information leakage
ID | Link to Hackinglife | Link to OWASP | Objectives |
---|---|---|---|
1.1 | WSTG-INFO-01 | Conduct Search Engine Discovery Reconnaissance for Information Leakage | - Identify what sensitive design and configuration information of the application, system, or organization is exposed directly (on the organization's website) or indirectly (via third-party services). |
1.5. Cloud resources
Buckets, blob, ...: https://buckets.grayhatwarfare.com/
Greyhat Warfare
Monitoring breaches in buckets and storage accounts in the cloud.
Trufflehog
A tool for continuously monitoring Git, Jira, Slack, Confluence, Microsoft Teams, Sharepoint, and more. Trufflehog
1.6. Fingerprint web application technology and frameworks
OWASP Web Security Testing Guide 4.2 > 1. Information Gathering > 1.8. Fingerprint Web Application Framework
ID | Link to Hackinglife | Link to OWASP | Objectives |
---|---|---|---|
1.8 | WSTG-INFO-08 | Fingerprint Web Application Framework | - Fingerprint the components being used by the web applications. - Find the type of web application framework/CMS from HTTP headers, Cookies, Source code, Specific files and folders, Error message. |
If we discover the webserver behind the target application, it can give us a good idea of what operating system is running on the back-end server.
For instance:
- IIS 6.0: Windows Server 2003
- IIS 7.0-8.5: Windows Server 2008 / Windows Server 2008R2
- IIS 10.0 (v1607-v1709): Windows Server 2016
- IIS 10.0 (v1809-): Windows Server 2019
Although this is usually correct when dealing with Windows, we can not be sure in the case of Linux or BSD-based distributions as they can run different web server versions
How to spot a web server?
HTTP headers
X-Powered-By and cookies:
- .NET: ASPSESSIONID<RANDOM>=<COOKIE_VALUE>
- PHP: PHPSESSID=<COOKIE_VALUE>
- JAVA: JSESSION=<COOKIE_VALUE>
More manual techniques on OWASP 4.2: WSTG-INFO-08
Banner Grabbing / Web Server Headers
whatweb
whatweb**.
Wappalyzer
wafw00f
wafw00f**:
Aquatone
BuiltWith
Addons BuiltWith: BuiltWith® covers 93,551+ internet technologies which include analytics, advertising, hosting, CMS and many more.
Curl
Curl:
nmap
nmap:
1.7. WAF detection
wafw00f
wafw00f**:
nmap
nmap:
1.8. Code analysis: HTTRack and EyeWitness
OWASP Web Security Testing Guide 4.2 > 1. Information Gathering > 1.7. Map Execution Paths through applications
ID | Link to Hackinglife | Link to OWASP | Objectives |
---|---|---|---|
1.7 | WSTG-INFO-07 | Map Execution Paths Through Application | - Map the target application and understand the principal workflows. - Use HTTP(s) Proxy Spider/Crawler feature aligned with application walkthrough |
HTTRack
Create a folder for replicating in it your target.
Interactive mode:
EyeWitness
First, create a file with the target domains, like for instance, listOfdomains.txt.
Then, run:
After that you will get a report.html file with the request and a screenshot of those domains.
1.9. Crawlers
Crawling is the process of navigating around the web application, following links, submitting forms and logging in (where possible) with the objective of mapping out and cataloging the web application and the navigational paths within it.
Crawling is typically passive as engagement with the target is done via what is publicly accessible, we can utilize Burp Suite’s passive crawler to help us map out the web application to better understand how it is setup and how it works.
- BurpSuite Community edition has only Crawler feature available. For spidering, you need Pro edition. Burp Suite Spider
- OWASP ZAP (Zed Attack Proxy): ZAP is a free, open-source web application security scanner. It can be used in automated and manual modes and includes a spider component to crawl web applications and identify potential vulnerabilities. has both Spider and Crawler features available.
Scrapy (Python Framework)
: Scrapy is a versatile and scalable Python framework for building custom web crawlers. It provides rich features for extracting structured data from websites, handling complex crawling scenarios, and automating data processing. Its flexibility makes it ideal for tailored reconnaissance tasks.Apache Nutch (Scalable Crawler)
: Nutch is a highly extensible and scalable open-source web crawler written in Java. It's designed to handle massive crawls across the entire web or focus on specific domains. While it requires more technical expertise to set up and configure, its power and flexibility make it a valuable asset for large-scale reconnaissance projects.
Scrapy
Scrapy (Python Framework)
: Scrapy is a versatile and scalable Python framework for building custom web crawlers. It provides rich features for extracting structured data from websites, handling complex crawling scenarios, and automating data processing. Its flexibility makes it ideal for tailored reconnaissance tasks.
ReconSpider
After running ReconSpider.py
, the data will be saved in a JSON file, results.json
. This file can be explored using any text editor.
2. Active information gathering
2.1. Enumerate applications and services on Webserver
OWASP Web Security Testing Guide 4.2 > 1. Information Gathering > 1.4. Enumerate Applications on Webserver
ID | Link to Hackinglife | Link to OWASP | Objectives |
---|---|---|---|
1.4 | WSTG-INFO-04 | Enumerate Applications on Webserver | - Enumerate the applications within the scope that exist on a web server. - Find applications hosted in the webserver (Virtual hosts/Subdomain), non-standard ports, DNS zone transfers |
Hostname discovery
Scanning the IP looking for services:
2.2. Web Server Fingerprinting
OWASP Web Security Testing Guide 4.2 > 1. Information Gathering > 1.2. Fingerprint Web Server
ID | Link to Hackinglife | Link to OWASP | Objectives |
---|---|---|---|
1.2 | WSTG-INFO-02 | Fingerprint Web Server | - Determine the version and type of a running web server to enable further discovery of any known vulnerabilities. |
HTTP headers and source code
HTTP headers and HTML Source code (with Burpsuite and curl). Or CRTL-u on the browser to see the source code.
- Note the response header
Server
,X-Powered-By
, orX-Generator
as well. - Identify framework specific cookies. For instance, the cookie
CAKEPHP
for php. - Review the source code and identify
<meta>
or attributes with typical patterns from some servers (and/or frameworks).
nmap
Conduct an scan
If a server version found is potentially vulnerable, use searchsploit:
metasploit
Additionally you can use metasploit:
whatweb
Nikto
2.3. # Well-Known URIs
OWASP
OWASP Web Security Testing Guide 4.2 > 1. Information Gathering > 1.3. Review Webserver Metafiles for Information Leakage
ID | Link to Hackinglife | Link to OWASP | Objectives |
---|---|---|---|
1.3 | WSTG-INFO-03 | Review Webserver Metafiles for Information Leakage | - Identify hidden or obfuscated paths and functionality through the analysis of metadata files (robots.txt, <META> tag, sitemap.xml) - Extract and map other information that could lead to a better understanding of the systems at hand. |
The .well-known
standard, defined in RFC 8615, serves as a standardized directory within a website's root domain. This designated location, typically accessible via the /.well-known/
path on a web server, centralizes a website's critical metadata, including configuration files and information related to its services, protocols, and security mechanisms.
URI Suffix | Description | Status | Reference |
---|---|---|---|
security.txt |
Contains contact information for security researchers to report vulnerabilities. | Permanent | RFC 9116 |
/.well-known/change-password |
Provides a standard URL for directing users to a password change page. | Provisional | https://w3c.github.io/webappsec-change-password-url/#the-change-password-well-known-uri |
openid-configuration |
Defines configuration details for OpenID Connect, an identity layer on top of the OAuth 2.0 protocol. | Permanent | http://openid.net/specs/openid-connect-discovery-1_0.html |
assetlinks.json |
Used for verifying ownership of digital assets (e.g., apps) associated with a domain. | Permanent | https://github.com/google/digitalassetlinks/blob/master/well-known/specification.md |
mta-sts.txt |
Specifies the policy for SMTP MTA Strict Transport Security (MTA-STS) to enhance email security. | Permanent | RFC 8461 |
2.4. Directory/File enumeration
nmap
dirb
gobuster
Ffuf
Ffuf:
Wfuzz
feroxbuster
amass
Some flags:
Spidering with OWASP ZAP
Spidering is an active technique. It's the process of automatically discovering new resources (URLs) on a web application/site. It typically begins with a list of target URLs called seeds, after which the spider will visit the URLs and identified hyperlinks in the page and adds them to the list of URLs to visit and repeats the process recursively.
Spidering can be quite loud and as a result, it is typically considered to be an active information gathering technique.
We can utilize OWASP ZAP’s Spider to automate the process of spidering a web application to map out the web application and learn more about how the site is laid out and how it works.
BurpSuite Community edition has only Crawler feature available. For spidering, you need Pro edition.
OWASP Zap has both Spider and Crawler features available.
2.5. Active DNS enumeration
Domain Name System (DNS) is a protocol that is used to resolve domain names/hostnames to IP addresses. During the early days of the internet, users would have to remember the IP addresses of the sites that they wanted to visit, DNS resolves this issue by mapping domain names (easier to recall) to their respective IP addresses.
A DNS server (nameserver) is like a telephone directory that contains domain names and their corresponding IP addresses. A plethora of public DNS servers have been set up by companies like Cloudflare (1.1.1.1) and Google (8.8.8.8). These DNS servers contain the records of almost all domains on the internet.
DNS interrogation is the process of enumerating DNS records for a specific domain. The objective of DNS interrogation is to probe a DNS server to provide us with DNS records for a specific domain. This process can provide us with important information like the IP address of a domain, subdomains, mail server addresses etc.
More about DNS enumeration.
Tool + Cheat sheet | What it does |
---|---|
dnsenum | multithreaded perl script to enumerate DNS information of a domain and to discover non-contiguous ip blocks. |
dig | discover non-contiguous ip blocks. |
fierce | DNS scanner that helps locate non-contiguous IP space and hostnames. |
dnscan | Python wordlist-based DNS subdomain scanner. |
gobuster | For brute force enumerations. |
nslookup | . |
amass | In depth DNS Enumeration and network mapping. |
dnsenum
dnsenum Multithreaded perl script to enumerate DNS information of a domain and to discover non-contiguous ip blocks. Used for active fingerprinting:
One cool thing about dnsenum is that it can perform dns transfer zone, like [dig]](dig.md). dnsenum performs DNS brute force with /usr/share/dnsenum/dns.txt.
dig
Additionally, see dig axfr.
dig (More complete cheat sheet: dig)
Fierce
Fierce (More complete cheat sheet: fierce)
DNScan
DNScan (More complete cheat sheet: DNScan): Python wordlist-based DNS subdomain scanner. The script will first try to perform a zone transfer using each of the target domain's nameservers.
gobuster
gobuster (More complete cheat sheet: gobuster)
nslookup
nslookup (More complete cheat sheet: nslookup)
2.6. Subdomain enumeration
Subdomain enumeration
is the process of systematically identifying and listing these subdomains. From a DNS perspective, subdomains are typically represented by A
(or AAAA
for IPv6) records, which map the subdomain name to its corresponding IP address. Additionally, CNAME
records might be used to create aliases for subdomains, pointing them to other domains or subdomains.
Using Sec wordlist:
Sublist3r
Sublist3r enumerates ubdomains using many search engines such as Google, Yahoo, Bing, Baidu and Ask. Sublist3r also enumerates subdomains using Netcraft, Virustotal, ThreatCrowd, DNSdumpster and ReverseDNS. Easily blocked by Google.
fierce
gobuster
wfuzz
dnsenum
Using dnsenum.
Bash script with dig and seclist
Bash script, using Sec wordlist:
2.7. VHOST enumeration
A virtual host (vHost
) is a feature that allows several websites to be hosted on a single server. At the core of virtual hosting
is the ability of web servers to distinguish between multiple websites or applications sharing the same IP address. This is achieved by leveraging the HTTP Host
header, a piece of information included in every HTTP
request sent by a web browser. The key difference between VHosts
and subdomains
is their relationship to the Domain Name System (DNS)
and the web server's configuration.
There are 3 ways to configure virtual hosts:
Name-Based Virtual Hosting
: This method relies solely on theHTTP Host header
to distinguish between websites. It is the most common and flexible method, as it doesn't require multiple IP addresses. It requires the web server to support name-basedvirtual hosting
and can have limitations with certain protocols likeSSL/TLS
.IP-Based Virtual Hosting
: This type of hosting assigns a unique IP address to each website hosted on the server. The server determines which website to serve based on the IP address to which the request was sent. It doesn't rely on theHost header
, can be used with any protocol, and offers better isolation between websites. Still, it requires multiple IP addresses, which can be expensive and less scalable.Port-Based Virtual Hosting
: Different websites are associated with different ports on the same IP address. For example, one website might be accessible on port 80, while another is on port 8080.Port-based virtual hosting
can be used when IP addresses are limited, but it’s not as common or user-friendly asname-based virtual hosting
and might require users to specify the port number in the URL.
In essence, the Host
header functions as a switch, enabling the web server to dynamically determine which website to serve based on the domain name requested by the browser.
vHost Fuzzing
vHost Fuzzing with ffuf
vHost Fuzzing with gobuster
There are a couple of things you need to prepare to brute force Host
headers:
Target Identification
: First, identify the target web server's IP address. This can be done through DNS lookups or other reconnaissance techniques.Wordlist Preparation
: Prepare a wordlist containing potential virtual host names. You can use a pre-compiled wordlist, such as SecLists, or create a custom one based on your target's industry, naming conventions, or other relevant information.
and feroxbuster.
2.8. Certificate enumeration
SSL/TLS certificates are another potentially valuable source of information if HTTPS is in use (for instance, in gathering information to prepare a phising attack).
sslyze and sslabs
For this we can use: - sslyze - ssllabs by Qalys - https://ciphersuite.info.
nmap
Also, you can use a script for nmap:
virustotal
crt.sh with curl
crt.sh: it enables the verification of issued digital certificates for encrypted Internet connections. This is intended to enable the detection of false or maliciously issued certificates for a domain.
censys.io
https://censys.io: We can navigate to https://search.censys.io/certificates or https://crt.sh and introduce the domain name of our target organization to start discovering new subdomains.
The Harvester
The Harvester: simple-to-use yet powerful and effective tool for early-stage penetration testing and red team engagements. We can use it to gather information to help identify a company's attack surface. The tool collects emails
, names
, subdomains
, IP addresses
, and URLs
from various public data sources for passive information gathering. It has modules.
Automate the modules we want to launch:
1. Create a list of sources, one per line, sources.txt.
2. Execute:
3. When the process finishes, extract all the subdomains found and sort them:
4. Merge all the passive reconnaissance files:
Shodan
Shodan: Once we see which hosts can be investigated further, we can generate a list of IP addresses with a minor adjustment to the cut
command and run them through Shodan
.
With this we'll get an IP list, that we can use to search for DNS records.
Last update: 2024-11-17 Created: May 22, 2023 16:32:18