Skip to content

Public Data Sources

Leveraging public search engines and databases for passive reconnaissance.

Shodan

The search engine for Internet-connected devices.

Installation

# Install CLI
pip install shodan

# Set API key
shodan init YOUR_API_KEY

Basic Searches

# Search for domain
shodan search hostname:target.com

# Search for organization
shodan search org:"Target Company"

# Search by IP
shodan host 8.8.8.8

# Search for specific port
shodan search port:23

# Search for product
shodan search product:"Apache"

# Count results without downloading
shodan count apache

Advanced Queries

# Multiple filters
shodan search "apache city:London country:GB"

# Exclude results
shodan search apache -nginx

# Specific HTTP title
shodan search http.title:"Dashboard"

# SSL certificate information
shodan search ssl:"target.com"

# Favicon hash (find similar sites)
shodan search http.favicon.hash:12345678

# Server headers
shodan search http.server:"nginx/1.10.0"

# Specific HTML content
shodan search http.html:"admin login"

# Products with known vulnerabilities
shodan search product:"Apache Struts" version:"2.3.5"

# Search by ASN
shodan search asn:AS15169

# Before/after date
shodan search apache before:01/01/2020

Common Shodan Queries

# Default credentials
shodan search "default password"
shodan search "admin:admin"

# Webcams
shodan search "webcamxp"
shodan search "Server: SQ-WEBCAM"

# Industrial control systems
shodan search "SCADA"
shodan search "port:502"  # Modbus
shodan search "port:47808"  # BACnet

# Databases
shodan search "MongoDB Server Information"
shodan search "product:MySQL"
shodan search "port:5432"  # PostgreSQL
shodan search "product:Redis"

# Network infrastructure
shodan search "cisco"
shodan search "mikrotik"
shodan search "juniper"

# Printers
shodan search "HP LaserJet"
shodan search "port:9100"

# VNC (often no password)
shodan search "authentication disabled" port:5900

# Remote Desktop
shodan search "port:3389 country:US"

# Open Docker APIs
shodan search "Docker port:2375"

# Elasticsearch
shodan search "port:9200 elastic"

# Jenkins
shodan search "X-Jenkins"

# Kubernetes
shodan search "port:10250"

Shodan Filters

Filter Description Example
city City name city:"Los Angeles"
country Country code country:US
geo Coordinates geo:"34.0522,-118.2437"
hostname Hostname hostname:target.com
net CIDR net:192.168.1.0/24
os Operating system os:"Windows 10"
port Port number port:22
before/after Date filter before:01/01/2020
asn ASN number asn:AS15169
org Organization org:"Google"
isp ISP name isp:"Comcast"
product Product name product:"Apache"
version Product version version:"2.4.1"
ssl SSL certificate ssl:"target.com"
http.title Page title http.title:"Login"
http.status Status code http.status:200

Shodan API Usage

#!/usr/bin/env python3
import shodan

api = shodan.Shodan('YOUR_API_KEY')

# Search
results = api.search('apache')
print(f"Results: {results['total']}")
for result in results['matches']:
    print(f"IP: {result['ip_str']}")
    print(f"Port: {result['port']}")
    print(f"Banner: {result['data']}\n")

# Host information
host = api.host('8.8.8.8')
print(f"IP: {host['ip_str']}")
print(f"Org: {host.get('org', 'N/A')}")
print(f"OS: {host.get('os', 'N/A')}")

Censys

Alternative to Shodan with focus on certificate data.

Installation

# Install CLI
pip install censys

# Configure API key
censys config

# Or set environment variables
export CENSYS_API_ID="your-api-id"
export CENSYS_API_SECRET="your-api-secret"

Basic Searches

# Search for IPs
censys search "target.com" --index-type ipv4

# Certificate search
censys search "parsed.subject_dn:target.com" --index-type certificates

# Specific port
censys search "services.port:22" --index-type ipv4

# Specific service
censys search "services.service_name:HTTP" --index-type ipv4

Advanced Queries

# Find specific software
censys search "services.software.vendor:Apache" --index-type ipv4

# SSL certificates issued for domain
censys search "names:target.com" --index-type certificates

# Expired certificates
censys search "parsed.validity.end:[* TO 2020-01-01]" --index-type certificates

# Self-signed certificates
censys search "tags:self-signed" --index-type certificates

# Specific cipher suites
censys search "services.tls.cipher_suite.name:TLS_RSA_WITH_RC4_128_SHA" --index-type ipv4

Web Interface Queries

Visit: https://search.censys.io/

# Search syntax (in web interface)
services.port:80
services.http.response.headers.server:"nginx/1.10.0"
autonomous_system.asn:15169
location.country_code:"US"
labels:"scada"

ZoomEye

Chinese alternative with good coverage in APAC region.

Web Interface

Visit: https://www.zoomeye.org/

# Search examples
app:"Apache httpd"
country:"CN"
port:22
ip:"192.168.1.1"
hostname:"target.com"
service:"ssh"
os:"Windows"
cidr:"192.168.1.0/24"

Fofa

Another Chinese search engine with extensive database.

Web Interface

Visit: https://fofa.info/

# Search syntax
title="Login"
header="Server: nginx"
body="admin panel"
port="3389"
domain="target.com"
ip="192.168.1.1"

Certificate Transparency Logs

Find subdomains via SSL certificate logs.

crt.sh

Web interface: https://crt.sh

# Search via web: %.target.com

# API query
curl -s "https://crt.sh/?q=%.target.com&output=json" | jq -r '.[].name_value' | sort -u

# Extract subdomains
curl -s "https://crt.sh/?q=%.target.com&output=json" | jq -r '.[].name_value' | sed 's/\*\.//g' | sort -u > subdomains.txt

certspotter

# Install
go install github.com/SSLMate/certspotter/cmd/certspotter@latest

# Search for domain
certspotter -domain target.com

# Export to JSON
certspotter -domain target.com -json > certs.json
# Use Intel module for certificate transparency
amass intel -d target.com -whois

Google Dorking

Advanced Google searches for reconnaissance.

Basic Syntax

site:target.com              # Restrict to domain
intitle:"index of"           # Pages with title
inurl:admin                  # URLs containing text
filetype:pdf                 # Specific file types
intext:"password"            # Text in page
link:target.com              # Pages linking to site
cache:target.com             # Cached version
related:target.com           # Similar sites

Common Dorks

# Find subdomains
site:*.target.com

# Login pages
site:target.com inurl:login
site:target.com intitle:"login" | intitle:"signin"

# Admin panels
site:target.com inurl:admin
site:target.com intitle:"admin panel"

# Configuration files
site:target.com filetype:env
site:target.com filetype:config
site:target.com ext:xml | ext:conf | ext:cnf | ext:reg | ext:inf

# Database files
site:target.com ext:sql | ext:dbf | ext:mdb

# Backup files
site:target.com ext:bkf | ext:bkp | ext:bak | ext:old | ext:backup

# Log files
site:target.com ext:log

# Documents with sensitive info
site:target.com filetype:pdf | filetype:doc | filetype:xls "confidential"

# Directory listings
site:target.com intitle:"index of"

# Exposed git repos
site:target.com inurl:".git"

# Exposed env files
site:target.com inurl:.env

# PHP info pages
site:target.com ext:php intitle:phpinfo "published by the PHP Group"

# SQL errors
site:target.com "SQL syntax" | "mysql_fetch" | "mysqli"

# Server information
site:target.com ext:php | ext:asp intitle:"Error" | intitle:"Warning"

# Vulnerable software
site:target.com inurl:wp-content | inurl:wp-includes "WordPress"
site:target.com inurl:joomla

# Interesting subdomains
site:dev.target.com
site:stage.target.com
site:test.target.com
site:uat.target.com
site:staging.target.com
site:internal.target.com

Google Dork Tools

# GHDB - Google Hacking Database
# Browse: https://www.exploit-db.com/google-hacking-database

# pagodo - Automated Google dorking
git clone https://github.com/opsdisk/pagodo.git
cd pagodo
python3 pagodo.py -d target.com -g dorks.txt

GitHub Reconnaissance

Search GitHub for sensitive data and information.

# Organization repositories
org:target-company

# User repositories
user:username

# Code search
"target.com" password
"target.com" api_key
"target.com" secret
"target.com" token
"target.com" AWS_ACCESS_KEY_ID

# File types
target.com filename:.env
target.com filename:id_rsa
target.com filename:.npmrc
target.com filename:.dockercfg
target.com filename:credentials

# Extensions
target.com extension:pem
target.com extension:key
target.com extension:ppk

# Specific keywords
target.com "BEGIN RSA PRIVATE KEY"
target.com "-----BEGIN PGP PRIVATE KEY BLOCK-----"
target.com "api_key" | "apikey" | "api-key"
target.com "client_secret" | "client secret"
target.com "mysql" password
target.com "postgres" password

GitHub Dorking Tools

# GitRob - Find sensitive files
git clone https://github.com/michenriksen/gitrob.git

# TruffleHog - Find secrets in git history
git clone https://github.com/trufflesecurity/truffleHog.git
trufflehog git https://github.com/target/repo

# Gitleaks
gitleaks detect --source /path/to/repo

# git-secrets - Prevent committing secrets
git clone https://github.com/awslabs/git-secrets.git

Wayback Machine

Historical website data.

Web Interface

Visit: https://web.archive.org/

API Access

# Get all URLs for domain
curl -s "http://web.archive.org/cdx/search/cdx?url=target.com/*&output=text&fl=original&collapse=urlkey" | sort -u

# Filter by status code
curl -s "http://web.archive.org/cdx/search/cdx?url=target.com/*&output=json&fl=original,statuscode&filter=statuscode:200" | jq -r '.[] | .[0]'

# Find specific file types
curl -s "http://web.archive.org/cdx/search/cdx?url=target.com/*.php&output=text&fl=original" | sort -u

waybackurls

# Install
go install github.com/tomnomnom/waybackurls@latest

# Usage
echo "target.com" | waybackurls

# Filter for interesting files
echo "target.com" | waybackurls | grep -E "\.js$|\.php$|\.asp$"

# Find parameters
echo "target.com" | waybackurls | grep "?" | cut -d"?" -f2 | cut -d"=" -f1 | sort -u

DNS Databases

DNSDumpster

Web interface: https://dnsdumpster.com/

Provides:

  • Subdomain enumeration
  • DNS records
  • Network mapping
  • Related hosts

SecurityTrails

Web interface: https://securitytrails.com/

# API usage
curl -H "APIKEY: your-api-key" "https://api.securitytrails.com/v1/domain/target.com/subdomains"

# Historical DNS data
curl -H "APIKEY: your-api-key" "https://api.securitytrails.com/v1/history/target.com/dns/a"

VirusTotal

# Subdomains via API
curl "https://www.virustotal.com/api/v3/domains/target.com/subdomains" \
  -H "x-apikey: YOUR_API_KEY"

# Passive DNS
curl "https://www.virustotal.com/api/v3/domains/target.com/resolutions" \
  -H "x-apikey: YOUR_API_KEY"

Passive DNS

RiskIQ / PassiveTotal

Web interface: https://community.riskiq.com/

AlienVault OTX

# Web interface: https://otx.alienvault.com/

# API access
curl "https://otx.alienvault.com/api/v1/indicators/domain/target.com/passive_dns"

WHOIS Data

# Basic WHOIS
whois target.com

# Historical WHOIS
# Use services like DomainTools, WhoisXMLAPI

# Reverse WHOIS (find domains by registrant)
# Use amass intel module
amass intel -whois -d target.com

Quick Reference

# Shodan - find exposed services
shodan search "org:Target Company"
shodan search hostname:target.com port:80,443,8080

# Censys - certificate focus
censys search "target.com" --index-type certificates

# Certificate transparency
curl -s "https://crt.sh/?q=%.target.com&output=json" | jq -r '.[].name_value' | sort -u

# Google dorking
site:target.com filetype:pdf | filetype:doc "confidential"
site:target.com inurl:admin

# GitHub secrets
"target.com" filename:.env
"target.com" "api_key" | "apikey"

# Wayback Machine
echo "target.com" | waybackurls