How Categorify Works: Domain Classification Engine

Machine Learning-Powered Domain Categorization

Categorify is the engine behind CleanBrowsing's filtering decisions. It uses supervised machine learning to classify domains into 21+ content categories, ensuring that new and emerging websites are categorized quickly and accurately.

Get Started

Step 1: What is Categorify?

Categorify is CleanBrowsing's domain categorization engine -- the technology that determines what category a website belongs to. Whether a domain hosts adult content, gambling, malware, social media, news, or educational material, Categorify assigns one or more categories that CleanBrowsing's filtering system uses to make allow or block decisions.

Categorify powers all of CleanBrowsing's filtering decisions, from the free community filters to enterprise paid plans. When you configure CleanBrowsing to block "Adult Content" or "Gambling," it is Categorify's categorization database that determines which domains fall into those categories.

The engine uses supervised machine learning to analyze domains and assign categories automatically. This is fundamentally different from purely manual blocklists, which can only grow as fast as human reviewers can process domains. With machine learning, Categorify can evaluate thousands of new domains per hour, ensuring that newly registered websites are categorized quickly -- often within minutes of first being seen in DNS traffic.


Why Automated Categorization Matters

The internet is constantly growing. Thousands of new domains are registered every day, and many of them are used for malicious purposes like phishing, malware distribution, or hosting explicit content. A categorization system that relies solely on manual review cannot keep pace with this volume. Categorify solves this by using machine learning to handle the scale, with human review as a quality assurance layer for edge cases and recategorization requests.

Step 2: How Categorization Works

When a new or uncategorized domain is encountered, Categorify analyzes multiple indicators to determine the most appropriate category. This multi-signal approach ensures high accuracy even for domains that have little history or content.


Signals Categorify Analyzes

  • Domain freshness: How recently the domain was registered. Newly registered domains are statistically more likely to be associated with malicious activity (phishing, spam, malware). Domain age is a strong early indicator used before content analysis is complete.
  • Word density and content patterns: Categorify analyzes the text content of websites, looking for patterns that indicate specific categories. Adult content, gambling, and other categories have distinct linguistic patterns that machine learning models can identify reliably.
  • Similarity to known categorized domains: Domains that share hosting infrastructure, registration patterns, or content structure with already-categorized domains are likely in the same category. This technique is particularly effective at catching mirror sites and domain rotation schemes.
  • Threat intelligence feeds: Commercial and open-source feeds that track malware, phishing, botnet infrastructure, and other threats are integrated in real-time. These feeds provide high-confidence categorization for known malicious domains.
  • Web crawling results: Categorify operates as a web crawler that uses newly submitted domains as starting points, expanding its coverage as it discovers linked domains and related content. This ensures that associated resources and subdomains are also categorized.
  • Historical traffic patterns: How domains are queried over time provides additional signal. Domains that suddenly receive large volumes of traffic from known malware-infected networks, for example, are flagged for review.


Accuracy improves continuously as more users interact with the system. Each DNS query provides a signal about real-world domain usage, and recategorization submissions from users help correct edge cases. The system learns from every interaction, making it progressively better at classifying new domains accurately on the first pass.

Step 3: Recategorization and Community Input

No automated system is perfect. There will always be edge cases where a domain is miscategorized -- either incorrectly blocked (false positive) or incorrectly allowed (false negative). CleanBrowsing addresses this through a community-driven recategorization process.

Any user can flag a domain they believe is miscategorized. This can be done through the CleanBrowsing dashboard (for paid users) or through the free Categorify tool at categorify.org. All submissions are reviewed by the CleanBrowsing team before updates are applied. This human review layer ensures that recategorization requests are legitimate and that the corrections are accurate.


Recategorization Timeline

Changes typically take 24-48 hours to propagate through the filtering system. This delay is primarily due to DNS caching -- even after a domain's categorization is updated in our database, DNS resolvers and client devices may cache the previous response until the TTL (Time to Live) expires.

For urgent recategorization needs (e.g., a critical business domain being incorrectly blocked), contact our support team at support@cleanbrowsing.org for expedited review. In the meantime, paid users can immediately add the domain to their custom allowlist to restore access while the recategorization is processed.


Checking Domain Categorization

You can check how any domain is currently categorized using the free Categorify tool at categorify.org. Simply enter a domain name and the tool will show its current categories, when it was last analyzed, and provide an option to submit a recategorization request if you believe the classification is incorrect.

Step 4: Verifying Domain Blocking

To verify whether a domain is being blocked by CleanBrowsing, you can use the dig command to query CleanBrowsing's DNS directly. This is useful for troubleshooting false positives, confirming that a newly categorized domain is being filtered correctly, or verifying that your configuration is working.


Using dig to Test

Open a terminal or command prompt and run:

dig @185.228.168.168 example.com


If the domain is blocked, the response will show an RPZ (Response Policy Zone) result pointing to CleanBrowsing's block page IP instead of the domain's real IP address. The response will typically resolve to an IP in CleanBrowsing's block page range rather than the domain's actual hosting IP.

If the domain is allowed, you will see the normal DNS response with the domain's actual IP address.


Troubleshooting Common Issues

  • Domain should be blocked but isn't: Check that your device is actually using CleanBrowsing DNS (not your ISP's resolver or a browser's built-in DoH). Verify with dig @185.228.168.168 example.com to test directly against our servers, bypassing any local caching.
  • Domain is blocked but shouldn't be: Check the domain's categorization at categorify.org and submit a recategorization request if it is incorrect. Paid users can also immediately add the domain to their custom allowlist.
  • Categorization was updated but the domain is still blocked/allowed: DNS caching may be serving stale results. Clear your local DNS cache (ipconfig /flushdns on Windows, sudo dscacheutil -flushcache on macOS) and try again. Full propagation can take up to 48 hours due to caching at various levels.


For more advanced troubleshooting, see our DNS troubleshooting guide which covers additional diagnostic commands and common configuration issues.

Powered by machine learning. Trusted by millions.

Try CleanBrowsing Today