Categorify a CleanBrowsing Categorization Engine

CleanBrowsing is a content filtering platform that leverages DNS filtering to create one of the most effective filtering platforms for pornographic and obscene content. But how does the CleanBrowsing filtering work?

Hello Categorify - A Domain Categorization Engine

At the core of our filtering technology, you will find Categorify.

 

Categorify is a free tool that can be used by anyone to check a domains category. We built it as a way to be transparent to users on how we were categorizing a specific domain.

 

Here is an example of an output:

How Categorify Works

Categorify is powered by a supervised machine learning platform that performs a real-time scan of a domain, categorizing it on the fly.

 

We don't get into the details of what the engine is looking for, but you can assume it's a combination of things like domain freshness, word density, and a slew of other indicators. In addition to this analysis, Categorify functions as a crawler and will use every domain submitted as a new starting point for the crawler.

 

In other words, the more it's used, the more accurate it gets and the bigger it gets.

 

Lastly, Categorify functions as a data repository that feeds our filtering platform (both free and paid filters).

CB-Categorify

Verifiying Categories on Filters

Because Categorify is real-time, that means that there may be instances where a domain shows one category on Categorify, and another in our filter.

 

This is to be expected. The update takes about 24 - 48 hours to take effect in our filtering platform.

 

If you're using our filters, and specifically wondering if a porn site is being actively blocked on our filters, use DIG. It would looks something like this:

 

dig badexample.com @185.228.168.168
; <<>> DiG 9.16.1-Ubuntu <<>> badexample.com @185.228.168.168
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 47229
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;badexample.com. IN A
;; AUTHORITY SECTION:
badexample.com. 3600 IN SOA cleanbrowsing.rpz.noc.org. accesspolicy.rpz.noc.org. 1 7200 900 1209600 86400
;; Query time: 29 msec
;; SERVER: 185.228.168.168#53(185.228.168.168)
;; WHEN: Wed Nov 24 11:23:32 CST 2021
;; MSG SIZE rcvd: 117

 

You are loooking for this response here:

badexample.com. 3600 IN SOA cleanbrowsing.rpz.noc.org. accesspolicy.rpz.noc.org. 1 7200 900 1209600 86400

 

If it responds with an RPZ value that means it is actively being blocked. If it shows you something other than the RPZ value then it's not updated yet, and it's prudent to give it 24/48 hours to update.

How to Improve Categories

While our focus has predominantly been on pornographic and obscene content, the filtering consumption has grown exponentially the past few years and in doing so, we've expanded to several categories.

 

One of our secrets to our filtering effectiveness is that we use the same approach we have always used with open-source technologies, crowd-sourcing.

 

Anyone that uses Categorify has the ability to be part of the solution and their contributions effect millions of free users around the world. Those contributions come in the way of recategorization submissions.

Every submission is reviewed by our team and engine. They also follow the same update timing as the automated system.

P.S. - Want to check a bunch of domains?

We sometimes get massive lists from users, if that's you, here is a quick bash script that will help you parse domains.

 

In this example, I parse all the domains that are not categorized correctly:

 

#!/bin/bash

while read domain;

do curl -s https://categorify.org/api?website=$domain | grep "Porn" >/dev/null 2>&1 ;
        if [ $? = 0 ];
        then
         echo "Is a porn site" > /dev/null 2>&1;
        else
         echo "$domain is not a porn site";
        fi

done < [name_of_your_file]

 

All I'm doing is a looping a file through the curl request, list all your domains in a single line and let it do it's work. You can customize this script to search for any category as well.

 

Send us the parsed list and we'll work to get them submitted quickly. If you have any questions, send us an email at support@cleanbrowsing.org.