Filtering Policy

From E2BNWiki

Contents

Introduction

Image:E2bnprotex.jpg

General Principles

First it must be noted that this policy document is an overview of E2BN policy on filtering internet access and the Protex system. The systems put in place by E2BN and, in particular, the architecture of the filtering solution, allow member Authorities and schools with their own local Protex system to vary the policy that is decided regionally. The policy outlined here does not negate the need for LAs and schools to think through their own policies in this area.

Filtering

Some would argue that there should be no filtering of internet content and that exposure to, and education about, the dangers of the internet must be the way forward. The main arguement put forward is that as pupils can access the whole gamut of material on their home computers filtering internet access at school provides a false sense of security. We reject this argument on two grounds: (1) the argument for pornographic or explicitly violent magazines in a school library has never, to our knowledge, been seriously proposed and we view internet access in the same way. (2) there are good legal and educational reasons for offering a safe environment for pupils to explore the vast body of information on the internet and preventing access (whether on purpose or accidently) to unsuitable material.

URL Filtering

E2BN subscribe to a number of commercial lists of URLs. These are regularly updated and collated into the lists used within the Protex system which are then distributed to the LA and local Protex systems overnight.

Clearly even the best URL system has its limitations as usuitable sites are springing up all the time so it will never be 100% perfect. In fact it is even more difficult to develop a "perfect" list as not everyone will agree what a bad site is. One person's acceptable site is another's porn and in education the definition of "inappropriate" will change depending upon the age of the user.

The URL list is divided into categories which E2BN uses to refine the filter profiles. So, for example, while porn is blocked in all categories (including Staff) adult is blocked in the student profiles but allowed for staff.

The Protex system offers a way to adjust these commercial URL lists. Each protex server maintains a set of local lists which are added to the standard distributed ones. At its simplest this allows a LA or school with its own Protex server or to add to, or remove URLs from, the lists provided by E2BN.

The typical terms used in URL based web filterind are "Whitelists" and "Blacklists". Because DansGuardian has (a) the concept of the "greylist" which can be confusing and (b) the ability to block downloaded files on the basis of their file extension we have changed the terms we use when talking about filtering to the more self explanatory Trusted, Blocked, and ContentChecked.

Trusted sites ("whitelisting")

Making a site Trusted has two effects.

Firstly: if the URL would normally have been blocked then this will be overridden and the page will be returned to the client browser. Any subsequent update to the lists will have no effect on a trusted site (for example a trusted site which at some later date is added to the main blacklists will still not be blocked - the trusted listing overrides the blacklist.

Making a site trusted is not to be done lightly as you must really trust the site never to contain or distribute inappropriate material. Parts of sites may be trusted and others not as long as they can be distiunguished by URL - for example while mydomain.com may not be trusted you may feel that mydomain.com/education can be.

The second effect is that by making a domain or site trusted you are explicitly trusting any downloadable files it contains as no extension blocking takes place.

E2BN has made the decision to trust certain categories of site (.gov.uk; .sch.uk; for example) and some specific sites (e2bn.org is one such others being various education sites, manufacturer sites, etc.). These can easily be added to at a regional or LA level.

Blocked sites ("blacklisting")

Adding a domain or site to the blocked list bans the site from being viewed. This is simple and straighforward. The only thing to remember is that this, like the trusted list, will override the main URL lists. So if, for example, a domain or site is removed from the main blacklists, for whatever reason, then it will still remain blocked locally.

ContentCheck sites ("greylists")

This is the interesting one - and where some confusion sets in! Actually the idea is quite simple. If a site would normally be blocked by the distributed blacklist but you want your clients to be able to access some or all of the site you have two options. (1) add it to the trusted list or (2) add it to the ContentCheck lists. So, what is the difference? I have been at pains in the section on trusted sites to emphasise the level of trust you are giving if you do this. You may feel that this is too much trust to give to a particular site but do not want to ban it completely.

Enter ContentCheck. Domains or sites added here override the URL lists (there would be no point in doing this if the site was not URL blocked!) but the returned pages' content is scanned and either blocked or allowed depending on the user's filter profile (see below).

Also, files downloaded from ContentCheck sites will be subject to the extension blocking rules in the profile. (i.e. student users will not be able to download .zip files from ContentCheck sites but staff users can - assuming they are using the Staff profile).

Content Filtering

How is the returned page's content checked? This is where the Phraselists come in: the HTML of the page is scanned for various phrases and patterns. There are two types are phrases: those which are either banned or exception and weighted. If a word or phrase in the web pages matches any item in a banned lists then the page is blocked. The items in the weighted lists (which are also categorized) all have a numerical value: the items found on the page are totalled to give the page a value which is used to rate the page. Each profile (see #Profiles) has a variable called the naughtinesslimit (not our name!) which can be changed to reflect the age group.

Google

Searches in Google utilise various other features of Protex which are beyond the scope of this document but include the ability to scan the submitted search and block unsuitable search terms.

Google & other image searches

Google image searching is a very powerful tool but some schools have had to ban it because of the nature of some of the thumbnail images displayed to the unwary. E2BN have addressed this problem in two ways. Firstly the term &safe=vss is added to all image searches. This forces the search into very safe search mode which is, from our experience, quite an effective block to inapproariate images. If a user sees this term in the browser's address bar and deletes it then it is replaced by Protex before the query is sent to Google for processing.

However, even Google's very safe search is not foolproof. So, in addition our system tests the URL of the originating site of each image returned. If Protex finds that it is a site which would be blocked to this user then the returned image is replaced with a blank one. Clearly, if the user clicks on the blank to go to the site it is blocked by the URL filter. In trials this has proved one of the most popular aspects of the system: teachers of all age groups can now allow image searching safe in the knowledge that the pupils will not access inappropriate material either by accident or design.

This technique can only be applied to Google's image search as they include the originating URL in the results page: others (yahoo, for example) do not. In these cases all thumbnails are replaced by a blank image.

Webmail

Webmail falls into two categories: (1) a web based front end to an e-mail system controlled by the school, the LEA/LA, or E2BN and (2) a publicly available webmail system the best known of which are hotmail and yahoomail but there are others.

Our policy on these is actually very simple although the effects of it may be contentious to some users.

(1) Access to these types of systems is trusted. Currently E2BN offer two web based mail systems which have controlled access (i.e. we know who all the users are and can can access and track their email usage) one from Digitabrain and one from Netmedia. These are both trusted: the E2BNProtex filtering system will not check the pages generated by these systems in any way and the majority of extensions for attachements can be downloaded. E2BN can offer ALL staff in the region free accounts on either of these systems.

Assuming an LEA's webmail system is using a .gov.uk or .sch.uk domain name then these are also trusted by virtue of this domain. If another domain is being used it can also can be added to the trusted lists.

(2) Public webmail systems (hotmail, yahoo) are treated in exactly the same way as other websites.

[For much more detail...]

Profiles

Regional Profiles

Version 1 of the E2BN Protex system has four profiles for use in schools: PRIMARY (8081); MIDDLE (8082); SECONDARY (8083) and STAFF (8084). Version 1 school based systems come with two profiles installed: the relevant school profile on port 8080 and the STAFF profile on port 8084. Two further profiles are used in Libraries accross the region.

The version of Protex now being installed (version 2) has 13 standard profiles, including one for Sixth Form, and Walled Garden and a Games versions of the three student profiles. Each version 2 system can be configured to use selected profiles on different ports, locations, and different NTLM user groups (if using NTLM AUthentication).

The most obvious difference between the profiles is the naughtinesslimit - this is set very low for the PRIMARY profile giving the most restrictive setting and increases on each of the other profiles as the age of the audience increases.

The second important difference is that the student profiles are much more restrictive on the types of file that can be downloaded. This is described in more detail here # File Extensions (.zip, .doc, .xls, etc.).

Finally the categories (see Protex Categories) of ULRs are slightly different between profiles. For example, while the category porn is blocked for all profiles URLs in the category adult are allowed for staff but blocked to all students.

File Extensions (.zip, .doc, .xls, etc.)

Protex enables the downloading of files from websites to be controlled via their file extension or mime types. The default files are very restrictive: the E2BN Protex implementation has relaxed these rules considerably. In particular we do not restrict any of the well known media types (mp3; mp4; mpeg; avi; rm; etc) on any profile. On the student profiles we do block the download of several files & mime types from sites which are not trusted: in particular .doc & .zip files are blocked. A full list of types taken from a student profile can be found here: banned extention list & banned mime types. Note that lines beginning with the # character are not blocked. In the staff profile a very few extensions are blocked while the rest are not.

E2BN had originally blocked the .exe extension but this is now allowed as some sites require .exe files to be downloaded to provide full functionality. We regard this as a security hole and would prefer these files to be blocked so LAs and Schools must make sure they have other systems in place to prevent virus' and other malware being downloaded and installed by students.

Of the items in these lists please remember that blocking will only apply to sites which are not trusted: if it is important that students are able to download files with these extensions then they will be able to do so if the site is added to the trusted list. If a site is not trustworthy enough (see #Trusted sites ("whitelisting")) to be added to the trusted list then as a matter of both network security and child protection pupils should not be given free rein to download files from it.

E2BN has, as has been stated above, already added a variety of sites to the trusted list - both specific sites (bbc.co.uk; sophos.com; e2bn.net; etc.) and generically (.sch.uk; .gov.uk) - which permit all file downloads.

It is worth repeating here that zip files can be downloaded from any site via the STAFF profile and it is therefore very important that school systems managers make provision for staff (both teaching and technical) to have access to the STAFF profile.

Onlinegames, music downloads and other interesting stuff.

Another of the benfits of Protex is the ability to ban online games. E2BN had discussions with schools about this decision and it was universally agreed within the test schools than onlines games should generally be barred to pupils as it is not considered to be an appropriate activity. Some of the games as well as being great timewasters are not suitable for younger students and can also clog up the limited bandwidth. Schools that purchase their own Protex server v2 provides each student profile in both the standard form and a "with games" option. To find out more about this please see the Online Documentation.

We have unblocked the audio and video types (which are generally blocked in a default dansguardian installation as both timewasting and bandwidth hungry) as we believe the great educational potential here outweighs any possible downside. Clearly these file types can only be downloaded from sites which are not blocked for some other reason.

Blogging, Flickr, Site-Builders (e.g. Geocities)

These sites (and other similar online tools) are, in general, not supported by E2BN for school use. We recognise that these are very valuable tools are will do what we can to make sites available on a case-by-case basis consistent with child safety and both our and our member LAs responsibility to provide a safe environment for students.

Why are they banned? Simply because they are commercial, uncontrolled, globally available web-spaces which do contain material unsuitable for school viewing. In paricular they often contain pictures and images that cannot be filtered just by the textual content of the page.

Can I use them at all? Yes. While these sites are generally blocked particular subdomains (for example at site of the form www.geocities.com/mysite/) can be added to the Trusted or ContentCheck lists. Just sent the comment form on the block page and we consider the request and make it available if appropriate. On all these sites they can only be made available if a unique URL is associated with your account.

Flickr is a special case - we are looking intro this but becuase of the structure of the site it is proving more difficult to make sub-domains available than with the other web-site (as oposed to image hosting) creation sites.

We are also looking into the possibility of providing access to similar tools ourselves: this will give us an audit trail in case of abuse and enable us to include the whole site within Trusted sites. This is still at an early stage, however.

FAQs

 Filtering FAQs Have moved to here