Filtering Policy
From E2BNWiki
Contents
|
Introduction
E2BN has recently completed an 18-month project on Content Delivery Management (CDM). There were two outcomes from this project.
Firstly, it resulted in an offer to the Region's LEAs to supply a choice of commercial content complete with hardware, installation, support, service and upgrades over a three year period. If you have not heard about this offer then please either contact your LEA or E2BN.
Secondly there is a publicly available report that outlines the results of the pilot scheme and, in particular, our thinking on the role of CDM systems in the educational context. It also became clear through the pilot that URL filtering was high on the agenda in many of the region's school and, in particular, the desire in many to be able to tailor the filtering system at a school level. In the light of this review and the lessons learnt in the original pilot E2BN began to run a project to investigate the best way of providing a regional service for caching and filtering.
This is now reaching fruition. We have now rolled out at the Local authority level cache & filtering servers to most of the authorities in E2BN. The Core & LA infrastructure is build around the Appliansys CACHEBOX300 series for caching and IBM/Opteron servers for filtering. Very soon it will be possible for schools & libraries to purchase their own Protex cache/filtering server. This is ONLY required if a site wants the ability to modify the four E2BN profiles locally. E2BN are currently making four profiles available: Primary, Middle, Secondary & Staff with others, particularly Libraries, planned. The Primary filter has been tuned to meet the Becta accreditation level of filtering which it recently achieved.
The filtering software we are using is an OpenSource product called DansGuardian which we have tested with some schools in the region. In addition to the traditional URL filtering DansGuardian also looks at the content of the requested pages (see their web-site for more details). The version of DansGuardian which E2BN are using includes the "Google patch" which enforces an undocumented feature in Google called "very safe search" and, for images, does some clever processing which blocks the display of thumbnails of images from sites which would be blocked by Dansguardian.
Please see this presentation for more background on the caching and filtering project and this one for more up to date information.
General Principles
First it must be noted that this policy document is an overview of E2BN policy on filtering internet access and the Protex system. The systems put in place by E2BN and, in particular, the architecture of the filtering solution, allow member Authorities and, in due course, schools to vary the policy that is decided regionally. The policy outlined here does not negate the need for LAs and schools to think through their own policies in this area: we hope that this document can provide a framework for further discussion and debate at a local level.
Filtering
Some would argue that there should be no filtering of internet content and that exposure to, and education about, the dangers of the internet must be the way forward. The main arguement put forward is that as pupils can access the whole gamut of material on their home computers schools provide a false sense of security if they filter all the "bad stuff". We reject this argument on two grounds: firstly that the argument for pornographic or explicitly violent magazines in a school library has never, to our knowledge, been seriously proposed and we view internet access in the same way. Secondly, there are good legal reasons for offering a safe environment for pupils to explore the vast body of information on the internet and preventing access (whether on purpose of accidently) to unsuitable material. It is in the exact definition of "unsuitable" that the devil lies.
URL Filtering
E2BN are subscribing on a regional basis to a commercial blacklist of URLs from URLBlacklist.com which is updated centrally at the E2BN core and distributed to the LEA and local systems overnight.
Clearly even the best URL system has its limitations - bad sites are springing up all the time in new and unusual places - so it will never be 100% perfect. In fact it is worse than that - not everyone will agree what a bad site is. One person's acceptable site is another's porn and in education the definition of "inappropriate" will change depending upon the age of the user.
The URL list is divided into categories which E2BN uses to refine the four current profiles. So, for example, while porn is blocked in all categories (including Staff!) adult is blocked in the student profiles but allowed for staff.
The Protex system offers a way to adjust these URL lists. The E2BN core servers and LA servers hold lists which are added to the standard blacklist.com lists. At its simplest this allows E2BN (or LA) to add URLs to the blacklist; override URLs in the blacklist; or "greylist" them.
Content Filtering
This is a stage on from URL filtering. Although the two sections following (Trusted Sites & Blocked Sites) are actually done via URLs I think they fit better here as the greylisting section is a kind of "half-way house" between the binary URL checks (block or don't block are the only two options) and the more heuristic, flexible content checking.
The typical terms used in web filtering and URL lists are "Whitelists" and "Blacklists". Because DansGuardian has (a) the concept of the "greylist" which can be confusing and (b) the ability to block downloaded files on the basis of their file extension we have changed the terms we use when talking about filtering to the more self explanatory Trusted, Blocked, and Content Checked.
Trusted sites ("whitelisting")
Making a site Trusted has two effects.
Firstly: if the site is in the urlblacklist then it will override it and let the page be displayed to the client. Any subsequent update to the urlblacklist will also have no effect on a trusted one (for example a trusted site which at some later date is added to the urlblacklist will still not be blocked - the trusted lists overrides the blacklist.
So, making a site trusted is not to be done lightly. You really must trust the site never to contain or distribute inappropriate material. Parts of sites may be trusted and others not as long as they can be distiunguished by URL - for example while mydomain.com [subsequently referred to as a domain] may not be trusted you may feel that mydomain.com/education [subsequently referred to as a site] can be.
The second effect is that by making a domain or site trusted you are explicitly trusting any downloadable files it contains as no extension blocking takes place. This effect is global to all the current profiles: a trusted domain or site's files can be downloaded by any client connecting to any profile.
E2BN has made the decision to trust certain categories of site (.gov.uk; .sch.uk; for example) and some specific sites (e2bn.net is one and others are various education sites, manufacturer sites; etc.) and these lists can easily be added to at a regional or LA level.
Blocked sites ("blacklisting")
Adding a domain or site to the blocked list bans the site from being viewed. This is simple and straighforward. The only thing to remember is that this, like the trusted list, will override the urlblacklist.com lists. So if, for example, a domain or site is removed from the main blacklists, for whatever reason, then it will remain blocked locally.
ContentCheck sites ("greylists")
This is the interesting one - and where some confusion sets in! Actually the idea is quite simple. If a site would normally be blocked by the urlblacklist but you want your clients to be able to access some or all of the site you have two options. (1) add it to the trusted list or (2) add it to the ContentCheck lists. So, what is the difference? I have been at pains in the section on trusted sites to emphasise the level of trust you are giving if you do this. You may feel that this is too much trust to give to a particular site but do not want to ban it completely.
Enter ContentCheck. Domains or sites added here override the URL lists (there would be no point in doing this if the site was not URL blocked!) but then the returned pages are scanned for the terms in the phraselists (see below) and either blocked or allowed depending on the rating and the profile in use.
Also, files downloaded from ContentCheck sites will be subject to the extension blocking rules in the profile. (i.e. student users will not be able to download .zip files from ContentCheck sites but staff users can - assuming they are using the Staff profile).
Phraselists
How is the content checked? This is where the Phraselists come in. Assuming the page is not blocked by the URL list (this is always first as it is the least processor intensive way of filtering) and it is not also trusted (in which case we do not need to check it by definition) the HTML on the page is scanned for various phrases and patterns.
There are two types are phrases: those which are either banned or exception and weighted. If a word or phrase in the web pages matches any item in a banned lists then the page is blocked. The items in the weighted lists (which are also categorized) all have a numerical value: the items found on the page are totalled to give the page a value which is used to rate the page. Each profile (see #Profiles) has a variable called the naughtinesslimit (not our name!) which can be changed to reflect the age group.
This is an area where some tuning is still being done to give the best and most consistent results. Searches in Google utilise various other features of Protex which are beyond the scope of this document but include the ability to scan the submitted search and block unsuitable search terms.
Google & other image searches
Google image searching is a very powerful tool but some schools have had to ban it because of the nature of some of the thumbnail images displayed to the unwary. E2BN have addressed this problem in two ways. Firstly the term &safe=vss is added to all image searches. This forces the search into very safe search mode which is, from our experience, a very effective block to inapproariate images. If a user sees this and deletes it from the search term in the browser it is replaced by Protex before the query is sent to Google for processing.
However, even Google's very safe search is not foolproof. So, in addition our system tests the URL of the originating site of each image returned. If it appears in the URL lists the returned image is replaced with a blank one. If the user clicks on the blank to go to the site it is blocked by the URL filter. In trials this has proved one of the most popular aspects of the system: teachers of all age groups can now allow image searching safe in the knowledge that the pupils will not access inappropriate material either by accident or design.
This technique can only be applied to Google's image search as they include the originating URL in the results page: others (yahoo, for example) do not. In these cases all thumbnails are replaced by a blank image.
Where can I find out more about Dansguardian?
There is plenty of material on the DansGuardian website: http://dansguardian.org
Webmail
Webmail falls into two categories: (1) a web based front end to an e-mail system controlled by the school, the LEA/LA, or E2BN and (2) a publicly available webmail system the best known of which are hotmail and yahoomail but there are others.
Our policy on these is actually very simple although the effects of it may be contentious to some users.
(1) Access to these types of systems is trusted. Currently E2BN offer two web based mail systems which have controlled access (i.e. we know who all the users are and can can access and track their email usage) one from Digitabrain and one from Netmedia. These are both trusted: the E2BNProtex filtering system will not check the pages generated by these systems in any way and the majority of extensions for attachements can be downloaded. E2BN can offer ALL staff in the region free accounts on either of these systems.
Assuming an LEA's webmail system is using a .gov.uk or .sch.uk domain name then these are also trusted by virtue of this domain. If another domain is being used it can also can be added to the trusted lists.
(2) Public webmail systems (hotmail, yahoo) are treated in exactly the same way as other websites.
OK, some explanation and elucidation may be necessary here because email is still such an important tool. What follows assumes you have read the section above about #Content Filtering.
So, what does "treated in exactly the same way as other websites" actually mean to a user? For the sake of clarity let us assume you have just opened a new hotmail account. When you access this account you are presented with a web-page with icons linking to an Inbox, Sent Messages, Drafts, etc. on the left hand side and your messages will appear in the main body of the page as a list of from addresses and subject headings. All well and good so far. Let us imagine for a moment a utopian vision where there is no spam mail and no idiots sending mail with swearwords in the subject line. In such a world hotmail would work fine. It will not get blocked by the filterting system because the web-pages generated are innocuous.
Even emails containing swearwords would be visible at this stage as only the subject lines are presented on the page. Now we see the power on content filtering (as opposed to URL blocking). Let us suppose one of the mails with an innocent subject is actually an email sent to a student from a bully using foul language. When the student tries to open this mail it is blocked by the content filter. Why? Because the webpage containing the mail has been checked by DansGuardian and the phrases used therein have pushed the page over the "naughtiness" limit asigned to that profile. As you see the profile of the student affects whether the page is blocked or not: a secondary student will see some mails that for a primary student would have been blocked.
If people could restrain themselves from using inappropriate language in the subject lines this would all be OK. Good mail gets through, bad mail does not. However, as we all know life is not like that.
First, lets get the simple case out of the way: student A receives a mail with a very inappropriate title which the phraselists weight at over the user's limit. What happens now? A logs into their account and now cannot access their Inbox as the page listing the subject lines in their Inbox has been blocked by the Phraselist weighting. What can they do? The best thing would be to go to a member of staff, ask them to log in to A's account (using port 8084) so they can access the Inbox. The member of staff notes the address of the sender and then deletes the mail so that the student can access their own account again. The alternative is that the student waits until the evening and does the same thing at home.
Suppose the sender of this mail (B) is another student at the school (and if hotmail is generally used by staff and students as their main email system this may very well be the case). In this case the teacher above would have their mail address and can talk to them about the mail and how it breached the school's AUP (Acceptable Use Policy). They may even impose sanctions on the user depending upon the content of the mail. Also B is blocked from his/her Sent Messages box while at school for the very same reason A is blocked from their Inbox.
If the sender is not a pupil then clearly it is harder (if not impossible!) to remonstrate with them - but do you really want outsiders sending inappropriate mail to your pupils? And, more to the point, your pupils receiving and reading them? This is exactly the point of filtering the content of the mail - an email communication is being treated in exactly the same way as any other web content.
Now we come to the issue of Spam mail. If the likes of hotmail and yahoo mail did really effective spam filtering; the mail was transferred to the Junk folder; and then deleted it after a certain time everything would be OK. However, this is not the case and much of the spam gets through to the inbox where, by the very nature of much of the spam mail, it throws the content filter over its threshold and the user is blocked from their inbox. (And, obviously, they also can never access the Junk mail folder at school).
There is no easy answer to this. E2BN policy is that, given all that has been said above about internet safety and the nature of the spam causing the block, any further loosening of the filtering policy for hotmail, yahoo mail, etc. is a matter for individual LEAs to decide and not a matter for regional policymaking.
What could be done? There are only two ways to address this - the choice is yours (or your LEAs): (1) make the "naughtiness" limit higher for each user profile. This will mitigate the webmail problem but will at the same time make access to other unsuitable sites more likely. You are making the whole web-filtering profile looser with all that implies. (2) depending on the exact mail system and how it is structured you may be able to add certain URLs to the trusted lists to, for example, give unfettered access to the inbox and all the e-mail it contains - but this would also allow any attachment to be downloaded as well. Hotmail seems to work in such a way that you would need to trust hotmail.msn.com in its entirety. Not something we would want to do at a regional level.
A word about attachments (see also the section below #File Extensions (.zip, .doc, .xls, etc.)). The same extensions are available for download from webmail as from any other website and will depend upon the profile being used. In particular, this means that staff can download most attachements (using port 8084) but students can only download certain acceptable extensions. This includes, for example, all the video and audio extensions but excludes .exe & .zip. But remember that users of trusted webmail systems (digitabrain, netmedia, LEA approved systems) can up & download all file types. So, for example, a student moving files between home and school using zip may have a problem if using hotmail but would be OK with an approved webmail system.
Profiles
Current Regional Profiles
The current version of the E2BN system has four profiles: PRIMARY (8081); MIDDLE (8082); SECONDARY (8083) and STAFF (8084). School based systems (when available) will come with two profiles installed: the relevant school profile on port 8080 and the STAFF profile on port 8084.
The most obvious difference between the profiles is the naughtinesslimit - this is set very low for the PRIMARY profile giving the most restrictive setting and increases on each of the other profiles as the age of the audience increases.
The second important difference is that the student profiles are much more restrictive on the types of file that can be downloaded. This is described in more detail here # File Extensions (.zip, .doc, .xls, etc.).
Finally the categories (see Protex Categories) of ULRs are slightly different between profiles. For example, while the category porn is blocked for all profiles URLs in the category adult are allowed for staff but blocked to all students.
Future Regional Profiles
There are two areas where we expect new profiles to be developed soon.
The first is Libraries. We already have a Library service but are now developing with them a profile for children to use in the Library. This will be similar to the MIDDLE profile but less restrictive.
The second is the Sixth form students. We feel that the SECONDARY profile is a little too restrictive for the sixth form and so intend to create another profile for them in due course.
File Extensions (.zip, .doc, .xls, etc.)
Protex enables the downloading of files from websites to be controlled via their file extension or mime types. The default files are very restrictive: the E2BN Protex implementation has relaxed these rules considerably. In particular we do not restrict any of the well known media types (mp3; mp4; mpeg; avi; rm; etc) on any profile. On the student profiles we do, for block the download of several files & mime types form sites which are not in the trusted list: in particular .doc & .zip files are blocked. A full list of types taken from a student profile can be found here: banned extention list & banned mime types. Note than lines beginning with the # character are not blocked. In the staff profile a very few extensions are blocked while the rest are not.
E2BN had originally blocked the .exe extension but this is now allowed as some sites require .exe files to be downloaded to provide full functionality. We regard this as a security hole and would prefer these files to be blocked so LAs and Schools must make sure they have other systems in place to prevent virus' and other malware being downloaded and installed by students.
Of the items in these lists please remember that blocking will only apply to sites which are not trusted: if it is important that students are able to download files with these extensions then they will be able to do so if the site is added to the trusted list. If a site is not trustworthy enough (see #Trusted sites ("whitelisting")) to be added to the trusted list then as a matter of both network security and child protection pupils should not be given free rein to download files from it.
E2BN has, as has been stated above, already added a variety of sites to the trusted list - both specific sites (bbc.co.uk; sophos.com; e2bn.net; etc.) and generically (.sch.uk; .gov.uk) - which permit all file downloads.
It is worth repeating here that zip files can be downloaded from any site via the STAFF profile and it is therefore very important that school systems managers make provision for staff (both teaching and technical) to have access to the STAFF profile.
Onlinegames, music downloads and other interesting stuff.
Another of the perceived benfits of Protex in testing has been the ability to ban online games: while this does cause much aggrevation to the pupils initially - new system installs inevitably bring a rush of forms asking for various online game sites to be unblocked - it soon abates. E2BN had discussions with schools about this decision and it was universally agreed within the test schools than onlines games - even at lunchtime - should remain barred to pupils as it is not considered to be an appropriate activity. Some of the games as well as being great timewasters are not suitable for younger students and can also clog up the limited bandwidth.
Where the system has been in place for some time we have very, very few requests for sites to be unblocked by either staff or pupils.
On the other hand we have unblocked the audio and video types (which are generally blocked in a default dansguardian installation as both timewasting and bandwidth hungry) as we believe the great educational potential here outweighs the downside. However, some LAs or schools may wish to block these type from untrusted sites.
Blogging, Flickr, Site-Builders (e.g. Geocities)
These sites (and other similar online tools) are, in general, not supported by E2BN for schools use. We recognise that there are very valuable tools are will do all we can to make sites available on a case-by-case basis consistent with child safety and both our and our member LAs responsibility to provide a safe environment for students.
Why are they banned? Simply because they are commercial, uncontrolled, globally available web-spaces which do contain material unsuitable for school viewing. In paricular they often contain pictures and images that cannot be filtered just by the textual content of the page: if it was not for the images we could add these sites to the ContentCheck cateogry and rely on the content filter.
Can I use them at all? Yes. While these sites are generally banned particlar subdoamins (for example at site of the form www.geocities.com/mysite/) can be added to the Trusted or ContentCheck lists. Just sent the comment form on the block page and we will look at the sites and make it available if appropriate. On all these sites they can only be made available if, as a user, you have a unique URL associated with your account.
Flickr is a special case - we are looking intro this but becuase of the structure of the site it is proving more difficult to make sub-domains available than with the other web-site (as oposed to image hosting) creation sites.
We are also looking into the possibility of providing access to similar tools ourselves: this will give us an audit trail in case of abuse and enable us to include the whole site within Trusted sites. This is still at an early stage, however.
Some technical stuff (skip if you are not interested!)
The E2BN Core Caching
There are six loadbalanced caches (Appliansys CACHEBOX 330) in the core. These servers are running as pure caches (i.e. not running the filtering software but just the Data Reactor caching software) which LA systems can access as their parent proxies. There are four IBM servers providing backup filtering for the region. In addition there are three IBM servers providing filtering for all region's libraries. Finally there are two management servers to handle the automatic dissemination of list changes and updates to all the implemented sytems: both at LA and school/library level. We intend to install a test system soon so we can test major changes (for example there is a new version of DansGuardian which we will need to test before upgrading) to the system.
Your LA's caching/filtering system
When fully deployed depending upon the particular LA and the bandwidth that needs to be supported a LA will have at least two Protex servers. If more than two are required they will be load-balanced with a pair of dedicated load-balancing applianses.
Local systems
E2BN have a special build of the CacheBox050 & CacheBox200 from Appliansys which includes the Dansguardian software configured to use our system. the 050 is appropriate for the smaller schools while we would recommend the 200 series for larger ones with a 2Mbps or greater connection. Now all the LA servers are all in place schools will shortly be able to purchase this cache/filter server to provide added, local flexibility. Costs & specifications are yet to be finalised: once complete they will be posted here and on the E2BN site website together with details of the hardware.
How they all work together
Below is a much simplfied description of how the elements fit together. What is not shown is the management system which runs from the E2BN core and connects with each of the Protex systems to propogate the list changes and updates. Currently the LEA systems are being installed. (Note: a Protex system includes both Caching and Filtering (DansGuardian) which may, or may not, be on the same physical server).
FAQs
Filtering FAQs Have moved to here



