BBC's website is in the top 50 of biggest websites in the world. The public service broadcaster struggles to handle all the data and traffic and keep the costs low.
BBC Online has a unique position. The broadcaster manages one of the largest websites in the world but does not generate money from it. That makes it hard to pay the bills for their storage and web servers. The organisation has found ways to cope with this problem.
Since the BBC is a public service broadcaster it is not permitted to carry any advertising or sponsorship to boost its revenue. This is done in order to keep the corporation independent of commercial interests and to make sure the general public interest is served. Instead the BBC derives its income from a mandatory license fee. Every household in the UK that wants to watch or record live television pays £145.50 per year, a small part of this amount (36p per month per user) goes to BBC Online.
According to web analytics firm Alexa bbc.co.uk is the number 44 on the list of biggest websites in the world. BBC's Chief Technical Architect (CTA) Dirk-Willem van Gulik points out that the amount of data that the broadcaster handles keeps growing, as well as the amount of visitors. This leaves the BBC with an dilemma. They have to make sure to handle his growing amount of data and be always available online but they cannot generate more income through advertising. They can only depend on the money generated from license fees, which is a fixed amount.
The move in 2007 to run advertising on the international news site (there is no advertising if you visit the site from inside the UK) did not fill the gap. In January this year the BBC also announced to cut the online budget by 25% reducing the income for the online services by £34 million.
The BBC's online department has to be creative in handling the ever growing data problem and the corresponding costs for data and traffic. "For the BBC broadcasting on the web is just like broadcasting on antenna's, we have to do it to get our stuff out there but it is not our core business," Van Gulik explained in a phone interview earlier this month. "So the BBC has found ways to be incredibly efficient with their infrastructure. Where a company like Yahoo would use many tens of thousands of servers to serve just the UK, the BBC would do that literally with a handful of them."
And that is where it gets "really fun and interesting" said Van Gulik, who has a background as one of the inventors of the Apache web server that is now used by 66 percent of the world's biggest websites. He points out that because of the license fee construction the BBC can not scale in the same way as the Google's and Yahoo's of this world. "If they get twice as much traffic or ten times as much traffic they are all dumped in the streets because that means for them that they get ten times as much revenue," said Van Gulik. "Because they have advertising and all sort of other things."
The BBC on the other hand cannot pay for growth by an increase in advertising revenue. If the BBC's online users increase tenfold the income from the license fees stays the same, Van Gulik points out. "Our income stays exactly the same, we don't get a penny more. So when we get ten times as many users we have to figure out a way to do things ten times cheaper."
To complicate matters further the buying of more server power is an expensive business. "If you have a server doing one gigabit and you buy a server doing ten gigabit, that server is not ten times more expensive but probably a hundred times more expensive. So that is a wonderful engineering challenge," said Van Gulik.
To tackle this problem the BBC created an in house software engineering practice within the own organization. In stead of manually creating static pages the BBC started to write software that generates things like structures on the fly. Serving pages more efficient and reducing the load on the servers.
"In the past 3 years we've moved the to a dynamic 3 tier, largely php, java and mySQL stack fronted by software loadbalancers called the Platform," Van Gulik explained. "This Linux based system is hosted at two datacentres. It makes use of a lot of Key Value stores, for example NoSQL products like CouchDB. Clever use of message queues; lots of efficient automated build chich is mostly based on maven, automated test (e.g. hudson), simple fast caches (varnish), lots of snmo and some solid Zenoss monitoring, novel log file handling using an fully internal system called teleport and so on."
The second thing the BBC did was developing the heavy lifting machines in house. "When you are in the top 100 of biggest websites in the world you can't buy your equipment of the shelf," said Van Gulik. "It isn't available commercially because it would have a market of only a 100 customers. [...] So that means that the really heavy lifting special stuff we actually have to construct in house and think about how to build that." The BBC can not simply copy the way Google handles things because they can pay for the extra server power by the increased revenue.
The next challenge for the BBC is digitising the video archive that consists of 80 miles of shelf space the broadcaster gathered since 1927. In total the data mounts up to 10 petabytes of storage. The BBC already completed the digitising of the one inch and two inch tapes and is now gradually processing the more modern formats. The data is stored on tapes because they consume less power and thus less money then a regular datacenter.
"Hard disks are nice but hard disks take power and they brake," Van Gulik explains. So the BBC chose to store the video archive in tape robots. "We store it in very large tape robots the size of a small tennis field which is basically packed with tapes and lots of arms and robots moving around to move the tapes around."
If the data would be stored on a normal hard disk it would contain between one or two hours of high definition footage that would cost between £30 and £40 a year for electricity alone. With overhead costs for the building or cooling that amount would be even higher. If the data is stored on tapes in a tape robot the costs for the same amount of data can be reduced to pennies.