Monday, 27 April 2009

Fear the Google, don't fear the Google

evil google logo
There's been a lot of paranoia around lately about Google. We've had Street View arriving in the UK and provoking posh protests blocking their camera car. Now we've got requests for investigations on the info they hold about their users.

The former seems bizarre to me in the country which has accepted more cameras looking down on us than anywhere else in the world. The latter appears worrying when it's suggested that they might disclose that information to states, but states can get far more from internet service providers (and already trawl our digital use anyway).

What's really exercised some though has been the Google business proposition, which is to use that data to better target ads.

Here's why I find this a bit ridiculous: Google just aren't very good at it.

I have been using Gmail for ages and send and receive heaps of email. Besides each one are text ads targeted at me using all the text in all that email plus the content of the particular email I'm looking at.

They're consistently mistargeted .


gmail text ad showing legal firms

gmail text ad showing lionel ritchie concert and model trains

I swear I don't know why they think I might want Lionel Richie tickets. or have a legal problem.

After hunting I found one which is vaguely near to the content of the email. But only vaguely.

gmail text ad showing training providers

The area where we should be the most concerned is the one which I've yet to see British media really pick up on. And it's actually concerns Google's mission - to organize the world's information and make it universally accessible and useful.

Google Books is their project to make available digital copies of out-of-copyright books and make copyright book text searchable.

They've signed up Oxford University amongst other big name partners.

Trouble is there are several rivals to Google and they're open-source, not proprietary. Services like PublicDomainReprints.org and the Internet Archive.

Recently Google changed it's terms to specifically disallow any of these services from using books they'd digitised - public domain books. There's not been any legal action thus far but why change the terms if they didn't want to challenge others, like the Internet Archive which hosts over half a million public domain books downloaded from Google.

words in google colours drop out of bookGoogle has also 'locked up' some public domain books.

Here's an example of a public domain book on Google that was once 'Full access' and is now 'Snippet only': The American Historical Review, 1920. For the time being, there is a copy on Internet Archive.

The agreements with libraries (which are mainly university libraries), which were only made public by legal action, means that they give Google all of their books for free, and in return they are given scans that they effectively cannot use for anything.

If they want access to the corpus, they have to subscribe just like everyone else. This means that Google is requiring them to buy back their own copyrighted books, if anyone wants to actually use them on or off the campus.

Their recent deal with publishers which includes the setting up of a Books Rights Registry appears to give Google different, more favourable terms to anyone else who enters into agreements with the Registry.

The Open Content Alliance
(OCA) is a consortium with the Internet Archive at its centre which wants to build (a virtual) Alexandria Library II (a physical Bibliotheca Alexandrina exists). The OCA includes the British Library, the Royal Botanic Gardens at Kew and a number of corporations - though neither Google nor Microsoft, who recently left it after funding the scanning of 750,000 books to launch their own book scanning project.

Brewster Kahle, who founded the Internet Archive and heads the Open Content Alliance, warns of "the consequences of the consolidation of information into the hands of a few private organizations".
Google is digitizing some great libraries. But their contracts (which were actually secret contracts with libraries – which is bizarre, but anyway, they were secret until they got sued out of them by some governments) are under such restrictions that they’re pretty useless... the copies that go back to the libraries. Pretty much Google is trying to set themselves up as the only place to get to these materials; the only library; the only access. The idea of having only one company control the library of human knowledge is a nightmare. I mean this is 1984 – a book about how bad the world would be if this really came about, if a few governments’ control and corporations’ control on information goes too far.
There's other issues here too with Google's relationship with libraries:

Some may have second thoughts if Google’s system isn’t set up to recognize some of their digital copies, said Gregory Crane, a Tufts University professor who is currently studying the difficulty accessing some digital content.

For instance, Tufts worries Google’s optical reader won’t recognize some books written in classical Greek. If the same problem were to crop up with a digital book in the Open Content Alliance, Crane thinks it will be more easily addressed because the group is allowing outside access to the material.

The OCA is trying to establish a standard and both Google and now Microsoft have opted out. Not only is there duplication (triplication) of these vital efforts for human knowledge but Google also refuses to even talk to them, it sees them as a rival.

The OCA are building a "permanent, publicly accessible archive" of digitized texts. Both Google and Microsoft are doing it to make money - not that there's anything wrong with that but it is right to fear when such knowledge is only available via coporate, proprietorial means.

4 comments:

  1. Just to back up your post with some figures. Based on our relatively small AdWords campaign for www.bigvote.org.uk we achieve a ClickThroughRate of 0.95% on Search and 0.05% on Content Networks the majority of which are GMail pages.

    However that in terms of total click throughs it is not far behind the serach network because there are so many impressions.

    I guess it is a little like spam. Show enough people the ad and some will click through.

    ReplyDelete
  2. 'AdWords a bit like spam' - love it!

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Excellent post, Paul! I have to say I've been wondering what the hoo-hah is about this project. I've always thought that the digitisation of books is a top notch idea, however your post specifically details the legal issues that Google are creating more than any other I have read. I can now see why so many people are against the project and I have to say I will add myself to that list. A brilliant idea very, very poorly executed. Keep books open!

    ReplyDelete