Shareware Beach

Saturday, 8 January 2005

Banned!

Filed under: Cyberspace — Jan @ 9:24

The other day I clicked on a link to Slashdot, and got a nice page with the title “BANNED!”:

Either your network or ip address has been banned from Slashdot

…due to script flooding that originated from your network or ip address — or this IP might have been used to post comments designed to break web browser rendering. Or you crawled us with a rude robot, especially one that doesn’t understand RFCs very well.

If you feel that this is unwarranted, feel free to include your IP address (203.147.0.42) in the subject of an email, and we will examine why there is a ban. If you fail to include the IP address (again, in the subject!), then your message will be deleted and ignored. I mean come on, we’re good, we’re not psychic.

[snipped a further dozen paragraphs of blah blah]

It didn’t really bother me, since I don’t care much for Slashdot anyway. Accusing me of being rude and claiming to be good themselves in one breath is the sort of thing I could expect from them.

But this morning, an attempt to search for something on Google got me a 403 Forbidden error:

We’re sorry…

… but we can’t process your request right now. A computer virus or spyware application is sending us automated requests, and it appears that your computer or network has been infected.

We’ll restore your access as quickly as possible, so try again soon. In the meantime, you might want to run a virus checker or spyware remover to make sure that your computer is free of viruses and other spurious software.

We apologize for the inconvenience, and hope we’ll see you again on Google.

Fortunately, there’s plenty of other search engines.

But it does point out how vulnerable the Internet in its present incarnation is. I know for a fact that my network hasn’t been hitting anybody’s server. My ADSL router has a built-in NAT, which allows me to see each and every active connection. But even if I wasn’t that smart, the idle state of the LED on the router that indicates traffic speaks volumes.

The router’s IP address is assigned by my ISP through DHCP. In English, that means that I get a new address every time I turn the thing on, as does everybody else using that ISP. It also means that the IP address I have today was used by somebody else yesterday.

And there lies the problem. The previous user of the IP was probably used by somebody who doesn’t care much for computer hygiene, and has ended up with a zombie computer. To protect themselves, Google blocked the IP. Since Slashdot has told me I’ve been banned on several occasions, they probably blocked the whole subnet. Because IP addresses are shared, careless behavior by some spoils the party for everybody.

The solution is of course to use static IP addresses, which can be easily blocked without collateral damage. But that won’t happen until IPv6 is rolled out on a major scale, since there simply aren’t enough IP addresses to go around for everybody (and their phone, and their iPod, and their wristwatch).

Thursday, 23 December 2004

Eric Rice Finds Google Suggest too Suggestive

Filed under: Cyberspace — Jan @ 19:39

Last week I commented on Google Suggest, a new feature of the popular search engine that is in beta. Google is obviously censoring the suggestions, since a particularly popular three-letter-word yields no suggestions at all. On the other hand, searching for many popular software products, including my own, causes Google to suggest to search for pirated copies.

I’ve sent my comments to Google using the feedback link on the Google Suggest page, but received no response. I carefully avoided using the three-letter-word in question, so it should have passed their spam filters.

If your name is Eric Rice, however, the picture is even uglier. It seems that one particular Eric Rice has been the recipient of quite a number of insults on a message board that’s being indexed by Google. Google Suggest cheerfully displays the list to anybody searching for “Eric Rice”, including to children under 13 who don’t know how to spell these words, even though they can spell and even guess the three-letter-word I won’t mention today. Eric Rice the podcaster doesn’t like it at all.

I wonder what Google’s intentions are with this new feature. By filtering out some search terms, but not others, the argument that it’s all done by a machine is no longer valid. Google filters, so Google is liable.

At least I know now I’m not the only one whose feedback is being ignored. And searching for “Jan Goyvaerts” yields no suggestions other than my name. But even then Google Suggest is not helpful. There are a number of variations of my last name, so even fellow countrymen often fail to properly spell it. Typing “Jan Go” (all variations identical up to that point) does not bring up my name as a suggestion, even though my name yields 4 times as many results as “Jan Gossaert” (never heard of). Google Suggest doesn’t know any “Jan Goovaerts” at all (the name Goovaerts is more common than Goyvaerts).

What I haven’t talked about is how Google Suggest is linked to AdWords. I have no idea, but it does look like a “clever” way of increasing the number of searches for keywords that get many high bids. I’m an AdWords advertiser, but I’m not thrilled with Googles way of trying to show them everywhere. AdWords was originally promoted as being highly targeted, since only people actively searching for your keywords would see them. But that is no longer true, at least if you don’t disable the “content network” option.

Thursday, 16 December 2004

Google Suggest: Too Suggestive or Not Suggestive Enough?

Filed under: Cyberspace,Shareware Industry — Jan @ 21:37

Google Suggest is a new feature of the popular search engine that’s currently in beta. I have mixed feelings about it.

So far it does not strike me as particularly helpful in quickly finding the information I want. If I type “beethoven”, I get 10 suggestions including “beethoven opus 20″ (107K results), “beethovens opus 20″ (11K results) and “beethoven.com” (1 result). That’s two redundant suggestions, and one useless suggestion. But it doesn’t suggest “beethoven symphony”. If I type “beethoven sym” it suggests “beethoven symphonies” (328K results), “beethoven symphony” (1,120K results) and then 8 times “beethoven symphony” followed by a number. Might be useful to quickly compare the popularity of certain keywords on the web (on web pages, not in search queries!), but doesn’t help me to search faster. It’s not really useful as a type assist tool, since it updates too slow.

The suggestions have been filtered, though. If I type “sex”, I get no suggestions at all. Same deal with “safe sex”. “Drugs”, “murder” and “massacre” yield plenty of results. Twisted morality police at work? At least “Hitler”, “Holocaust” and “nazi” return plenty of suggestions. Even “Mein Kampf” is in the list, which isn’t even English. (Note: Holocaust-related subjects are very sensitive in many countries, even more so than sex-related subjects.) Guess which of all these keywords has the most results when actually performing the search?

But where Google gets really suggestive is when I type in the name of one of my products. E.g. “HelpScribble” shows me “helpscribble”, “helpscribble download” and “helpscribble crack”, the latter two with various version numbers. Same story with all my other products, and many Surely people are typing these keywords into Google, and Google is finding pages containing those keywords. But should Google suggest people to look for cracks when they are innocently typing in the name of a product? I know from my web logs that many people do arrive at my web site by typing a product name into Google rather than typing the product domain name (e.g. helpscribble.com) into the address bar. I also know that I frequently do this myself. Google usually gives the proper site as the first result, which may not always be productname.com.

Google has already shown the ability and willingness to filter the suggestions. Surely, some people would be offended if Google made suggestions about safe sex. But as far as I know, safe sex is still fully legal in California, where Google is based. But software piracy is not. Yet Google hides safe sex, but offers to search for pirated software for almost any product name (mine or other people”s) that I can think of. I know I don’t like it. More people looking for cracks is the last thing the software industry needs. I wonder what Google’s legal department has to say about this. They can’t claim to be mechanically aggregating what’s already out there, since they’re not.

Let’s see whether an email message with “sex” in it to Google’s feedback address will get past their spam filters.

Monday, 13 December 2004

Time to Uninstall SETI@Home?

Filed under: Cyberspace — Jan @ 19:40

Many people run the SETI@Home screen saver to contribute idle CPU cycles to the search for extra-terrestrial intelligence. An interesting article in Astrobiology Magazine argues that the effort is most likely to be in vain.

The SETI project uses a huge array of radar telescopes to scan signals from space for patterns (as opposed to random background noise). The article argues that signals with detectable patterns would be from aliens with limited technological skill in effectively encoding signals. The kind of alien that isn’t capable of sending signals into deep space in the first place.

Computer files are a good comparison. A simple file would be a plain text file. Even if it’s written in a language you don’t understand, there are clearly recognizable patterns in the file. Certain elements (vowels) occur far more often than others, elements appear in groups (words), some elements are used to create super-groups (punctuation), etc. It’s a very inefficient way of storing data.

A better way to store data is to compress it. A .zip file still has a certain amount of structure, since a simple layout is used to group files together in the .zip archive. But the compressed files inside the .zip look much like random bytes, with little statistical difference from random bytes. The better the compression algorithm, the closer the data will resemble a meaningless random stream of bytes. Because if there’s a pattern left, you could enhance the compression algorithm to compress the remaining pattern as well.

When sending signals through space over such distances that even the speed of light is a seriously limiting factor, you don’t have the luxury of using too many bytes. In fact, even our terrestrial communications and storage systems are using ever more sophisticated compression algorithms. Audio CDs and VHS tapes are uncompressed. MP3 songs and DVD movies use complex, patented compression methods.

Suppose you have a billion files on your PC’s hard disk, all of them containing random, meaningless bytes, except for one that contains a document written in an unknown language, compressed with a highly effective, unknown algorithm. There’s no way you can find out which file you want, let alone decipher it. Looking for a needle in a haystack is easy. Looking for a straw in a haystack is impossible.

And we haven’t even considered encryption yet.

« Previous PageNext Page »