Shareware Beach

Wednesday, 20 July 2005

Underscore vs Dash

Filed under: Software Development — Jan @ 9:40

Dave Collins wonders about the difference between the underscore and the dash. In particular, he wonders why search engines treat “my_page” as one word, and “my-page” as two.

Easy. Search engines are developed by programmers, and programmers generally treat the underscore as part of a word or identifier, but not the dash. In most programming languages, an identifier can start with a letter or an underscore, which can be followed by a number of letters, digits and underscores. The dash is used as the substraction operator. In other words, “my_page” is an identifier, while “my-page” substracts “page” from “my”.

In regular expressions, the short-hand character class w matches letters, digits and the underscore. While the actual set of characters matched by w is implementation-specific, it always includes the underscore. The difference is in matching non-English letters and digits. So if a search engine programmer uses w+ to get a list of words, “my_page” is matched entirely, and “my-page” matches twice as “my” and “page”.

While I’m sure search engines differ in the way they treat the underscore, it’s a safe assumption that most of them will treat underscores as part of the word, while a dash connects two separate words.

No Comments

No comments yet.

Sorry, the comment form is closed at this time.