Underscore vs Dash
Dave Collins wonders about the difference between the underscore and the dash. In particular, he wonders why search engines treat “my_page” as one word, and “my-page” as two.
Easy. Search engines are developed by programmers, and programmers generally treat the underscore as part of a word or identifier, but not the dash. In most programming languages, an identifier can start with a letter or an underscore, which can be followed by a number of letters, digits and underscores. The dash is used as the substraction operator. In other words, “my_page” is an identifier, while “my-page” substracts “page” from “my”.
In regular expressions, the short-hand character class w matches letters, digits and the underscore. While the actual set of characters matched by w is implementation-specific, it always includes the underscore. The difference is in matching non-English letters and digits. So if a search engine programmer uses w+ to get a list of words, “my_page” is matched entirely, and “my-page” matches twice as “my” and “page”.
While I’m sure search engines differ in the way they treat the underscore, it’s a safe assumption that most of them will treat underscores as part of the word, while a dash connects two separate words.