NOTE: I am continuing my articles in blog style over at a new site: http://PhilosophyOfReality.com See you there.
Dynamic Websites and Search Engines
The situation of database driven websites and search engines has been a hot topic for a while now, and as usual, most people aren't any closer to understanding the entire situation.
Here is my take on the entire situation as it stands now and where we're headed:
First off, we need to understand a little history. The beginning concept of the internet was to transfer static computer files from one computer to another. (Note: By files we mean html, txt, and other webpage files as well as images and even audio and video and other files like pdf. For the purposes of this article we'll stick with referring to files as webpages.) This idea was so great that things exploded.
In order to make it easier to find the desired webpages, "search engines" were created to find, index and create links to all the webpages they could find. These "search engines" build a database of webpages consisting of URL (webpage location), page content (text), list of links to and from that page, date indexed, and a whole slew of other bits of data to help them find the correct file your looking for, or at least the closest possible matches. More webpages and more information takes computer space and makes the searches more and more difficult. Then there's the problem of making sure the pages in your index still exist. It doesn't help to offer a link to a file that doesn't exist anymore.
In short time, the number of available webpages grew to an ungodly amount. This causes a huge problem for search engines: How do you economically index 10 Billion (and growing) webpages and give good results to peoples queries while still being profitable? After all, the total goal of a search engine is to offer only what they consider "acceptable search results" so they can satisfy their customers and keep them from going to "The Competitor Search Engine". They know that if they can get most people using their search engine, then they have a great chance at making lots of money.
The search engines needed to make "algorithms" or rules for deciding which webpages to list for the possible search terms. For example, if someone searched for "race horses", the search engine must offer different links than if someone searched for "history of race horses". Even though the searches are very similar they are looking for different sources of information and thus need to point to different webpages.
Since many pages were found to be, for all intents and purposes, worthless, the search engines started making rules as to what they would and would not index. This was a great way to save space, but the people that owned the discarded webpages weren't too happy.
Professional web builders learned quickly that ranking high on search results can be the source of incredible revenue for very little investment. For example if someone sold expensive jewelry and ranked top in the search engines for the term "diamond ring" (which is searched for over 23,000 times per month) the opportunity to make thousands or even millions of dollars exists. If 25% of the searchers comes to his site, and only 1% of those people buy a diamond ring, that's 58 sales per month with almost zero advertising costs. (These are conservative numbers. The results can be much greater.)
Well, if our jewelry retailer can get a good ranking for the term "Neil Diamond" (searched for 12,000 times per month. Good going Neil.) he may be able to convert some more sales. But his site has nothing to do with Neil Diamond whatsoever, so he creates a page to offer the special "Neil Diamond diamond ring collection" (a name he makes up for some jewelry). This page is specifically designed to rank well in the search engines for the term "Neil Diamond". Well why stop there? Why not do the same thing for the term "diamond multimedia" (15,000+ searches)? What about "baseball diamond" (2,500+ searches)? Plenty of money can be made this way.
Well after this happens, some people contact the search engine to tell them how lousy their search results are. A search for "baseball diamond" shouldn't turn up a jewelry company. The search engines find out what's going on, and adjusts their algorithm to keep this from happening. The jewelry company notices the lack of sales and finds out what happen. He figures out the new algorithm and redesigns his "marketing" strategy against this and again ranks well for inappropriate search terms. The whole cycle starts again.
This was the start of the whole Search Engine Optimization (or SEO) genre.
Now there is a constant battle between people trying to get their content at the top of the search engine results and the search engines trying to keep people from cheating. They want to offer relevant content so people will be happy with their search engine.
To make things even more fun, along came technology that allowed webpages to be made dynamically. They compare certain criteria, compare it to information in a database and, following a few rules in a template, create the webpage "on the fly" or immediately. This allows for personalized pages giving just the information you ask for in a practically unlimited combination.
Oh great. Now we have websites with the ability to offer an unlimited amount of webpages. How on Earth can we possibly index this? Well search engines can't possibly index a potentially infinite number of webpages so they decided to not index any dynamically created pages at all. How would they know a dynamically created page? A typical dynamic page has a URL like this:
http://www.mycousinfred.com/coolpage.asp?id=yippe&custom=72325&zip=91367
The information reads like this: The above example is at the website "www.mycousinfred.com" looking at the page "coolpage.asp". This is a template page, or a dynamic page. The information after the "?" is called URL variables. They give the template needed information to create the customized page. (Other sources can be used to create the dynamic page, but we'll concentrate on this one for the purpose of this article.)
This article is an example of a dynamic page. You're at the page article.cfm on my www.rubak.com site. The URL variable at the end tells the page which article to give to you. Changing the variable will give you a different article (provided that an article matches the variable)
So to make things simple the search engines just decided to not index any URL that contains a "?". (There are a few other rules, but this is the one we'll concentrate on.) So now they don't have any dynamic pages in their search engine.
OK, this brings us to:
The Present
So here comes the SEO people and inventive web site developers to tackle this huge task of getting dynamically created pages into the search engines. They've come up with scripts, hacks and tricks to change a URL from
http://www.mycousinfred.com/coolpage.asp?id=yippe&custom=72325&zip=91367
to
http://www.mycousinfred.com/coolpage.asp/id/yippe/custom/72325/zip/91367/
This new website address looks just like a static page several folders deep, which the search engine have no problem with.
Web developers also make "pointer pages" or static webpages that point to a dynamic page. Sometimes there are multiple pages, each one designed to rank well on different search engines for different search terms. It's not unusual in some online industries to have 100+ pointer pages directing traffic to the same webpage.
So we send these new URLs to the search engines and hey presto, they are indexed in the search engines.
As time goes on, more and more websites are using dynamically created webpages. They are easier to manage, give a more consistent look and allow much more options then static pages. (Try making a shopping system with only static pages. It's a pain.)
So this creates a potential problem of the search engines not being able to index more and more pages.
Except when we use the tricks to make "search engine friendly" URLs.
OK. So to recap, the search engines will only index your webpages if you use static looking, web friendly URLs. With this criteria out of the way, (assuming your pages are relevant and not useless) your pages are indexed. The search engines are happy and your customers can find you. Everything's great.
But wait a minute.
One of the problems the search engines claims is the situation of indexing an unlimited amount of pages. It doesn't matter what the URL is. The amount of webpages is the same.
In fact, once you add in "pointer pages", that amount of indexable increases even more.
Are the owners/workers of the search engines unaware of these tactics?
The search engines aren't being taken off guard. They are fully aware of the "friendly URL technique" and in fact often encourage this technique.
Here's an example of how webpages view indexing:
From http://www.directhit.com/util/spider.html
Dynamic Content: Grabber (directhits indexing spider) crawls documents regardless of whether they are generated statically or dynamically. Grabber does, however, avoid some dynamically generated (and potentially infinite) URL spaces by ignoring links to URLs that contain the following characters or strings: ?, =, @, &, cgi- and JavaScript.
In other words, they're worried about indexing "potentially infinite" pages unless you use the accepted technique, in which case, space doesn't seem to be an issue.
There seems to be a double standard for this situation. Why?
Because the search engines are trying to figure out how they can get their search engines to handle the truly dynamic and potentially vast amount of dynamic information out on the net. This URL friendly technique is a middle ground the search engines are practicing with until they can figure out the dilemma.
Also at this moment, Google does index dynamic web URLs. And lots of them. (which is a good thing, otherwise almost no one could find my articles.) They are proving that they can in fact index the dynamic internet and offer very relevant search results, as their growing popularity can attest to. To add to the amount of indexable content, they even index Adobe Acrobat PDF webpages. They are pushing the envelope as to how much a search engine can comfortably index. And they are breaking plenty of records doing it. The other search engines are now racing to get their systems to be better than Google at indexing dynamic information.
Also, goto.com has no problem with dynamic URLs since you have to buy the link from them. (On another note, the goto.com pay-per-click model is a great one for business sites, but sorely hurts personal sites. I'm not about to pay to have people find my articles, even if I offer items for sale on the side.)
The Future
So what is in store for dynamic websites like this one? What will the search engines do?
I predict that the search engines will, very shortly, start indexing dynamic webpages regardless of the URL. Very soon, links like http://www.mycousinfred.com/coolpage.asp?id=yippe&custom=72325&zip=91367
will soon be very common in search results, regardless of the search engine.
This would be better for everyone. It allows web developers to concentrate on more important aspects than making sure the URLs are search engine friendly. It cuts out the need for "pointer pages" which just clutter the internet. (Why should there be multiple pages pointing to the same page?) It would allow people to find specific items/information that's hiding in dynamic websites. It will make things better overall.
How confident am I? None of the dynamic websites I make have any search engine friendly URLs.
However, I do use "pointer pages" and search engine friendly URLs on some important products. Like many, I can't waste money waiting for the search engines to readjust, so I have to waste money fitting their current mold on certain situations.
I look forward to day when I don't have to worry about these things. Don't you?
I am available for seminars, debates, interviews or lectures about many philosophical, ethical, scientific or religious issues. Please contact me for scheduling and prices.
Read Responses
I am no longer accepting responses to my articles.
|