View source for Toolswiki Blog 2007-10

__NOTOC__
==31 October 2007==
'''Nutch'''

In the wee hours of this morning I extended my research into [http://www.google.fr/search?hl=fr&q=%22I+hate+Google%22&meta= I hate Google] by searching for [http://www.google.fr/search?hl=fr&q=%22alternatives+to+Google%22&meta= alternatives to Google] (35,600 hits in Google). Bingo: I found [[Nutch]], which is an open-source web crawler which seems to be the type of software needed for my [[GING]] project. Moreover, [[Nutch]] is accompanied by Hadoop, a "software framework that supports distributed applications running on large clusters of commodity computers that process huge amounts of data." This is an example of what is called [http://en.wikipedia.org/wiki/Distributed_web_crawling distributed web crawling].

After a few hours of feeverish sleep, I had forgotten the name of [[Nutch]], so I went on Wikipedia to look for [[Open Source Crawlers]]. Zowie, there's tons of 'em!

Added note: [[Nutch]] is developped by Doug Cutting, who works for... Yahoo.<br>
<br>

==30 October 2007==
'''New URL again'''

As I mentioned yesterday, my webserver Ouvaton only accepts site names having the form "mysite.ouvaton.org," which is what the visitor sees in the navigator even if directed to the site from a more nifty DNS like "convivialtools.net."

But I should still change the site URL from "toolswiki.ouvaton.org" to "convivialtools.ouvaton.org," which now expresses the site's intent much better. And that means, as I explained yesterday, that I have to create a new space on the server, reinstall the software, and transfer all the data into the new space. Maybe I'll do that during the long weekend ahead, before posting the site's address to various wikis to boost the Google ratings. I checked again today: this site has virtually disappeared from both Google and Yahoo.<br>
<br>

'''Googling "I hate Google"'''

I amused myself for a while this evening searching for "I hate Google" on both Google and Yahoo. If you Google [http://www.google.fr/search?hl=fr&q=%22I+hate+Google%22&meta= I hate Google] you get 32,600 hits. My favorite is a page about<br>[http://robcubbon.com/more-about-seo-google-and-recriprocal-links seo google and reciprocal links] by Rob Cubbon, who seems to be just another reasonable but frustrated bloke like myself. Here is some of what he has to say:

:Am I the only person in the world who thinks that Google’s influence on the web is undemocratic and against the original spirit of the internet?

: You end up wasting valuable time trying to up your page rank. This is adding layer upon layer of unwanted crap on the internet that no one reads and no one wants. Link sites. Hate them.

:I’m trying to do reciprocal links to get my page ranking up.

:If anybody is interested in trading reciprocal links with me then get in touch. Mail me at rob@robcubbon.com. Have a look at the [http://robcubbon.com/links.html links].

I may drop him a line.<br>
<br>

==29 October 2007==
'''Dancing bird'''

My sister Miranda sent me this link to a [http://birdloversonly.blogspot.com/2007/09/may-i-have-this-dance.html dancing cockatoo].<br>
<br>

'''Old URL'''

Well, I tried what I discussed below in "New URL." I get the impression that Ouvaton only hosts URLs of the form "mysite.ouvaton.org." If I configure Gandi to redirect from "convivialtools.net" to "toolswiki.ouvaton.org," when they arrive on the site the visitors will still see "toolswiki.ouvaton.org" in their navigator. So why bother to a redirect? And how do I get a site where "convivialtools.net" stays displayed as the site's real URL?<br>
<br>

'''Pages too long'''

Today when I started editing this page Mediawiki gave me the following message:

:'''WARNING: This page is 35 kilobytes long; some browsers may have problems editing pages approaching or longer than 32kb. Please consider breaking the page into smaller sections.'''

So that explains why Google and Yahoo refuse to crawl most of my pages: they're too big. That also explains why so many sites only show a few paragraphs from each article, followed by a link saying "read more here." But there is still something strange going on here, because 32kb is a very small page, and Google contains all sorts of pages that are much bigger. We need [[GING|GING is not Google]] more than ever. (Parenthetical reflection: why did it take me so long to find this out? Good websites to help novices are clearly lacking or too hard too find.)

I [[ReFactoring|refactored]] the Main Page, as well as this current blog page, to try to get them below 32kb. I go to "File - Properties" in the menu to see the file size. The Main Page is okay at around 25kb, but this page is still too long at around 36kb. It fills about 9 screens, so a rule of thumb would be 4kb per screen, except there is also the stuff in the invisible headers and footers, so I should count maybe 5kb per screen, or a maximum of 6 screens. I HATE GOOGLE!

I cut out all but the past few days, and got this page down to 25kb. Every day when I add something into the top, I'll have to take something out of the bottom. What a pain in the old wazoo!

(Note that Google gives about 50 hits for [http://www.google.fr/search?hl=fr&q=%22pain+in+the+old+wazoo%22&btnG=Rechercher&meta= pain in the old wazoo] but more than 4000 hits for [http://www.google.fr/search?hl=fr&q=%22pain+in+the+wazoo%22&meta= pain in the wazoo].... My addiction to Google is what makes my Google problems so particularly painful.)<br>
<br>

==28 October 2007==
'''New URL'''

I got ready to register this site on various wikis, such as [[WikidWeb]] and [[AboutUs]], but then realized that I should first change the [http://en.wikipedia.org/wiki/Url URL]. Any self-respecting website uses a [http://en.wikipedia.org/wiki/Url Domaine Name System] (DNS), for instance "convivialtools.net," rather than a sub-domaine such as "toolswiki.ouvaton.org." I do own the DNS "convivialtools.net," having registered it with Gandi, and did attach it to my other website "conviviality.ouvaton.org," but only as a sort of redirect. I've long wondered what is the correct way to link the DNS to a site, and I think I finally understand. I have to create a new "space" on Ouvaton with the name "convivialtools.net," and then point the Gandi DNS towards it.

But this means that I have to install this website in the "space" on Ouvaton with the name "convivialtools.net." I will have to transfer the Mediawiki software into this space using FTI , create a new database, and then "install" the new website. After which I will have to import the data from this site into that one. No wonder I've waited so long to have a proper URL!<br>
<br>

==27 October 2007==
'''From "Convivial Tools User Guide" to "Convivial Tools Data Base"'''

The name of this site as displayed at the top of the Main Page has always been "Convivial Tools User Guide." I just changed the name to "Convivial Tools Data Base."

The original idea was to furnish a User Guide in terms of detailed evaluations of specific tool products, giving recommendations about the best products available. I still think that would be a useful project for a wiki. But in reality the website has evolved to be a sort of database. The new name thus better reflects the actual nature of this website.<br>
<br>

==25 October 2007==
'''Success, sort of'''

Well, it worked. The Google meta-tags that I added to my Mediawiki MonoBook.php file were found by Google, and this morning I was able to use the "Google Analytics." They showed that as usual I am the only visitor to the website. Then I checked the cache of this site's Main Page on google.com (English Google). The cache still hasn't been renewed since the last time the Main Page was crawled one month ago! Now I have done everything I possibly can to indicate to Google that my site exists and that I would like the Main Page to be crawled, and there is no result whatsoever. Yours truly, Franz Kafka.<br>
<br>

'''The next step'''

I think I've pretty well confirmed that there are only two ways for a new website to get listed on Google. One is to have lots of direct visitors, for example if you are a professor and you tell your students to look at the website. The other is to have links to your site from existing listed sites..<br>
<br>

I'll wait a few days to see if the Googlebot ever comes back to crawl my Main Page. Then I'll go to the next step, which is to put the address of the site onto other websites. I can think of a few possibilities: AboutUs, WikidWeb, even WikiIndex (although this site is not a real wiki). It might be noted however that there is already a link to this site from my Wikipedia User Page, but apparently Google has decided not to count links from Wikipedia User Pages.....<br>
<br>

'''A trip to Yahoo'''

I forgot the name of the website directory "[[WikidWeb]]" and could only remember "Wikid." I did some searching with Google (using for instance "wikid" plus "directory) but didn't find it. This lead me to finally do a search with Yahoo, and there it was right on the first page.

I then finally searched Yahoo for "Convivial Tools User Guide." It found the Main Page with no problem. But when I clicked on link to the cache of the Main Page, I could this message:

:We're sorry, but we could not process your request for the cache of http://toolswiki.ouvaton.org/.

Perhaps the Main Page is now too big, and exceeds some kind of limit they have set to cached pages? Maybe that is also the reason why the Googlebot refuses to update the cache? But I've seen many web pages that are a lot longer than this site's Main Page. Perhaps they do some calculation which weighs the size of the page against the amount of traffic, to decide whether it is worth their while to cache the page? If so, then as long as my traffic is low I should perhaps reduce the size of the Main Page, by taking out the blog posts? Many new webmasters undoubtedly get the same idea and overload their Main Page in order to try to get more hits, so the search engines counter that by setting a limit?

But now the problem is that with the Mediawiki change tracking, all of the past versions are stored with the page. If I try to cut down the size of my Main Page by deleting the blog posts, they will still stay in the page as unseen portions. I will have to make a backup version of the site, and then learn how to delete the page history and just save the current version....<br>
<br>

'''Another trip to Yahoo'''

I was worried that all the page history of a Mediawiki page is stored with the page, making the file very long. I thought of using PhpMyAdmin on my server to look at how a page is stored in this site's database, but before I did that I realized there is any easier way. I used my navigator to show the source code for the site's Main Page. I thus verified that only the current version is sent to the navigator, without all of the page history. The file is actually rather short, so it should be easy for Google and Yahoo to store the cache of the page.

So I went back to Yahoo to try to figure out just what version of the Main Page they have stored. In searching for a few word chains that would only be on that page, I was able to pinpoint the last time Yahoo crawled it: the 13th of October. That is about 12 days ago, which is pretty good when compared to Google's one month ago. Anyhow, the main conclusion is that the page length has little to do with Google's failure to update the cache of my Main Page.<br>
<br>

==24 October 2007==
'''Sitemap follies'''

This blog is getting far too technical for anyone ever to want to read, but it serves as a record of my struggle with the Google machine. Yesterday I succeeded in putting onto my Ouvaton server a sitemap.txt file, which contained only the address of this site's Main Page. When I did a Google search this morning, nothing had changed. The English Google cache still contains this site's Main Page as it was a whole month ago, while the French Google doesn't have it at all. But when I went to Ouvaton to look at the statistics, I saw that the Googlebot had crawled the site today. What the hell did it come for, if it didn't even crawl the Main Page?

Just now I prepared a new XML sitemap, instead of a TXT sitemap. It still contains only the address of the Main Page, but the XML sitempa has the advantage over the TXT sitemap of specifying how often the page will change, and the priority I want the page to have. I made the XML sitemap by copying example text from http://www.sitemaps.org instead of from the Sitemap Protocol help on Google. Sitemaps.org gives exactly the same example as Google (see my blog entry for yesterday), but the Google one is a dud. I checked that the XML file on my hard disk was readable by opening it, and confirmed that the XML file correctly displayed in my navigator (unlike the Google one). I then transferred the file by FTP to my server, and went to my Google account to submit it. Now waiting to see whether Google finds any errors in it....<br>
<br>

'''Another sitemap error'''

An hour later Google indicated that the XML sitemap had generated an error. It turned out to be my mistake: the URL that I gave in the XML file was http:http://toolswiki.ouvaton.org/index.php?title=Main_Page. I deleted the first "http:" from the address in the XML file, transferred the file anew with FTP, and resubmitted to Google the location of the XML file. But I doubt this will be enough to incite the robot to crawl my site. Which leads to the following philosophical reflections.<br>
<br>

'''Philosophical reflections'''

I get the impression that Google starves new sites deliberately, avoiding crawls and updates of their main pages, in order to provoke them into paying for Google links out of desperation. I call this reflection "philosophical" because it leads immediately to another, which is that the Google system reflects the type of profound inequality that is found everywhere in modern society. Once big sites get indexed, there seems to be no limit to the number of pages that Google will crawl for them. Consider the following quotes from the Google document on Sitemap Protocol:

:Q: How big can my Sitemap be?
:Sitemaps should be no larger than 10MB (10,485,760 bytes) in length when uncompressed and can contain a maximum of 50,000 URLs. This means that if your site contains more than 50,000 URLs or your Sitemap is bigger than 10MB, you must create multiple Sitemap files and use a Sitemap index file. You should use a Sitemap index file even if you have a small site but plan on growing beyond 50,000 URLs or a file size of 10MB.

:Q: My site has tens of millions of URLs; can I somehow submit only those that have changed recently?
:You can list the updated URLs in a small number of Sitemaps that change frequently and then use the lastmod tag in your Sitemap index file to identify those Sitemap files. Search engines can then incrementally crawl only the changed Sitemaps.

Can we understand from this that Google will crawl as many as 50,000 pages on favoured sites, up to as many as "tens of millions of URLs"? Is that shocking when you consider that I can't even get Google to crawl my one little Main Page? A single big site will get thousands of pages crawled, and thousands of little sites won't get anything crawled at all. Likewise, a single rich person may have as much combined wealth as tens of thousands of poorer ones. Should we just say that's life buddy, or is it possible to imagine a more just society?<br>
<br>

'''Collaborating reflections'''

Now back to collaborating. On the Google Webmaster pages I clicked on "Google Analytics," and signed up to use them. This sign-up was more elaborate than the first Google sign-up, and involved accepted the terms of a license. Google then proposed a code block for me to insert into every webpage I want to track, immediately before the </body> tag. (I think they mean before the <body> tag, that is, just after the <head> part of the page.) They add: "If you use a common include or template, you can enter it there." That required me to search hard on the Mediawiki site to find advice on how to do this, which I finally found at [http://www.mediawiki.org/wiki/Extension_talk:Google_Sitemap Google Authentication not described], as follows:

:To add the meta-tag, edit your MonoBook.php file at yoursite.com/skins/MonoBook.php ... Insert this code just below <head> which appears near the beginning of the file. 

I added the meta-tags to the MonoBook.php that I had in the Mediawiki directories on my hard disk, and used FTP to put the modified file in place of the existing one on the server. I'll visit "Google Analytics" to check the result tomorrow. I'm getting somewhere - I don't know where, but somewhere...<br>
<br>

==23 October 2007==
'''Worse and worse'''

The Googlebot situation is worse than I thought. Today I did another Google search for "Convivial Tools User Guide," and as usual Google found the [[Main Page]] of this website. I noticed that the Google listing includes a link called '''cached''', which shows the page as stored by Google. I clicked on the link, and then made a copy of the [[Google Cache on 23 October 2007]]. After a laborious search in the [http://toolswiki.ouvaton.org/index.php?title=Main_Page&action=history page history], I found that this cache exactly matches the [[Main Page on 26 September 2007]]. So although the Googlebot last visited the site on 11 October, the last time it crawled the [[Main Page]] was on 26 September, nearly a month ago!<br>
<br>

'''Toolswiki sells out'''

I've decided to collaborate with the enemy. I finally clicked on the link [http://www.google.com/intl/en/about.html About Google] on the Google page, and then explored [http://www.google.com/webmasters Webmaster Central]. Okay, I'll play the game. I'll open an account with Google and see if I can improve my listings that way. This more or less obliges me to change my tone, so I deleted the anti-Google rantings from the [[Main_Page#Recent Blog Posts|blog summary]] on this site's [[Main Page]]. As a compromise, I leave unchanged my previous posts below, and am keeping pages such as [[GoogleMafia]], but will push them discretely in the background. You can consider it historical material...<br>
<br>

'''First Sitemap'''

I understand now why it took me so long to find [http://www.google.com/webmasters Webmaster Central]. At home I use Firefox as my navigator, and the Google main page under Firefox is missing links that show up under Internet Explorer. This for example is why I always got Google in French (google.fr) and never found how to change to Google in English (google.com). Whereas at work my navigator is Internet Explorer, and when I found the added links there, I realized that at home I would have to use Internet Explorer to find them.

So I plunged into [http://www.google.com/webmasters Webmaster Central], and created a Google account. This required me to submit my email address and to wait for an email of confirmation. I then submitted the URL of this website (http://toolswiki.ouvaton.org). When I tried to obtain information regarding it, Google required me to validate my ownership by creating an html file with a barbarous title and then using FTP to transfer the file to the directory of the website on the server. This done, I was able to see the few data Google had about the site, and confirmed that the Main Page is '''not''' listed on google.com (even though it is listed on google.fr).

Then Google invited me to submit a Sitemap. They suggested that I generate a Sitemap automatically using the Google Sitemap Generator, which I dutifully downloaded and attempted to configure, before realizing that Ouvaton lacks Python, whereas the Generator is written in Python script. So I finally used the examples given to manually create a tiny Sitemap file in .xml format (which is just .txt format with the suffix changed to .xml). Here is the entire content my Sitemap file:

 < urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  < url>
   < loc>http://toolswiki.ouvaton.org/index.php?title=Main_Page/</loc>
   < changefreq>daily</changefreq>
   < priority>1.0</priority>
  </url>  
 </urlset>

My biggest worry came from the first line, but I decided out that it only refers to a standard for sitemaps which is found at the given internet address (thus largely irrelevant). I "submitted" my sitemap, which again required me to first transfer it by FTP onto my website server. I got a message saying it might be a few hours before it gets taken into a account. Let's wait and see.

All of which seems well designed to shake off all but professionals or the most determined of amateur "webmasters." But since I am trying to collaborate... No comment!<br>
<br>

'''Sitemap Errors'''

Half an hour later (at midnight) the result came back: an error message. The URL of the sitemap was incorrect. I had put the sitemap file into the directory "toolswiki.ouvaton.org" and given the URL as "http://toolswiki.ouvaton.org/sitemap.xml". But for that URL I should have put the sitemap file into the sub-directory "toolswiki.ouvaton.org/html". This is an incomprehensible feature of the Ouvaton server which has tricked me before. I moved the file to the right place and waited for the next error message....

Which came a few minutes later. The sitemap is in the wrong format. I copied their example to the letter, but of course that would have been too easy if it actually worked....

So I created a sitemap.txt file, which contains one line:

 http://toolswiki.ouvaton.org/index.php?title=Main_Page/

I put it in the same place, and went back to [http://www.google.com/webmasters Webmaster Central], and submitted it...
<br>
<br>

==22 October 2007==
'''Where's that Googlebot?'''

Now I've got a new problem. The Main Page of this website is listed on Google, but the Google robot last crawled it on the 11th of October. That means that all of the [[ReFactoring|refactoring]] I've been doing is completely unknown to Google. Here are the statistics about the robot visits:

*'''5 different robots -- Hits -- Last visit'''
*Yahoo Slurp ---- 168 ---- 22 Oct 2007
*Googlebot ---- 61 ---- 11 Oct 2007
*BaiDuSpider ---- 4 ---- 15 Oct 2007
*Unknown robot ---- 1 ---- 04 Oct 2007
*NG 2.x (Exalead) ---- 1 ---- 20 Oct 2007

The [http://help.yahoo.com/l/us/yahoo/search/webcrawler Yahoo Slurp] webcrawler just came by today. Good job! I think maybe I should switch from Google to Yahoo. The [http://www.google.com/support/webmasters/bin/topic.py?topic=8843 Googlebot] came last on 11 October, which is 11 days ago. Come on, you [[Scwewy Wabbit]]. I get visited by the [http://www.baidu.com/search/spider.html BaiDuSpider] thanks to the Chinese [[Spam Attack]]. Then there is a mysterious "Unknown robot." And finally there is a thing called [http://www.exalead.fr/search Exalead]. I wonder what that is....<br>
<br>

==18 October 2007==
'''Steep part of the [[ReFactoring|refactoring]] curve'''

We're talkin' steep part of the [[ReFactoring|refactoring]] curve, gang. This is a reference to a post on my personal blog dated 19 September 2007, entitled <br>
[http://slattery.ouvaton.org/index.php?2007/09/19/7-exponential Exponential], which begins: "We're talkin' steep part of the learning curve, gang."

Yesterday I [[ReFactoring|refactored]] this site's [[Main Page]], to put there a narrative summary of the contents of the site. In the process, I drastically expanded the site's scope. One week ago the only category that was really developed was [[:Category:Wikis]]. A few days back I added [[:Category:Convivial Tools]] and [[:Category:Open Design]], followed by [[:Category:FOSS]] ([[Free and Open Source Software]]). Then yesterday I suddenly added all of the following new categories: 

*[[:Category:Whole Earth]]
*[[:Category:Appropriate Technology]]
*[[:Category:Hacker Generation]]

But when will I ever find the time to fill in all of these categories with relevant information?<br>
<br>

'''WhoCares'''

Who really cares? This thought reminds me of a page from [[Richard Drake]]'s [[WhyClublet]], entitled [http://clublet.com/c/c/why?WhyWouldOneBother WhyWouldOneBother]. I complained again on [[#14_October_2007|October 14]] that this site only has a handfull of visitors. And when I gave the address of the site to a couple of colleagues at work last week, they never even looked at it, explaining that they are "too busy."

It's already been more than a month since [[RainDog's Editorial|RainDog's First Rant]] listed reasons to write for no one. Since then this site received a brief flurry of internet attention, after it first appeared on Google. Google apparently only found it because [[CommunityWiki]] generated automatic reverse indexing after I ''referred to'' them in this blog. But most of the new attention came in the form of a [[Spam Attack]]. My rage lead me to invent [[GING|GING is not Google]], but that like all the rest appears still-born.

So nothing has changed. I'm still producing all of this content just for myself. The reasons I gave in [[RainDog's Editorial|RainDog's First Rant]] are still valid, but the hope expressed by one them that "maybe someday something will change" is starting to fade. Yet I go on. Why? [http://clublet.com/c/c/why?WhyWouldOneBother WhyWouldOneBother]? WhoCares?
<br><br>

==17 October 2007==
'''News from [[Sunir Shah]]'''

[[Sunir Shah]] posted a message on my [http://en.wikipedia.org/wiki/User:Redeyed_Treefrog#Articles_I_have_contributed_to: Wikipedia user page], thanking me for the [http://en.wikipedia.org/wiki/Sunir_Shah article about him]. He very cordially says that he owes me a beer. I wonder if that will ever happen...


'''As well as news from EngineerScotty'''

The Wikipedia user [http://en.wikipedia.org/wiki/User:EngineerScotty EngineerScotty] wrote a paragraph about my [http://en.wikipedia.org/wiki/History_of_wikis History of Wikis] article on that [http://en.wikipedia.org/wiki/Talk:History_of_wikis article's Talk page]. It begins: "Good article." He says that it is too long and probably contains "original research." Be that as it may, this much is clear: at least one person has read it!<br>
<br>

==15 October 2007==
'''[[User:Frog]] steps in for [[User:RainDog]]'''

This evening I Googled [http://www.google.fr/search?hl=fr&q=raindog&meta= RainDog] to see if this website appears anywhere in the list. There are ''thousands'' of RainDogs, including one that has a competing series of [http://raindog.jasonarena.com RainDog's Rants]. That decided me to take my RainDog off of the stage, and to replace his with a simple Frog.

'''Cloud Therapy'''

My sister sent me this link to a [http://www.cloudappreciationsociety.org/gallery cloud appreciation website].<br>
<br>

==14 October 2007==
'''Stop looking at the statistics...'''

I just looked at the statistics concerning this site. Other than myself, there was one visitor yesterday, and two the day before. It's too depressing. I should avoid looking at the statistics and just keep concentrating on developing content. All the same...


'''English Google and French Google'''

It occured to me recently that this site is probably only listed on French Google, not on the English Google. When we were in the States last summer, I discovered that the home page of the [http://conviviality.ouvaton.org Convivial Tools Encyclopedia] site was unlisted there, whereas it had been listed in France.

When I try to connect to http://google.com I get automatically redirected to http://google.fr. Is there some way I can prevent the redirect?


'''[[ReFactoring]] the [http://conviviality.ouvaton.org Convivial Tools Encyclopedia]'''

Today I [[ReFactoring|refactored]] the [http://conviviality.ouvaton.org Convivial Tools Encyclopedia] website. It needed simplifying. I'm also going to rewrite the Home Page to profit from the lessons I've learned about Google. The new Home Page will be a long summary of what is on the site, with links to every article. That way at least the Home Page should pop up once in a while when people are searching Google, and thus get visited from time to time.

==13 October 2007==
'''RainDog finally understands the basic principle of a blog'''

All of this worry about getting pages into the Google listings has finally lead me to understand the basic principle of a blog: A BLOG PAGE ALWAYS HAS THE SAME INTERNET ADDRESS.

Once the blog page gets into the Google listings, people who click on the link will always go to the same page, and will thus see the new content added to the top of the page. Whereas if you create a new page, there is little chance that it will get listed on Google, which means no one will ever see it.

I am therefore creating a main blog page which will always have the same internet address, at: [[User:RainDog/Blog|RainDog's Blog]]. The remaining listings for October will be posted there.

==12 October 2007==
'''Our [[Main Page]] is finally on Google'''

The [[Main Page]] of the toolswiki site is finally on Google. This is probably only because I myself have been connecting to it so much. Or is it because I added at the bottom of the [[Main Page]] all those summaries from RainDog's Blog?

At any rate, mission accomplished. A few other pages had been showing up, but now at least the Main Page is on Google. '''For how long?''' See [[Google Log]].

The other big news is that the page of Wikipedia user [http://en.wikipedia.org/wiki/User:Redeyed_Treefrog Redeyed Treefrog] also shows up on Google!

'''And so is the "History of Wikis" article!'''

This evening I launched again the Google search chain that should find the [http://en.wikipedia.org/wiki/History_of_wikis History of Wikis] article and no other: "WikiWikiWeb snowballed." Bingo. This article is already listed on Google, and I only posted it yesterday!

'''But my "WhyClublet" article got booted off Wikipedia'''

But my [http://en.wikipedia.org/wiki/WhyClublet WhyClublet] article got deleted by a Wikipedia administrator called [http://en.wikipedia.org/wiki/User:Marasmusine Marasmusine], for failing to meet [http://en.wikipedia.org/wiki/WP:WEB notability guidelines for web content]. I had also created a page called [http://en.wikipedia.org/wiki/Richard_Drake Richard Drake], which I redirected to WhyClublet (the redirect was left in place). I pointed out that WhyClublet founder Richard Drake was one of the original WikiReductionists, who were historically the first group anywhere to deliberately delete articles from a wiki (from WikiWikiWeb in this case). The article about the first person to delete articles from a wiki was deleted from a wiki (from Wikipedia in this case).

==11 October 2007==
'''The [http://en.wikipedia.org/wiki/History_of_wikis History of wikis] is hot off the press!'''

Today I posted my big [http://en.wikipedia.org/wiki/History_of_wikis History of wikis] article on Wikipedia. I then spent a lot of time editing the wrinkles out.

This evening I [http://www.google.fr/search?hl=fr&q=%22history+of+wikis%22&btnG=Recherche+Google&meta= Googled "History of wikis"] just to see, knowing from experience that it will take a while before the new article is crawled. The first result, all the way up at the top of the list, was the [http://en.wikipedia.org/wiki/Wiki Wikipedia article on wikis]. The search had found a tag in the "history" section of this article, saying "Main article: History of wikis." Hey, I just added that "Main article" tag today! So the Wikipedia article on "wikis" was crawled today ''after'' I added the tag. Coincidence? Or are certain Wikipedia articles crawled every day?

Then I looked for a search chain that would be only in ''my article'', and no other. I tried the chain "WikiWikiWeb snowballed" and that worked. The result was:

::Your search - "WikiWikiWeb snowballed" - did not match any documents. 

The countdown begins to see how many days before the article appears on Google...

'''Google speaks "Elmer Fudd"'''

In writing the above paragraphs, I wanted to capture the sentence "Your search ... did not match any documents" in English, whereas my Google speaks French. So I went to the [http://www.google.fr/preferences?hl=en Google settings page] to change the language, and discovered that the list of available languages includes "Elmer Fudd." I set it to that, and got things like:

::Tip: Twy wemoving quotes fwom youw seawch to get mowe wesults.

The void search result shown above became:

::Youw seawch - "WikiWikiWeb snowballed" - did not match any documents. Oh, dat scwewy wabbit!

They're a load of laughs, they are, those Google guys.

(I then wanted to be able refer to this story easily, so I put it on its own page: [[Scwewy Wabbit]].)

==10 October 2007==
'''My MediaWiki is too slow'''

Perhaps one reason there are so few visitors to this site is that it is too slow. It strains my own patience when I click on a link and have to wait forever before the next page comes into view.

What can I do about this? Replace MediaWiki by something faster? My sites using WakkaWiki and SPIP feel significantly more rapid. Move to a service provider with more powerful computers? Tweak my MediaWiki installation to give it more pep?

I tried Googling this problem, and got good results with these search words: "Mediawiki faster." This gave a [http://www.mwusers.com/forums/showthread.php?t=4338 forum page] with the following advice:

*Enable caching if it isn't already
*Install APC, eAccelerator or a similar product to pre-compile the php to bytecode
*Google "mediawiki performance" and find things like [http://dammit.lt/2007/01/26/mediawiki-performance-tuning this].

This "this" links to an article called [http://dammit.lt/2007/01/26/mediawiki-performance-tuning Five minutes of MediaWiki performance tuning], which advises installing [http://fr3.php.net/manual/en/ref.apc.php APC], as follows:

*pecl install APC 
*add extension=apc.so to php.ini 
*add $wgMainCacheType = CACHE_ACCEL; into LocalSettings.php 

The word "APC" links to the page of the [http://fr3.php.net/manual/en/index.php PHP manual] called [http://fr3.php.net/manual/en/ref.apc.php III Alternative PHP Cache].

But where do execute these commands? The second command goes in php.ini (where's that?) and the third command goes in the MediaWiki Local Settings file (that I can find), but what about the first one? And is APC installed on my Ouvaton server? I tried to look on the Ouvaton site to find out more, but with no success. To be continued...

==09 October 2007==
'''[[GING]] is not Google'''<br>
A couple of days ago I came up with the idea of "[[GING]] is not Google." Modelled on Richard Stallman's original free software project "GNU is (not) Unix," [[GING]] is a project to use P2P software to pool thousands of PCs together to provide 
enough computing power to create an alternative to the [[GoogleMafia]]. But in order for anyone to know about [[GING]], the pages describing it will have to be listed on Google...

'''[[Google Log]]'''<br>
I added a line to the Main Page of this website which proudly indicates:
*On 9 October the Main Page of this website '''is not''' listed on Google.
I also created the page [[Google Log]] which will show the long-term record of when the Main Page is or is not listed on Google. If ever...

'''Wikipedia "Sunir Shah" article finally listed on Google'''<br>
I am glad to discover that my Wikipedia article on Sunir Shah has finally been crawled by the Googlebot. The article shows up in '''eighth place''' in the results of a search for "Sunir Shah." There is hope yet.

==07 October 2007==
'''More on toolswiki'''<br>
I just took a careful look at the statistics about this site. Over the past few weeks there have in fact been at the most twenty visitors. And how many of these were the spammers? Also, the frequency of visits already seems to be dropping off. So I'm basically back to talking to myself again. I guess that's just as well - I still have a lot of work to do on this site before it will really be useful to anybody.

'''More on Wikispam'''<br>
But how did the spammers create new pages, without first making a link in an existing page?

'''More ReFactoring'''<br>
I just laboriously [[ReFactoring|refactored]] the [http://toolswiki.ouvaton.org/index.php?title=Main_Page Main Page] of this website, adding a lot of recent blog posts at the bottom of the page, in order to try to preserve the presence of the Main Page on Google. But when I had finished I Googled "toolswiki" again, and sadly observed that the Main Page has already dropped off the list. [[GoogleMafia]]!

==06 October 2007==
'''[[WikiSpam]], the price of success'''<br>
The toolswiki has been spammed. It started a week ago, when this site started to show up just a little bit on Google. I only discovered it today. The spammers were discrete, to avoid killing the hen that lays the golden eggs, as it were. (Seen any golden eggs around here lately?) And fortunately the site had been configured to require creating an account before editing.

I quickly modified the LocalSettings.php file to allow account creation only by WikiSysop. I then logged on as WikiSysop and blocked one by one all 8 unwanted users, each of whom seemed to have lain only one golden egg. I consulted "All Pages," and with a bit of work created the following record:

*'''user name  --- date ----- page name'''
*8 r8y d9d -- --- 5 Oct ------ 雅思枪手13764107295雅思代考雅思
*8 r8y d9 ------- 5 Oct ------ @13764107295代考雅思代考雅思枪手QQ：27939721
*Luck668 -------- 4 Oct ------ Gold wow-power-leveling
*8P2q3K68 ----- 1 Oct ---- 代考雅思代考托福代考雅思枪手QQ：27939721
*3P2e3K68 ----- 30 Sept --- =代考雅思代考托福代考QQ：27939721
*3j2e3m64 ------ 30 Sept ---- QQ：27939721代考雅思代考托福代考
*8P2q3K68 ---- 30 Sept ---- 雅思枪手QQ：27939721代考雅思代考托福代考

They all come from GuessWhere, except Luck688. There was also a user Noinfo126 who seems to have created no pages. The page from Luck668 contains nuggets like this:

*He honestly performed his promise to the [http://www.toppowerlevel.net wow power leveling] man of many cures.

As WikiSysop I could have deleted these files, but I decided to leave them for the moment (even though that way the scumbags will get the payoff - and maybe I will too). Take a look at [[Special:Allpages]].

Now I have to figure out how to allow new user accounts without getting spammed....

'''GING is not Google'''<br>
Ruminating about Google and how I suffer from the way it works, I came up with the idea of a solution: [[GING|GING is not Google]].

==05 October 2007==
'''[[GoogleMafia]]'''<br>
After yesterday's high, a big low. I discovered that my [http://conviviality.ouvaton.org Convivial Tools Encyclopedia] website has already completely disappeared from the Google ratings. This saddening discovery prompted [[RainDog's Second Rant]].

==04 October 2007==
Wow! I found another link to this site. I Googled "Sunir Shah" to see if my [http://en.wikipedia.org/wiki/Sunir_Shah Wikipedia article] about him has started to show up yet. I was prepared to dig deep to find it, so I was down to about the eighth or ninth page, when I found a listing of - RainDog's Blog! Yipee! But what about the Wikipedia article?

'''Wikipedia article unknown to Google robot'''

I searched to the bottom of Google for my Wikipedia article on [http://en.wikipedia.org/wiki/Sunir_Shah Sunir Shah], but never found it. I finally put together a foolproof combination: "Sunir Shah" with "Early Life." That only brought up two or three pages of Google listings, but the article wasn't there.

So it hasn't even been crawled by the [http://en.wikipedia.org/wiki/Googlebot Google robot] yet! Holy cow - I put that article on Wikipedia more than two weeks ago. The insignificant information in Raindog's Blog shows up towards the top of the list, while the ''encyclopedia article'' about Mr. Shah remains totally invisible. Apparently Google waits to find an external link before it crawls a new Wikipedia article. (Maybe I should put a link to the article in my blog page.)

When I was desperately searching, I tried the combination "Sunir Shah" plus "Wikipedia." Several pages down I found the  [http://www.wikirage.com wikirage website], which had put the "Sunir Shah" article on an automatic list of "what's hot on Wikipedia" (so hot that Google missed it entirely). The article was on the list of those that had received the most edits over a short time period.... So I'm a perfectionist....

'''Wikipedia "Recent changes"'''

I suppose every Wikipedia contributor has tried this at least once: last week when I finished saving an edit, I went straight to "[http://en.wikipedia.org/wiki/Special:Recentchanges Recent changes]" to see it listed there. In the second it took me to open "Recent changes," some thirty other edits had been made, showing up on the list on top of mine.

'''More on Toolswiki'''

I Googled "Toolswiki" again, and realised that there are several varieties. "ToolsWiki" in [http://en.wikipedia.org/wiki/Camel_case CamelCase] belongs to a [http://www-theory.lbl.gov physics site], although the wiki itself seems broken. The page called [http://www-theory.lbl.gov/toolswiki/index.php?title=Monte_Carlo_Generators Monte Carlo Generators] on that site is the grab-bag porn wiki page. This seems to be [[WikiSpam]]. My site is "toolswiki" all in small letters. The link from it to [[Lion Kimbro]]'s [[de.li.cious]] is already broken, but there is still a link to the main page of [http://communitywiki.org CommunityWiki]. Not because "toolswiki" is listed anywhere on the CommunityWiki page (it isn't), but because Raindog's Blog '''points to''' CommunityWiki. JUST HOW DOES THIS GOOGLE BUSINESS WORK ANYWAY?

==03 October 2007==
Things are snowballing - it is hard to keep up.

'''[[ReFactoring]]'''

I [[ReFactoring|refactored]] the [http://ctwiki.ouvaton.org ctwiki site] today, the one that is about "Convivial Projects." Just as the present "User Guide" site now has a special section entirely devoted only to Wikis, the "Convivial Projects" site now has a special section devoted only to Open Design.

I'm busy studying Wiki History, but I should stop for a moment to tell the story about my minor rise in the Google charts - and what goes along with it.

'''"Toolswiki" inches up the Google charts - but not alone!'''

The other night I thought I would do a Google search on "toolswiki," to see if someone else uses the term. To my surprise, on the second or third page of Google results, I found the present site, whereas I thought the [http://en.wikipedia.org/wiki/Googlebot Google robot] had never crawled it. Of course, nobody is ever deliberately going to search for something like "toolswiki," so there were only three or four pages. But only one link actually connected to this site.

One of the listings caught my eye, because I recognised the name of well-known wiki-wizard [[Lion Kimbro]] (notably of the [[WikiNodes]] project). I opened the link, and found what appeared to be Lion Kimbro's de.li.cious record, showing that he had looked at this toolswiki site. Why? Probably because a day or two before I had put a link in this site pointing to his WikiNodes site, and being an expert hacker he uses software which allows him to look at sites that '''point''' to his.

The next morning I tried the experiment again. "Toolswiki" still brought up four pages on Google, but my site was no longer among them! On the other hand, I found sites that associated the term "toolswiki" with various forms of pornography. I opened one of the listed sites, and found a [http://www-theory.lbl.gov/toolswiki/index.php?title=Monte_Carlo_Generators long Mediawiki page] full of links, mostly to clearly pornographic sites. Apparently "toolswiki" was listed somewhere in the middle of all that. I think the explanation is that some kind of robot software followed the same path I did, looking at links on de.li.cious, and then copied everything it found onto that grab-bag wiki page. The payoff seems to be that the grab-bag wiki allows the porn sites to "ride" on the Google references of legitimate sites, including mine, to boost their own Google ratings. HOW TWISTED CAN YOU GET?

==02 October 2007==
'''Slow sites'''<br>
This site was extremely slow last night, taking forever to open pages or save changes. I surmised that the Ouvaton server is unable to process the MediaWiki software fast enough, and decided I would have to either migrate to a more professional server, like Lycos, or move all of the wiki pages to my simpler WikkaWiki site. But the next morning things worked much faster. Still, Ouvaton might well be overloaded at peak times like 10 o'clock at night.

But this site was not the worst. When I tried to open pages on one of the the Open Design websites, it took so long that I had to give up and do something else instead!

==01 October 2007==
This website henceforth includes two sub-sections, as follows:
* The Convivial Tools User Guide
* The Wiki Pages: pages about wikis, their history and main concepts

[[Category:Wikis Blog]]