From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner+willy=40w.ods.org@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S263399AbTJaQgk (ORCPT <rfc822;willy@w.ods.org>);
	Fri, 31 Oct 2003 11:36:40 -0500
Received: (majordomo@vger.kernel.org) by vger.kernel.org id S263400AbTJaQgk
	(ORCPT <rfc822;linux-kernel-outgoing>);
	Fri, 31 Oct 2003 11:36:40 -0500
Received: from kinesis.swishmail.com ([209.10.110.86]:34573 "HELO
	kinesis.swishmail.com") by vger.kernel.org with SMTP
	id S263399AbTJaQgi (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Fri, 31 Oct 2003 11:36:38 -0500
Message-ID: <3FA290FB.1050307@techsource.com>
Date: Fri, 31 Oct 2003 11:42:35 -0500
From: Timothy Miller <miller@techsource.com>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Scott Robert Ladd <coyote@coyotegulch.com>
CC: trelane@digitasaru.net, Alex Belits <abelits@phobos.illtel.denver.co.us>,
       Dax Kelson <dax@gurulabs.com>, Hans Reiser <reiser@namesys.com>,
       andersen@codepoet.org, linux-kernel@vger.kernel.org
Subject: Re: Things that Longhorn seems to be doing right
References: <3F9F7F66.9060008@namesys.com> <20031029224230.GA32463@codepoet.org> <3FA0475E.2070907@namesys.com> <1067466349.3077.274.camel@mentor.gurulabs.com> <20031030002005.GC3094@digitasaru.net> <Pine.LNX.4.58.0310291848590.11170@sm1420.belits.com> <20031030031223.GA15309@digitasaru.net> <3FA091BD.2020701@coyotegulch.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
X-Mailing-List: linux-kernel@vger.kernel.org


Scott Robert Ladd wrote:

> 
> Another problem with metadata is that it is largely generated by the 
> user, who is notoriously lazy. A truly powerful system would use 
> contextual analysis and other  algorithms to automatically generate 
> metadata, freeing the user from an onerous task (which is what computers 
> should do). Certainly, some search engiens are bordering on this 
> capability.
> 

There is a French company called Pertimm which develops a search engine 
that does this with documents.  It even does cross-language queries 
based on sophistocated linguistic analysis.  Often, I wish google had 
some of those features, if even a primitive synonym table.

The relevance here, though, is that the Pertimm index is much larger 
than the actual text that be being indexed.  That's not a problem, 
really, because the same is true for google.  You need that for 
efficient searches.  But there is no place for such a thing in a file 
system.  I don't think any Linux developers would want the metadata to 
even APPROACH the size of the file data, let alone get LARGER.

Indexing of this sort has its place, but applying it to a whole file 
system is much too broad of a use.  For instance, you wouldn't want to 
index the contents of your binary programs, or even shell scripts for 
that matter.  So text, data, and code need to have different kinds of 
indexing.