From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S263399AbTJaQgk (ORCPT ); Fri, 31 Oct 2003 11:36:40 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S263400AbTJaQgk (ORCPT ); Fri, 31 Oct 2003 11:36:40 -0500 Received: from kinesis.swishmail.com ([209.10.110.86]:34573 "HELO kinesis.swishmail.com") by vger.kernel.org with SMTP id S263399AbTJaQgi (ORCPT ); Fri, 31 Oct 2003 11:36:38 -0500 Message-ID: <3FA290FB.1050307@techsource.com> Date: Fri, 31 Oct 2003 11:42:35 -0500 From: Timothy Miller User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Scott Robert Ladd CC: trelane@digitasaru.net, Alex Belits , Dax Kelson , Hans Reiser , andersen@codepoet.org, linux-kernel@vger.kernel.org Subject: Re: Things that Longhorn seems to be doing right References: <3F9F7F66.9060008@namesys.com> <20031029224230.GA32463@codepoet.org> <3FA0475E.2070907@namesys.com> <1067466349.3077.274.camel@mentor.gurulabs.com> <20031030002005.GC3094@digitasaru.net> <20031030031223.GA15309@digitasaru.net> <3FA091BD.2020701@coyotegulch.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Scott Robert Ladd wrote: > > Another problem with metadata is that it is largely generated by the > user, who is notoriously lazy. A truly powerful system would use > contextual analysis and other algorithms to automatically generate > metadata, freeing the user from an onerous task (which is what computers > should do). Certainly, some search engiens are bordering on this > capability. > There is a French company called Pertimm which develops a search engine that does this with documents. It even does cross-language queries based on sophistocated linguistic analysis. Often, I wish google had some of those features, if even a primitive synonym table. The relevance here, though, is that the Pertimm index is much larger than the actual text that be being indexed. That's not a problem, really, because the same is true for google. You need that for efficient searches. But there is no place for such a thing in a file system. I don't think any Linux developers would want the metadata to even APPROACH the size of the file data, let alone get LARGER. Indexing of this sort has its place, but applying it to a whole file system is much too broad of a use. For instance, you wouldn't want to index the contents of your binary programs, or even shell scripts for that matter. So text, data, and code need to have different kinds of indexing.