From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S262324AbTJ3KJs (ORCPT ); Thu, 30 Oct 2003 05:09:48 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262328AbTJ3KJr (ORCPT ); Thu, 30 Oct 2003 05:09:47 -0500 Received: from smtprelay01.ispgateway.de ([62.67.200.156]:21154 "EHLO smtprelay01.ispgateway.de") by vger.kernel.org with ESMTP id S262324AbTJ3KJh (ORCPT ); Thu, 30 Oct 2003 05:09:37 -0500 From: Ingo Oeser To: trelane@digitasaru.net Subject: Re: Things that Longhorn seems to be doing right Date: Thu, 30 Oct 2003 10:52:15 +0100 User-Agent: KMail/1.5.4 References: <3F9F7F66.9060008@namesys.com> <20031030031223.GA15309@digitasaru.net> In-Reply-To: <20031030031223.GA15309@digitasaru.net> Cc: Alex Belits , Dax Kelson , Hans Reiser , andersen@codepoet.org, linux-kernel@vger.kernel.org MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200310301052.15315.ioe-lkml@rameria.de> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Thursday 30 October 2003 04:12, Joseph Pingenot wrote: > being in userspace. The cost is the context switching performance hit and > the fact that each process that wants to index its stuff must tell the > filesystem and the indexing service its data (effectively, two writes and a > completely separate API). [I'm likely preaching to the choir here, but > it's good to outline it] To tell the data, you just have to create a mapping of a file and pass that somehow. So you get it fresh from the pagecache. Since the indexer will just read it, this mapping will be even shared, which means it will not affect disk performance of the other applications using this file. Also notice, that no global index is needed, but per user ones, which must be mergable with a global one. Rationale: User A should not know the file contents of user B. And currently indexing is sloooow. I tried glimpse and htdig and they run several hours just for indexing the KDE and QT documentation. This let me come to the conclusion that a small keyword generator (strings?) run after fsync, which stores autogenerated keywords in the nearest index (per file, per directory, per user, global) might be better. Another interesting idea might be using existing indexes by letting applications define a search handler. This might prove useful for databases and will allow for unified search. But that might prove to be quite hard. Regards Ingo Oeser