linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Hans Reiser <reiser@namesys.com>
Cc: Erik Andersen <andersen@codepoet.org>, linux-kernel@vger.kernel.org
Subject: Re: Things that Longhorn seems to be doing right
Date: Fri, 31 Oct 2003 14:30:16 -0500	[thread overview]
Message-ID: <20031031193016.GA1546@thunk.org> (raw)
In-Reply-To: <3FA211D3.2020008@namesys.com>

On Fri, Oct 31, 2003 at 10:40:03AM +0300, Hans Reiser wrote:
> Special cases of general theorems are not more powerful than the general 
> theorems, they are simply special cases.   You can design a language 
> that has the power of both relational algebra and boolean algebra.

Just because you can reduce everything to a turing machine doesn't
mean that the best way to implement a filesystem is with an infinitely
long tape which can only contain zero's and one's.  There are plenty
of optimizations which means that you can quickly and with minimal
overhead do searches based on structured data, which is far, far more
difficult to do if you are doing unstructured searches.  (In fact, in
some cases, if you don't have structured data to distinguish between
the author and the subject, you have to do the equivalent of natural
language processing if you are trying to do via an unstructured search
to find all papers written *about* a famous author, while not getting
false hits that were *written* by that same famous author.  Doing this
requires structured, not unstructured data.)

> >No, but it means that doing searches on formatted text is very
> >difficult,
> >
> When you say formatted text, do you mean fonts and stuff, or do you mean 
> object storage models.  Object storage models should generally be 
> replaced with files and directories. 

I mean fonts and stuff.  Stripping out fonts, tables, etc. for doing
generalized, unstructured text search, clearly needs to be done in
userspace.  Actually, I think we both agree on this point.  The poing
of disagreement is whether the searches utilizing such indexes should
be done in the kernel as part of the intrinsic part of the filesystem,
or in userspace.  I believe that we need to draw a very firm line
between what you call "primary keys", which uniquely identify a file,
and generalized searches.  You believe the two should be unified.

> Are you saying that auto-indexers should not parse the formatted text, 
> index the document, and allow users to find the document, with the 
> auto-indexer running in user space, but the indexes being traversed by 
> the filesystem namespace resolver?  The kernel does not need to 
> understand how to parse a document, it just needs to support queries 
> that use the indexes created by an auto-indexer that does understand it.

I believe that there is a big difference between, "I want the file
named /home/tytso/src/e2fsprogs/e2fsck/e2fsck.c", and "I remember
vaguely that 5 years ago, I read a paper about the effects of high-fat
diets on akida's, where the first name of the author was Tom".  The
first is a filename lookup.  The second is a search.  I would like
better search tools for files in a filesystem, no doubt.  But I would
never, ever put a search that might return an ambiguous number of
responses (that might change over time as more files are added to the
filesystem) in a Makefile as a source file.  

You are conflating these two concepts, pointing out that filename path
resolution happens a lot, and so therefore generalized searches should
also be done in the kernel.  What I am saying is that generalized
searches where the user needs to look at the returned set of files,
and then apply human intelligence to see which of the returned set of
files was the one they were looking for is a FUNDAMENTALLY DIFFERENT
OPERATION from a filename lookup via a primary key.  The latter should
be done in the kernel, as is the case to day.  The former should by no
means be in the kernel, and should be done in userspace, preferably
with a graphical interface lookup so the user can look at the returned
files, look at the context in which the search parameters appear, and
select the ones which actually is the document they were looking for.

Sure, Google has the concept of the "I'm feeling lucky" button.  But
there is a fundamental difference between a URL, and saying, "Type
'Akida fat diet' into Google and hit "I'm feeling lucky".  The latter
is something that you would never put into hypertext document as a
link, because it changes over time, and what works today might not
work tomorrow.  That is the difference between a name (a URL), and a
search string (what you type into Google).

> >In contrast, consider searching for someone who is male, between 30
> >and 40, is named Tom, and lived in Libertyville, Illinois sometime
> >between 1960 and 1970, and is married to someone named Mary who was
> >born in California.  This might return several people, and most people
> >would **NOT** consider the space of all queries about people to be a
> >"name space". 
> >
> Oh god, did you read the literature?

Is this the same literature as the ones which said that Microkernels
were the way, the truth and the light?  Is this the same literature as
the stuff written by the Professor Tennenbaum, who said he would have
given Linus a failing grade if he submitted Linux as a project?  There
are plenty of things in the Literature that I consider to be pure
stuff and nonsense, and people who claim that searches and "name
spaces" to be identical fall into that category as far as I'm
concerned....

> >Searches are not names.  They do not uniquely identify
> >people or objects, which is a fundamental requirement of a name.
> > 
> >
> You mean like Theodore?  Are you saying that Theodore is not a name 
> because it does not uniquely identify you?

In the computer science usage, yes, "Theodore" is not a name.  It is a
nick name; it is a convenient handle by which I can be identified; but
it does not uniquely identify me.  (I am reminded of a story from when
I was at MIT, and someone called up a fraternity, Tau Epsilon Theta,
and asked for "Mike", and was told, "which one".  "Well, the one which
lives at Tep".  "There's more than one".  "Well, the one with blond
hair".  "Sorry, there are three Mikes with Blond hair at TEP".  The
result was a run of frat shirts that were labelled, "Blond Mike from
TEP".  The moral of the story?  "Mike" is not a useful name when
trying to contact a specific person at this specific fraternity at MIT
back in the late 80's.)

> >The bottom line is that for something that happens dozens or even
> >hundreds of times a day, that's an argument that it *shouldn't* be
> >done in the kernel.  Compare and contrast that with handling incoming
> >network packets, which can happen millions of times per hour.
> > 
> >
> Actually the relevant measure is, not how often do you use it, but how 
> often would it context switch if it was not in the kernel.  Users rarely 
> use the networking code directly.

If random generic searches that return an ambiguous number of matches,
some of which may be the one the user wants, and some of them not,
happens only a few dozen times a day (which is about how often I use
Google), then an extra context switch, which is really fast in Linux,
is completely lost in the noise.  

> Naming is used by programs a lot.  Enhace naming, and the programs will 
> used enhanced naming a lot.

Searching and Naming are not the same thing.  Period.

						- Ted

  reply	other threads:[~2003-10-31 19:30 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-29  8:50 Things that Longhorn seems to be doing right Hans Reiser
2003-10-29 22:42 ` Erik Andersen
2003-10-29 23:03   ` Hans Reiser
2003-10-29 22:25     ` Dax Kelson
2003-10-30  0:20       ` Joseph Pingenot
2003-10-30  0:54         ` Neil Brown
2003-10-30  1:34           ` Joseph Pingenot
2003-10-30  2:54             ` Bernd Eckenfels
2003-10-30  2:58               ` Arnaldo Carvalho de Melo
2003-10-30  3:16               ` Joseph Pingenot
2003-10-30  5:28                 ` Jeff Garzik
2003-10-30  5:56                   ` Valdis.Kletnieks
2003-10-30  3:16             ` Neil Brown
2003-10-30  3:39               ` Joseph Pingenot
2003-10-30 10:27             ` Thorsten Körner
2003-10-30 21:28             ` jlnance
2003-10-30 22:29               ` Måns Rullgård
2003-10-31  2:03                 ` Daniel B.
2003-10-31  1:04               ` Clemens Schwaighofer
2003-10-30  2:09         ` Alex Belits
2003-10-30  3:12           ` Joseph Pingenot
2003-10-30  4:21             ` Scott Robert Ladd
2003-10-31 16:42               ` Timothy Miller
2003-10-31 19:15                 ` Hans Reiser
2003-10-30  9:52             ` Ingo Oeser
2003-10-30  4:06           ` Scott Robert Ladd
2003-10-30  1:52   ` Theodore Ts'o
2003-10-30  2:03     ` Joseph Pingenot
2003-10-30  9:23       ` Ingo Oeser
2003-10-30  3:57     ` Scott Robert Ladd
2003-10-30  4:08       ` Larry McVoy
2003-10-30 13:46       ` Jesse Pollard
2003-10-31  4:50       ` Stephen Satchell
2003-10-30  7:33     ` Diego Calleja García
2003-10-30  8:43       ` Giuliano Pochini
2003-10-30  8:05     ` Hans Reiser
2003-10-30  8:17       ` Wichert Akkerman
2003-10-30 11:59         ` Hans Reiser
2003-10-30  9:14       ` Giuliano Pochini
2003-10-30  9:55         ` Hans Reiser
2003-10-30 17:48       ` Theodore Ts'o
2003-10-30 19:23         ` Hans Reiser
2003-10-30 20:31           ` Theodore Ts'o
2003-10-31  7:40             ` Hans Reiser
2003-10-31 19:30               ` Theodore Ts'o [this message]
2003-10-31 20:47                 ` Hans Reiser
2003-10-31 13:59                   ` Herman
2003-10-31 21:23                     ` Richard B. Johnson
2003-11-01 18:30                       ` Hans Reiser
2003-10-31 21:08                   ` David S. Miller
2003-11-02 21:42                     ` Hans Reiser
2003-11-03 12:42                 ` Nikita Danilov
2003-11-03 16:58                   ` Timothy Miller
2003-11-04  8:13                     ` Hans Reiser
2003-11-05 13:51                       ` Ingo Oeser
2003-11-05  2:07                         ` Hans Reiser
2003-10-31 11:01         ` Kenneth Johansson
2003-10-31 13:52           ` Jesse Pollard
2003-10-30 11:21     ` Felipe Alfaro Solana
2003-10-30  7:25 ` Christian Axelsson
2003-10-30  8:10   ` Hans Reiser
     [not found] ` <200311011731.10052.ioe-lkml@rameria.de>
     [not found]   ` <3FA3FF46.7010309@namesys.com>
2003-11-03 10:55     ` Ingo Oeser
2003-11-04  8:10       ` Hans Reiser
     [not found] <LUlv.31e.5@gated-at.bofh.it>
     [not found] ` <M7iG.41B.7@gated-at.bofh.it>
     [not found]   ` <MagC.82U.7@gated-at.bofh.it>
     [not found]     ` <Maqe.8l3.9@gated-at.bofh.it>
2003-10-30 11:10       ` Ihar 'Philips' Filipau
2003-10-30 17:23         ` Alex Belits
2003-10-31  1:46           ` Daniel B.
2003-10-31  1:57             ` Philippe Troin
     [not found]     ` <Mcig.2uf.1@gated-at.bofh.it>
     [not found]       ` <Mcs2.2FJ.5@gated-at.bofh.it>
2003-10-30 12:04         ` Ihar 'Philips' Filipau
     [not found]     ` <Mg2B.7wf.9@gated-at.bofh.it>
     [not found]       ` <Mh8n.BT.9@gated-at.bofh.it>
     [not found]         ` <MhLf.1pF.9@gated-at.bofh.it>
2003-10-30 12:16           ` Ihar 'Philips' Filipau
2003-11-02 13:11 Brian Beattie
2003-11-02 17:15 ` Valdis.Kletnieks
2003-11-03 19:35   ` Brian Beattie
2003-11-03 20:17     ` Richard B. Johnson
2003-11-03 20:23       ` Valdis.Kletnieks
2003-11-03 20:54         ` Richard B. Johnson
2003-11-03 21:01           ` Valdis.Kletnieks
2003-11-03 22:06             ` Måns Rullgård
2003-11-04  8:47           ` Michael Clark
2003-11-04 12:47             ` Richard B. Johnson
2003-11-04 14:02           ` Brian Beattie
2003-11-03 20:55         ` Roland Dreier
2003-11-04  0:35     ` Daniel B.
2003-11-04 14:05       ` Brian Beattie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20031031193016.GA1546@thunk.org \
    --to=tytso@mit.edu \
    --cc=andersen@codepoet.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=reiser@namesys.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).