linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Hans Reiser <reiser@namesys.com>
Cc: Erik Andersen <andersen@codepoet.org>, linux-kernel@vger.kernel.org
Subject: Re: Things that Longhorn seems to be doing right
Date: Thu, 30 Oct 2003 12:48:10 -0500	[thread overview]
Message-ID: <20031030174809.GA10209@thunk.org> (raw)
In-Reply-To: <3FA0C631.6030905@namesys.com>

On Thu, Oct 30, 2003 at 11:05:05AM +0300, Hans Reiser wrote:
> What a performance nightmare.  Updating a user space database every time 
> a file changes --- let's move to a micro-kernel architecture for all of 
> the kernel the same day.....;-)

Nope, the user space database only needs to change when the file
metadata changes.

> Not to mention that SQL is utterly unsuited for semi-structured data 
> queries (what people store in filesystems is semi-structured data), and 
> would only be effective for those fields that you require every file to 
> have.

Your assumption here is that the only thing that people search and
index on is semi-structed data.  While this might be interesting for
text-based data, in fact, the problem space which WinFS has been
addressing isn't necessarily text files.  For example, one of the
examples given in the WinFS paper is the scenario where the user has a
large number of digital photographs, where some of the metadata might
be extracted from the EXIF headers, and some might be inserted by the
user him/herself (for example, the list of names of the people in the
picture, or the subject matter of the photograph: flowers, mountains,
etc --- the latter being very important for professional or
semi-professional stock photographers).  Such information is in fact
very structured, and is much more likely to stay constant even when
the file is modified.

In addition, even for text-based files, in the future, files will very
likely not be straight ASCII, but some kind of rich text based format
with formatting, unicode, etc.  And even general, unstructured
text-based indexing is hard enough that putting that into the kernel
is just as bad as putting an SQL optimizer into the kernel.  That I
would claim would have to be done in userspace, as part of the
overhead when OpenOffice saves the file.  (Note that some of the
Linux-based office suites store files as gzip'ed XML files, which
again argues that putting it in the kernel is insane --- why should we
compress the file, only to have the kernel uncompress it and then
re-parse the XML just so they can index it?  Much better to have
OpenOffice do the indexing while it has the uncompressed, parsed out
text tree in memory.  And if the indexes need to be updated in
userspace, then life is much, much, much simpler if the lookups are
also done in userspace --- especially when complex SQL query
optimizations may be required.)

> How about you send him a patch that removes all of that networking stuff 
> from the kernel and puts it into user space where it belongs.;-)  There 
> was this Windows user on Slashdot some time ago who claimed that it 
> wasn't just the browser that should be unbundled from the kernel, the 
> whole networking stack was unfairly bundled and locked out the companies 
> that used to provide DOS with networking stacks (the user didn't have in 
> mind patching the windows kernel and recompiling, he really thought it 
> should all be in user space).  Your kind of fellow.....

Networking has definite performance requirements on a per-packet basis
which requires that it be in the kernel.  Given that indexing happens
rarely (i.e., only when a file is saved), the same arguments simply
don't apply.  If you consider how often a user is going to ask the
question, "Give me a list of all photographs taken between June 10,
1993 and July 24, 1996 which contains Mary Schmidt as a subject and
whose resolution is at least 150 dpi", it definitely demonstrates why
this doesn't need to be in the kernel.

If you consider the amount of data that needs to be shovelled back and
forth between the kernel's network device driver to a userspace
networking stack and then back down into the kernel to the socket
layer when processing a TCP connection over a 10 gigabyte Ethernet
link, it's clear why it has to be in the kernel.  When you consider
how much data needs to be referenced when doing indexing, and in fact
that it may exist in uncompressed form only in the userspace
application, you'll see why it indeed it's better to do it in userspace.

The bottom line is that if a case can be made that some portion of the
functionality required by WinFS needs to be in the kernel, and in the
filesystem layer specifically, I'm all in favor of it.  But it has to
be justified.  To date, I haven't seen a justification for why the
database processing aspect of things needs to be in the kernel.

						- Ted

  parent reply	other threads:[~2003-10-30 17:48 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-29  8:50 Things that Longhorn seems to be doing right Hans Reiser
2003-10-29 22:42 ` Erik Andersen
2003-10-29 23:03   ` Hans Reiser
2003-10-29 22:25     ` Dax Kelson
2003-10-30  0:20       ` Joseph Pingenot
2003-10-30  0:54         ` Neil Brown
2003-10-30  1:34           ` Joseph Pingenot
2003-10-30  2:54             ` Bernd Eckenfels
2003-10-30  2:58               ` Arnaldo Carvalho de Melo
2003-10-30  3:16               ` Joseph Pingenot
2003-10-30  5:28                 ` Jeff Garzik
2003-10-30  5:56                   ` Valdis.Kletnieks
2003-10-30  3:16             ` Neil Brown
2003-10-30  3:39               ` Joseph Pingenot
2003-10-30 10:27             ` Thorsten Körner
2003-10-30 21:28             ` jlnance
2003-10-30 22:29               ` Måns Rullgård
2003-10-31  2:03                 ` Daniel B.
2003-10-31  1:04               ` Clemens Schwaighofer
2003-10-30  2:09         ` Alex Belits
2003-10-30  3:12           ` Joseph Pingenot
2003-10-30  4:21             ` Scott Robert Ladd
2003-10-31 16:42               ` Timothy Miller
2003-10-31 19:15                 ` Hans Reiser
2003-10-30  9:52             ` Ingo Oeser
2003-10-30  4:06           ` Scott Robert Ladd
2003-10-30  1:52   ` Theodore Ts'o
2003-10-30  2:03     ` Joseph Pingenot
2003-10-30  9:23       ` Ingo Oeser
2003-10-30  3:57     ` Scott Robert Ladd
2003-10-30  4:08       ` Larry McVoy
2003-10-30 13:46       ` Jesse Pollard
2003-10-31  4:50       ` Stephen Satchell
2003-10-30  7:33     ` Diego Calleja García
2003-10-30  8:43       ` Giuliano Pochini
2003-10-30  8:05     ` Hans Reiser
2003-10-30  8:17       ` Wichert Akkerman
2003-10-30 11:59         ` Hans Reiser
2003-10-30  9:14       ` Giuliano Pochini
2003-10-30  9:55         ` Hans Reiser
2003-10-30 17:48       ` Theodore Ts'o [this message]
2003-10-30 19:23         ` Hans Reiser
2003-10-30 20:31           ` Theodore Ts'o
2003-10-31  7:40             ` Hans Reiser
2003-10-31 19:30               ` Theodore Ts'o
2003-10-31 20:47                 ` Hans Reiser
2003-10-31 13:59                   ` Herman
2003-10-31 21:23                     ` Richard B. Johnson
2003-11-01 18:30                       ` Hans Reiser
2003-10-31 21:08                   ` David S. Miller
2003-11-02 21:42                     ` Hans Reiser
2003-11-03 12:42                 ` Nikita Danilov
2003-11-03 16:58                   ` Timothy Miller
2003-11-04  8:13                     ` Hans Reiser
2003-11-05 13:51                       ` Ingo Oeser
2003-11-05  2:07                         ` Hans Reiser
2003-10-31 11:01         ` Kenneth Johansson
2003-10-31 13:52           ` Jesse Pollard
2003-10-30 11:21     ` Felipe Alfaro Solana
2003-10-30  7:25 ` Christian Axelsson
2003-10-30  8:10   ` Hans Reiser
     [not found] ` <200311011731.10052.ioe-lkml@rameria.de>
     [not found]   ` <3FA3FF46.7010309@namesys.com>
2003-11-03 10:55     ` Ingo Oeser
2003-11-04  8:10       ` Hans Reiser
     [not found] <LUlv.31e.5@gated-at.bofh.it>
     [not found] ` <M7iG.41B.7@gated-at.bofh.it>
     [not found]   ` <MagC.82U.7@gated-at.bofh.it>
     [not found]     ` <Maqe.8l3.9@gated-at.bofh.it>
2003-10-30 11:10       ` Ihar 'Philips' Filipau
2003-10-30 17:23         ` Alex Belits
2003-10-31  1:46           ` Daniel B.
2003-10-31  1:57             ` Philippe Troin
     [not found]     ` <Mcig.2uf.1@gated-at.bofh.it>
     [not found]       ` <Mcs2.2FJ.5@gated-at.bofh.it>
2003-10-30 12:04         ` Ihar 'Philips' Filipau
     [not found]     ` <Mg2B.7wf.9@gated-at.bofh.it>
     [not found]       ` <Mh8n.BT.9@gated-at.bofh.it>
     [not found]         ` <MhLf.1pF.9@gated-at.bofh.it>
2003-10-30 12:16           ` Ihar 'Philips' Filipau
2003-11-02 13:11 Brian Beattie
2003-11-02 17:15 ` Valdis.Kletnieks
2003-11-03 19:35   ` Brian Beattie
2003-11-03 20:17     ` Richard B. Johnson
2003-11-03 20:23       ` Valdis.Kletnieks
2003-11-03 20:54         ` Richard B. Johnson
2003-11-03 21:01           ` Valdis.Kletnieks
2003-11-03 22:06             ` Måns Rullgård
2003-11-04  8:47           ` Michael Clark
2003-11-04 12:47             ` Richard B. Johnson
2003-11-04 14:02           ` Brian Beattie
2003-11-03 20:55         ` Roland Dreier
2003-11-04  0:35     ` Daniel B.
2003-11-04 14:05       ` Brian Beattie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20031030174809.GA10209@thunk.org \
    --to=tytso@mit.edu \
    --cc=andersen@codepoet.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=reiser@namesys.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).