All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: Robert White <rwhite@casabyte.com>
Cc: "'Linus Torvalds'" <torvalds@osdl.org>,
	"'Albert Cahalan'" <albert@users.sourceforge.net>,
	"'Ulrich Drepper'" <drepper@redhat.com>,
	"'Mikael Pettersson'" <mikpe@csd.uu.se>,
	"'Kernel Mailing List'" <linux-kernel@vger.kernel.org>
Subject: Re: Here is a case that proves my previous position wrong regurading CLONE_THREAD and CLONE_FILES
Date: Sun, 12 Oct 2003 12:41:19 +0100	[thread overview]
Message-ID: <20031012114119.GB13427@mail.shareable.org> (raw)
In-Reply-To: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA2ZSI4XW+fk25FhAf9BqjtMKAAAAQAAAAl4UIR+3nFUmBp1aNINMhFgEAAAAA@casabyte.com>

Robert White wrote:
> The class of applications that contain "safe interpreters"

That's a fine long exposition you have there, but

> If the file descriptor tables are unified (all threads share one table) then
> the "X" would have to be a non-trivial function ThisThreadsSayFD() which
> would bear the burden of traversing some sort of lookup table, and probably
> checking access lists.  At a minimum there would need to be some kind of
> thread-specific variable support (a la POSIX).

Thread-specific variables take somewhere between zero and a few clock
cycles, when implemented properly (as they are now on Linux).

Also, it is nothing compared with the thread-specific interpreter
context that you already have running...

> By spawning your threads without the CLONE_FILES flag, you can partition the
> normal users away from these system level accesses via the simple expedient
> of closing the file handles in the new thread.  This could largely prevent
> script based fishing expeditions (e.g. calling scripting primitives with
> likely guesses about other entity tags representing file descriptors) and is
> particularly applicable to the more complex scripting or virtual machine
> environments.
> 
> If all your threads share the same file descriptor table, then you must be
> able to "prove" your GetTheRightDiscriptor() function for each possible
> fetched descriptor.  The function has to be able to return the right thing
> without ever returning the wrong thing.  That is expensive and complex, and
> complexity leads to error.

It's at least as difficult to prove that the script can't access other
threads' memory, which is a bigger weak point.  If you need proper
security isolation, you're going to need to _not_ use CLONE_VM.

That's not to say that separate file tables with CLONE_VM aren't
useful; they are.  But in my opionion efficiency of looking up an fd
number and interpreter security isolation aren't serious reasons.

-- Jamie




> 
> It is easier to "prove" that your ListenFoNewClients() thread starts before
> the database and administrative channels are even open (etc) and that your
> CreateNewClientThread() routine closes the few common resources the Listen
> thread needed before it gives control to the actual script/client.
> 
> Closing files out in the new thread increases safety and actually improves
> performance.
> 
> (Think about how much nicer and safer email would be on windows if Outlook
> did this, didn't share descriptors, and its scripting environment didn't
> include an open() call, or at least its open() *ALWAYS* asked the operator
> if the open was ok...)
> 
> ====
> 
> Linux Kernel Threads, versus POSIX Threads, Java tasks, et al.
> 
> Some of you reading this are probably asking yourself WTF I am talking
> about, and you just want to know if you can do some particular thing in your
> threaded program.  The answer is that if you are using pthread_create() in
> your program, the above discussion probably doesn't directly apply to you at
> any level that you need to care about.
> 
> Your answer lies in these three statements:
> 1) The Linux Kernel does not provide POSIX style thread support.
> 2) The Linux Kernel does provide everything necessary for the libpthread
> library to provide POSIX style thread support.
> 3) The Linux Kernel (also) provides features for decidedly non POSIX style
> threads.
> 
> If you substitute "Java" or "ADA" and the appropriate libraries or runtimes
> in the above you get the same basic truths, and it would be a mistake to
> wish otherwise.
> 
> The POSIX threading interface is, when you think about it, a detailed
> description of a set of features and facilities that work together a certain
> way.  It forms a set of promises about what you can expect the system to do,
> look like, and do for you, within a single program.  Its scope is naturally
> not extendable to an entire OS or platform.  That may not seem obvious to
> you, but consider these assertions made by the POSIX standard.
> 
> 1) There is a "main thread".
> 2) When the main thread exits all the threads are canceled.
> 3) You can create a "detached" thread that can not be pthread_joined().
> 4) [Detached threads are (surprisingly to some) subject to rule 2]
> 
> If you were to try to apply the four rules above to an entire operating
> system, there could only be one main thread in the whole system.  (Some
> might argue that init fills this role in GNU/Linux but) That would preclude
> the individual pthread programs from having their own main thread and
> reaping the benefits of both detached threads and application termination
> semantics.
> 
> Further, and still worse, consider that when you call pthread_create() it
> does far more than just start a process or program.  It must create and set
> up the data structures on which cancellation, thread specific data, cleanup
> push/pop, and so on are based.  pthread_exit() must likewise undo all that.
> If the kernel were asked to do this work, then these structures would be
> both slow and semi-public.  Neither property would be good for your program
> no the system as a whole.
> 
> All of the above would also be true for every mutex and condition variable
> too.
> 
> So when you see pthread_[anything] you are relying on the library to "do the
> right thing for you" in providing that consistent interface.  When you
> consider how bad native pthread support is in Windows, and then how much
> better it is in cygwin, you see just how bad it can be to try to merge the
> application-level pthread paradigm with the operating system core functions.
> 
> This is identical to how the Java Virtual Machine is in charge of doing the
> right thing for a java program etc.
> 
> 
> So what does the kernel provide and what is all this talk of threads?
> 
> [begin quick history lesson]
> 
> If you take a quick trample through the *NIX history you will find two
> system calls very close to its heart.  fork() and exec().  These two calls
> share between them the tasks necessary to invoke a program.  The actual
> genius is the fact that they split this work.  The horror is how expensive
> fork() could be, and that led to vfork().
> 
> In reverse order, exec() basically means "I wish to suicide in favor of this
> other program."  When you exec() your memory and stack space are wiped out
> and replaced with the image of the new program to run.  That program does
> inherit all of your other traits (process number, permissions, most or all
> of your open files, etc) but everything in the process data and code space
> is gone.  (This last bit is, incidentally, why we have "environment
> variables", so that some common data may survive.)
> 
> With only exec() you would never be able to have more than one program
> running.  Enter fork(), which takes the entire process and copies it.  Where
> there was one process there are now two identical processes.  The new
> process, the child, the copy, would then tweak a few file handles around etc
> and then call exec().
> 
> Since the first program was copied you needed to have as much memory free as
> the program was already using, that could get very pricy.  If the fork()ing
> program was larger than available memory it could be impossible.  And all
> this was often being done just so that the new copy could be discarded a few
> instructions later.
> 
> Enter vfork(). This "virtual fork" call didn't actually copy the process
> memory image, it just acted like it had to span the tiny bit of time between
> the vfork() and the exec() calls.  This saved tremendous amount of space and
> time.
> 
> And then time moved on and the hardware got better and the software
> paradigms became more expansive... 
> 
> [end quick history lesson]
> 
> Linux provides clone() "in place of" the standard fork() and vfork().  I use
> the quotes because if you look in the code you will *actually* see the
> fork.c file and entry.S file.  There are entry points for each of sys_clone,
> sys_fork, and sys_vfork and they all eventually pile back into the same code
> calling do_fork() with different arguments.  It's just easier to take at one
> gulp if you think of clone() as the new generic thing and fork() and vfork()
> special cases.  Have I lost you yet?
> 
> The real inspired part of clone() is that you get to choose what gets copied
> and what just gets shared between the old and the new process.  If you look
> in your linux source directory for include/linux/sched.h you will see there
> is a whole set of values that can be passed into clone to tell it how to
> slice/copy (e.g. clone) the new task from the old.  By artfully combining
> the flags you can do all sorts of interesting things when cloning yourself.
> 
> At one end you can get the original fork() and at the other end you can get
> the tightly intermeshed entities necessary for implementing pthreads (and
> Java tasks and such).
> 
> Now, if you run a pthread based program on a 2.4 kernel, and do a "ps -ef"
> you will see the same program repeated as a bunch of processes because of
> the way clone is called for each thread you (or the library) creates.  The
> weird thing is that because each thread is a separate process the outside
> world sees things it doesn't need to see and can do things to individual
> threads it kind of ought not to be able to do.  This is how you could
> occasionally exit or kill a pthread based program and end up with tidbits of
> it (one or two processes) left behind.
> 
> The 2.5 kernel adds the CLONE_THREAD flag to the list of clone available
> options.  The flag lets the application programmer (or in this case the
> pthreads library programmer) essentially say "no really, these tightly
> interwoven and interdependent entities can not live away from their
> siblings.  Treat them as one process."
> 
> When you run a pthreads based program on a 2.5 or later kernel AND you are
> using a version of libpthread that knows about/uses CLONE_THREAD you will
> see just one listing for the program (unless you ask ps to show you all the
> parts by using -m).  Indeed the kernel keeps the parts more intimately bound
> which makes a bunch of things better including, but not limited to, better
> management and exit strategies.
> 
> =====
> 
> The above may be reproduced or referenced for any purpose except for suing
> me or my employer.
> 
> Rob.
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

  parent reply	other threads:[~2003-10-12 11:42 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-09-28  1:27 Linux 2.6.0-test6 Linus Torvalds
2003-09-28  7:03 ` Con Kolivas
2003-09-28 10:02   ` Rob Landley
2003-09-29  4:55     ` Nick Piggin
2003-09-29  7:35       ` Rob Landley
2003-09-29 16:55       ` Ed Sweetman
2003-09-30  0:03         ` Nick Piggin
2003-10-02  0:41         ` Pedro Larroy
2003-10-02  3:05           ` Nick Piggin
2003-10-02 19:07             ` Pedro Larroy
2003-10-03  0:07               ` Nick Piggin
2003-10-03 19:34                 ` Pedro Larroy
2003-09-29 18:45       ` bill davidsen
2003-09-30  1:12         ` Nick Piggin
2003-10-01 21:13           ` bill davidsen
2003-10-02  2:45             ` Nick Piggin
2003-09-28  8:26 ` Markus Hästbacka
2003-09-28 10:54   ` Jeff Garzik
2003-09-28  8:59 ` keyboard repeat / sound [was Re: Linux 2.6.0-test6] Roger Luethi
2003-09-29 15:16   ` Vojtech Pavlik
2003-09-30  7:50     ` Paul
2003-09-30 12:51       ` Vojtech Pavlik
2003-09-30 13:21         ` Aristeu Sergio Rozanski Filho
2003-09-30 13:44           ` Vojtech Pavlik
2003-09-30 14:05             ` Aristeu Sergio Rozanski Filho
2003-09-30 14:16               ` Vojtech Pavlik
2003-10-01 23:51                 ` Aristeu Sergio Rozanski Filho
2003-09-30 18:16               ` Mark W. Alexander
2003-10-01 23:52                 ` Aristeu Sergio Rozanski Filho
2003-09-28 10:09 ` Linux 2.6.0-test6 Rafał 'rmrmg' Roszak
2003-09-28 11:05 ` Andreas Jellinghaus
2003-09-28 12:34   ` Dave Jones
2003-09-28 16:12     ` Andreas Jellinghaus
2003-09-28 17:51       ` Andries Brouwer
2003-09-28 16:42 ` Ivan Gyurdiev
2003-09-28 20:26 ` [patch] 2.6.0-test6: correct hdlcdrv.h prototypes Adrian Bunk
2003-09-29 13:23 ` Linux 2.6.0-test6 Florin Iucha
2003-09-29 13:55   ` Muli Ben-Yehuda
2003-09-29 14:01     ` Jaroslav Kysela
2003-09-29 14:18       ` Muli Ben-Yehuda
2003-09-29 19:04         ` bill davidsen
2003-09-29 14:30       ` Takashi Iwai
2003-09-29 13:58   ` Jaroslav Kysela
2003-09-29 16:30 ` Linux 2.6.0-test6 (compile statistics) John Cherry
2003-09-29 17:44   ` Jesper Juhl
2003-10-06 20:39     ` John Cherry
2003-10-01  8:58 ` Who changed /proc/<pid>/ in 2.6.0-test5-bk9? Mikael Pettersson
2003-10-01 11:52   ` John Levon
2003-10-01 20:21     ` bill davidsen
2003-10-02  1:00       ` John Levon
2003-10-06  3:01         ` bill davidsen
2003-10-01 15:11   ` Linus Torvalds
2003-10-01 20:58     ` bill davidsen
2003-10-01 23:42     ` Albert Cahalan
2003-10-02  0:38       ` Linus Torvalds
2003-10-02  0:57         ` Albert Cahalan
2003-10-02  3:35     ` Ulrich Drepper
2003-10-02  4:12       ` Albert Cahalan
2003-10-02  4:58         ` Ulrich Drepper
2003-10-02 13:48           ` Albert Cahalan
2003-10-02 17:30             ` Ulrich Drepper
2003-10-03  0:03               ` Albert Cahalan
2003-10-03  0:40                 ` Linus Torvalds
2003-10-03  2:53                   ` Jamie Lokier
2003-10-06  4:54                     ` Mike Fedyk
2003-10-06  2:52                   ` bill davidsen
2003-10-07 23:08                   ` Robert White
2003-10-07 22:46                 ` Robert White
2003-10-07 23:25                   ` Linus Torvalds
2003-10-08  0:41                     ` Robert White
2003-10-08  0:54                       ` Linus Torvalds
2003-10-08  2:31                         ` Robert White
2003-10-08  2:39                           ` David Lang
2003-10-08  2:59                             ` Robert White
2003-10-09 18:25                               ` bill davidsen
2003-10-08  2:47                           ` Who changed /proc/<pid>/ in 2.6.0-test5-bk9? (SIGPIPE?) Robert White
2003-10-08  2:57                             ` Linus Torvalds
2003-10-08  4:01                               ` Robert White
2003-10-08  4:08                                 ` Linus Torvalds
2003-10-08 10:47                         ` Who changed /proc/<pid>/ in 2.6.0-test5-bk9? bert hubert
2003-10-08 19:12                           ` Ulrich Drepper
2003-10-09 18:43                             ` bill davidsen
2003-10-08 21:54                           ` Robert White
2003-10-09 18:12                         ` bill davidsen
2003-10-10  4:39                           ` Jamie Lokier
2003-10-09 17:59                       ` bill davidsen
2003-10-11  3:02                       ` Here is a case that proves my previous position wrong regurading CLONE_THREAD and CLONE_FILES Robert White
2003-10-11  3:48                         ` viro
2003-10-12 11:41                         ` Jamie Lokier [this message]
2003-10-02  8:46       ` Who changed /proc/<pid>/ in 2.6.0-test5-bk9? Miquel van Smoorenburg
2003-10-02 22:35         ` Jamie Lokier
2003-10-02 23:43           ` Miquel van Smoorenburg
2003-10-06  2:57         ` bill davidsen
2003-10-02  3:38     ` Ulrich Drepper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20031012114119.GB13427@mail.shareable.org \
    --to=jamie@shareable.org \
    --cc=albert@users.sourceforge.net \
    --cc=drepper@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mikpe@csd.uu.se \
    --cc=rwhite@casabyte.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.