linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Richard B. Johnson" <root@chaos.analogic.com>
To: Mark Grosberg <mark@nolab.conman.org>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [RFD] Combined fork-exec syscall.
Date: Mon, 28 Apr 2003 09:00:58 -0400 (EDT)	[thread overview]
Message-ID: <Pine.LNX.4.53.0304280855240.16444@chaos> (raw)
In-Reply-To: <Pine.BSO.4.44.0304272207431.23296-100000@kwalitee.nolab.conman.org>

On Sun, 27 Apr 2003, Mark Grosberg wrote:

>
>
> On Sun, 27 Apr 2003, Richard B. Johnson wrote:
>
> > You don't save anything but one system call time which is inconsequential
> > compared to the time necessary to exec (load a file, etc). Also, it is
> > worthless for anything except the most basic 'system()' or popen()
>
> Actually, my original proposal will work for popen and all sorts of piping
> because of the file descriptor map. For example:
>
>    int   in[2], out[2];
>    char *null_argv[] = { NULL };
>    int   fmap[4];
>    pid_t p;
>
>    pipe(in);
>    pipe(out);
>
>    fmap[0] = in[0];                     /* STDIN  */
>    fmap[1] = out[1];                    /* STDOUT */
>    fmap[2] = open("/dev/null", O_RDWR); /* STDERR */
>    fmap[3] = -1;                        /* end    */
>
>    p = nexec("/bin/cat",
>              null_argv,
>              NULL,
>              filmap);
>
>
> In this case you save the extra closes the child would have to do and you
> save the dup's.
>
> > All it does is add kernel bloat and duplicate existing kernel code
> > (both). Learn Unix instead of trying to make it VMS with spawn().
>
> Ahem, I happen to know Unix very well, thank you very much. Please read my
> proposed API before flaming it out and assuming I know nothing of UNIX,
> kernel development, or operating systems in general!
>
> Do you honestly think that just because I picked a name spawn() that
> happens to be in VMS (and MS-DOS C compilers) that I am inexperienced to
> Unix. Nope. I just happen to be a BSD user in general and don't frequent
> LKML.... and now I remember WHY!
>
> And there _ARE_ issues this does solve as were already pointed out because
> of the linear scan that must be made on the file descriptor array for the
> close-on-exec flag (which this API could happily say it ignores since it
> builds a _WHOLE_NEW file descriptor array).
>
> L8r,
> Mark G.


The Unix API provides execve(), fexecve(), execv(), execle(),
execl(), execvp(), and execlp() for what you call 'exec'. So
there is no 'fork and exec' as you state.

The kernel provides one system call, execve(). All of the
other functional changes are done with 'C' wrappers in the
'C' runtime library. To make a generic fork-exec, would require
that this code, or its functionality, be moved into the kernel.

To save some processing time, most knowledgeable software
engineers would use vfork(). This leaves the major time,
the time necessary to load the new application into the
new address space and begin its execution. This time could
be tens of milliseconds or even hundreds if the application
is on a CD, floppy, a disk that hasn't been accessed yet,
or the network. In the usuall situation where processing
must be performed between the fork() and the execve(), you
can't use vfork().

You can measure the time for a system call by executing
getpid() or something similar. It is in the noise compared
to the time necessary to execute a program. Further, we
get to the situation where one can't even verify a supposed
speed increase because the system call overhead is in the
noise. Great, one can claim any improvement they want and
it can't be verified. What will be verified, though, is
the increase in size of the kernel.

The following is a "simple popem()', about as minimal as
you can get and have it work.


 *   invocation as `/bin/sh -c COMMAND`. 0 reads 1 writes.
 */
FILE *popen(const char *command, const char *type)
{
    size_t i;
    int fd2close;
    struct sigaction sa;
    char *args[NR_ARGS];
    FILE *file;
    if((command == NULL) || (type == NULL))
    {
        errno = EINVAL;
        return NULL;
    }
    if(!((*type == (char)'r') || (*type == (char)'w')))
    {
        errno = EINVAL;
        return NULL;
    }
    if((file = (FILE *) malloc(sizeof(FILE))) == NULL)
    {
        return file;
    }
    bzero(file, sizeof(FILE));
    if(pipe(file->pfd))
    {
        free(file);
        return NULL;
    }
    fd2close = 0xff;
    if(*type == (char)'r')
    {
        file->fd = file->pfd[0];
        fd2close = file->pfd[1];
    }
    else
    {
        file->fd = file->pfd[1];
        fd2close = file->pfd[0];
    }
    i = 0;
    args[i++] = "/bin/sh";
    args[i++] = "-c";
    args[i++] = strtok((char *)command, " ");
    for(; i< NR_ARGS; i++)
        if((args[i] = strtok(NULL, " ")) == NULL)
            break;
    for(i++; i < NR_ARGS; i++)
        args[i] = NULL;
    sigaction(SIGCHLD, NULL, &sa);     /* Save old */
    signal(SIGCHLD, SIG_IGN);
    switch((file->pid=fork()))
    {
    case 0:
        if(*type == (char)'r')
        {
            dup2(file->pfd[1], STDOUT_FILENO);
            (void)close(file->pfd[0]);
        }
        else
        {
            dup2(file->pfd[0], STDIN_FILENO);
            (void)close(file->pfd[1]);
        }
        signal(SIGINT, SIG_IGN);
        signal(SIGQUIT, SIG_IGN);
        execve(args[0], args, __environ);
        exit(EXIT_FAILURE);
        break;
    case -1:
        (void)close(file->pfd[0]);
        (void)close(file->pfd[1]);
        free(file);
        return NULL;
    default:
        break;
    }
    file->magic = POPEN;
    sigaction(SIGCHLD, &sa, NULL);     /* Restore old */
    (void)close(fd2close);
    return file;
}

Clearly, some additional, non-generic, processing has to
occur after the fork() and before execve(). For instance,
in the parent it is mandatory that the file descriptor that
is not being accessed by the parent be closed just as it
is mandatory that the file descriptor that is not being
accessed by the child be closed. Otherwise, a read from
the file descriptor by the parent, will not error-out
and return control to the parent when the child closes its
end of the pipe. All these 'trivial little details' are
necessary to have individual function calls work as a
system. That's why Unix breaks these functions into little
pieces (primitives) so the writer has control over the
overall behavior of the complete system. Integration of
these components into a monolythic conglomeration has
always failed to provide increased functionality or
performance, instead it simply reduces the number of
lines of code necessary to be written and maintained.

Reducing the number of lines of code may be a good thing.
However, the proper place for that is in the 'C' library,
not the kernel.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.


  parent reply	other threads:[~2003-04-28 12:47 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-04-28  0:57 [RFD] Combined fork-exec syscall Mark Grosberg
2003-04-28  0:59 ` Larry McVoy
2003-04-28  1:16   ` Mark Grosberg
2003-04-28  1:36     ` Måns Rullgård
2003-04-28  1:45       ` Mark Grosberg
2003-04-28  1:49       ` dean gaudet
2003-04-28  1:59         ` Mark Grosberg
2003-04-28  2:27           ` Miles Bader
2003-04-28 19:07           ` dean gaudet
2003-05-01 13:14       ` Jakob Oestergaard
2003-04-28  1:17 ` Davide Libenzi
2003-04-28  1:28   ` Mark Grosberg
2003-04-29  2:01     ` Rafael Costa dos Santos
2003-04-28  1:41   ` Ulrich Drepper
2003-04-28  1:49     ` Mark Grosberg
2003-04-28  2:19       ` Ulrich Drepper
2003-04-28  6:59       ` Kai Henningsen
2003-04-28  1:35 ` dean gaudet
2003-04-28  1:43   ` Mark Grosberg
2003-04-28  3:44     ` Mark Mielke
2003-04-28  5:16       ` Jamie Lokier
2003-04-28  2:38   ` Davide Libenzi
2003-04-28  2:09 ` Richard B. Johnson
2003-04-28  2:12   ` Mark Grosberg
2003-04-28  2:42     ` Werner Almesberger
2003-04-28  6:35       ` Mark Grosberg
2003-04-29  2:47       ` Rafael Santos
2003-04-28  3:20         ` Werner Almesberger
2003-04-28 13:00     ` Richard B. Johnson [this message]
2003-04-28 13:22       ` Andreas Schwab
2003-04-28 13:57         ` Richard B. Johnson
2003-04-28 13:57           ` Andreas Schwab
2003-04-28 14:16             ` Richard B. Johnson
2003-04-28 14:38               ` Valdis.Kletnieks
2003-04-28 14:56                 ` Richard B. Johnson
2003-04-28 14:42               ` Andreas Schwab
2003-04-28 16:36       ` Mark Grosberg
2003-04-28 17:19         ` Davide Libenzi
2003-04-28 18:28         ` Craig Ruff
2003-05-06  2:48         ` Miles Bader
2003-04-29 18:50       ` Timothy Miller
2003-04-28  2:32   ` Werner Almesberger
2003-04-28  7:40 ` Mirar
2003-04-28 12:45 ` Matthias Andree
2003-04-29  1:05 ` Rafael Costa dos Santos
2003-04-28  1:19   ` Mark Grosberg
2003-04-29  1:29     ` Rafael Costa dos Santos
2003-04-28  3:03 Davide Libenzi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.53.0304280855240.16444@chaos \
    --to=root@chaos.analogic.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark@nolab.conman.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).