linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* select() bug
@ 2000-11-02 22:11 Paul Marquis
  2000-11-02 22:27 ` Alan Cox
  0 siblings, 1 reply; 24+ messages in thread
From: Paul Marquis @ 2000-11-02 22:11 UTC (permalink / raw)
  To: Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 890 bytes --]

I've uncovered a bug in select() when checking if it's okay to write
on a pipe.  It will report "false negatives" if there is any unread
data already on the pipe, even a single byte.  As soon as the pipe
gets flushed, select() does the right thing.

Normally, I wouldn't think this such a big deal, except that Apache
uses the writability of s pipe to determine if any of its children
that are log file handlers are dead.  If select() reports it can't
write immediately, Apache terminates and restarts the child process,
creating unnecessary load on the system.

The bug exists in the 2.2.x kernels up to and including 2.2.17 that
I've tried and I don't know it extends beyond pipes.  I've attached
sample code to demonstrate the problem, which works correctly on BSD
and Solaris.

Is this a know bug?

-- 
Paul Marquis
pmarquis@iname.com

If it's tourist season, why can't we shoot them?


[-- Attachment #2: simple_pipe_test.c --]
[-- Type: application/octet-stream, Size: 2882 bytes --]

#include <sys/time.h>
#include <sys/types.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>

int main(int argc, char **argv)
{
   int fd[2];
   int i;
   char buf[2];
   fd_set write_fds;
   struct timeval tv;
   ssize_t n;
   int status;

   /* initialize */
   buf[0] = 'h';
   buf[1] = 'i';

   do
   {
      /* create pipe */
      if (0 > pipe(fd))
      {
         printf("pipe error - %s\n", strerror(errno));
         break;
      }

      /*
       * loop four times
       * select for write, then write one byte
       * after the second write, do a read
       * odd selects succeed, even fail
       */
      for (i = 0; i < 4; i++)
      {
         printf("iteration %d\n", i + 1);

         status = 0;

         /* do select */
         FD_ZERO(&write_fds);
         FD_SET(fd[1], &write_fds);
         tv.tv_sec = 0;
         tv.tv_usec = 0;
         switch (select(fd[1] + 1, NULL, &write_fds, NULL, &tv))
         {
            case -1:
               /* should probably check for EINTR and/or EWOULDBLOCK */
               printf("select error - %s\n", strerror(errno));
               break;
            case 0:
               /* no I/O */
               puts("select returned 0 -- pipe not writable");
               break;
            default:
               /* make sure our fd is writable */
               if (FD_ISSET(fd[1], &write_fds))
               {
                  puts("select says pipe is writable");
                  status = 1;
               }
               else
                  puts("select says pipe is not writable");
               break;
         }

         /* do write */
         n = write(fd[1], buf + (i % 2), 1);
         switch (n)
         {
            case -1:
               /* should probably check for EINTR and/or EWOULDBLOCK */
               printf("write error - %s\n", strerror(errno));
               break;
            case 0:
               /* no I/O */
               puts("write returned 0 -- pipe closed");
               break;
            default:
               printf("write succeeded - %d bytes written\n", n);
               if (0 == status)
                  puts("select bug");
               break;
         }

         /* do read */
         if (0 == ((i + 1) % 2))
         {
            n = read(fd[0], buf, sizeof(buf));
            switch (n)
            {
               case -1:
                  /* should probably check for EINTR and/or EWOULDBLOCK */
                  printf("write error - %s\n", strerror(errno));
                  break;
               case 0:
                  /* no I/O */
                  puts("read returned 0 -- pipe closed");
                  break;
               default:
                  printf("read succeeded - %d bytes read\n", n);
                  break;
            }
         }
      }
   }
   while (0);

   exit(0);
}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
  2000-11-02 22:11 select() bug Paul Marquis
@ 2000-11-02 22:27 ` Alan Cox
  2000-11-02 22:42   ` Richard B. Johnson
  2000-11-02 22:53   ` Paul Marquis
  0 siblings, 2 replies; 24+ messages in thread
From: Alan Cox @ 2000-11-02 22:27 UTC (permalink / raw)
  To: Paul Marquis; +Cc: Linux Kernel Mailing List

> that are log file handlers are dead.  If select() reports it can't
> write immediately, Apache terminates and restarts the child process,
> creating unnecessary load on the system.

Is there anything saying that select has to report ready the instant a byte
would fit. Certainly its better for performance to reduce the context switch
rate by encouraging blocking


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
  2000-11-02 22:27 ` Alan Cox
@ 2000-11-02 22:42   ` Richard B. Johnson
  2000-11-02 22:58     ` Paul Marquis
  2000-11-02 22:53   ` Paul Marquis
  1 sibling, 1 reply; 24+ messages in thread
From: Richard B. Johnson @ 2000-11-02 22:42 UTC (permalink / raw)
  To: Alan Cox; +Cc: Paul Marquis, Linux Kernel Mailing List

On Thu, 2 Nov 2000, Alan Cox wrote:

> > that are log file handlers are dead.  If select() reports it can't
> > write immediately, Apache terminates and restarts the child process,
> > creating unnecessary load on the system.
> 
> Is there anything saying that select has to report ready the instant a byte
> would fit. Certainly its better for performance to reduce the context switch
> rate by encouraging blocking
> 

It looks as though, when select() is reporting that the pipe is
NOT writable, the code writes anyway -- then complains that it
could be written.

This is a code bug. The pipe can certainly become writable during
the time interval between when it was checked by select() and the
attempt to write the byte.


Cheers,
Dick Johnson

Penguin : Linux version 2.2.17 on an i686 machine (801.18 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
  2000-11-02 22:27 ` Alan Cox
  2000-11-02 22:42   ` Richard B. Johnson
@ 2000-11-02 22:53   ` Paul Marquis
  2000-11-02 22:58     ` Alan Cox
  1 sibling, 1 reply; 24+ messages in thread
From: Paul Marquis @ 2000-11-02 22:53 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

I guess in theory, you're right, though if a write() could succeed,
shouldn't select() say that it would?

And this assumes you're calling select() with a timeout.  In Apache,
the caretaker process wakes up periodically and polls the pipe with a
timeout of zero.  If it gets back the pipe is not writable, it kills
the process.  With this false negative situation, this is a bad thing.

Alan Cox wrote:
> 
> > that are log file handlers are dead.  If select() reports it can't
> > write immediately, Apache terminates and restarts the child process,
> > creating unnecessary load on the system.
> 
> Is there anything saying that select has to report ready the instant a byte
> would fit. Certainly its better for performance to reduce the context switch
> rate by encouraging blocking

-- 
Paul Marquis
pmarquis@iname.com

If it's tourist season, why can't we shoot them?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
  2000-11-02 22:53   ` Paul Marquis
@ 2000-11-02 22:58     ` Alan Cox
  2000-11-02 23:08       ` Paul Marquis
  0 siblings, 1 reply; 24+ messages in thread
From: Alan Cox @ 2000-11-02 22:58 UTC (permalink / raw)
  To: Paul Marquis; +Cc: Alan Cox, Linux Kernel Mailing List

> I guess in theory, you're right, though if a write() could succeed,
> shouldn't select() say that it would?

Thats certainly not normal for a lot of devices.  Most fast devices wake up
when buffers are half empty for example.

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
  2000-11-02 22:42   ` Richard B. Johnson
@ 2000-11-02 22:58     ` Paul Marquis
  2000-11-03  0:53       ` Richard B. Johnson
  0 siblings, 1 reply; 24+ messages in thread
From: Paul Marquis @ 2000-11-02 22:58 UTC (permalink / raw)
  To: root; +Cc: Alan Cox, Linux Kernel Mailing List

In the code sample, there is a loop that is run four times.  During
each iteration, a call to select() and write() is done, while every
other iteration does a read().  Between the 1st and 2nd calls to
write(), as well as the 3rd and 4th, select() fails, but all write()'s
and read()'s succeed.  There is no code bug.

Richard B. Johnson wrote:
> On Thu, 2 Nov 2000, Alan Cox wrote:
> 
> > > that are log file handlers are dead.  If select() reports it can't
> > > write immediately, Apache terminates and restarts the child process,
> > > creating unnecessary load on the system.
> >
> > Is there anything saying that select has to report ready the instant a byte
> > would fit. Certainly its better for performance to reduce the context switch
> > rate by encouraging blocking
> >
> 
> It looks as though, when select() is reporting that the pipe is
> NOT writable, the code writes anyway -- then complains that it
> could be written.
> 
> This is a code bug. The pipe can certainly become writable during
> the time interval between when it was checked by select() and the
> attempt to write the byte.
> 
> Cheers,
> Dick Johnson
> 
> Penguin : Linux version 2.2.17 on an i686 machine (801.18 BogoMips).
> 
> "Memory is like gasoline. You use it up when you are running. Of
> course you get it all back when you reboot..."; Actual explanation
> obtained from the Micro$oft help desk.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/

-- 
Paul Marquis
pmarquis@iname.com

If it's tourist season, why can't we shoot them?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
  2000-11-02 22:58     ` Alan Cox
@ 2000-11-02 23:08       ` Paul Marquis
  2000-11-02 23:20         ` Alan Cox
  0 siblings, 1 reply; 24+ messages in thread
From: Paul Marquis @ 2000-11-02 23:08 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

I'm not exactly sure what you mean by this statement.  Would you mind
explaining further?

Alan Cox wrote:
> Most fast devices wake up when buffers are half empty for example.

-- 
Paul Marquis
pmarquis@iname.com

If it's tourist season, why can't we shoot them?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
  2000-11-02 23:08       ` Paul Marquis
@ 2000-11-02 23:20         ` Alan Cox
  2000-11-02 23:44           ` Paul Marquis
  0 siblings, 1 reply; 24+ messages in thread
From: Alan Cox @ 2000-11-02 23:20 UTC (permalink / raw)
  To: Paul Marquis; +Cc: Alan Cox, Linux Kernel Mailing List

> I'm not exactly sure what you mean by this statement.  Would you mind
> explaining further?

Well take a socket with 64K of buffering. You don't want to wake processes
waiting in select or in write every time you can scribble another 1460 bytes
to the buffer. Instead you wait until there is 32K of room then wake the
user. That means that there is one wakeup/trip through userspace every 32K
rather than potentially every time a byte is read the other end

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
  2000-11-02 23:20         ` Alan Cox
@ 2000-11-02 23:44           ` Paul Marquis
  2000-11-02 23:46             ` David S. Miller
                               ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Paul Marquis @ 2000-11-02 23:44 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

Okay, I see your point, thanks.  A couple of comments/questions:

- Does this make sense with devices with small kernel buffers?  From
my experimentation, pipes on Linux have a 4K buffer and tend to be
read and written very quickly.

- If I'm correct that pipes have a 4K kernel buffer, then writing 1
byte shouldn't cause this situation, as the buffer is well more than
half empty.  Is this still a bug?

Semantic issues aside, since Apache does the test I mentionned earlier
to determine child status and since it could be misled, should this
feature be turned off?

Thanks for your input.

Alan Cox wrote:
> > I'm not exactly sure what you mean by this statement.  Would you mind
> > explaining further?
> 
> Well take a socket with 64K of buffering. You don't want to wake processes
> waiting in select or in write every time you can scribble another 1460 bytes
> to the buffer. Instead you wait until there is 32K of room then wake the
> user. That means that there is one wakeup/trip through userspace every 32K
> rather than potentially every time a byte is read the other end

-- 
Paul Marquis
pmarquis@iname.com

If it's tourist season, why can't we shoot them?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
  2000-11-02 23:44           ` Paul Marquis
@ 2000-11-02 23:46             ` David S. Miller
  2000-11-02 23:52               ` David S. Miller
  2000-11-03  0:04               ` H. Peter Anvin
  2000-11-02 23:53             ` H. Peter Anvin
  2000-11-02 23:55             ` Alan Cox
  2 siblings, 2 replies; 24+ messages in thread
From: David S. Miller @ 2000-11-02 23:46 UTC (permalink / raw)
  To: hpa; +Cc: linux-kernel

   From: "H. Peter Anvin" <hpa@zytor.com>
   Date: 	2 Nov 2000 15:53:29 -0800

   Has anyone considered the possibility of expanding the buffer of
   high-traffic pipes?

The kiobuf pipe patches are a more effective performance improvement
for this type of usage.  It has the benefit of not requiring a
temporary kernel buffer of any size :-)

Later,
David S. Miller
davem@redhat.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
  2000-11-02 23:46             ` David S. Miller
@ 2000-11-02 23:52               ` David S. Miller
  2000-11-03  0:05                 ` David S. Miller
  2000-11-03  0:13                 ` H. Peter Anvin
  2000-11-03  0:04               ` H. Peter Anvin
  1 sibling, 2 replies; 24+ messages in thread
From: David S. Miller @ 2000-11-02 23:52 UTC (permalink / raw)
  To: hpa; +Cc: hpa, linux-kernel

   Date: Thu, 02 Nov 2000 16:04:05 -0800
   From: "H. Peter Anvin" <hpa@transmeta.com>

   That's (very) nice, but it does assume there is currently a reader
   listening.

No, it has no such assumption.

Later,
David S. Miller
davem@redhat.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
  2000-11-02 23:44           ` Paul Marquis
  2000-11-02 23:46             ` David S. Miller
@ 2000-11-02 23:53             ` H. Peter Anvin
  2000-11-03  0:01               ` Alan Cox
  2000-11-02 23:55             ` Alan Cox
  2 siblings, 1 reply; 24+ messages in thread
From: H. Peter Anvin @ 2000-11-02 23:53 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <3A01FC44.8A43FE8B@iname.com>
By author:    Paul Marquis <pmarquis@iname.com>
In newsgroup: linux.dev.kernel
>
> Okay, I see your point, thanks.  A couple of comments/questions:
> 
> - Does this make sense with devices with small kernel buffers?  From
> my experimentation, pipes on Linux have a 4K buffer and tend to be
> read and written very quickly.
> 
> - If I'm correct that pipes have a 4K kernel buffer, then writing 1
> byte shouldn't cause this situation, as the buffer is well more than
> half empty.  Is this still a bug?
> 
> Semantic issues aside, since Apache does the test I mentionned earlier
> to determine child status and since it could be misled, should this
> feature be turned off?
> 
> Thanks for your input.
> 

Has anyone considered the possibility of expanding the buffer of
high-traffic pipes?

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
  2000-11-02 23:44           ` Paul Marquis
  2000-11-02 23:46             ` David S. Miller
  2000-11-02 23:53             ` H. Peter Anvin
@ 2000-11-02 23:55             ` Alan Cox
  2000-11-03  5:52               ` dean gaudet
  2000-11-03  7:05               ` Marc Lehmann
  2 siblings, 2 replies; 24+ messages in thread
From: Alan Cox @ 2000-11-02 23:55 UTC (permalink / raw)
  To: Paul Marquis; +Cc: Alan Cox, Linux Kernel Mailing List

> - Does this make sense with devices with small kernel buffers?  From
> my experimentation, pipes on Linux have a 4K buffer and tend to be
> read and written very quickly. 

It makes sense for all things I suspect

> - If I'm correct that pipes have a 4K kernel buffer, then writing 1
> byte shouldn't cause this situation, as the buffer is well more than
> half empty.  Is this still a bug?

The pipe code uses totally full/empty. Im not sure why that was chosen

> Semantic issues aside, since Apache does the test I mentionned earlier
> to determine child status and since it could be misled, should this
> feature be turned off?

Or made smarter yes

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
  2000-11-02 23:53             ` H. Peter Anvin
@ 2000-11-03  0:01               ` Alan Cox
  0 siblings, 0 replies; 24+ messages in thread
From: Alan Cox @ 2000-11-03  0:01 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

> Has anyone considered the possibility of expanding the buffer of
> high-traffic pipes?

Do that too much and the data falls out of L1 cache before we context switch. 
Its a rather complex juggling game. DaveM's kiovec pipe patches avoid some of
this by cheating and going user->user

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
  2000-11-02 23:46             ` David S. Miller
  2000-11-02 23:52               ` David S. Miller
@ 2000-11-03  0:04               ` H. Peter Anvin
  1 sibling, 0 replies; 24+ messages in thread
From: H. Peter Anvin @ 2000-11-03  0:04 UTC (permalink / raw)
  To: David S. Miller; +Cc: hpa, linux-kernel

"David S. Miller" wrote:
> 
>    From: "H. Peter Anvin" <hpa@zytor.com>
>    Date:        2 Nov 2000 15:53:29 -0800
> 
>    Has anyone considered the possibility of expanding the buffer of
>    high-traffic pipes?
> 
> The kiobuf pipe patches are a more effective performance improvement
> for this type of usage.  It has the benefit of not requiring a
> temporary kernel buffer of any size :-)
> 

That's (very) nice, but it does assume there is currently a reader
listening.

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
  2000-11-02 23:52               ` David S. Miller
@ 2000-11-03  0:05                 ` David S. Miller
  2000-11-03  0:38                   ` H. Peter Anvin
  2000-11-03  0:13                 ` H. Peter Anvin
  1 sibling, 1 reply; 24+ messages in thread
From: David S. Miller @ 2000-11-03  0:05 UTC (permalink / raw)
  To: hpa; +Cc: hpa, linux-kernel

   Date: Thu, 02 Nov 2000 16:13:13 -0800
   From: "H. Peter Anvin" <hpa@transmeta.com>

   Oh.  What do you do if there isn't... link up the contents of the
   write() in a kiovec and hold the writer?

Right, the writer blocks.

I've already posted the patches here within the past week, I'll send
them to you under seperate cover so you can see for yourself how it
works.

Later,
David S. Miller
davem@redhat.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
  2000-11-02 23:52               ` David S. Miller
  2000-11-03  0:05                 ` David S. Miller
@ 2000-11-03  0:13                 ` H. Peter Anvin
  1 sibling, 0 replies; 24+ messages in thread
From: H. Peter Anvin @ 2000-11-03  0:13 UTC (permalink / raw)
  To: David S. Miller; +Cc: hpa, linux-kernel

"David S. Miller" wrote:
> 
>    Date: Thu, 02 Nov 2000 16:04:05 -0800
>    From: "H. Peter Anvin" <hpa@transmeta.com>
> 
>    That's (very) nice, but it does assume there is currently a reader
>    listening.
> 
> No, it has no such assumption.
> 

Oh.  What do you do if there isn't... link up the contents of the write()
in a kiovec and hold the writer?

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
  2000-11-03  0:05                 ` David S. Miller
@ 2000-11-03  0:38                   ` H. Peter Anvin
  0 siblings, 0 replies; 24+ messages in thread
From: H. Peter Anvin @ 2000-11-03  0:38 UTC (permalink / raw)
  To: David S. Miller; +Cc: hpa, linux-kernel

"David S. Miller" wrote:
> 
>    Date: Thu, 02 Nov 2000 16:13:13 -0800
>    From: "H. Peter Anvin" <hpa@transmeta.com>
> 
>    Oh.  What do you do if there isn't... link up the contents of the
>    write() in a kiovec and hold the writer?
> 
> Right, the writer blocks.
> 
> I've already posted the patches here within the past week, I'll send
> them to you under seperate cover so you can see for yourself how it
> works.
> 

Sure, but my point was that it would be nice for high-traffic pipes to
allow a larger volume of data with the two processes still running
concurrently.

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
  2000-11-02 22:58     ` Paul Marquis
@ 2000-11-03  0:53       ` Richard B. Johnson
  2000-11-03  5:00         ` Paul Marquis
  0 siblings, 1 reply; 24+ messages in thread
From: Richard B. Johnson @ 2000-11-03  0:53 UTC (permalink / raw)
  To: Paul Marquis; +Cc: Alan Cox, Linux Kernel Mailing List

On Thu, 2 Nov 2000, Paul Marquis wrote:

> In the code sample, there is a loop that is run four times.  During
> each iteration, a call to select() and write() is done, while every
> other iteration does a read().  Between the 1st and 2nd calls to
> write(), as well as the 3rd and 4th, select() fails, but all write()'s
> and read()'s succeed.  There is no code bug.

Not as I see it. This is clearly a code bug. Look:
iteration 1
select says pipe is writable
write succeeded - 1 bytes written
iteration 2
select returned 0 -- pipe not writable

      for (i = 0; i < 4; i++)
      {
         printf("iteration %d\n", i + 1);

         status = 0;

         /* do select */
         FD_ZERO(&write_fds);
         FD_SET(fd[1], &write_fds);
         tv.tv_sec = 0;
         tv.tv_usec = 0;
         switch (select(fd[1] + 1, NULL, &write_fds, NULL, &tv))
         {
            case -1:
               /* should probably check for EINTR and/or EWOULDBLOCK */
               printf("select error - %s\n", strerror(errno));
               break;
            case 0:
               /* no I/O */
               puts("select returned 0 -- pipe not writable");
               break;

            The BREAK breaks out of the switch-statement ---
                                                           |
|-----------------------------------------------------------
|
|            default:
|               /* make sure our fd is writable */
|               if (FD_ISSET(fd[1], &write_fds))
|               {
|                  puts("select says pipe is writable");
|                  status = 1;
|               }
|               else
|                  puts("select says pipe is not writable");
|               break;
|         }
|
|         /* do write */
|------->  n = write(fd[1], buf + (i % 2), 1);

Then you write it anyway!

write succeeded - 1 bytes written
select bug




Cheers,
Dick Johnson

Penguin : Linux version 2.2.17 on an i686 machine (801.18 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
  2000-11-03  0:53       ` Richard B. Johnson
@ 2000-11-03  5:00         ` Paul Marquis
  2000-11-03 13:05           ` Richard B. Johnson
  0 siblings, 1 reply; 24+ messages in thread
From: Paul Marquis @ 2000-11-03  5:00 UTC (permalink / raw)
  To: root; +Cc: Alan Cox, Linux Kernel Mailing List

It shows that even though select() says a file descriptor is not
writable, a write() can still succeed.  This code is not used anywhere
in the real world, or at least my real world :-P.  It just demonstrates
"the bug".

Richard B. Johnson" wrote:
> Not as I see it. This is clearly a code bug. Look:
> iteration 1
> select says pipe is writable
> write succeeded - 1 bytes written
> iteration 2
> select returned 0 -- pipe not writable
> 
>       for (i = 0; i < 4; i++)
>       {
>          printf("iteration %d\n", i + 1);
> 
>          status = 0;
> 
>          /* do select */
>          FD_ZERO(&write_fds);
>          FD_SET(fd[1], &write_fds);
>          tv.tv_sec = 0;
>          tv.tv_usec = 0;
>          switch (select(fd[1] + 1, NULL, &write_fds, NULL, &tv))
>          {
>             case -1:
>                /* should probably check for EINTR and/or EWOULDBLOCK */
>                printf("select error - %s\n", strerror(errno));
>                break;
>             case 0:
>                /* no I/O */
>                puts("select returned 0 -- pipe not writable");
>                break;
> 
>             The BREAK breaks out of the switch-statement ---
>                                                            |
> |-----------------------------------------------------------
> |
> |            default:
> |               /* make sure our fd is writable */
> |               if (FD_ISSET(fd[1], &write_fds))
> |               {
> |                  puts("select says pipe is writable");
> |                  status = 1;
> |               }
> |               else
> |                  puts("select says pipe is not writable");
> |               break;
> |         }
> |
> |         /* do write */
> |------->  n = write(fd[1], buf + (i % 2), 1);
> 
> Then you write it anyway!
> 
> write succeeded - 1 bytes written
> select bug
> 
> Cheers,
> Dick Johnson
> 
> Penguin : Linux version 2.2.17 on an i686 machine (801.18 BogoMips).
> 
> "Memory is like gasoline. You use it up when you are running. Of
> course you get it all back when you reboot..."; Actual explanation
> obtained from the Micro$oft help desk.

--
Paul Marquis
pmarquis@iname.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
  2000-11-02 23:55             ` Alan Cox
@ 2000-11-03  5:52               ` dean gaudet
  2000-11-03  7:05               ` Marc Lehmann
  1 sibling, 0 replies; 24+ messages in thread
From: dean gaudet @ 2000-11-03  5:52 UTC (permalink / raw)
  To: Alan Cox; +Cc: Paul Marquis, Linux Kernel Mailing List

> > Semantic issues aside, since Apache does the test I mentionned earlier
> > to determine child status and since it could be misled, should this
> > feature be turned off?
> 
> Or made smarter yes

i'm scratching my head wondering what i was thinking when i wrote that
code.  the specific thing the parent wants to handle there is the case of
a stuck logger... you'd think i would have at least put some debouncing
logic in there.

(the parent is able to replace a logger without disturbing the children
because it keeps a copy of both halves of the logging pipes.)

an alternative would be to look for many children stuck in the logging
state (they make a note in the scoreboard before going into it).

paul -- if you want to just eliminate that feature (it'll still be able to
replace the logger if the logger exits) go into src/http_log.c,
piped_log_maintenance and comment out the OC_REASON_UNWRITEABLE.

-dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
  2000-11-02 23:55             ` Alan Cox
  2000-11-03  5:52               ` dean gaudet
@ 2000-11-03  7:05               ` Marc Lehmann
  1 sibling, 0 replies; 24+ messages in thread
From: Marc Lehmann @ 2000-11-03  7:05 UTC (permalink / raw)
  To: Alan Cox; +Cc: Paul Marquis, Linux Kernel Mailing List

On Thu, Nov 02, 2000 at 11:55:52PM +0000, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > - If I'm correct that pipes have a 4K kernel buffer, then writing 1
> > byte shouldn't cause this situation, as the buffer is well more than
> > half empty.  Is this still a bug?
> 
> The pipe code uses totally full/empty. Im not sure why that was chosen

Just a quick guess: maybe because of the POSIX atomicity guarantees (if
select returned, write might have to block which is not what is expected),
and maybe this limitation was used not only on write but on read (Although
it's not necessary on the read side, AFAIK).

-- 
      -----==-                                             |
      ----==-- _                                           |
      ---==---(_)__  __ ____  __       Marc Lehmann      +--
      --==---/ / _ \/ // /\ \/ /       pcg@opengroup.org |e|
      -=====/_/_//_/\_,_/ /_/\_\       XX11-RIPE         --+
    The choice of a GNU generation                       |
                                                         |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
  2000-11-03  5:00         ` Paul Marquis
@ 2000-11-03 13:05           ` Richard B. Johnson
  0 siblings, 0 replies; 24+ messages in thread
From: Richard B. Johnson @ 2000-11-03 13:05 UTC (permalink / raw)
  To: Paul Marquis; +Cc: Alan Cox, Linux Kernel Mailing List

On Fri, 3 Nov 2000, Paul Marquis wrote:

> It shows that even though select() says a file descriptor is not
> writable, a write() can still succeed.  This code is not used anywhere
> in the real world, or at least my real world :-P.  It just demonstrates
> "the bug".
> 

It demonstrates nothing except bad code. Anybody should know that
such a descriptor can become writable at any instant, simply because
the kernel may be finding room for more unread data.

So, the code sees that, at this instant, it's not writable. Then
it writes to it anyway. Sometimes the fd becomes writable between
the time that it was checked, and the time it was written. So
the write succeeded. So what. It shows nothing but bad code. It
demonstrates no bug whatsover.

There may be a bug. However, this code doesn't demonstrate anything
but bad coding practice.

Cheers,
Dick Johnson

Penguin : Linux version 2.2.17 on an i686 machine (801.18 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: select() bug
@ 2000-11-05 16:14 Stanislav Meduna
  0 siblings, 0 replies; 24+ messages in thread
From: Stanislav Meduna @ 2000-11-05 16:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: pcg, alan

> > > - If I'm correct that pipes have a 4K kernel buffer, then writing 1 
> > > byte shouldn't cause this situation, as the buffer is well more than 
> > > half empty. Is this still a bug? 
> > 
> > The pipe code uses totally full/empty. Im not sure why that was chosen 
> 
> Just a quick guess: maybe because of the POSIX atomicity guarantees (if 
> select returned, write might have to block which is not what is expected), 
> and maybe this limitation was used not only on write but on read (Although 
> it's not necessary on the read side, AFAIK). 

FWIW: I tried to do some experiments (I was hit by the context
switch 'explosion': the writer writes one byte and then
cannot write another one until the reader wakes up and reads
the single one - for my application this was very bad).

Unfortunately my experiments made the most-used uses slower
and I don't know the kernel internals enough to analyse it :-(

I am repeating here my post from 14. 7. (this was with -test4,
I don't know whether there were any significant changes since then)
under the subject "pipe concurrent r/w performance". Last time
I did not get any replies.

- snip -

I am (again) playing with pipe.c trying to enlarge the
pipe buffer, so the pipe can select for write even when
non-empty (but with more than PIPE_BUF free), aiming
for reducing context switches when two applications
signal something via a pipe without waiting for
an answer. The functionality is (hopefully) OK and
I am testing the performance.

I have written two similar test programs. The
writer side selects the pipe and then writes a byte.
The reader side does either 1) blocking read of one
byte, or 2) selects the pipe for reading and then
reads a byte.

To my surprise the case 1) is slower than the original
version by 20% and there is actually more context switches.
Case 2) is at least 30% faster than the original version
and the number of context switches is dramatically reduced.

The difference probably is, that the reader is faster
than the writer and in the case 1) it nearly always
blocks in the pipe_wait call inside pipe_read.
It seems like as soon as the writer writes something
in the pipe, the reader immediately gets the CPU
instead of allowing the writer to continue.

In the case 2) it blocks in poll_wait inside pipe_poll.

I don't quite understand all the semantics behind
synchronisation primitives, but it seems like the
current code is not optimal for the concurrent
presence of reader and writer in pipe_read/pipe_write.
The code in pipe_poll excluded such possibility.
Could someone understanding these things better
take a look?

My patch and the test programs are available at
http://www.penguin.cz/~stano/code/kernelpipe.tar.gz
The patch is against 2.4.0-test4, my machine is
a SMP one (maybe the situation differs on UP).

- snip -

Regards
-- 
				Stano



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2000-11-05 16:15 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-11-02 22:11 select() bug Paul Marquis
2000-11-02 22:27 ` Alan Cox
2000-11-02 22:42   ` Richard B. Johnson
2000-11-02 22:58     ` Paul Marquis
2000-11-03  0:53       ` Richard B. Johnson
2000-11-03  5:00         ` Paul Marquis
2000-11-03 13:05           ` Richard B. Johnson
2000-11-02 22:53   ` Paul Marquis
2000-11-02 22:58     ` Alan Cox
2000-11-02 23:08       ` Paul Marquis
2000-11-02 23:20         ` Alan Cox
2000-11-02 23:44           ` Paul Marquis
2000-11-02 23:46             ` David S. Miller
2000-11-02 23:52               ` David S. Miller
2000-11-03  0:05                 ` David S. Miller
2000-11-03  0:38                   ` H. Peter Anvin
2000-11-03  0:13                 ` H. Peter Anvin
2000-11-03  0:04               ` H. Peter Anvin
2000-11-02 23:53             ` H. Peter Anvin
2000-11-03  0:01               ` Alan Cox
2000-11-02 23:55             ` Alan Cox
2000-11-03  5:52               ` dean gaudet
2000-11-03  7:05               ` Marc Lehmann
2000-11-05 16:14 Stanislav Meduna

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).