linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Problem with select?
@ 2003-06-18  8:49 Eli Barzilay
  2003-07-24  5:28 ` Repost: Bug " Eli Barzilay
  0 siblings, 1 reply; 10+ messages in thread
From: Eli Barzilay @ 2003-06-18  8:49 UTC (permalink / raw)
  To: linux-kernel

Hello,

When I run the following program, and block the terminal's output
(C-s), the `select' doesn't seem to have any effect, resulting in a
100% cpu usage (this is on a RH8, with 2.4.18).  I wouldn't be
surprised if I'm doing something stupid, but it does seem to work fine
on Solaris.

Is there anything wrong with this, or is this some bug?

======================================================================
#include <unistd.h>
#include <fcntl.h>
int main() {
  int flags, fd, len; fd_set writefds;
  fd = 1;
  flags = fcntl(fd, F_GETFL, 0);
  fcntl(fd, F_SETFL, flags | O_NONBLOCK);
  while (1) {
    FD_ZERO(&writefds);
    FD_SET(fd, &writefds);
    len = select(fd + 1, NULL, &writefds, NULL, NULL);
    if (!FD_ISSET(fd,&writefds)) exit(0);
    len = write(fd, "hi\n", 3);
  }
  fcntl(fd, F_SETFL, flags);
}
======================================================================

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                  http://www.barzilay.org/                 Maze is Life!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Repost: Bug with select?
  2003-06-18  8:49 Problem with select? Eli Barzilay
@ 2003-07-24  5:28 ` Eli Barzilay
  2003-07-25 13:41   ` Marco Roeland
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Eli Barzilay @ 2003-07-24  5:28 UTC (permalink / raw)
  To: linux-kernel


[This is a second post, since I didn't get any replies the first time.
It looks more like a bug now, which sounds strange for something that
common...]

When I run the following program, and block the terminal's output
(C-s), the `select' doesn't seem to have any effect, resulting in a
100% cpu usage (this is on a RH8, with 2.4.18).  I wouldn't be
surprised if I'm doing something stupid, but it does seem to work fine
on Solaris.

Is there anything wrong with this, or is this some bug?

======================================================================
#include <unistd.h>
#include <fcntl.h>
int main() {
  int flags, fd, len; fd_set writefds;
  fd = 1;
  flags = fcntl(fd, F_GETFL, 0);
  fcntl(fd, F_SETFL, flags | O_NONBLOCK);
  while (1) {
    FD_ZERO(&writefds);
    FD_SET(fd, &writefds);
    len = select(fd + 1, NULL, &writefds, NULL, NULL);
    if (!FD_ISSET(fd,&writefds)) exit(0);
    len = write(fd, "hi\n", 3);
  }
  fcntl(fd, F_SETFL, flags);
}
======================================================================

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                  http://www.barzilay.org/                 Maze is Life!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Repost: Bug with select?
  2003-07-24  5:28 ` Repost: Bug " Eli Barzilay
@ 2003-07-25 13:41   ` Marco Roeland
  2003-07-26  0:20     ` Ben Greear
  2003-07-26  0:35   ` Philippe Troin
  2003-07-26 14:25   ` Eli Barzilay
  2 siblings, 1 reply; 10+ messages in thread
From: Marco Roeland @ 2003-07-25 13:41 UTC (permalink / raw)
  To: Linux Kernel Development; +Cc: Eli Barzilay

On Thursday July 24th 2003 at 01:28 uur Eli Barzilay wrote:

> When I run the following program, and block the terminal's output
> (C-s), the `select' doesn't seem to have any effect, resulting in a
> 100% cpu usage (this is on a RH8, with 2.4.18).  I wouldn't be
> surprised if I'm doing something stupid, but it does seem to work fine
> on Solaris.
> 
> Is there anything wrong with this, or is this some bug?
> 
> ======================================================================
> #include <unistd.h>
> #include <fcntl.h>
> int main() {
>   int flags, fd, len; fd_set writefds;
>   fd = 1;
>   flags = fcntl(fd, F_GETFL, 0);
>   fcntl(fd, F_SETFL, flags | O_NONBLOCK);

You use non-blocking mode here.

>   while (1) {
>     FD_ZERO(&writefds);
>     FD_SET(fd, &writefds);
>     len = select(fd + 1, NULL, &writefds, NULL, NULL);

A select with no timeout, so it will immediately return.

>     if (!FD_ISSET(fd,&writefds)) exit(0);

This might be what Solaris does differently, by _not_ including '1' in
the returned descriptors? Linux will say (rightly) that a following call
will not block, which is something very different than 'will not fail'!

>     len = write(fd, "hi\n", 3);

You don't check the exit status here, but when you press Ctrl-C (stdout
blocked) it will indicate an error here (exit status -1) with errno set
to EAGAIN, meaning you should try again, which is the appropriate result
for a non-blocking descriptor or socket here. Anyway, the call "succeeds" and
we loop back into the while(1), indeed as you say creating a busy loop.
No surprises there I'd say.

>   }
>   fcntl(fd, F_SETFL, flags);
> }

You might start by checking for EAGAIN as result of the write, and then
reacting according to your needs (waiting a while or exiting the
program or whatever).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Repost: Bug with select?
  2003-07-25 13:41   ` Marco Roeland
@ 2003-07-26  0:20     ` Ben Greear
  2003-07-26  9:05       ` Marco Roeland
  0 siblings, 1 reply; 10+ messages in thread
From: Ben Greear @ 2003-07-26  0:20 UTC (permalink / raw)
  To: Marco Roeland; +Cc: Linux Kernel Development, Eli Barzilay

Marco Roeland wrote:
> On Thursday July 24th 2003 at 01:28 uur Eli Barzilay wrote:
> 
> 
>>When I run the following program, and block the terminal's output
>>(C-s), the `select' doesn't seem to have any effect, resulting in a
>>100% cpu usage (this is on a RH8, with 2.4.18).  I wouldn't be
>>surprised if I'm doing something stupid, but it does seem to work fine
>>on Solaris.
>>
>>Is there anything wrong with this, or is this some bug?
>>
>>======================================================================
>>#include <unistd.h>
>>#include <fcntl.h>
>>int main() {
>>  int flags, fd, len; fd_set writefds;
>>  fd = 1;
>>  flags = fcntl(fd, F_GETFL, 0);
>>  fcntl(fd, F_SETFL, flags | O_NONBLOCK);
> 
> 
> You use non-blocking mode here.
> 
> 
>>  while (1) {
>>    FD_ZERO(&writefds);
>>    FD_SET(fd, &writefds);
>>    len = select(fd + 1, NULL, &writefds, NULL, NULL);
> 
> 
> A select with no timeout, so it will immediately return.
> 
> 
>>    if (!FD_ISSET(fd,&writefds)) exit(0);
> 
> 
> This might be what Solaris does differently, by _not_ including '1' in
> the returned descriptors? Linux will say (rightly) that a following call
> will not block, which is something very different than 'will not fail'!

I thought select is supposed to tell you when you can read/write at least something without
failing.  Otherwise it would be worthless when doing non-blocking IO because you can
both read and write w/out blocking at all times.  If you run similar code on a tcp
socket instead of std-out, do you see the same busy spin?  (To do it right, make
sure the network between source and destination is slower than the CPU can handle,
ie 10bt hub.)


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Repost: Bug with select?
  2003-07-24  5:28 ` Repost: Bug " Eli Barzilay
  2003-07-25 13:41   ` Marco Roeland
@ 2003-07-26  0:35   ` Philippe Troin
  2003-07-26 14:29     ` Eli Barzilay
  2003-07-26 14:25   ` Eli Barzilay
  2 siblings, 1 reply; 10+ messages in thread
From: Philippe Troin @ 2003-07-26  0:35 UTC (permalink / raw)
  To: Eli Barzilay; +Cc: linux-kernel

Eli Barzilay <eli@barzilay.org> writes:

> [This is a second post, since I didn't get any replies the first time.
> It looks more like a bug now, which sounds strange for something that
> common...]
> 
> When I run the following program, and block the terminal's output
> (C-s), the `select' doesn't seem to have any effect, resulting in a
> 100% cpu usage (this is on a RH8, with 2.4.18).  I wouldn't be
> surprised if I'm doing something stupid, but it does seem to work fine
> on Solaris.
> 
> Is there anything wrong with this, or is this some bug?
> 
> ======================================================================
> #include <unistd.h>
> #include <fcntl.h>
> int main() {
>   int flags, fd, len; fd_set writefds;
>   fd = 1;
>   flags = fcntl(fd, F_GETFL, 0);
>   fcntl(fd, F_SETFL, flags | O_NONBLOCK);
>   while (1) {
>     FD_ZERO(&writefds);
>     FD_SET(fd, &writefds);
>     len = select(fd + 1, NULL, &writefds, NULL, NULL);
>     if (!FD_ISSET(fd,&writefds)) exit(0);
>     len = write(fd, "hi\n", 3);
>   }
>   fcntl(fd, F_SETFL, flags);
> }
> ======================================================================

Looks like a bug to me.
Strace says:

select(2, NULL, [1], NULL, NULL)        = 1 (out [1])
write(1, "hi\n", 3)                     = -1 EAGAIN (Resource temporarily unavailable)

forever.

Then select() should not return fd 1 as writable, at least not
reapeatedly.

Phil.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Repost: Bug with select?
  2003-07-26  0:20     ` Ben Greear
@ 2003-07-26  9:05       ` Marco Roeland
  0 siblings, 0 replies; 10+ messages in thread
From: Marco Roeland @ 2003-07-26  9:05 UTC (permalink / raw)
  To: Ben Greear; +Cc: Linux Kernel Development

On Friday July 25th 2003 at 17:20 uur Ben Greear wrote:

> I thought select is supposed to tell you when you can read/write at least
> something without failing. Otherwise it would be worthless when doing
> non-blocking IO because you can both read and write w/out blocking at all
> times. If you run similar code on a tcp socket instead of std-out, do you see
> the same busy spin? (To do it right, make sure the network between source and
> destination is slower than the CPU can handle, ie 10bt hub.)

My 'analysis' was indeed based on experience with sockets, where you
don't get the busy spin. It's indeed a bit baffling why select keeps
insisting that fd 1 is writable. A quick test on kernel versions
2.2.12-20, 2.4.20 and 2.6.0-test1 all give the same results, so I
suppose select itself is doing it's expected duty, and that in that case
the special underlying mechanics of stdout require special mechanics to
find out if it's blocked?! Beats me, but that's pretty easy... ;-)
 
Marco Roeland

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Repost: Bug with select?
  2003-07-24  5:28 ` Repost: Bug " Eli Barzilay
  2003-07-25 13:41   ` Marco Roeland
  2003-07-26  0:35   ` Philippe Troin
@ 2003-07-26 14:25   ` Eli Barzilay
  2003-07-26 15:37     ` Marco Roeland
  2 siblings, 1 reply; 10+ messages in thread
From: Eli Barzilay @ 2003-07-26 14:25 UTC (permalink / raw)
  To: Marco Roeland, Ben Greear; +Cc: Linux Kernel Development

On Jul 25, Marco Roeland wrote:
> >     len = select(fd + 1, NULL, &writefds, NULL, NULL);
> 
> A select with no timeout, so it will immediately return.

The man page says:

       timeout  is  an  upper bound on the amount of time elapsed
       before select returns. It may be zero, causing  select  to
       return immediately. (This is useful for polling.) If time­
       out is NULL (no timeout), select can block indefinitely.

But I did (obviously) try adding one just in case -- the problem does
not go away.


> >     if (!FD_ISSET(fd,&writefds)) exit(0);
> 
> This might be what Solaris does differently, by _not_ including '1'
> in the returned descriptors? Linux will say (rightly) that a
> following call will not block, which is something very different
> than 'will not fail'!

I just added that when trying to trace the problem and reading
somewhere that ISSET must be used...  It never had any effect -- never
exits and otherwise the program is still on a busy spin in Linux and
fine on Solaris.


> >     len = write(fd, "hi\n", 3);
> 
> You don't check the exit status here, but when you press Ctrl-C
> (stdout blocked) it will indicate an error here (exit status -1)
> with errno set to EAGAIN, meaning you should try again, which is the
> appropriate result for a non-blocking descriptor or socket
> here. Anyway, the call "succeeds" and we loop back into the
> while(1), indeed as you say creating a busy loop.  No surprises
> there I'd say.

Uh, that's just a stripped down example -- in the original the
returned value is checked and the write is retried if the result is
EINTR.  The problem is that AFAICT, select should wait until the fd is
writable, but then write fails with EAGAIN, only to have the next
select succeed as if there is no problems.


> >   }
> >   fcntl(fd, F_SETFL, flags);
> > }
> 
> You might start by checking for EAGAIN as result of the write, and
> then reacting according to your needs (waiting a while or exiting
> the program or whatever).

Yeah, when the problem occurs, write will result in an EAGAIN, but
the next select should block until writing is ok.

When I played with this now I saw another strange thing -- when there
is a timeout in place, the FD_ISSET *will* return 0 after some output
was done (probably when its waiting for output).  So I thought that it
might be a good place to put a sleep, but the problem is that 0 is not
returned when the output is stopped.

This is the program:
======================================================================
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
int main() {
  int flags, fd, len; fd_set writefds;
  struct timeval timeout; timeout.tv_sec = 1; timeout.tv_usec = 0;
  fd = 1;
  flags = fcntl(fd, F_GETFL, 0);
  fcntl(fd, F_SETFL, flags | O_NONBLOCK);
  while (1) {
    FD_ZERO(&writefds);
    FD_SET(fd, &writefds);
    len = select(fd + 1, NULL, &writefds, NULL, &timeout);
    if (len<0) exit(1);
    while (!FD_ISSET(fd,&writefds)) {
      sleep(1);
      FD_ZERO(&writefds);
      FD_SET(fd, &writefds);
      select(fd + 1, NULL, &writefds, NULL, &timeout);
      if (len<0) exit(1);
    }
    do {
      len = write(fd, "hi\n", 3);
    } while ((len == -1) && (errno == EINTR));
    if (len<0 && errno==EINTR) exit(2);
    /* if (len<0 && errno==EAGAIN) exit(3); */
  }
  fcntl(fd, F_SETFL, flags);
}
======================================================================


On Jul 25, Ben Greear wrote:
> I thought select is supposed to tell you when you can read/write at
> least something without failing.  Otherwise it would be worthless
> when doing non-blocking IO because you can both read and write w/out
> blocking at all times.

That was the point I was trying to make.


On Jul 26, Marco Roeland wrote:
> My 'analysis' was indeed based on experience with sockets, where you
> don't get the busy spin. It's indeed a bit baffling why select keeps
> insisting that fd 1 is writable. A quick test on kernel versions
> 2.2.12-20, 2.4.20 and 2.6.0-test1 all give the same results, so I
> suppose select itself is doing it's expected duty, and that in that
> case the special underlying mechanics of stdout require special
> mechanics to find out if it's blocked?! Beats me, but that's pretty
> easy... ;-)

This doesn't solve the problem, and as evidence, the code will look
ugly with special cases for terminal output.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                  http://www.barzilay.org/                 Maze is Life!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Repost: Bug with select?
  2003-07-26  0:35   ` Philippe Troin
@ 2003-07-26 14:29     ` Eli Barzilay
  0 siblings, 0 replies; 10+ messages in thread
From: Eli Barzilay @ 2003-07-26 14:29 UTC (permalink / raw)
  To: Philippe Troin; +Cc: linux-kernel

On Jul 25, Philippe Troin wrote:
> Looks like a bug to me.
> Strace says:
> 
> select(2, NULL, [1], NULL, NULL)        = 1 (out [1])
> write(1, "hi\n", 3)                     = -1 EAGAIN (Resource temporarily unavailable)
> 
> forever.
> 
> Then select() should not return fd 1 as writable, at least not
> reapeatedly.

Exactly -- I didn't even think of using strace where this is made
obvious.  (I don't have any solaris where I can run strace, but I
wonder what does that say.)

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                  http://www.barzilay.org/                 Maze is Life!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Repost: Bug with select?
  2003-07-26 14:25   ` Eli Barzilay
@ 2003-07-26 15:37     ` Marco Roeland
  0 siblings, 0 replies; 10+ messages in thread
From: Marco Roeland @ 2003-07-26 15:37 UTC (permalink / raw)
  To: Eli Barzilay; +Cc: Linux Kernel Development

Op Saturday 26 July 2003 at 10:25 Eli Barzilay wrote:

> ...

> I just added that when trying to trace the problem and reading
> somewhere that ISSET must be used...  It never had any effect -- never
> exits and otherwise the program is still on a busy spin in Linux and
> fine on Solaris.

After some more testing the behaviour here seems indeed a bit odd. For
what it's worth I just tested the program under IBM AIX 4.2 on an old
RS/6000 machine, and it doesn't busy spin there either.
-- 
Marco Roeland

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Repost: Bug with select?
@ 2003-07-27 19:29 Manfred Spraul
  0 siblings, 0 replies; 10+ messages in thread
From: Manfred Spraul @ 2003-07-27 19:29 UTC (permalink / raw)
  To: Eli Barzilay, linux-kernel; +Cc: viro

[-- Attachment #1: Type: text/plain, Size: 553 bytes --]

Hi Eli,

The problem is normal_poll in drivers/char/n_tty.c:

 > if (tty->driver->chars_in_buffer(tty) < WAKEUP_CHARS)
 >                mask |= POLLOUT | POLLWRNORM;

It assumes that a following write will succeed if less than 256 bytes 
are in the write buffer right now. This assumption is wrong for 
con_write_room: if the console is stopped, it returns 0 bytes buffer 
size (con_write_room()). Dito for pty_write_room.

The attached patch fixes your test case, but I don't understand tty 
devices good enough to guarantee anything.

--
    Manfred

[-- Attachment #2: patch-tty-fix --]
[-- Type: text/plain, Size: 403 bytes --]

--- 2.5/drivers/char/n_tty.c	2003-07-05 09:13:01.000000000 +0200
+++ build-2.5/drivers/char/n_tty.c	2003-07-27 20:44:58.000000000 +0200
@@ -1251,7 +1251,8 @@
 		else
 			tty->minimum_to_wake = 1;
 	}
-	if (tty->driver->chars_in_buffer(tty) < WAKEUP_CHARS)
+	if (tty->driver->chars_in_buffer(tty) < WAKEUP_CHARS &&
+			tty->driver->write_room(tty) > 0)
 		mask |= POLLOUT | POLLWRNORM;
 	return mask;
 }

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-07-28  2:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-18  8:49 Problem with select? Eli Barzilay
2003-07-24  5:28 ` Repost: Bug " Eli Barzilay
2003-07-25 13:41   ` Marco Roeland
2003-07-26  0:20     ` Ben Greear
2003-07-26  9:05       ` Marco Roeland
2003-07-26  0:35   ` Philippe Troin
2003-07-26 14:29     ` Eli Barzilay
2003-07-26 14:25   ` Eli Barzilay
2003-07-26 15:37     ` Marco Roeland
2003-07-27 19:29 Manfred Spraul

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).