linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* UDP recvmsg blocks after select(), 2.6 bug?
@ 2004-10-06 14:52 Joris van Rantwijk
  2004-10-06 15:01 ` David S. Miller
                   ` (4 more replies)
  0 siblings, 5 replies; 191+ messages in thread
From: Joris van Rantwijk @ 2004-10-06 14:52 UTC (permalink / raw)
  To: linux-kernel

Hello,

I have a problem where the sequence of events is as follows:
 - application does select() on a UDP socket descriptor
 - select returns success with descriptor ready for reading
 - application does recvfrom() on this descriptor and this recvfrom()
   blocks forever

My understanding of POSIX is limited, but it seems to me that a read call
must never block after select just said that it's ok to read from the
descriptor. So any such behaviour would be a kernel bug.

This problem occurs repeatedly, but only once per week on average so it is
hard to debug but definitely a real problem. I know for a fact that the
sequence of events is as described above from strace output. My kernel
version is 2.6.7.

>From a brief look at the kernel UDP code, I suspect a problem in
net/ipv4/udp.c, udp_recvmsg(): it reads the first available datagram
from the queue, then checks the UDP checksum. If the UDP checksum fails at
this point, the datagram is discarded and the process blocks until the next
datagram arrives.

Could someone please help me track this problem?
Am I correct in my reasoning that the select() -> recvmsg() sequence must
never block?
If yes, is it possible that this problem is triggered by a failed UDP
checksum in the udp_recvmsg() function?
If yes, can we do something to fix this?

Thanks,
  Joris.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 14:52 UDP recvmsg blocks after select(), 2.6 bug? Joris van Rantwijk
@ 2004-10-06 15:01 ` David S. Miller
  2004-10-06 15:13   ` Chris Friesen
  2004-10-06 15:15   ` Richard B. Johnson
  2004-10-06 15:09 ` Richard B. Johnson
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 191+ messages in thread
From: David S. Miller @ 2004-10-06 15:01 UTC (permalink / raw)
  To: Joris van Rantwijk; +Cc: linux-kernel

On Wed, 6 Oct 2004 16:52:27 +0200 (CEST)
Joris van Rantwijk <joris@eljakim.nl> wrote:

> My understanding of POSIX is limited, but it seems to me that a read call
> must never block after select just said that it's ok to read from the
> descriptor. So any such behaviour would be a kernel bug.

There is no such guarentee.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 14:52 UDP recvmsg blocks after select(), 2.6 bug? Joris van Rantwijk
  2004-10-06 15:01 ` David S. Miller
@ 2004-10-06 15:09 ` Richard B. Johnson
  2004-10-06 15:18 ` bert hubert
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 191+ messages in thread
From: Richard B. Johnson @ 2004-10-06 15:09 UTC (permalink / raw)
  To: Joris van Rantwijk; +Cc: linux-kernel

On Wed, 6 Oct 2004, Joris van Rantwijk wrote:

> Hello,
>
> I have a problem where the sequence of events is as follows:
> - application does select() on a UDP socket descriptor
> - select returns success with descriptor ready for reading
> - application does recvfrom() on this descriptor and this recvfrom()
>   blocks forever
>
> My understanding of POSIX is limited, but it seems to me that a read call
> must never block after select just said that it's ok to read from the
> descriptor. So any such behaviour would be a kernel bug.
>

Can you check to see if you have an exception at the same time?
Also, please make sure that the first parameter to select() is
the file-descriptor value + 1.  There are things like out-of-band
data that could show up ( MSG_OOB ) under select(). Also, recvfom()
takes a lot of parameters that need to be correct. There is a buffer
length plus a pointer to a socklen_t variable. I've seen people
mess these up and have everything "work" except sometimes...

Cheers,
Dick Johnson
Penguin : Linux version 2.6.5-1.358-noreg on an i686 machine (5537.79 BogoMips).
             Note 96.31% of all statistics are fiction.


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 15:01 ` David S. Miller
@ 2004-10-06 15:13   ` Chris Friesen
  2004-10-06 15:15   ` Richard B. Johnson
  1 sibling, 0 replies; 191+ messages in thread
From: Chris Friesen @ 2004-10-06 15:13 UTC (permalink / raw)
  To: David S. Miller; +Cc: Joris van Rantwijk, linux-kernel

David S. Miller wrote:
> On Wed, 6 Oct 2004 16:52:27 +0200 (CEST)
> Joris van Rantwijk <joris@eljakim.nl> wrote:
> 
> 
>>My understanding of POSIX is limited, but it seems to me that a read call
>>must never block after select just said that it's ok to read from the
>>descriptor. So any such behaviour would be a kernel bug.
> 
> 
> There is no such guarentee.

Hmm, the man page for select() says:

"Those  listed  in readfds will be watched to see if characters become available 
for reading (more precisely, to see if a read will not block"

Maybe it needs changing?

Chris

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 15:01 ` David S. Miller
  2004-10-06 15:13   ` Chris Friesen
@ 2004-10-06 15:15   ` Richard B. Johnson
  2004-10-06 15:21     ` David S. Miller
  2004-10-06 15:30     ` Chris Friesen
  1 sibling, 2 replies; 191+ messages in thread
From: Richard B. Johnson @ 2004-10-06 15:15 UTC (permalink / raw)
  To: David S. Miller; +Cc: Joris van Rantwijk, linux-kernel

On Wed, 6 Oct 2004, David S. Miller wrote:

> On Wed, 6 Oct 2004 16:52:27 +0200 (CEST)
> Joris van Rantwijk <joris@eljakim.nl> wrote:
>
>> My understanding of POSIX is limited, but it seems to me that a read call
>> must never block after select just said that it's ok to read from the
>> descriptor. So any such behaviour would be a kernel bug.
>
> There is no such guarentee.

Huh?  Then why would anybody use select()?  It can't return a
'guess" or it's broken. When select() or poll() claims that
there are data available, there damn well better be data available
or software becomes a crap-game. And, if you've decided that
such a game-of-chance is okay then please keep your software
out of the new Ethernet Avionics Buses that Airbus now is using.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.5-1.358-noreg on an i686 machine (5537.79 BogoMips).
             Note 96.31% of all statistics are fiction.


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 14:52 UDP recvmsg blocks after select(), 2.6 bug? Joris van Rantwijk
  2004-10-06 15:01 ` David S. Miller
  2004-10-06 15:09 ` Richard B. Johnson
@ 2004-10-06 15:18 ` bert hubert
  2004-10-06 16:41 ` Alan Cox
  2004-10-07 19:31 ` David Schwartz
  4 siblings, 0 replies; 191+ messages in thread
From: bert hubert @ 2004-10-06 15:18 UTC (permalink / raw)
  To: Joris van Rantwijk; +Cc: linux-kernel

On Wed, Oct 06, 2004 at 04:52:27PM +0200, Joris van Rantwijk wrote:
> Hello,
> 
> I have a problem where the sequence of events is as follows:
>  - application does select() on a UDP socket descriptor
>  - select returns success with descriptor ready for reading
>  - application does recvfrom() on this descriptor and this recvfrom()
>    blocks forever

This can happen, and is fully to be expected. For a host of reasons the
packet might not in fact appear. Whenever using select for non-blocking IO
always set your sockets to non-blocking as well.

One of the legitimate reasons is the reception of packets which, on copying,
turn out to have a bad checksum.

Stevens has a section on this, recommended reading.

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 15:15   ` Richard B. Johnson
@ 2004-10-06 15:21     ` David S. Miller
  2004-10-06 15:29       ` Richard B. Johnson
                         ` (3 more replies)
  2004-10-06 15:30     ` Chris Friesen
  1 sibling, 4 replies; 191+ messages in thread
From: David S. Miller @ 2004-10-06 15:21 UTC (permalink / raw)
  To: root; +Cc: joris, linux-kernel

On Wed, 6 Oct 2004 11:15:13 -0400 (EDT)
"Richard B. Johnson" <root@chaos.analogic.com> wrote:

> On Wed, 6 Oct 2004, David S. Miller wrote:
> 
> > On Wed, 6 Oct 2004 16:52:27 +0200 (CEST)
> > Joris van Rantwijk <joris@eljakim.nl> wrote:
> >
> >> My understanding of POSIX is limited, but it seems to me that a read call
> >> must never block after select just said that it's ok to read from the
> >> descriptor. So any such behaviour would be a kernel bug.
> >
> > There is no such guarentee.
> 
> Huh?  Then why would anybody use select()?  It can't return a
> 'guess" or it's broken. When select() or poll() claims that
> there are data available, there damn well better be data available
> or software becomes a crap-game.

So if select returns true, and another one of your threads
reads all the data from the file descriptor, what would you
like the behavior to be for the current thread when it calls
read?

So like I said, there is no such guarentee.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 15:21     ` David S. Miller
@ 2004-10-06 15:29       ` Richard B. Johnson
  2004-10-06 15:42         ` David S. Miller
                           ` (2 more replies)
  2004-10-06 15:31       ` Chris Friesen
                         ` (2 subsequent siblings)
  3 siblings, 3 replies; 191+ messages in thread
From: Richard B. Johnson @ 2004-10-06 15:29 UTC (permalink / raw)
  To: David S. Miller; +Cc: joris, linux-kernel

On Wed, 6 Oct 2004, David S. Miller wrote:

> On Wed, 6 Oct 2004 11:15:13 -0400 (EDT)
> "Richard B. Johnson" <root@chaos.analogic.com> wrote:
>
>> On Wed, 6 Oct 2004, David S. Miller wrote:
>>
>>> On Wed, 6 Oct 2004 16:52:27 +0200 (CEST)
>>> Joris van Rantwijk <joris@eljakim.nl> wrote:
>>>
>>>> My understanding of POSIX is limited, but it seems to me that a read call
>>>> must never block after select just said that it's ok to read from the
>>>> descriptor. So any such behaviour would be a kernel bug.
>>>
>>> There is no such guarentee.
>>
>> Huh?  Then why would anybody use select()?  It can't return a
>> 'guess" or it's broken. When select() or poll() claims that
>> there are data available, there damn well better be data available
>> or software becomes a crap-game.
>
> So if select returns true, and another one of your threads
> reads all the data from the file descriptor, what would you
> like the behavior to be for the current thread when it calls
> read?
>
> So like I said, there is no such guarentee.
>

Any code that uses select() on the same file-descriptor
for several threads is broken. You can't explain away
a select() problem with a bad-coding example. Somebody
else responded that a bad checksum could do the same
thing --not. Select must return correct information.
The user-code doesn't know about, doesn't care, and
in many cases can't find out about, the inner workings
of an operating system.

When a function call like select() says there are data
available, there must be data available, period. If
not, it's broken and needs to be fixed. Requiring
one to set sockets to non-blocking is a poor work-
around for an otherwise fatal flaw.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.5-1.358-noreg on an i686 machine (5537.79 BogoMips).
             Note 96.31% of all statistics are fiction.


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 15:15   ` Richard B. Johnson
  2004-10-06 15:21     ` David S. Miller
@ 2004-10-06 15:30     ` Chris Friesen
  1 sibling, 0 replies; 191+ messages in thread
From: Chris Friesen @ 2004-10-06 15:30 UTC (permalink / raw)
  To: root; +Cc: David S. Miller, Joris van Rantwijk, linux-kernel

Richard B. Johnson wrote:
> On Wed, 6 Oct 2004, David S. Miller wrote:

>> There is no such guarentee.
> Huh?  Then why would anybody use select()?  

To tell you when to try a nonblocking read?

 > It can't return a
> 'guess" or it's broken. When select() or poll() claims that
> there are data available, there damn well better be data available
> or software becomes a crap-game.

In the single-threaded case, where you are the only one touching the socket, I 
would expect this to be true.  But since I'm a belt-and-suspenders kind of guy, 
I usually use nonblocking reads anyway.

Chris

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 15:21     ` David S. Miller
  2004-10-06 15:29       ` Richard B. Johnson
@ 2004-10-06 15:31       ` Chris Friesen
  2004-10-06 15:41         ` David S. Miller
  2004-10-06 15:59       ` Paul Jackson
  2004-10-06 16:35       ` Martijn Sipkema
  3 siblings, 1 reply; 191+ messages in thread
From: Chris Friesen @ 2004-10-06 15:31 UTC (permalink / raw)
  To: David S. Miller; +Cc: root, joris, linux-kernel

David S. Miller wrote:

> So if select returns true, and another one of your threads
> reads all the data from the file descriptor, what would you
> like the behavior to be for the current thread when it calls
> read?

What about the single-threaded case?

Chris


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 15:31       ` Chris Friesen
@ 2004-10-06 15:41         ` David S. Miller
  2004-10-06 16:07           ` Richard B. Johnson
  2004-10-06 16:57           ` Neil Horman
  0 siblings, 2 replies; 191+ messages in thread
From: David S. Miller @ 2004-10-06 15:41 UTC (permalink / raw)
  To: Chris Friesen; +Cc: root, joris, linux-kernel

On Wed, 06 Oct 2004 09:31:46 -0600
Chris Friesen <cfriesen@nortelnetworks.com> wrote:

> David S. Miller wrote:
> 
> > So if select returns true, and another one of your threads
> > reads all the data from the file descriptor, what would you
> > like the behavior to be for the current thread when it calls
> > read?
> 
> What about the single-threaded case?

Incorrect UDP checksums could cause the read data to
be discarded.  We do the copy into userspace and checksum
computation in parallel.  This is totally legal and we've
been doing it since 2.4.x first got released.

Use non-blocking sockets with select()/poll() and be happy.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 15:29       ` Richard B. Johnson
@ 2004-10-06 15:42         ` David S. Miller
  2004-10-06 15:57           ` Chris Friesen
  2004-10-06 15:44         ` Lars Marowsky-Bree
  2004-10-07  1:16         ` Paul Jakma
  2 siblings, 1 reply; 191+ messages in thread
From: David S. Miller @ 2004-10-06 15:42 UTC (permalink / raw)
  To: root; +Cc: joris, linux-kernel

On Wed, 6 Oct 2004 11:29:22 -0400 (EDT)
"Richard B. Johnson" <root@chaos.analogic.com> wrote:

> Somebody else responded that a bad checksum could do the same
> thing --not. Select must return correct information.

Guess what, our UDP implementation does exactly that
and has done so for years.  It's perfectly fine.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 15:29       ` Richard B. Johnson
  2004-10-06 15:42         ` David S. Miller
@ 2004-10-06 15:44         ` Lars Marowsky-Bree
  2004-10-07  1:16         ` Paul Jakma
  2 siblings, 0 replies; 191+ messages in thread
From: Lars Marowsky-Bree @ 2004-10-06 15:44 UTC (permalink / raw)
  To: Richard B. Johnson, David S. Miller; +Cc: joris, linux-kernel

On 2004-10-06T11:29:22, "Richard B. Johnson" <root@chaos.analogic.com> wrote:

> Any code that uses select() on the same file-descriptor
> for several threads is broken. You can't explain away
> a select() problem with a bad-coding example.

Any code which expects non-blocking behaviour without O_NONBLOCK is
broken.

Go away.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX AG - A Novell company


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 15:42         ` David S. Miller
@ 2004-10-06 15:57           ` Chris Friesen
  0 siblings, 0 replies; 191+ messages in thread
From: Chris Friesen @ 2004-10-06 15:57 UTC (permalink / raw)
  To: David S. Miller; +Cc: root, joris, linux-kernel

David S. Miller wrote:
> "Richard B. Johnson" <root@chaos.analogic.com> wrote:

>>Somebody else responded that a bad checksum could do the same
>>thing --not. Select must return correct information.

> Guess what, our UDP implementation does exactly that
> and has done so for years.  It's perfectly fine.

We may want change the man page for select() so that this is made clear.

Chris

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 15:21     ` David S. Miller
  2004-10-06 15:29       ` Richard B. Johnson
  2004-10-06 15:31       ` Chris Friesen
@ 2004-10-06 15:59       ` Paul Jackson
  2004-10-06 16:35       ` Martijn Sipkema
  3 siblings, 0 replies; 191+ messages in thread
From: Paul Jackson @ 2004-10-06 15:59 UTC (permalink / raw)
  To: David S. Miller; +Cc: root, joris, linux-kernel

David wrote:
> So like I said, there is no such guarentee.

The select(2) man page states:

  more precisely, to see if a read will not block

It doesn't say _which_ read won't block.  Seems to me that the
successful non-block read in the other thread qualifies as the
promised non-blocking read.

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 15:41         ` David S. Miller
@ 2004-10-06 16:07           ` Richard B. Johnson
  2004-10-06 16:57           ` Neil Horman
  1 sibling, 0 replies; 191+ messages in thread
From: Richard B. Johnson @ 2004-10-06 16:07 UTC (permalink / raw)
  To: David S. Miller; +Cc: Chris Friesen, joris, linux-kernel

On Wed, 6 Oct 2004, David S. Miller wrote:

> On Wed, 06 Oct 2004 09:31:46 -0600
> Chris Friesen <cfriesen@nortelnetworks.com> wrote:
>
>> David S. Miller wrote:
>>
>>> So if select returns true, and another one of your threads
>>> reads all the data from the file descriptor, what would you
>>> like the behavior to be for the current thread when it calls
>>> read?
>>
>> What about the single-threaded case?
>
> Incorrect UDP checksums could cause the read data to
> be discarded.  We do the copy into userspace and checksum
> computation in parallel.  This is totally legal and we've
> been doing it since 2.4.x first got released.
>
> Use non-blocking sockets with select()/poll() and be happy.

Gawd. This is damn awful. How could this possibly be justified?
You can't have a system call that lies. We already have an OS
that does that. Certainly, no other Unix OS in the past has
thrown away integrity with such aplomb. Next, in the interest
of "performance", you'll probably only occasionally provide
file-data, as well.

You can't do this. When there is some well-defined procedure
such as select() or poll(), that is designed to provide a
reliable way of knowing that a read will succeed, you or
anybody else don't have the authority to declare that it's
not important to actually have it work.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.5-1.358-noreg on an i686 machine (5537.79 BogoMips).
             Note 96.31% of all statistics are fiction.


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 15:21     ` David S. Miller
                         ` (2 preceding siblings ...)
  2004-10-06 15:59       ` Paul Jackson
@ 2004-10-06 16:35       ` Martijn Sipkema
  3 siblings, 0 replies; 191+ messages in thread
From: Martijn Sipkema @ 2004-10-06 16:35 UTC (permalink / raw)
  To: David S. Miller, root; +Cc: joris, linux-kernel

From: "David S. Miller" <davem@davemloft.net>
Sent: Wednesday, October 06, 2004 16:21
> On Wed, 6 Oct 2004 11:15:13 -0400 (EDT)
> "Richard B. Johnson" <root@chaos.analogic.com> wrote:
> 
> > On Wed, 6 Oct 2004, David S. Miller wrote:
> > 
> > > On Wed, 6 Oct 2004 16:52:27 +0200 (CEST)
> > > Joris van Rantwijk <joris@eljakim.nl> wrote:
> > >
> > >> My understanding of POSIX is limited, but it seems to me that a read call
> > >> must never block after select just said that it's ok to read from the
> > >> descriptor. So any such behaviour would be a kernel bug.
> > >
> > > There is no such guarentee.
> > 
> > Huh?  Then why would anybody use select()?  It can't return a
> > 'guess" or it's broken. When select() or poll() claims that
> > there are data available, there damn well better be data available
> > or software becomes a crap-game.
> 
> So if select returns true, and another one of your threads
> reads all the data from the file descriptor, what would you
> like the behavior to be for the current thread when it calls
> read?
> 
> So like I said, there is no such guarentee.

Perhaps you should have elaborated then, because this is obviously
not what was meant.

--ms



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 14:52 UDP recvmsg blocks after select(), 2.6 bug? Joris van Rantwijk
                   ` (2 preceding siblings ...)
  2004-10-06 15:18 ` bert hubert
@ 2004-10-06 16:41 ` Alan Cox
  2004-10-06 18:04   ` Joris van Rantwijk
  2004-10-07 19:31 ` David Schwartz
  4 siblings, 1 reply; 191+ messages in thread
From: Alan Cox @ 2004-10-06 16:41 UTC (permalink / raw)
  To: Joris van Rantwijk; +Cc: Linux Kernel Mailing List

On Mer, 2004-10-06 at 15:52, Joris van Rantwijk wrote:
> My understanding of POSIX is limited, but it seems to me that a read call
> must never block after select just said that it's ok to read from the
> descriptor. So any such behaviour would be a kernel bug.

Select indicates there may be data. That is all - it might also be an
error, it might turn out to be wrong.

You should always combine select with nonblocking I/O


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 15:41         ` David S. Miller
  2004-10-06 16:07           ` Richard B. Johnson
@ 2004-10-06 16:57           ` Neil Horman
  1 sibling, 0 replies; 191+ messages in thread
From: Neil Horman @ 2004-10-06 16:57 UTC (permalink / raw)
  To: David S. Miller; +Cc: Chris Friesen, root, joris, linux-kernel

David S. Miller wrote:
> On Wed, 06 Oct 2004 09:31:46 -0600
> Chris Friesen <cfriesen@nortelnetworks.com> wrote:
> 
> 
>>David S. Miller wrote:
>>
>>
>>>So if select returns true, and another one of your threads
>>>reads all the data from the file descriptor, what would you
>>>like the behavior to be for the current thread when it calls
>>>read?
>>
>>What about the single-threaded case?
> 
> 
> Incorrect UDP checksums could cause the read data to
> be discarded.  We do the copy into userspace and checksum
> computation in parallel.  This is totally legal and we've
> been doing it since 2.4.x first got released.
> 
> Use non-blocking sockets with select()/poll() and be happy.


I think you could also pass the MSG_ERRQUEUE flag to the recvfrom call 
and receive the errored frame, eliminating the case where errored frames 
might cause you to block on a read after a good return from a select call.
Neil
-- 
/***************************************************
  *Neil Horman
  *Software Engineer
  *Red Hat, Inc.
  *nhorman@redhat.com
  *gpg keyid: 1024D / 0x92A74FA1
  *http://pgp.mit.edu
  ***************************************************/

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 16:41 ` Alan Cox
@ 2004-10-06 18:04   ` Joris van Rantwijk
  2004-10-06 19:30     ` Andries Brouwer
  0 siblings, 1 reply; 191+ messages in thread
From: Joris van Rantwijk @ 2004-10-06 18:04 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

Hello,

Many thanks to you everybody else for their helpfull comments.

On Wed, 6 Oct 2004, Alan Cox wrote:
> On Mer, 2004-10-06 at 15:52, Joris van Rantwijk wrote:
> > My understanding of POSIX is limited, but it seems to me that a read call
> > must never block after select just said that it's ok to read from the
> > descriptor. So any such behaviour would be a kernel bug.
>
> Select indicates there may be data. That is all - it might also be an
> error, it might turn out to be wrong.
>
> You should always combine select with nonblocking I/O

Ok, thanks. It turns out now that I (and a lot of people with me) have
always been wrong about this. I will go fix the application (dnsmasq) and
try to get the fix to the author.

Sorry about my wrongly blaming the kernel. I do think this issue shows hat
the select(2) manual needs fixing.

For clarity, I'd like to point out that my case has nothing to do with
multi-threading. Using select from multiple threads is a totally different
sort of mistake.

Bye,
  Joris.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 19:30     ` Andries Brouwer
@ 2004-10-06 19:23       ` Alan Cox
  2004-10-06 22:08         ` Martijn Sipkema
  2004-10-06 19:43       ` Hua Zhong
  1 sibling, 1 reply; 191+ messages in thread
From: Alan Cox @ 2004-10-06 19:23 UTC (permalink / raw)
  To: Andries Brouwer; +Cc: Joris van Rantwijk, Linux Kernel Mailing List

On Mer, 2004-10-06 at 20:30, Andries Brouwer wrote:
>        A  descriptor shall be considered ready for reading when a
>        call to an input function with O_NONBLOCK clear would  not
>        block,  whether  or  not  the function would transfer data
>        successfully. (The function might return data, an  end-of-
>        file  indication,  or  an  error other than one indicating
>        that it is  blocked,  and  in  each  of  these  cases  the
>        descriptor shall be considered ready for reading.)
> 
> As far as I can interpret these sentences, Linux does not conform.

Nor does anything else in that case. I guess we need a POSIX_ME_HARDER
socket option.

As to the Stevens reference - Stevens says nothing about read but does
mention the problem of accept, which is one of the "can't fix" type
examples.

	Connection setup pending
		select returns
	Connection destroyed
		accept blocks

Alan


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 18:04   ` Joris van Rantwijk
@ 2004-10-06 19:30     ` Andries Brouwer
  2004-10-06 19:23       ` Alan Cox
  2004-10-06 19:43       ` Hua Zhong
  0 siblings, 2 replies; 191+ messages in thread
From: Andries Brouwer @ 2004-10-06 19:30 UTC (permalink / raw)
  To: Joris van Rantwijk; +Cc: Alan Cox, Linux Kernel Mailing List

On Wed, Oct 06, 2004 at 08:04:29PM +0200, Joris van Rantwijk wrote:

> > On Mer, 2004-10-06 at 15:52, Joris van Rantwijk wrote:
> > > My understanding of POSIX is limited, but it seems to me that a read call
> > > must never block after select just said that it's ok to read from the
> > > descriptor. So any such behaviour would be a kernel bug.

Alan answers - and I don't like his answer a bit:

> > Select indicates there may be data. That is all - it might also be an
> > error, it might turn out to be wrong.
> >
> > You should always combine select with nonblocking I/O

Joris replies again:

> Sorry about my wrongly blaming the kernel. I do think this issue shows hat
> the select(2) manual needs fixing.

It may need fixing in the sense that it must point out that the Linux kernel
might not conform to POSIX in its handling of select on sockets.

We now not only have "man 2 select", but also "man 3p select".
This is the POSIX text:

       A  descriptor shall be considered ready for reading when a
       call to an input function with O_NONBLOCK clear would  not
       block,  whether  or  not  the function would transfer data
       successfully. (The function might return data, an  end-of-
       file  indication,  or  an  error other than one indicating
       that it is  blocked,  and  in  each  of  these  cases  the
       descriptor shall be considered ready for reading.)

As far as I can interpret these sentences, Linux does not conform.

Andries


Neil Horman wrote:

>> I think you could also pass the MSG_ERRQUEUE flag to the recvfrom call 
>> and receive the errored frame, eliminating the case where errored frames 
>> might cause you to block on a read after a good return from a select call.

davem wrote:

>> There is no such guarentee.

>> Incorrect UDP checksums could cause the read data to
>> be discarded.  We do the copy into userspace and checksum
>> computation in parallel.  This is totally legal and we've
>> been doing it since 2.4.x first got released.


ahu wrote:

>> This can happen, and is fully to be expected. For a host of reasons the
>> packet might not in fact appear. Whenever using select for non-blocking IO
>> always set your sockets to non-blocking as well.

>> One of the legitimate reasons is the reception of packets which, on copying,
>> turn out to have a bad checksum.

>> Stevens has a section on this, recommended reading.

Reference?

^ permalink raw reply	[flat|nested] 191+ messages in thread

* RE: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 19:30     ` Andries Brouwer
  2004-10-06 19:23       ` Alan Cox
@ 2004-10-06 19:43       ` Hua Zhong
  2004-10-06 19:54         ` Chris Friesen
                           ` (2 more replies)
  1 sibling, 3 replies; 191+ messages in thread
From: Hua Zhong @ 2004-10-06 19:43 UTC (permalink / raw)
  To: 'Andries Brouwer', 'Joris van Rantwijk'
  Cc: 'Alan Cox', 'Linux Kernel Mailing List'

> It may need fixing in the sense that it must point out that 
> the Linux kernel
> might not conform to POSIX in its handling of select on sockets.

Agreed.

> We now not only have "man 2 select", but also "man 3p select".
> This is the POSIX text:
> 
>        A  descriptor shall be considered ready for reading when a
>        call to an input function with O_NONBLOCK clear would  not
>        block,  whether  or  not  the function would transfer data
>        successfully. (The function might return data, an  end-of-
>        file  indication,  or  an  error other than one indicating
>        that it is  blocked,  and  in  each  of  these  cases  the
>        descriptor shall be considered ready for reading.)
> 
> As far as I can interpret these sentences, Linux does not conform.

How hard is it to treat the next read to the fd as NON_BLOCKING, even if
it's not set?

> Andries


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 19:43       ` Hua Zhong
@ 2004-10-06 19:54         ` Chris Friesen
  2004-10-06 19:59           ` Hua Zhong
  2004-10-06 20:06           ` David S. Miller
  2004-10-06 20:06         ` Olivier Galibert
  2004-10-06 20:41         ` Neil Horman
  2 siblings, 2 replies; 191+ messages in thread
From: Chris Friesen @ 2004-10-06 19:54 UTC (permalink / raw)
  To: hzhong
  Cc: 'Andries Brouwer', 'Joris van Rantwijk',
	'Alan Cox', 'Linux Kernel Mailing List'

Hua Zhong wrote:

> How hard is it to treat the next read to the fd as NON_BLOCKING, even if
> it's not set?

Userspace likely would not properly handle EAGAIN on a nonblocking socket.

As far as I can tell, either you block, or you have to scan the checksum before 
select() returns.

Would it be so bad to do the checksum before marking the socket readable? 
Chances are we're going to receive the message "soon" anyways, so there is at 
least a chance it will stay hot in the cache, no?

Chris

^ permalink raw reply	[flat|nested] 191+ messages in thread

* RE: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 19:54         ` Chris Friesen
@ 2004-10-06 19:59           ` Hua Zhong
  2004-10-06 20:10             ` Chris Friesen
  2004-10-06 20:06           ` David S. Miller
  1 sibling, 1 reply; 191+ messages in thread
From: Hua Zhong @ 2004-10-06 19:59 UTC (permalink / raw)
  To: 'Chris Friesen'
  Cc: 'Andries Brouwer', 'Joris van Rantwijk',
	'Alan Cox', 'Linux Kernel Mailing List'

> Hua Zhong wrote:
> 
> > How hard is it to treat the next read to the fd as 
> NON_BLOCKING, even if
> > it's not set?
> 
> Userspace likely would not properly handle EAGAIN on a 
> nonblocking socket.

But it's better than blocking the call, isn't it?

If the caller is using NON_BLOCKING already, no change in behavior,
otherwise it returns an error which the app may or may not handle, instead
of blocking it (which is usually fatal). Plus it hopefully gives Posix
compliance.

I can see there could be remote DoS attacks by just sending malformed UDP
packets.
 
> As far as I can tell, either you block, or you have to scan 
> the checksum before 
> select() returns.
> 
> Would it be so bad to do the checksum before marking the 
> socket readable? 
> Chances are we're going to receive the message "soon" 
> anyways, so there is at 
> least a chance it will stay hot in the cache, no?
> 
> Chris
> 


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 19:43       ` Hua Zhong
  2004-10-06 19:54         ` Chris Friesen
@ 2004-10-06 20:06         ` Olivier Galibert
  2004-10-06 23:35           ` David S. Miller
  2004-10-06 20:41         ` Neil Horman
  2 siblings, 1 reply; 191+ messages in thread
From: Olivier Galibert @ 2004-10-06 20:06 UTC (permalink / raw)
  To: 'Linux Kernel Mailing List'

On Wed, Oct 06, 2004 at 12:43:27PM -0700, Hua Zhong wrote:
> How hard is it to treat the next read to the fd as NON_BLOCKING, even if
> it's not set?

Programs don't expect EAGAIN from blocking sockets.

  OG.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 19:54         ` Chris Friesen
  2004-10-06 19:59           ` Hua Zhong
@ 2004-10-06 20:06           ` David S. Miller
  2004-10-06 20:18             ` Chris Friesen
  1 sibling, 1 reply; 191+ messages in thread
From: David S. Miller @ 2004-10-06 20:06 UTC (permalink / raw)
  To: Chris Friesen; +Cc: hzhong, aebr, joris, alan, linux-kernel

On Wed, 06 Oct 2004 13:54:46 -0600
Chris Friesen <cfriesen@nortelnetworks.com> wrote:

> Would it be so bad to do the checksum before marking the socket readable? 

Yes, because if we do that we have to make two passes over the
data instead of one.  It does make a big difference.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 19:59           ` Hua Zhong
@ 2004-10-06 20:10             ` Chris Friesen
  2004-10-06 21:45               ` Martijn Sipkema
  0 siblings, 1 reply; 191+ messages in thread
From: Chris Friesen @ 2004-10-06 20:10 UTC (permalink / raw)
  To: hzhong
  Cc: 'Andries Brouwer', 'Joris van Rantwijk',
	'Alan Cox', 'Linux Kernel Mailing List'

Hua Zhong wrote:

>>>How hard is it to treat the next read to the fd as 
>>NON_BLOCKING, even if it's not set?
>>
>>Userspace likely would not properly handle EAGAIN on a 
>>nonblocking socket.

I meant blocking, of course, but you caught that.

> But it's better than blocking the call, isn't it?
> 
> If the caller is using NON_BLOCKING already, no change in behavior,
> otherwise it returns an error which the app may or may not handle, instead
> of blocking it (which is usually fatal). Plus it hopefully gives Posix
> compliance.

 From what Andries posted, we can't block.  If select says its readable, we can 
"return data, an  end-of-file  indication,  or  an  error other than one 
indicating that it is  blocked".

We have no data, network sockets don't have end-of-file indication (or would 
returning a length of zero count?), and there is no other suitable errno that I saw.

> I can see there could be remote DoS attacks by just sending malformed UDP
> packets.

Yep.

Chris

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 20:06           ` David S. Miller
@ 2004-10-06 20:18             ` Chris Friesen
  2004-10-06 20:26               ` Hua Zhong
  2004-10-06 20:38               ` Andries Brouwer
  0 siblings, 2 replies; 191+ messages in thread
From: Chris Friesen @ 2004-10-06 20:18 UTC (permalink / raw)
  To: David S. Miller; +Cc: hzhong, aebr, joris, alan, linux-kernel

David S. Miller wrote:
> On Wed, 06 Oct 2004 13:54:46 -0600
> Chris Friesen <cfriesen@nortelnetworks.com> wrote:
> 
> 
>>Would it be so bad to do the checksum before marking the socket readable? 
> 
> 
> Yes, because if we do that we have to make two passes over the
> data instead of one.  It does make a big difference.

Hmm...no easy solution then.

In any case, the current behaviour is not compliant with the POSIX text that 
Andries posted.  Perhaps this should be documented somewhere?

Alternately, how about having the recvmsg() call return a zero, and (if 
appropriate) the length of the name set to zero?  This appears to comply with 
the man page for recvmsg().

Chris

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 22:08         ` Martijn Sipkema
@ 2004-10-06 20:25           ` Alan Cox
  2004-10-06 22:15             ` Andries Brouwer
  2004-10-06 23:11             ` Willy Tarreau
  0 siblings, 2 replies; 191+ messages in thread
From: Alan Cox @ 2004-10-06 20:25 UTC (permalink / raw)
  To: Martijn Sipkema
  Cc: Andries Brouwer, Joris van Rantwijk, Linux Kernel Mailing List

On Mer, 2004-10-06 at 23:08, Martijn Sipkema wrote:
> > Nor does anything else in that case. I guess we need a POSIX_ME_HARDER
> > socket option.
> 
> The default should be a POSIX compliant socket IMHO; a POSIX_ME_NOT
> option could provide better performance.

The current setup has so far been found to break one app, after what
three years. It can almost double performance. In this case it is very
much POSIX_ME_HARDER, and perhaps longer term suggests the posix/sus
people should revisit their API design.



^ permalink raw reply	[flat|nested] 191+ messages in thread

* RE: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 20:18             ` Chris Friesen
@ 2004-10-06 20:26               ` Hua Zhong
  2004-10-06 20:38               ` Andries Brouwer
  1 sibling, 0 replies; 191+ messages in thread
From: Hua Zhong @ 2004-10-06 20:26 UTC (permalink / raw)
  To: 'Chris Friesen', 'David S. Miller'
  Cc: aebr, joris, alan, linux-kernel

>  From what Andries posted, we can't block.  If select says 
> its readable, we can "return data, an  end-of-file  indication, 
> or  an  error other than one indicating that it is  blocked".

Arrrrgh..I misread it as "or an error indicating that it is blocked"..

So treating it simply as NON_BLOCKING isn't right.

> Hmm...no easy solution then.
> 
> In any case, the current behaviour is not compliant with the 
> POSIX text that Andries posted.  Perhaps this should be 
> documented somewhere?
> Alternately, how about having the recvmsg() call return a 
> zero, and (if appropriate) the length of the name set to zero?  This 
> appears to comply with the man page for recvmsg().

If it's a read, returning zero means end-of-file. Not sure what it means 
when recvmsg returns zero..But is this just the problem of UDP or 
recvmsg? I doubt it.

So I guess the easiest solution is just admit Linux select 
isn't posix compliant.

> Chris


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 20:18             ` Chris Friesen
  2004-10-06 20:26               ` Hua Zhong
@ 2004-10-06 20:38               ` Andries Brouwer
  2004-10-06 20:58                 ` Joris van Rantwijk
                                   ` (2 more replies)
  1 sibling, 3 replies; 191+ messages in thread
From: Andries Brouwer @ 2004-10-06 20:38 UTC (permalink / raw)
  To: Chris Friesen; +Cc: David S. Miller, hzhong, aebr, joris, alan, linux-kernel

On Wed, Oct 06, 2004 at 02:18:23PM -0600, Chris Friesen wrote:

> In any case, the current behaviour is not compliant with the POSIX text 
> that Andries posted.  Perhaps this should be documented somewhere?

For the time being I wrote (in select.2)

BUGS
       It has been reported (Linux 2.6) that select may report  a
       socket  file descriptor as "ready for reading", while nev-
       ertheless a subsequent read  blocks.  This  could  perhaps
       happen  when  data  has  arrived  but upon examination has
       wrong checksum and is discarded. Thus it may be  safer  to
       use non-blocking I/O.

(I have not yet investigated, just read the lk posts. Does this
really happen? All kernel versions? Is this the explanation for
the reported behaviour?)

> Alternately, how about having the recvmsg() call return a zero, and (if 
> appropriate) the length of the name set to zero?  This appears to comply 
> with the man page for recvmsg().

Returning 0 for a read signifies end-of-file. Not what you want.

Andries

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 19:43       ` Hua Zhong
  2004-10-06 19:54         ` Chris Friesen
  2004-10-06 20:06         ` Olivier Galibert
@ 2004-10-06 20:41         ` Neil Horman
  2004-10-06 22:27           ` Chris Friesen
  2 siblings, 1 reply; 191+ messages in thread
From: Neil Horman @ 2004-10-06 20:41 UTC (permalink / raw)
  To: hzhong
  Cc: 'Andries Brouwer', 'Joris van Rantwijk',
	'Alan Cox', 'Linux Kernel Mailing List'

Hua Zhong wrote:
>>It may need fixing in the sense that it must point out that 
>>the Linux kernel
>>might not conform to POSIX in its handling of select on sockets.
> 
> 
> Agreed.
> 
> 
>>We now not only have "man 2 select", but also "man 3p select".
>>This is the POSIX text:
>>
>>       A  descriptor shall be considered ready for reading when a
>>       call to an input function with O_NONBLOCK clear would  not
>>       block,  whether  or  not  the function would transfer data
>>       successfully. (The function might return data, an  end-of-
>>       file  indication,  or  an  error other than one indicating
>>       that it is  blocked,  and  in  each  of  these  cases  the
>>       descriptor shall be considered ready for reading.)
>>
>>As far as I can interpret these sentences, Linux does not conform.
> 

Again, shouldn't this just mean that recvfrom should not be called 
without the MSG_ERRQUEUE flag set?  From the above description, I read 
that select returning with an indication that a descriptor is ready for 
reading could mean that it has an error message queued to it, rather 
than in-band data.  If you then call recvfrom/recv/recvmsg without 
indicating that you want to receive the error indications as well, then 
isn't that just another example of improper coding, since recvfrom would 
have returned immeidately, as select indicated, had the appropriate read 
flags been set?

Neil
> 
> How hard is it to treat the next read to the fd as NON_BLOCKING, even if
> it's not set?
> 
> 
>>Andries
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


-- 
/***************************************************
  *Neil Horman
  *Software Engineer
  *Red Hat, Inc.
  *nhorman@redhat.com
  *gpg keyid: 1024D / 0x92A74FA1
  *http://pgp.mit.edu
  ***************************************************/

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 20:38               ` Andries Brouwer
@ 2004-10-06 20:58                 ` Joris van Rantwijk
  2004-10-06 22:29                 ` David S. Miller
  2004-10-07 16:08                 ` Adrian Phillips
  2 siblings, 0 replies; 191+ messages in thread
From: Joris van Rantwijk @ 2004-10-06 20:58 UTC (permalink / raw)
  To: Andries Brouwer; +Cc: linux-kernel


On Wed, 6 Oct 2004, Andries Brouwer wrote:
> Does this really happen?

Yes. Finally got my raw-udp-with-wrong-checksum sending program to work
over localhost and it hangs recvfrom pretty good.

> All kernel versions?

Quick guess: probably since late 2.4. Source of 2.4.27 udp.c is similar to
2.6.9, but 2.4.17 returns EAGAIN even for blocking sockets, apparently
this was "fixed" later on.

> Is this the explanation for the reported behaviour?)

Very hard to say since I can't predict when it will occur again.

Joris.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 20:10             ` Chris Friesen
@ 2004-10-06 21:45               ` Martijn Sipkema
  2004-10-06 23:35                 ` David S. Miller
  0 siblings, 1 reply; 191+ messages in thread
From: Martijn Sipkema @ 2004-10-06 21:45 UTC (permalink / raw)
  To: Chris Friesen, hzhong
  Cc: 'Andries Brouwer', 'Joris van Rantwijk',
	'Alan Cox', 'Linux Kernel Mailing List'

From: "Chris Friesen" <cfriesen@nortelnetworks.com>
Sent: Wednesday, October 06, 2004 21:10
[...]
>  From what Andries posted, we can't block.  If select says its readable, we can 
> "return data, an  end-of-file  indication,  or  an  error other than one 
> indicating that it is  blocked".
> 
> We have no data, network sockets don't have end-of-file indication (or would 
> returning a length of zero count?), and there is no other suitable errno that I saw.

The current behavious is definately not POSIX compliant; returning an error
is better in any case and that error does not necessarily have to be listed in the
ERRORS section of the recvmsg() function description in the standard.
Returning EIO would be an option and is listed as with the errors in POSIX.

--ms



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 19:23       ` Alan Cox
@ 2004-10-06 22:08         ` Martijn Sipkema
  2004-10-06 20:25           ` Alan Cox
  0 siblings, 1 reply; 191+ messages in thread
From: Martijn Sipkema @ 2004-10-06 22:08 UTC (permalink / raw)
  To: Alan Cox, Andries Brouwer; +Cc: Joris van Rantwijk, Linux Kernel Mailing List

From: "Alan Cox" <alan@lxorguk.ukuu.org.uk>
Sent: Wednesday, October 06, 2004 20:23
> On Mer, 2004-10-06 at 20:30, Andries Brouwer wrote:
> >        A  descriptor shall be considered ready for reading when a
> >        call to an input function with O_NONBLOCK clear would  not
> >        block,  whether  or  not  the function would transfer data
> >        successfully. (The function might return data, an  end-of-
> >        file  indication,  or  an  error other than one indicating
> >        that it is  blocked,  and  in  each  of  these  cases  the
> >        descriptor shall be considered ready for reading.)
> > 
> > As far as I can interpret these sentences, Linux does not conform.
> 
> Nor does anything else in that case. I guess we need a POSIX_ME_HARDER
> socket option.

The default should be a POSIX compliant socket IMHO; a POSIX_ME_NOT
option could provide better performance.

--ms



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 20:25           ` Alan Cox
@ 2004-10-06 22:15             ` Andries Brouwer
  2004-10-06 22:32               ` David S. Miller
  2004-10-06 23:25               ` YOSHIFUJI Hideaki / 吉藤英明
  2004-10-06 23:11             ` Willy Tarreau
  1 sibling, 2 replies; 191+ messages in thread
From: Andries Brouwer @ 2004-10-06 22:15 UTC (permalink / raw)
  To: Alan Cox
  Cc: davem, Martijn Sipkema, Andries Brouwer, Joris van Rantwijk,
	Linux Kernel Mailing List

On Wed, Oct 06, 2004 at 09:25:28PM +0100, Alan Cox wrote:

> The current setup has so far been found to break one app, after what
> three years. It can almost double performance. In this case it is very
> much POSIX_ME_HARDER, and perhaps longer term suggests the posix/sus
> people should revisit their API design.

Maybe. Have we really investigated and concluded that there is no
reasonable way to follow POSIX and not harm performance?

I would hope that checksum failure is not the fast path,
so at zeroth sight, not having looked at the code, it seems
that we could do rather elaborate things on checksum failure
if we wanted to.

One such thing might be to raise a flag "I/O error seen since last read"
where the flag is cleared by read and causes an EIO when there is no
other input.

(There may be many objections - maybe such a setup would break
more user space programs. Or maybe there are more ways select
is broken than just the "discarded because of bad checksum" way.
But it seems too early to just say "too bad, our select is not
the POSIX one".)

Andries

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 20:41         ` Neil Horman
@ 2004-10-06 22:27           ` Chris Friesen
  2004-10-06 23:32             ` Neil Horman
  2004-10-06 23:36             ` David S. Miller
  0 siblings, 2 replies; 191+ messages in thread
From: Chris Friesen @ 2004-10-06 22:27 UTC (permalink / raw)
  To: Neil Horman
  Cc: hzhong, 'Andries Brouwer', 'Joris van Rantwijk',
	'Alan Cox', 'Linux Kernel Mailing List'

Neil Horman wrote:

> Again, shouldn't this just mean that recvfrom should not be called 
> without the MSG_ERRQUEUE flag set?

Does a message with a bad udp checksum even get sent up as a queued error message?

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 20:38               ` Andries Brouwer
  2004-10-06 20:58                 ` Joris van Rantwijk
@ 2004-10-06 22:29                 ` David S. Miller
  2004-10-07 16:08                 ` Adrian Phillips
  2 siblings, 0 replies; 191+ messages in thread
From: David S. Miller @ 2004-10-06 22:29 UTC (permalink / raw)
  To: Andries Brouwer; +Cc: cfriesen, hzhong, aebr, joris, alan, linux-kernel

On Wed, 6 Oct 2004 22:38:18 +0200
Andries Brouwer <aebr@win.tue.nl> wrote:

> On Wed, Oct 06, 2004 at 02:18:23PM -0600, Chris Friesen wrote:
> 
> > In any case, the current behaviour is not compliant with the POSIX text 
> > that Andries posted.  Perhaps this should be documented somewhere?
> 
> For the time being I wrote (in select.2)
> 
> BUGS
>        It has been reported (Linux 2.6) that select may report  a

2.4.x has identical behavior

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 22:15             ` Andries Brouwer
@ 2004-10-06 22:32               ` David S. Miller
  2004-10-06 23:25               ` YOSHIFUJI Hideaki / 吉藤英明
  1 sibling, 0 replies; 191+ messages in thread
From: David S. Miller @ 2004-10-06 22:32 UTC (permalink / raw)
  To: Andries Brouwer; +Cc: alan, martijn, aebr, joris, linux-kernel

On Thu, 7 Oct 2004 00:15:12 +0200
Andries Brouwer <aebr@win.tue.nl> wrote:

> I would hope that checksum failure is not the fast path,
> so at zeroth sight, not having looked at the code, it seems
> that we could do rather elaborate things on checksum failure
> if we wanted to.

The code in question is in net/ipv4/udp.c:udp_recvmsg()

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 20:25           ` Alan Cox
  2004-10-06 22:15             ` Andries Brouwer
@ 2004-10-06 23:11             ` Willy Tarreau
  1 sibling, 0 replies; 191+ messages in thread
From: Willy Tarreau @ 2004-10-06 23:11 UTC (permalink / raw)
  To: Alan Cox
  Cc: Martijn Sipkema, Andries Brouwer, Joris van Rantwijk,
	Linux Kernel Mailing List

Hi,

On Wed, Oct 06, 2004 at 09:25:28PM +0100, Alan Cox wrote:
 
> The current setup has so far been found to break one app, after what
> three years. It can almost double performance. In this case it is very
> much POSIX_ME_HARDER, and perhaps longer term suggests the posix/sus
> people should revisit their API design.

Couldn't we simply make recvfrom() return 0 (no data) or -1 (whatever error)
in a case where select() had a reason to believe that there were data, but
that the copy function discovered that it was corrupted data ?

This should not impact performance and would let recvfrom() behave in a
smarter way. After all, I don't see a problem receiving 0 bytes.

Anyway, I'm all for non-blocking I/O, but I can understand the stupidity
of the situation.

Just a few thoughts, of course.
Willy


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 22:15             ` Andries Brouwer
  2004-10-06 22:32               ` David S. Miller
@ 2004-10-06 23:25               ` YOSHIFUJI Hideaki / 吉藤英明
  1 sibling, 0 replies; 191+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2004-10-06 23:25 UTC (permalink / raw)
  To: aebr; +Cc: alan, davem, martijn, joris, linux-kernel, yoshfuji

In article <20041006221512.GE4523@pclin040.win.tue.nl> (at Thu, 7 Oct 2004 00:15:12 +0200), Andries Brouwer <aebr@win.tue.nl> says:

> (There may be many objections - maybe such a setup would break
> more user space programs. Or maybe there are more ways select
> is broken than just the "discarded because of bad checksum" way.
> But it seems too early to just say "too bad, our select is not
> the POSIX one".)

select() != pselect(). :-)

--yoshfuji

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 22:27           ` Chris Friesen
@ 2004-10-06 23:32             ` Neil Horman
  2004-10-06 23:36             ` David S. Miller
  1 sibling, 0 replies; 191+ messages in thread
From: Neil Horman @ 2004-10-06 23:32 UTC (permalink / raw)
  To: Chris Friesen
  Cc: hzhong, 'Andries Brouwer', 'Joris van Rantwijk',
	'Alan Cox', 'Linux Kernel Mailing List'

Chris Friesen wrote:

> Neil Horman wrote:
>
>> Again, shouldn't this just mean that recvfrom should not be called 
>> without the MSG_ERRQUEUE flag set?
>
>
> Does a message with a bad udp checksum even get sent up as a queued 
> error message?

I thought thats exactly what MSG_ERRQUEUE was for, or am I mistaken?
Neil

-- 
/***************************************************
 *Neil Horman
 *Software Engineer
 *Red Hat, Inc.
 *nhorman@redhat.com
 *gpg keyid: 1024D / 0x92A74FA1
 *http://pgp.mit.edu
 ***************************************************/


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 21:45               ` Martijn Sipkema
@ 2004-10-06 23:35                 ` David S. Miller
  0 siblings, 0 replies; 191+ messages in thread
From: David S. Miller @ 2004-10-06 23:35 UTC (permalink / raw)
  To: Martijn Sipkema; +Cc: cfriesen, hzhong, aebr, joris, alan, linux-kernel

On Wed, 6 Oct 2004 22:45:08 +0100
"Martijn Sipkema" <martijn@entmoot.nl> wrote:

> The current behavious is definately not POSIX compliant; returning an error
> is better in any case

Not true at all.  We in fact used to return -EAGAIN if the
checksum failed and the socket was blocking, and Olaf Kirch
pointed that out so we changed it to block instead and wait
for the next packet to arrive instead which is the correct
behavior.

People, get the heck over this.  The kernel has behaved this way
for more than 3 years both in 2.4.x and 2.6.x.  The code in question
even exists in the 2.2.x sources as well.

Therefore, it would be totally pointless to change the behavior
now since anyone writing an application wishing it to work on
all existing Linux kernels needs to handle this case anyways.

Like stated earlier, use non-blocking sockets with select()/poll()
and be happy.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 20:06         ` Olivier Galibert
@ 2004-10-06 23:35           ` David S. Miller
  2004-10-07  0:19             ` Olivier Galibert
  0 siblings, 1 reply; 191+ messages in thread
From: David S. Miller @ 2004-10-06 23:35 UTC (permalink / raw)
  To: Olivier Galibert; +Cc: linux-kernel

On Wed, 6 Oct 2004 22:06:08 +0200
Olivier Galibert <galibert@pobox.com> wrote:

> On Wed, Oct 06, 2004 at 12:43:27PM -0700, Hua Zhong wrote:
> > How hard is it to treat the next read to the fd as NON_BLOCKING, even if
> > it's not set?
> 
> Programs don't expect EAGAIN from blocking sockets.

That's right, which is why we block instead.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 22:27           ` Chris Friesen
  2004-10-06 23:32             ` Neil Horman
@ 2004-10-06 23:36             ` David S. Miller
  1 sibling, 0 replies; 191+ messages in thread
From: David S. Miller @ 2004-10-06 23:36 UTC (permalink / raw)
  To: Chris Friesen; +Cc: nhorman, hzhong, aebr, joris, alan, linux-kernel

On Wed, 06 Oct 2004 16:27:11 -0600
Chris Friesen <cfriesen@nortelnetworks.com> wrote:

> Neil Horman wrote:
> 
> > Again, shouldn't this just mean that recvfrom should not be called 
> > without the MSG_ERRQUEUE flag set?
> 
> Does a message with a bad udp checksum even get sent up as a queued error message?

No, it doesn't, MSG_ERRQUEUE is used for other things.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 23:35           ` David S. Miller
@ 2004-10-07  0:19             ` Olivier Galibert
  2004-10-07  0:29               ` David S. Miller
  0 siblings, 1 reply; 191+ messages in thread
From: Olivier Galibert @ 2004-10-07  0:19 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel

On Wed, Oct 06, 2004 at 04:35:21PM -0700, David S. Miller wrote:
> On Wed, 6 Oct 2004 22:06:08 +0200
> Olivier Galibert <galibert@pobox.com> wrote:
> 
> > On Wed, Oct 06, 2004 at 12:43:27PM -0700, Hua Zhong wrote:
> > > How hard is it to treat the next read to the fd as NON_BLOCKING, even if
> > > it's not set?
> > 
> > Programs don't expect EAGAIN from blocking sockets.
> 
> That's right, which is why we block instead.

Programs don't expect a read to block after a positive select either,
so it doesn't really help.

  OG.


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07  0:19             ` Olivier Galibert
@ 2004-10-07  0:29               ` David S. Miller
  2004-10-07 10:56                 ` Martijn Sipkema
  2004-10-08  6:41                 ` Willy Tarreau
  0 siblings, 2 replies; 191+ messages in thread
From: David S. Miller @ 2004-10-07  0:29 UTC (permalink / raw)
  To: Olivier Galibert; +Cc: linux-kernel

On Thu, 7 Oct 2004 02:19:37 +0200
Olivier Galibert <galibert@pobox.com> wrote:

> On Wed, Oct 06, 2004 at 04:35:21PM -0700, David S. Miller wrote:
> > On Wed, 6 Oct 2004 22:06:08 +0200
> > Olivier Galibert <galibert@pobox.com> wrote:
> > 
> > > On Wed, Oct 06, 2004 at 12:43:27PM -0700, Hua Zhong wrote:
> > > > How hard is it to treat the next read to the fd as NON_BLOCKING, even if
> > > > it's not set?
> > > 
> > > Programs don't expect EAGAIN from blocking sockets.
> > 
> > That's right, which is why we block instead.
> 
> Programs don't expect a read to block after a positive select either,
> so it doesn't really help.

It absolutely does help the programs not using select(), using
blocking sockets, and not expecting -EAGAIN.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 15:29       ` Richard B. Johnson
  2004-10-06 15:42         ` David S. Miller
  2004-10-06 15:44         ` Lars Marowsky-Bree
@ 2004-10-07  1:16         ` Paul Jakma
  2004-10-07  7:10           ` Chris Friesen
  2 siblings, 1 reply; 191+ messages in thread
From: Paul Jakma @ 2004-10-07  1:16 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: David S. Miller, joris, linux-kernel

On Wed, 6 Oct 2004, Richard B. Johnson wrote:

> thing --not. Select must return correct information.

It does, it's just state that select() reported on changed by the 
time user called recvmsg.

> When a function call like select() says there are data available, 
> there must be data available, period.

There was, but there wasnt when recvmsg() was called. Time changes 
things..

> If not, it's broken and needs to be fixed. Requiring one to set 
> sockets to non-blocking is a poor work- around for an otherwise 
> fatal flaw.

Any application that expects socket read not to block should set 
O_NONBLOCK.

> Cheers,
> Dick Johnson

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
Kansas state law requires pedestrians crossing the highways at night to
wear tail lights.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07  1:16         ` Paul Jakma
@ 2004-10-07  7:10           ` Chris Friesen
  2004-10-07 11:53             ` Paul Jakma
  0 siblings, 1 reply; 191+ messages in thread
From: Chris Friesen @ 2004-10-07  7:10 UTC (permalink / raw)
  To: Paul Jakma; +Cc: Richard B. Johnson, David S. Miller, joris, linux-kernel

Paul Jakma wrote:
> On Wed, 6 Oct 2004, Richard B. Johnson wrote:
>> thing --not. Select must return correct information.
> It does, it's just state that select() reported on changed by the time 
> user called recvmsg.

Actually, in the single threaded case, the state did not change.  We just didn't 
actually check the state before returning from select().

>> When a function call like select() says there are data available, 
>> there must be data available, period.

> There was, but there wasnt when recvmsg() was called. Time changes things.

Actually, there wasn't.  The data was corrupt, therefore there was no data. 
Nothing changed with time, as the corrupt data was already present before we 
returned from select().

> Any application that expects socket read not to block should set 
> O_NONBLOCK.

POSIX says that if select() says a socket is readable, a read call will not 
block.  Obviously, we are not POSIX compliant.

There's nothing wrong with not being compliant, but it should be documented and 
we shouldn't claim to be compliant.

Chris

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07  0:29               ` David S. Miller
@ 2004-10-07 10:56                 ` Martijn Sipkema
  2004-10-08  6:41                 ` Willy Tarreau
  1 sibling, 0 replies; 191+ messages in thread
From: Martijn Sipkema @ 2004-10-07 10:56 UTC (permalink / raw)
  To: David S. Miller, Olivier Galibert; +Cc: linux-kernel

From: "David S. Miller" <davem@davemloft.net>
Sent: Thursday, October 07, 2004 01:29

> On Thu, 7 Oct 2004 02:19:37 +0200
> Olivier Galibert <galibert@pobox.com> wrote:
> 
> > On Wed, Oct 06, 2004 at 04:35:21PM -0700, David S. Miller wrote:
> > > On Wed, 6 Oct 2004 22:06:08 +0200
> > > Olivier Galibert <galibert@pobox.com> wrote:
> > > 
> > > > On Wed, Oct 06, 2004 at 12:43:27PM -0700, Hua Zhong wrote:
> > > > > How hard is it to treat the next read to the fd as NON_BLOCKING, even if
> > > > > it's not set?
> > > > 
> > > > Programs don't expect EAGAIN from blocking sockets.
> > > 
> > > That's right, which is why we block instead.
> > 
> > Programs don't expect a read to block after a positive select either,
> > so it doesn't really help.
> 
> It absolutely does help the programs not using select(), using
> blocking sockets, and not expecting -EAGAIN.

Both returning EAGAIN and blocking are not POSIX compliant. Returning EIO
from blocking sockets is an option. Another option would be to have select()
check the data so that when it returns it can guarantee available data. This would
not affect performance of recvmsg() without using select().


--ms



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07  7:10           ` Chris Friesen
@ 2004-10-07 11:53             ` Paul Jakma
  2004-10-07 13:32               ` Martijn Sipkema
  0 siblings, 1 reply; 191+ messages in thread
From: Paul Jakma @ 2004-10-07 11:53 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Richard B. Johnson, David S. Miller, joris, linux-kernel

On Thu, 7 Oct 2004, Chris Friesen wrote:

> Actually, in the single threaded case, the state did not change.  We just 
> didn't actually check the state before returning from select().

Right, so our perception of state (which for all useful purposes /is/ 
the state) changed - "we have data" -> "we had to throw out data due 
to bad checksum" is a change in kernel state at least, if not in the 
(now gone) data.

I'm not really a kernel person. From the application POV, in the 
single-threaded case (cause the multi-threaded case is fairly 
pathological anyway), there /will/ be time between the select and the 
recvmsg, things /can/ change, and obviously they do.

Treating select as anything other than a useful hints mechanism is 
going to get you into trouble - just see the Stevens' example others 
gave for a long-standing example, in addition to this (sane imho) 
Linuxism.

> Actually, there wasn't.  The data was corrupt, therefore there was 
> no data. Nothing changed with time, as the corrupt data was already 
> present before we returned from select().

Perception of state is as good as state here.

> POSIX says that if select() says a socket is readable, a read call 
> will not block.  Obviously, we are not POSIX compliant.

Right, yes, that seems to be clear now.

Though, I'd still say that any app that calls read/write functions 
without O_NONBLOCK set and that expects it will not block, is broken. 
Basic common sense really, never mind the fine details of POSIX on 
select(). ;)

> There's nothing wrong with not being compliant, but it should be 
> documented and we shouldn't claim to be compliant.

Right.

> Chris

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
A good reputation is more valuable than money.
 		-- Publilius Syrus

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 13:32               ` Martijn Sipkema
@ 2004-10-07 12:53                 ` Paul Jakma
  2004-10-07 13:12                   ` Richard B. Johnson
  2004-10-07 14:07                   ` Martijn Sipkema
  0 siblings, 2 replies; 191+ messages in thread
From: Paul Jakma @ 2004-10-07 12:53 UTC (permalink / raw)
  To: Martijn Sipkema
  Cc: Chris Friesen, Richard B. Johnson, David S. Miller, joris, linux-kernel

On Thu, 7 Oct 2004, Martijn Sipkema wrote:

> That there is time between the select() and recvmsg() calls is not 
> the issue; the data is only checked in the call to recvmsg(). 
> Actually the longer the time between select() and recvmsg(), the 
> larger the probability that valid data has been received.

But it is the issue.

Much can change between the select() and recvmsg - things outside of 
kernel control too, and it's long been known.

> But the standard clearly says otherwise.

Standards can have bugs too.

It's not healthy to take a corner-case situation from the standard on 
select() and apply it globally to all IO. (not in my mind anyway, 
whatever the standard says).

> Perhaps select()'s perception of state should be made to take possible
> corruption into account.

You'll /still/ run into problems, on other platforms too. Set socket 
to O_NONBLOCK and deal with it ;)

> Why would the POSIX standard say recvmsg() should not block if
> it did not intend it to be used in that way?

POSIX_ME_HARDER? ;)

> Wrong. IMHO it is not exactly a good thing to not be compliant on 
> such basic functionality.

Like I said, to my mind, any sane app should try avoiding assumption 
that kernel state remains same between select() and read/write - and 
O_NONBLOCK exists to deal nicely with the situation.

You really shouldnt assume select state is guaranteed not to change 
by time you get round to doing IO. It's not safe, and not just on 
Linux - whatever POSIX says.

> --ms

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
"An ounce of prevention is worth a pound of purge."

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 12:53                 ` Paul Jakma
@ 2004-10-07 13:12                   ` Richard B. Johnson
  2004-10-07 14:07                   ` Martijn Sipkema
  1 sibling, 0 replies; 191+ messages in thread
From: Richard B. Johnson @ 2004-10-07 13:12 UTC (permalink / raw)
  To: Paul Jakma
  Cc: Martijn Sipkema, Chris Friesen, David S. Miller, joris, linux-kernel

On Thu, 7 Oct 2004, Paul Jakma wrote:

> On Thu, 7 Oct 2004, Martijn Sipkema wrote:
>
>> That there is time between the select() and recvmsg() calls is not the 
>> issue; the data is only checked in the call to recvmsg(). Actually the 
>> longer the time between select() and recvmsg(), the larger the probability 
>> that valid data has been received.
>
> But it is the issue.
>
> Much can change between the select() and recvmsg - things outside of kernel 
> control too, and it's long been known.
>
>> But the standard clearly says otherwise.
>
> Standards can have bugs too.
>
> It's not healthy to take a corner-case situation from the standard on 
> select() and apply it globally to all IO. (not in my mind anyway, whatever 
> the standard says).
>
>> Perhaps select()'s perception of state should be made to take possible
>> corruption into account.
>
> You'll /still/ run into problems, on other platforms too. Set socket to 
> O_NONBLOCK and deal with it ;)
>
>> Why would the POSIX standard say recvmsg() should not block if
>> it did not intend it to be used in that way?
>
> POSIX_ME_HARDER? ;)
>
>> Wrong. IMHO it is not exactly a good thing to not be compliant on such 
>> basic functionality.
>
> Like I said, to my mind, any sane app should try avoiding assumption that 
> kernel state remains same between select() and read/write - and O_NONBLOCK 
> exists to deal nicely with the situation.
>
> You really shouldnt assume select state is guaranteed not to change by time 
> you get round to doing IO. It's not safe, and not just on Linux - whatever 
> POSIX says.
>
>> --ms
>
> regards,

Hmmm. When somebody is sleeping in select(), waiting for that
wake_up_interruptible() call that signals that data are
present, how could data not be present anymore when the
only listener makes the call to read() the data?

Simple. It's a BUG. A no-nonsense BUG. The wake_up_interruptible()
call must be made only after valid data are available for the
listener to read. Anything else is a BUG. There are several
bugs present in the logic. The first being that it will
take some time for a listener to wake up, therefore it's
okay to signal the listener before data are ready. This
is the primary BUG. The code is just details. With the
new preemptive kernel, it becomes increasingly necessary
to perform things in the correct order. The second known
BUG is that somebody decided to remove basic functionality
to improve some benchmarks.

Now, one can update the man pages and state to not use poll()
or select() for their intended purposes, or they can fix the
BUG. It's just that simple.

In the meantime, what is the function that should be used
to emulate the correct behavior of select()? If there isn't
an answer to that, then the BUG needs to be fixed.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.5-1.358-noreg on an i686 machine (5537.79 BogoMips).
             Note 96.31% of all statistics are fiction.


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 14:07                   ` Martijn Sipkema
@ 2004-10-07 13:19                     ` Paul Jakma
  2004-10-07 13:36                     ` Paul Jakma
                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 191+ messages in thread
From: Paul Jakma @ 2004-10-07 13:19 UTC (permalink / raw)
  To: Martijn Sipkema
  Cc: Chris Friesen, Richard B. Johnson, David S. Miller, joris, linux-kernel

On Thu, 7 Oct 2004, Martijn Sipkema wrote:

> Would you care to provide any real answers or are you just telling
> me to shut up because whatever Linux does is good, and not appear
> unreasonable by adding a ;) ..?

No, I'm saying it's simple good practice to set O_NONBLOCK on sockets 
if one expects not to block - any other expectation is not robust, 
never mind what POSIX says about select().

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
You know you are getting old when you think you should drive the speed limit.
 		-- E.A. Gilliam

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 11:53             ` Paul Jakma
@ 2004-10-07 13:32               ` Martijn Sipkema
  2004-10-07 12:53                 ` Paul Jakma
  0 siblings, 1 reply; 191+ messages in thread
From: Martijn Sipkema @ 2004-10-07 13:32 UTC (permalink / raw)
  To: Paul Jakma, Chris Friesen
  Cc: Richard B. Johnson, David S. Miller, joris, linux-kernel

From: "Paul Jakma" <paul@clubi.ie>
Sent: Thursday, October 07, 2004 12:53


> On Thu, 7 Oct 2004, Chris Friesen wrote:
> 
> > Actually, in the single threaded case, the state did not change.  We just 
> > didn't actually check the state before returning from select().
> 
> Right, so our perception of state (which for all useful purposes /is/ 
> the state) changed - "we have data" -> "we had to throw out data due 
> to bad checksum" is a change in kernel state at least, if not in the 
> (now gone) data.
> 
> I'm not really a kernel person. From the application POV, in the 
> single-threaded case (cause the multi-threaded case is fairly 
> pathological anyway), there /will/ be time between the select and the 
> recvmsg, things /can/ change, and obviously they do.

That there is time between the select() and recvmsg() calls is not the
issue; the data is only checked in the call to recvmsg(). Actually the
longer the time between select() and recvmsg(), the larger the probability
that valid data has been received.

> Treating select as anything other than a useful hints mechanism is 
> going to get you into trouble - just see the Stevens' example others 
> gave for a long-standing example, in addition to this (sane imho) 
> Linuxism.

But the standard clearly says otherwise.

> > Actually, there wasn't.  The data was corrupt, therefore there was 
> > no data. Nothing changed with time, as the corrupt data was already 
> > present before we returned from select().
> 
> Perception of state is as good as state here.

Perhaps select()'s perception of state should be made to take possible
corruption into account.

> > POSIX says that if select() says a socket is readable, a read call 
> > will not block.  Obviously, we are not POSIX compliant.
> 
> Right, yes, that seems to be clear now.
> 
> Though, I'd still say that any app that calls read/write functions 
> without O_NONBLOCK set and that expects it will not block, is broken. 
> Basic common sense really, never mind the fine details of POSIX on 
> select(). ;)

Why would the POSIX standard say recvmsg() should not block if
it did not intend it to be used in that way?

> > There's nothing wrong with not being compliant, but it should be 
> > documented and we shouldn't claim to be compliant.
> 
> Right.

Wrong. IMHO it is not exactly a good thing to not be compliant on
such basic functionality.


--ms


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 14:07                   ` Martijn Sipkema
  2004-10-07 13:19                     ` Paul Jakma
@ 2004-10-07 13:36                     ` Paul Jakma
  2004-10-07 15:01                       ` Jean-Sebastien Trottier
  2004-10-07 13:45                     ` Alan Cox
  2004-10-07 13:48                     ` UDP recvmsg blocks after select(), 2.6 bug? Alan Cox
  3 siblings, 1 reply; 191+ messages in thread
From: Paul Jakma @ 2004-10-07 13:36 UTC (permalink / raw)
  To: Martijn Sipkema
  Cc: Chris Friesen, Richard B. Johnson, David S. Miller, joris, Linux Kernel

On Thu, 7 Oct 2004, Martijn Sipkema wrote:

> Any sane application would be written for the POSIX API as 
> described in the standard, and a sane kernel should IMHO implement 
> that standard whenever possible.

NB: I dont disagree with you.

Just the impression I get is that there is no way to avoid this 
situation without a serious performance impact, and that the 
optimisation shouldnt really any affect any healthy app. (any which 
are really should be setting O_NONBLOCK).

If you could follow the spec without significantly harming 
performance, then I'd agree spec should be followed.

I dont really have anything useful to say other than that, IMHO, a 
sane app should be using O_NONBLOCK if it really does not want to 
block, so I shall now quietly back away from this thread.

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
What this country needs is a good five cent microcomputer.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 14:07                   ` Martijn Sipkema
  2004-10-07 13:19                     ` Paul Jakma
  2004-10-07 13:36                     ` Paul Jakma
@ 2004-10-07 13:45                     ` Alan Cox
  2004-10-07 16:32                       ` Martijn Sipkema
  2004-10-07 13:48                     ` UDP recvmsg blocks after select(), 2.6 bug? Alan Cox
  3 siblings, 1 reply; 191+ messages in thread
From: Alan Cox @ 2004-10-07 13:45 UTC (permalink / raw)
  To: Martijn Sipkema
  Cc: Paul Jakma, Chris Friesen, Richard B. Johnson, David S. Miller,
	joris, Linux Kernel Mailing List

> Read the standard. The behavious of select() on sockets is explicitely
> described.

For a strict posix system, but then if we were a strict posix/sus system
you wouldn't be able to use mmap. Also the kernel doesn't claim to
implement posix behaviour, it avoids those areas were posix is stupid.

> > POSIX_ME_HARDER? ;)
> 
> Would you care to provide any real answers or are you just telling
> me to shut up because whatever Linux does is good, and not appear
> unreasonable by adding a ;) ..?

POSIX_ME_HARDER was an environment variable GNU tools used when users
wanted them to do stupid but posix mandated things instead of sensible
things. It was later changed to POSIXLY_CORRECT, which lost the point
somewhat.

> > You really shouldnt assume select state is guaranteed not to change 
> > by time you get round to doing IO. It's not safe, and not just on 
> > Linux - whatever POSIX says.
> 
> Any sane application would be written for the POSIX API as described
> in the standard, and a sane kernel should IMHO implement that standard
> whenever possible.

I doubt that. Sane applications are written to the BSD socket API not
POSIX 1003.1g draft 6.4 and relatives.

Alan


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 14:07                   ` Martijn Sipkema
                                       ` (2 preceding siblings ...)
  2004-10-07 13:45                     ` Alan Cox
@ 2004-10-07 13:48                     ` Alan Cox
  2004-10-07 14:57                       ` Richard B. Johnson
  2004-10-07 15:18                       ` Adam Heath
  3 siblings, 2 replies; 191+ messages in thread
From: Alan Cox @ 2004-10-07 13:48 UTC (permalink / raw)
  To: Martijn Sipkema
  Cc: Paul Jakma, Chris Friesen, Richard B. Johnson, David S. Miller,
	joris, Linux Kernel Mailing List

On Iau, 2004-10-07 at 15:07, Martijn Sipkema wrote:
> > Much can change between the select() and recvmsg - things outside of 
> > kernel control too, and it's long been known.
> 
> There is no change; the current implementation just checks the validity of
> the data in the recvmsg() call and not during select().

The accept one is documented by Stevens and well known. In the UDP case
currently we could get precise behaviour - by halving performance of UDP
applications like video streaming. We probably don't want to  because we
can respond intelligently to OOM situations by freeing the queue if we
don't enforce such a silly rule.


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 12:53                 ` Paul Jakma
  2004-10-07 13:12                   ` Richard B. Johnson
@ 2004-10-07 14:07                   ` Martijn Sipkema
  2004-10-07 13:19                     ` Paul Jakma
                                       ` (3 more replies)
  1 sibling, 4 replies; 191+ messages in thread
From: Martijn Sipkema @ 2004-10-07 14:07 UTC (permalink / raw)
  To: Paul Jakma
  Cc: Chris Friesen, Richard B. Johnson, David S. Miller, joris, linux-kernel

From: "Paul Jakma" <paul@clubi.ie>
Sent: Thursday, October 07, 2004 13:53


> On Thu, 7 Oct 2004, Martijn Sipkema wrote:
> 
> > That there is time between the select() and recvmsg() calls is not 
> > the issue; the data is only checked in the call to recvmsg(). 
> > Actually the longer the time between select() and recvmsg(), the 
> > larger the probability that valid data has been received.
> 
> But it is the issue.
> 
> Much can change between the select() and recvmsg - things outside of 
> kernel control too, and it's long been known.

There is no change; the current implementation just checks the validity of
the data in the recvmsg() call and not during select().

> > But the standard clearly says otherwise.
> 
> Standards can have bugs too.
> 
> It's not healthy to take a corner-case situation from the standard on 
> select() and apply it globally to all IO. (not in my mind anyway, 
> whatever the standard says).


Read the standard. The behavious of select() on sockets is explicitely
described.

> > Perhaps select()'s perception of state should be made to take possible
> > corruption into account.
> 
> You'll /still/ run into problems, on other platforms too. Set socket 
> to O_NONBLOCK and deal with it ;)

What problems?

> > Why would the POSIX standard say recvmsg() should not block if
> > it did not intend it to be used in that way?
> 
> POSIX_ME_HARDER? ;)

Would you care to provide any real answers or are you just telling
me to shut up because whatever Linux does is good, and not appear
unreasonable by adding a ;) ..?

> > Wrong. IMHO it is not exactly a good thing to not be compliant on 
> > such basic functionality.
> 
> Like I said, to my mind, any sane app should try avoiding assumption 
> that kernel state remains same between select() and read/write - and 
> O_NONBLOCK exists to deal nicely with the situation.
> 
> You really shouldnt assume select state is guaranteed not to change 
> by time you get round to doing IO. It's not safe, and not just on 
> Linux - whatever POSIX says.

Any sane application would be written for the POSIX API as described
in the standard, and a sane kernel should IMHO implement that standard
whenever possible.

--ms


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 16:32                       ` Martijn Sipkema
@ 2004-10-07 14:50                         ` Alan Cox
  2004-10-07 21:58                           ` mmap specification - was: ... select specification Andries Brouwer
  0 siblings, 1 reply; 191+ messages in thread
From: Alan Cox @ 2004-10-07 14:50 UTC (permalink / raw)
  To: Martijn Sipkema
  Cc: Paul Jakma, Chris Friesen, Richard B. Johnson, David S. Miller,
	joris, Linux Kernel Mailing List

On Iau, 2004-10-07 at 17:32, Martijn Sipkema wrote:
> > For a strict posix system, but then if we were a strict posix/sus system
> > you wouldn't be able to use mmap. Also the kernel doesn't claim to
> > implement posix behaviour, it avoids those areas were posix is stupid.
> 
> mmap() _is_ in POSIX AFAIK. Also, there are other standards for things
> that aren't in POSIX, but these are supersets.

I'll quote SuSv3 to illustrate the danger of "specifications"

"The system shall always zero-fill any partial page at the end of an
object. Further, the system shall never write out any modified portions
of the last page of an object which are beyond its end. References
within the address range starting at pa and continuing for len bytes to
whole pages following the end of an object shall result in delivery of a
SIGBUS signal."

Its a mistake, its not apparent if its actually meant something entirely
different or someone forgot a "not", or what is going on.

It certainly isnt a useful definition of mmap.

> > POSIX_ME_HARDER was an environment variable GNU tools used when users
> > wanted them to do stupid but posix mandated things instead of sensible
> > things. It was later changed to POSIXLY_CORRECT, which lost the point
> > somewhat.
> 
> I actually also don't agree with this behaviour of the GNU tools..

Thankfully you seem to be in the minority

> > I doubt that. Sane applications are written to the BSD socket API not
> > POSIX 1003.1g draft 6.4 and relatives.
> 
> Perhaps... I get the idea that I just seem to value a standard operating
> system interface more than you do; it would be a loss IMHO if people
> were forced to write for Linux instead of being able to write portable
> applications.

Portable applications don't work if you get that close to the grey areas
of a standard. Also you'll find the SuS spec simply doesn't work sanely
for some applications. Bind() is instant connect can block for example.
Now try implementing netbios - bind blocks, connect is instant, and the
posix spec is toiler paper grade at this point.

Likewise we have an MS-DOS FS. Its not POSIX behaviour. Should we remove
it, require a different set of system calls or decide that religious
application of standards is not useful.

Linux applies standards pragmatically. A setsockopt for strict socket
compliance might make sense. It'll trash the app performance for UDP but
it would allow users to select useful v posix.

Alan


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 13:48                     ` UDP recvmsg blocks after select(), 2.6 bug? Alan Cox
@ 2004-10-07 14:57                       ` Richard B. Johnson
  2004-10-07 15:18                       ` Adam Heath
  1 sibling, 0 replies; 191+ messages in thread
From: Richard B. Johnson @ 2004-10-07 14:57 UTC (permalink / raw)
  To: Alan Cox
  Cc: Martijn Sipkema, Paul Jakma, Chris Friesen, David S. Miller,
	joris, Linux Kernel Mailing List

On Thu, 7 Oct 2004, Alan Cox wrote:

> On Iau, 2004-10-07 at 15:07, Martijn Sipkema wrote:
>>> Much can change between the select() and recvmsg - things outside of
>>> kernel control too, and it's long been known.
>>
>> There is no change; the current implementation just checks the validity of
>> the data in the recvmsg() call and not during select().
>
> The accept one is documented by Stevens and well known. In the UDP case
> currently we could get precise behaviour - by halving performance of UDP
> applications like video streaming. We probably don't want to  because we
> can respond intelligently to OOM situations by freeing the queue if we
> don't enforce such a silly rule.
>

Well if you accept(pun_intended) what Stevens says, then check his
web-site on what he teaches about select() and sockets. His demo
code certainly requires that select() not fail.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.5-1.358-noreg on an i686 machine (5537.79 BogoMips).
             Note 96.31% of all statistics are fiction.


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 13:36                     ` Paul Jakma
@ 2004-10-07 15:01                       ` Jean-Sebastien Trottier
  2004-10-07 16:20                         ` Chris Friesen
  0 siblings, 1 reply; 191+ messages in thread
From: Jean-Sebastien Trottier @ 2004-10-07 15:01 UTC (permalink / raw)
  To: Linux Kernel

Just an outsider's view of someone that has been following this thread:

Could select() have 2 different behaviors depending on wether the
O_NONBLOCK flag is set or not on the socket.

1. If O_NONBLOCK is set, it can immediately return that the socket is
ready to be read (while CRC and possibly other checks are being done in
background). A subsequent call to recvfrom may return EWOULDBLOCK if in
the mean time the data was discarded by the kernel. This is current
behavior.

An application using O_NONBLOCK should be ready to deal with
consequences.

2. In the case where O_NONBLOCK is not set, select() could wait for all
the checks to be done before deciding to return or not. In this case the
meaning would be "there is data ready", NOT "there *might* be data
ready".

This way, there should not be any performance hits for (IMHO) well built
applications that use O_NONBLOCK. And the POSIX standard would not be
broken otherwise and applications that don't use O_NONBLOCK can rely on
actual data being read in recvfrom.

Just my 2 cents...
Sebastien

On Thu, Oct 07, 2004 at 02:36:26PM +0100, Paul Jakma wrote:
> On Thu, 7 Oct 2004, Martijn Sipkema wrote:
> 
> >Any sane application would be written for the POSIX API as 
> >described in the standard, and a sane kernel should IMHO implement 
> >that standard whenever possible.
> 
> NB: I dont disagree with you.
> 
> Just the impression I get is that there is no way to avoid this 
> situation without a serious performance impact, and that the 
> optimisation shouldnt really any affect any healthy app. (any which 
> are really should be setting O_NONBLOCK).
> 
> If you could follow the spec without significantly harming 
> performance, then I'd agree spec should be followed.
> 
> I dont really have anything useful to say other than that, IMHO, a 
> sane app should be using O_NONBLOCK if it really does not want to 
> block, so I shall now quietly back away from this thread.
> 
> regards,
> -- 
> Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
> Fortune:
> What this country needs is a good five cent microcomputer.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 13:48                     ` UDP recvmsg blocks after select(), 2.6 bug? Alan Cox
  2004-10-07 14:57                       ` Richard B. Johnson
@ 2004-10-07 15:18                       ` Adam Heath
  2004-10-07 16:39                         ` Martijn Sipkema
  1 sibling, 1 reply; 191+ messages in thread
From: Adam Heath @ 2004-10-07 15:18 UTC (permalink / raw)
  Cc: Linux Kernel Mailing List

On Thu, 7 Oct 2004, Alan Cox wrote:

> On Iau, 2004-10-07 at 15:07, Martijn Sipkema wrote:
> > > Much can change between the select() and recvmsg - things outside of
> > > kernel control too, and it's long been known.
> >
> > There is no change; the current implementation just checks the validity of
> > the data in the recvmsg() call and not during select().
>
> The accept one is documented by Stevens and well known. In the UDP case
> currently we could get precise behaviour - by halving performance of UDP
> applications like video streaming. We probably don't want to  because we
> can respond intelligently to OOM situations by freeing the queue if we
> don't enforce such a silly rule.

Also, pkts could be dropped even for TCP sockets.  TCP will cause the pkt to
be retransmitted, but while that is occuring, the read that was prompted by
the select will still block.

So, any app that does not use O_NONBLOCK is broken, if they assume that a
successful select will indicate a nonblocking read/recvmsg.


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 20:38               ` Andries Brouwer
  2004-10-06 20:58                 ` Joris van Rantwijk
  2004-10-06 22:29                 ` David S. Miller
@ 2004-10-07 16:08                 ` Adrian Phillips
  2 siblings, 0 replies; 191+ messages in thread
From: Adrian Phillips @ 2004-10-07 16:08 UTC (permalink / raw)
  To: Andries Brouwer; +Cc: linux-kernel

>>>>> "Andries" == Andries Brouwer <aebr@win.tue.nl> writes:

    Andries> On Wed, Oct 06, 2004 at 02:18:23PM -0600, Chris Friesen
    Andries> wrote:
    >> In any case, the current behaviour is not compliant with the
    >> POSIX text that Andries posted.  Perhaps this should be
    >> documented somewhere?

    Andries> For the time being I wrote (in select.2)

    Andries> BUGS It has been reported (Linux 2.6) that select may
    Andries> report a socket file descriptor as "ready for reading",
    Andries> while nev- ertheless a subsequent read blocks.  This
    Andries> could perhaps happen when data has arrived but upon
    Andries> examination has wrong checksum and is discarded. Thus it
    Andries> may be safer to use non-blocking I/O.

On my Debian stable and testing boxes the following text is in man 2
select (obtained from ftp://ftp.win.tue.nl/pub/linux-local/manpages)
:-

       Three independent sets of descriptors are watched.  Those listed in readfds will be watched to
       see  if  characters  become  available  for reading (more precisely, to see if a read will not
                                                                                        ^^^^^^^^^^^^^
       block - in particular, a file descriptor is also ready on end-of-file), those in writefds will
       ^^^^^

(and seems to be the same in the latest tarball) which means that
there is a good possibility that a number of people, myself including,
rely on read not blocking on a fd after select indicates that
"characters have become available". This section should be altered in
some way as well (perhaps referencing the BUGS section).

Sincerely,

Adrian Phillips

-- 
Who really wrote the works of William Shakespeare ?
http://www.pbs.org/wgbh/pages/frontline/shakespeare/

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 16:39                         ` Martijn Sipkema
@ 2004-10-07 16:09                           ` Mark Mielke
  2004-10-07 17:18                             ` Chris Friesen
  0 siblings, 1 reply; 191+ messages in thread
From: Mark Mielke @ 2004-10-07 16:09 UTC (permalink / raw)
  To: Martijn Sipkema; +Cc: Adam Heath, Linux Kernel Mailing List

On Thu, Oct 07, 2004 at 05:39:06PM +0100, Martijn Sipkema wrote:
> Aaargh... I'm going to shut up about this now, because this is clearly going
> nowhere, but you are saying that any application that expects behaviour as
> defined in POSIX is broken, and that bothers me..

I like the idea submitted by somebody else. If O_NONBLOCK is enabled,
the current semantics apply. If O_NONBLOCK is not enabled, select()
takes longer to run and verifies that data is actually available. No
cost for applications that people consider to be 'proper', and the
standard, and saner behaviour, is implemented for the rest. If
somebody could provide a patch for this, that was clean, easy to
maintain, and could be proven to have a minimal impact on performance,
I bet the opponents to this would quiet down.

I don't have the time or experience to do this right now. It would take
me over a month just to learn what would need to be done. So, it will
have to be somebody else...

There is one claim I'd like to question - the claim that select()
would be slowed down unnecessarily, even if the behaviour was changed
for both O_NONBLOCK enabled. Isn't it more expensive to allow the
application to be woken up, and poll using read(), than to just do a
quick check in the kernel and not tell the application there is data,
when there really isn't? This sounds to me like a question of
implementation - if select() did the read check, including checksums,
or whatever, the checks are done. The application doesn't get waken
up, or by the time it gets to read(), the data is already
available. No loss. I'm thinking that it must be the current
implementation that would make this expensive to implement, rather
than some convincing theoretical explanation. If this is true, it's
fine - we all live in the practical world, but we should admit it.

Cheers,
mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 15:01                       ` Jean-Sebastien Trottier
@ 2004-10-07 16:20                         ` Chris Friesen
  2004-10-07 18:20                           ` Hua Zhong
  0 siblings, 1 reply; 191+ messages in thread
From: Chris Friesen @ 2004-10-07 16:20 UTC (permalink / raw)
  To: Jean-Sebastien Trottier; +Cc: Linux Kernel, Alan Cox, David S. Miller

Jean-Sebastien Trottier wrote:
> Just an outsider's view of someone that has been following this thread:
> 
> Could select() have 2 different behaviors depending on wether the
> O_NONBLOCK flag is set or not on the socket.
> 
> 1. If O_NONBLOCK is set, it can immediately return that the socket is
> ready to be read

> 2. In the case where O_NONBLOCK is not set, select() could wait for all
> the checks to be done before deciding to return or not. In this case the
> meaning would be "there is data ready", NOT "there *might* be data
> ready".

This actually sounds quite interesting.

For applications that are prepared to handle the nonblocking case, you get full 
speed.  For applications coded to POSIX, you get correctness.

It does mean that select() is now a bit more complicated, but applications 
become much easier to write.

Chris


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 13:45                     ` Alan Cox
@ 2004-10-07 16:32                       ` Martijn Sipkema
  2004-10-07 14:50                         ` Alan Cox
  0 siblings, 1 reply; 191+ messages in thread
From: Martijn Sipkema @ 2004-10-07 16:32 UTC (permalink / raw)
  To: Alan Cox
  Cc: Paul Jakma, Chris Friesen, Richard B. Johnson, David S. Miller,
	joris, Linux Kernel Mailing List

From: "Alan Cox" <alan@lxorguk.ukuu.org.uk>
> > Read the standard. The behavious of select() on sockets is explicitely
> > described.
> 
> For a strict posix system, but then if we were a strict posix/sus system
> you wouldn't be able to use mmap. Also the kernel doesn't claim to
> implement posix behaviour, it avoids those areas were posix is stupid.

mmap() _is_ in POSIX AFAIK. Also, there are other standards for things
that aren't in POSIX, but these are supersets.

> > > POSIX_ME_HARDER? ;)
> > 
> > Would you care to provide any real answers or are you just telling
> > me to shut up because whatever Linux does is good, and not appear
> > unreasonable by adding a ;) ..?
> 
> POSIX_ME_HARDER was an environment variable GNU tools used when users
> wanted them to do stupid but posix mandated things instead of sensible
> things. It was later changed to POSIXLY_CORRECT, which lost the point
> somewhat.

I actually also don't agree with this behaviour of the GNU tools..

> > > You really shouldnt assume select state is guaranteed not to change 
> > > by time you get round to doing IO. It's not safe, and not just on 
> > > Linux - whatever POSIX says.
> > 
> > Any sane application would be written for the POSIX API as described
> > in the standard, and a sane kernel should IMHO implement that standard
> > whenever possible.
> 
> I doubt that. Sane applications are written to the BSD socket API not
> POSIX 1003.1g draft 6.4 and relatives.

Perhaps... I get the idea that I just seem to value a standard operating
system interface more than you do; it would be a loss IMHO if people
were forced to write for Linux instead of being able to write portable
applications.

The POSIX interface shouldn't become something one reads to get an idea
of how things wil _not_ work on Linux.


--ms



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 15:18                       ` Adam Heath
@ 2004-10-07 16:39                         ` Martijn Sipkema
  2004-10-07 16:09                           ` Mark Mielke
  0 siblings, 1 reply; 191+ messages in thread
From: Martijn Sipkema @ 2004-10-07 16:39 UTC (permalink / raw)
  To: Adam Heath; +Cc: Linux Kernel Mailing List

From: "Adam Heath" <doogie@debian.org>
> On Thu, 7 Oct 2004, Alan Cox wrote:
> 
> > On Iau, 2004-10-07 at 15:07, Martijn Sipkema wrote:
> > > > Much can change between the select() and recvmsg - things outside of
> > > > kernel control too, and it's long been known.
> > >
> > > There is no change; the current implementation just checks the validity of
> > > the data in the recvmsg() call and not during select().
> >
> > The accept one is documented by Stevens and well known. In the UDP case
> > currently we could get precise behaviour - by halving performance of UDP
> > applications like video streaming. We probably don't want to  because we
> > can respond intelligently to OOM situations by freeing the queue if we
> > don't enforce such a silly rule.
> 
> Also, pkts could be dropped even for TCP sockets.  TCP will cause the pkt to
> be retransmitted, but while that is occuring, the read that was prompted by
> the select will still block.
> 
> So, any app that does not use O_NONBLOCK is broken, if they assume that a
> successful select will indicate a nonblocking read/recvmsg.

Aaargh... I'm going to shut up about this now, because this is clearly going
nowhere, but you are saying that any application that expects behaviour as
defined in POSIX is broken, and that bothers me..


--ms


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 16:09                           ` Mark Mielke
@ 2004-10-07 17:18                             ` Chris Friesen
  0 siblings, 0 replies; 191+ messages in thread
From: Chris Friesen @ 2004-10-07 17:18 UTC (permalink / raw)
  To: Mark Mielke; +Cc: Martijn Sipkema, Adam Heath, Linux Kernel Mailing List

Mark Mielke wrote:

> Isn't it more expensive to allow the
> application to be woken up, and poll using read(), than to just do a
> quick check in the kernel and not tell the application there is data,
> when there really isn't?

The issue is caching.

We have to do a pass over the data to copy it to userspace.  If we do the 
checksum verification then, it's basically free since its already in the cache.

If we do it at select() time, we end up having to do two passes over every 
packet, one to verify the checksum, and one to pass it to userspace.

I agree with you though, it'd be nice to have select() change behaviour based on 
whether the socket is blocking or not.

Chris

^ permalink raw reply	[flat|nested] 191+ messages in thread

* RE: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 16:20                         ` Chris Friesen
@ 2004-10-07 18:20                           ` Hua Zhong
  2004-10-07 18:33                             ` Chris Friesen
  0 siblings, 1 reply; 191+ messages in thread
From: Hua Zhong @ 2004-10-07 18:20 UTC (permalink / raw)
  To: 'Chris Friesen', 'Jean-Sebastien Trottier'
  Cc: 'Linux Kernel', 'Alan Cox', 'David S. Miller'

> This actually sounds quite interesting.
> 
> For applications that are prepared to handle the nonblocking 
> case, you get full speed.  For applications coded to POSIX, 
> you get correctness.
> 
> It does mean that select() is now a bit more complicated, but 
> applications become much easier to write.

It was my original proposal. The only question is to return which error
code. We cannot return EAGAIN as Posix explicitly disallows it. Is EIO good?
Or some other new error code?

As far as implmentation goes, we probably need a "check_data" function in
the f_ops. Default it could be NULL.

The question is that is it worth the trouble to be Posix complaint? Seems
most developers think not, especially since it has been this way for so long
a time.

Hua


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 18:20                           ` Hua Zhong
@ 2004-10-07 18:33                             ` Chris Friesen
  2004-10-07 22:41                               ` Martijn Sipkema
  0 siblings, 1 reply; 191+ messages in thread
From: Chris Friesen @ 2004-10-07 18:33 UTC (permalink / raw)
  To: hzhong
  Cc: 'Jean-Sebastien Trottier', 'Linux Kernel',
	'Alan Cox', 'David S. Miller'

Hua Zhong wrote:

> It was my original proposal. The only question is to return which error
> code. We cannot return EAGAIN as Posix explicitly disallows it. Is EIO good?
> Or some other new error code?

Since we wouldn't be posix compliant anyway in the nonblocking case, we may as 
well return EAGAIN--it's the most appropriate.

Chris

^ permalink raw reply	[flat|nested] 191+ messages in thread

* RE: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-06 14:52 UDP recvmsg blocks after select(), 2.6 bug? Joris van Rantwijk
                   ` (3 preceding siblings ...)
  2004-10-06 16:41 ` Alan Cox
@ 2004-10-07 19:31 ` David Schwartz
  2004-10-07 22:36   ` Martijn Sipkema
  4 siblings, 1 reply; 191+ messages in thread
From: David Schwartz @ 2004-10-07 19:31 UTC (permalink / raw)
  To: linux-kernel


> I have a problem where the sequence of events is as follows:
>  - application does select() on a UDP socket descriptor
>  - select returns success with descriptor ready for reading
>  - application does recvfrom() on this descriptor and this recvfrom()
>    blocks forever

	POSIX does not require the kernel to predict the future. The only guarantee
against having a socket operation block is found in non-blocking sockets.

> My understanding of POSIX is limited, but it seems to me that a read call
> must never block after select just said that it's ok to read from the
> descriptor. So any such behaviour would be a kernel bug.

	Suppose hypothetically that we add a new network protocol that permits the
sender to 'invalidate' data after it's received by the remote network stack
and before it's accepted by the remote application. Would you argue that
'select'ing must be considered a read in this case? Even though an
application might 'select' on a socket with no intention to follow up with a
read? Remember, the 'select' operation is supposed to be protocol neutral.

> From a brief look at the kernel UDP code, I suspect a problem in
> net/ipv4/udp.c, udp_recvmsg(): it reads the first available datagram
> from the queue, then checks the UDP checksum. If the UDP checksum fails at
> this point, the datagram is discarded and the process blocks
> until the next
> datagram arrives.

	You should understand a hit on 'select' to mean that something happened,
and that it would therefore behoove your application to try the operation it
wants to perform again. The 'select' operation is not fine-grained enough to
know what operation you planned, and whether that particular operation would
block.

	Suppose, for example, that instead of using 'read' you used 'recvmsg', and
we add an option to 'recvmsg' to allow you to read datagrams with bad
checksums. What should 'select' do if a datagram is received with a bad
checksum? It has no idea what flavor of 'recvmsg' you're going to call, so
it can't know if your operation is going to block or not.

> Could someone please help me track this problem?
> Am I correct in my reasoning that the select() -> recvmsg() sequence must
> never block?

	No, you are incorrect. Consider, again, a 'recvmsg' flag to allow you to
receive messages even if they have bad checksums versus one that blocks
until a message with a valid checksum is received. The 'select' function
just isn't smart enough.

	Consider a 'select' for write on a TCP socket. How does 'select' know how
many bytes you're going to write? Again, a 'select' hit just indicates
something relevant has happened, it *cannot* guarantee that a future
operation won't block both because 'select' has no idea what operation is
going to take place in the future and because things can change between now
and then.

> If yes, is it possible that this problem is triggered by a failed UDP
> checksum in the udp_recvmsg() function?
> If yes, can we do something to fix this?

	The bug is in your application. The kernel behavior might be considered
undesirable, but it's your application that is failing to tell the kernel
that it must not block.

	DS



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 22:41                               ` Martijn Sipkema
@ 2004-10-07 21:49                                 ` Chris Friesen
  2004-10-07 22:00                                   ` David S. Miller
  2004-10-07 23:17                                   ` Martijn Sipkema
  0 siblings, 2 replies; 191+ messages in thread
From: Chris Friesen @ 2004-10-07 21:49 UTC (permalink / raw)
  To: Martijn Sipkema
  Cc: hzhong, 'Jean-Sebastien Trottier', 'Linux Kernel',
	'Alan Cox', 'David S. Miller'

Martijn Sipkema wrote:
> From: "Chris Friesen" <cfriesen@nortelnetworks.com>

>>Since we wouldn't be posix compliant anyway in the nonblocking case, we may as 
>>well return EAGAIN--it's the most appropriate.
> 
> 
> No, I don't think so, since POSIX says to return EAGAIN when:
> 
>   The socket's file descriptor is marked O_NONBLOCK and no data is waiting to
>   be received; or MSG_OOB is set and no out-of-band data is available and either
>   the socket's file descriptor is marked O_NONBLOCK or the socket does not
>   support blocking to await out-of-band data

We are discussing the case where the socket is nonblocking and the udp checksum 
is corrupt, right?  (Because in the blocking case select() would verify the 
checksum.)

In this case, select() returns with the socket readable, we call recvmsg() and 
discover the message is corrupt.  At this point we throw away the corrupt 
message, so we now have no data waiting to be received.  We return EAGAIN, and 
userspace goes merrily on its way, handling anything else in its loop, then 
going back to select().

Seems perfectly suitable.

Chris

^ permalink raw reply	[flat|nested] 191+ messages in thread

* mmap specification - was: ... select specification
  2004-10-07 14:50                         ` Alan Cox
@ 2004-10-07 21:58                           ` Andries Brouwer
  2004-10-07 22:17                             ` Chris Wedgwood
  2004-10-07 22:32                             ` Kyle Moffett
  0 siblings, 2 replies; 191+ messages in thread
From: Andries Brouwer @ 2004-10-07 21:58 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

On Thu, Oct 07, 2004 at 03:50:31PM +0100, Alan Cox wrote:

> I'll quote SuSv3 to illustrate the danger of "specifications"
> 
> "The system shall always zero-fill any partial page at the end of an
> object. Further, the system shall never write out any modified portions
> of the last page of an object which are beyond its end. References
> within the address range starting at pa and continuing for len bytes to
> whole pages following the end of an object shall result in delivery of a
> SIGBUS signal."
> 
> Its a mistake, its not apparent if its actually meant something entirely
> different or someone forgot a "not", or what is going on.

What precisely are you thinking of?
You seem to think this is ridiculous.
The way I read this, Linux is compliant, I think.

[I read this as follows: If you mmap a file with MAP_SHARED and modify
the memory at an address so far beyond EOF that it is not in a page
containing stuff from the file, then you get a SIGBUS. -- Linux does this.
Also, that if you modify the memory at an address beyond EOF, then
the file is not modified. -- Again Linux does this.]

Andries

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 21:49                                 ` Chris Friesen
@ 2004-10-07 22:00                                   ` David S. Miller
  2004-10-07 22:24                                     ` Chris Friesen
  2004-10-07 23:19                                     ` Martijn Sipkema
  2004-10-07 23:17                                   ` Martijn Sipkema
  1 sibling, 2 replies; 191+ messages in thread
From: David S. Miller @ 2004-10-07 22:00 UTC (permalink / raw)
  To: Chris Friesen; +Cc: martijn, hzhong, jst1, linux-kernel, alan, davem

On Thu, 07 Oct 2004 15:49:17 -0600
Chris Friesen <cfriesen@nortelnetworks.com> wrote:

> In this case, select() returns with the socket readable, we call recvmsg() and 
> discover the message is corrupt.  At this point we throw away the corrupt 
> message, so we now have no data waiting to be received.  We return EAGAIN, and 
> userspace goes merrily on its way, handling anything else in its loop, then 
> going back to select().

Incorrect.  When the user specifies blocking on the file descriptor
we must give it what it asked for.  -EAGAIN on a blocking file descriptor
is always a bug, in all situations, that's what this code used to do and we
fixed it because it's a bug.

It goes "merrily on its way" if it marks the file descriptor as non-blocking.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: mmap specification - was: ... select specification
  2004-10-07 21:58                           ` mmap specification - was: ... select specification Andries Brouwer
@ 2004-10-07 22:17                             ` Chris Wedgwood
  2004-10-07 22:34                               ` Andries Brouwer
  2004-10-07 22:32                             ` Kyle Moffett
  1 sibling, 1 reply; 191+ messages in thread
From: Chris Wedgwood @ 2004-10-07 22:17 UTC (permalink / raw)
  To: Andries Brouwer; +Cc: Alan Cox, Linux Kernel Mailing List

On Thu, Oct 07, 2004 at 11:58:34PM +0200, Andries Brouwer wrote:

> [I read this as follows: If you mmap a file with MAP_SHARED and
> modify the memory at an address so far beyond EOF that it is not in
> a page containing stuff from the file, then you get a SIGBUS. --
> Linux does this.  Also, that if you modify the memory at an address
> beyond EOF, then the file is not modified. -- Again Linux does
> this.]

consider mmaping a 1-byte file ... you can modify bytes 0..4095.
bytes 1..4095 shouldn't be recorded to disk ideally

at one point one fs did actually store this data and it caused cpp
problems (it would expect to see zeroes and where it didn't it got
upset)

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 23:19                                     ` Martijn Sipkema
@ 2004-10-07 22:24                                       ` David S. Miller
  2004-10-07 22:33                                         ` Alan Curry
                                                           ` (2 more replies)
  0 siblings, 3 replies; 191+ messages in thread
From: David S. Miller @ 2004-10-07 22:24 UTC (permalink / raw)
  To: Martijn Sipkema; +Cc: cfriesen, hzhong, jst1, linux-kernel, alan, davem

On Fri, 8 Oct 2004 00:19:52 +0100
"Martijn Sipkema" <msipkema@sipkema-digital.com> wrote:

> So why not return EIO instead? It would be even better to have select()
> validate the data, but I think returning EIO is better than blocking and most
> likely POSIX compliant.

It's not EIO either, it's "queue empty" which means block.
-EIO is a hard error.  You're trying to find some clever way
to say "try again", but:

1) -EAGAIN is illegal on blocking sockets
2) -EIO is a hard error and does not mean "try again"

Therefore we block, which is the correct thing to do in this
situation.

Applications which wish not to block should (SURPRISE SURPRISE)
use non-blocking sockets, otherwise blocking is ok for them.

I can't believe this thread has lasted this long.  I think people
had cotton in their ears when I mentioned that every single 2.4.x
and 2.6.x existing system out there has this behavior, therefore
even if we changed the behavior some way today people still need
to handle this to work on all existing Linux systems.

Furthermore, returning -EAGAIN or any other kind of "try again"
_DOES_ break applications, that's why we changed it to block instead.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 22:00                                   ` David S. Miller
@ 2004-10-07 22:24                                     ` Chris Friesen
  2004-10-07 22:26                                       ` David S. Miller
  2004-10-07 23:19                                     ` Martijn Sipkema
  1 sibling, 1 reply; 191+ messages in thread
From: Chris Friesen @ 2004-10-07 22:24 UTC (permalink / raw)
  To: David S. Miller; +Cc: martijn, hzhong, jst1, linux-kernel, alan, davem

David S. Miller wrote:
> Chris Friesen <cfriesen@nortelnetworks.com> wrote:

>>In this case, select() returns with the socket readable, we call recvmsg() and 
>>discover the message is corrupt.  At this point we throw away the corrupt 
>>message, so we now have no data waiting to be received.  We return EAGAIN, and 
>>userspace goes merrily on its way, handling anything else in its loop, then 
>>going back to select().

> Incorrect.  When the user specifies blocking on the file descriptor
> we must give it what it asked for.  -EAGAIN on a blocking file descriptor
> is always a bug, in all situations, that's what this code used to do and we
> fixed it because it's a bug.

I believe you misread what I said.  Just before the above quote, I said "We are 
discussing the case where the socket is nonblocking and the udp checksum is 
corrupt, right? "

What I had in mind was that the non-blocking file descriptor have select() 
return without verifying the checksum, and if it was discovered to be bad at 
recvmsg() time, we return EAGAIN.  For a blocking file descriptor, we would 
verify the checksum before returning from select().

That way, the blocking case gets the semantics it expects (although worse 
performance), while the nonblocking case gets full performance as well as 
semantics that it will handle properly.

Chris

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 22:24                                     ` Chris Friesen
@ 2004-10-07 22:26                                       ` David S. Miller
  2004-10-07 22:39                                         ` Chris Friesen
  0 siblings, 1 reply; 191+ messages in thread
From: David S. Miller @ 2004-10-07 22:26 UTC (permalink / raw)
  To: Chris Friesen; +Cc: martijn, hzhong, jst1, linux-kernel, alan, davem

On Thu, 07 Oct 2004 16:24:13 -0600
Chris Friesen <cfriesen@nortelnetworks.com> wrote:

> I believe you misread what I said.  Just before the above quote, I said "We are 
> discussing the case where the socket is nonblocking and the udp checksum is 
> corrupt, right? "

And in that case we return -EAGAIN and always have.

> What I had in mind was that the non-blocking file descriptor have select() 
> return without verifying the checksum, and if it was discovered to be bad at 
> recvmsg() time, we return EAGAIN.

That's what we do.  In net/ipv4/udp.c:udp_recvmsg()

	if (skb->ip_summed==CHECKSUM_UNNECESSARY) {
		err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov,
					      copied);
	} else if (msg->msg_flags&MSG_TRUNC) {
		if (__udp_checksum_complete(skb))
			goto csum_copy_err;
		err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov,
					      copied);
	} else {
		err = skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov);

		if (err == -EINVAL)
			goto csum_copy_err;
	}
 ...
csum_copy_err:
	UDP_INC_STATS_BH(UDP_MIB_INERRORS);

	/* Clear queue. */
	if (flags&MSG_PEEK) {
		int clear = 0;
		spin_lock_irq(&sk->sk_receive_queue.lock);
		if (skb == skb_peek(&sk->sk_receive_queue)) {
			__skb_unlink(skb, &sk->sk_receive_queue);
			clear = 1;
		}
		spin_unlock_irq(&sk->sk_receive_queue.lock);
		if (clear)
			kfree_skb(skb);
	}

	skb_free_datagram(sk, skb);

	if (noblock)
		return -EAGAIN;	
	goto try_again;

If socket is non-blocking, return -EAGAIN, else go back to "try_again"
where we block on packet arrival or error.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: mmap specification - was: ... select specification
  2004-10-07 21:58                           ` mmap specification - was: ... select specification Andries Brouwer
  2004-10-07 22:17                             ` Chris Wedgwood
@ 2004-10-07 22:32                             ` Kyle Moffett
  2004-10-07 22:46                               ` Andries Brouwer
  1 sibling, 1 reply; 191+ messages in thread
From: Kyle Moffett @ 2004-10-07 22:32 UTC (permalink / raw)
  To: Andries Brouwer; +Cc: Linux Kernel Mailing List, Alan Cox

On Oct 07, 2004, at 17:58, Andries Brouwer wrote:
> On Thu, Oct 07, 2004 at 03:50:31PM +0100, Alan Cox wrote:
>> "References
>> within the address range starting at pa and continuing for len bytes 
>> to
>> whole pages following the end of an object shall result in delivery 
>> of a
>> SIGBUS signal."
>
> [I read this as follows: If you mmap a file with MAP_SHARED and modify
> the memory at an address so far beyond EOF that it is not in a page
> containing stuff from the file, then you get a SIGBUS. -- Linux does 
> this.
> Also, that if you modify the memory at an address beyond EOF, then
> the file is not modified. -- Again Linux does this.]

The last bit of the SuS text means:

pa <-- len --> eof <-> page boundary

Anywhere from pa to page boundary will generate SIGBUS.  This is a
rather useless definition no?  I think what they meant is:

"References within the address range starting on the first whole page
at least len bytes after pa shall result in delivery of a SIGBUS 
signal."

(I am assuming, of course, that pa is the result of the mmap call. If 
I'm
wrong please tell me, thanks!)

Cheers,
Kyle Moffett

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/U d- s++: a17 C++++>$ UB/L/X/*++++(+)>$ P+++(++++)>$
L++++(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b++++(++) DI+ D+ G e->++++$ h!*()>++$ r  
!y?(-)
------END GEEK CODE BLOCK------



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 22:24                                       ` David S. Miller
@ 2004-10-07 22:33                                         ` Alan Curry
  2004-10-07 22:42                                         ` Mark Mielke
  2004-10-07 22:46                                         ` Hua Zhong
  2 siblings, 0 replies; 191+ messages in thread
From: Alan Curry @ 2004-10-07 22:33 UTC (permalink / raw)
  To: David S. Miller
  Cc: Martijn Sipkema, cfriesen, hzhong, jst1, linux-kernel, alan, davem

David S. Miller writes the following:
>
>I can't believe this thread has lasted this long.  I think people
>had cotton in their ears when I mentioned that every single 2.4.x
>and 2.6.x existing system out there has this behavior, therefore
>even if we changed the behavior some way today people still need
>to handle this to work on all existing Linux systems.

That argument sounds like "I broke something a few years ago, and nobody
complained quickly enough, so I got away with it." It's not impressive.


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: mmap specification - was: ... select specification
  2004-10-07 22:17                             ` Chris Wedgwood
@ 2004-10-07 22:34                               ` Andries Brouwer
  0 siblings, 0 replies; 191+ messages in thread
From: Andries Brouwer @ 2004-10-07 22:34 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Andries Brouwer, Alan Cox, Linux Kernel Mailing List

On Thu, Oct 07, 2004 at 03:17:45PM -0700, Chris Wedgwood wrote:
> On Thu, Oct 07, 2004 at 11:58:34PM +0200, Andries Brouwer wrote:
> 
> > [I read this as follows: If you mmap a file with MAP_SHARED and
> > modify the memory at an address so far beyond EOF that it is not in
> > a page containing stuff from the file, then you get a SIGBUS. --
> > Linux does this.  Also, that if you modify the memory at an address
> > beyond EOF, then the file is not modified. -- Again Linux does
> > this.]
> 
> consider mmaping a 1-byte file ... you can modify bytes 0..4095.
> bytes 1..4095 shouldn't be recorded to disk ideally
> 
> at one point one fs did actually store this data and it caused cpp
> problems (it would expect to see zeroes and where it didn't it got
> upset)

Is this an answer? Or an anecdote?

[You seem to say anecdotically that at one point in time Linux mmap
was not POSIX compliant, and problems arose.
Alan on the other hand seems to say that POSIX comes with
ridiculous requirements.]

Andries

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 19:31 ` David Schwartz
@ 2004-10-07 22:36   ` Martijn Sipkema
  2004-10-08  0:19     ` David Schwartz
  0 siblings, 1 reply; 191+ messages in thread
From: Martijn Sipkema @ 2004-10-07 22:36 UTC (permalink / raw)
  To: davids, linux-kernel

From: "David Schwartz" <davids@webmaster.com>
> > I have a problem where the sequence of events is as follows:
> >  - application does select() on a UDP socket descriptor
> >  - select returns success with descriptor ready for reading
> >  - application does recvfrom() on this descriptor and this recvfrom()
> >    blocks forever
> 
> POSIX does not require the kernel to predict the future. The only guarantee
> against having a socket operation block is found in non-blocking sockets.

It is one thing to implement select()/recvmsg() in a non POSIX compliant
way; it is another thing to make false claims about that standard. POSIX
_does_ guarantee that a call to recvmsg() does not block after a call
to select().

> > My understanding of POSIX is limited, but it seems to me that a read call
> > must never block after select just said that it's ok to read from the
> > descriptor. So any such behaviour would be a kernel bug.
> 
> Suppose hypothetically that we add a new network protocol that permits the
> sender to 'invalidate' data after it's received by the remote network stack
> and before it's accepted by the remote application. Would you argue that
> 'select'ing must be considered a read in this case? Even though an
> application might 'select' on a socket with no intention to follow up with a
> read? Remember, the 'select' operation is supposed to be protocol neutral.

Consider the data accepted by the remote application at the moment it
calls select().

> > From a brief look at the kernel UDP code, I suspect a problem in
> > net/ipv4/udp.c, udp_recvmsg(): it reads the first available datagram
> > from the queue, then checks the UDP checksum. If the UDP checksum fails at
> > this point, the datagram is discarded and the process blocks
> > until the next
> > datagram arrives.
> 
> You should understand a hit on 'select' to mean that something happened,
> and that it would therefore behoove your application to try the operation it
> wants to perform again. The 'select' operation is not fine-grained enough to
> know what operation you planned, and whether that particular operation would
> block.
> 
> Suppose, for example, that instead of using 'read' you used 'recvmsg', and
> we add an option to 'recvmsg' to allow you to read datagrams with bad
> checksums. What should 'select' do if a datagram is received with a bad
> checksum? It has no idea what flavor of 'recvmsg' you're going to call, so
> it can't know if your operation is going to block or not.

This is all described in detail in the standard.

> > Could someone please help me track this problem?
> > Am I correct in my reasoning that the select() -> recvmsg() sequence must
> > never block?
> 
> No, you are incorrect. Consider, again, a 'recvmsg' flag to allow you to
> receive messages even if they have bad checksums versus one that blocks
> until a message with a valid checksum is received. The 'select' function
> just isn't smart enough.
>
> Consider a 'select' for write on a TCP socket. How does 'select' know how
> many bytes you're going to write? Again, a 'select' hit just indicates
> something relevant has happened, it *cannot* guarantee that a future
> operation won't block both because 'select' has no idea what operation is
> going to take place in the future and because things can change between now
> and then.

You really should read the standard on this..

> > If yes, is it possible that this problem is triggered by a failed UDP
> > checksum in the udp_recvmsg() function?
> > If yes, can we do something to fix this?
> 
> The bug is in your application. The kernel behavior might be considered
> undesirable, but it's your application that is failing to tell the kernel
> that it must not block.

Actually, the application may be correct, but he should change it
anyway since it is unlikely that Linux will follow the standard on select()
anytime soon...

--ms


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 22:26                                       ` David S. Miller
@ 2004-10-07 22:39                                         ` Chris Friesen
  2004-10-07 22:42                                           ` David S. Miller
  0 siblings, 1 reply; 191+ messages in thread
From: Chris Friesen @ 2004-10-07 22:39 UTC (permalink / raw)
  To: David S. Miller; +Cc: martijn, hzhong, jst1, linux-kernel, alan, davem

David S. Miller wrote:
> Chris Friesen <cfriesen@nortelnetworks.com> wrote:

>>What I had in mind was that the non-blocking file descriptor have select() 
>>return without verifying the checksum, and if it was discovered to be bad at 
>>recvmsg() time, we return EAGAIN.
> 
> 
> That's what we do.  In net/ipv4/udp.c:udp_recvmsg()

Yes.  I realize this, and agree with that behaviour.

However, you chopped off what I consider the interesting part of my post.   I 
propose that if we call select() on a blocking file descriptor, we verify the 
checksum before saying that the socket is readable.  Then, at recvmsg() time, if 
it hasn't been checked already we would check it (to allow for the case of 
blocking socket without select()).

This allows for easy porting of apps that expect a blocking recvmsg() after 
select() to always succeed.

Thus, you end up with:

nonblocking socket -- exactly as current
blocking socket without select -- exactly as current
blocking socket with select -- checksum verified before select() returns


Chris

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 18:33                             ` Chris Friesen
@ 2004-10-07 22:41                               ` Martijn Sipkema
  2004-10-07 21:49                                 ` Chris Friesen
  0 siblings, 1 reply; 191+ messages in thread
From: Martijn Sipkema @ 2004-10-07 22:41 UTC (permalink / raw)
  To: Chris Friesen, hzhong
  Cc: 'Jean-Sebastien Trottier', 'Linux Kernel',
	'Alan Cox', 'David S. Miller'

From: "Chris Friesen" <cfriesen@nortelnetworks.com>
> Hua Zhong wrote:
> 
> > It was my original proposal. The only question is to return which error
> > code. We cannot return EAGAIN as Posix explicitly disallows it. Is EIO good?
> > Or some other new error code?
> 
> Since we wouldn't be posix compliant anyway in the nonblocking case, we may as 
> well return EAGAIN--it's the most appropriate.

No, I don't think so, since POSIX says to return EAGAIN when:

  The socket's file descriptor is marked O_NONBLOCK and no data is waiting to
  be received; or MSG_OOB is set and no out-of-band data is available and either
  the socket's file descriptor is marked O_NONBLOCK or the socket does not
  support blocking to await out-of-band data

So, I think returning EIO is probably better; I think that would be POSIX
compliant.


--ms


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 22:39                                         ` Chris Friesen
@ 2004-10-07 22:42                                           ` David S. Miller
  2004-10-07 23:27                                             ` Chris Friesen
  2004-10-08  2:51                                             ` Mark Mielke
  0 siblings, 2 replies; 191+ messages in thread
From: David S. Miller @ 2004-10-07 22:42 UTC (permalink / raw)
  To: Chris Friesen; +Cc: martijn, hzhong, jst1, linux-kernel, alan, davem

On Thu, 07 Oct 2004 16:39:06 -0600
Chris Friesen <cfriesen@nortelnetworks.com> wrote:

> However, you chopped off what I consider the interesting part of my post.   I 
> propose that if we call select() on a blocking file descriptor, we verify the 
> checksum before saying that the socket is readable.  Then, at recvmsg() time, if 
> it hasn't been checked already we would check it (to allow for the case of 
> blocking socket without select()).

So people who improperly use select() with blocking sockets get punished
in a different way, with half the performance compared to today?

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 22:24                                       ` David S. Miller
  2004-10-07 22:33                                         ` Alan Curry
@ 2004-10-07 22:42                                         ` Mark Mielke
  2004-10-07 22:47                                           ` David S. Miller
  2004-10-07 22:46                                         ` Hua Zhong
  2 siblings, 1 reply; 191+ messages in thread
From: Mark Mielke @ 2004-10-07 22:42 UTC (permalink / raw)
  To: David S. Miller
  Cc: Martijn Sipkema, cfriesen, hzhong, jst1, linux-kernel, alan, davem

On Thu, Oct 07, 2004 at 03:24:00PM -0700, David S. Miller wrote:
> I can't believe this thread has lasted this long.  I think people
> had cotton in their ears when I mentioned that every single 2.4.x
> and 2.6.x existing system out there has this behavior, therefore
> even if we changed the behavior some way today people still need
> to handle this to work on all existing Linux systems.

Nah. Some people hate it when their operating system doesn't do what
it is documented to do. These people include me.

We all know you should use non-blocking sockets for the application
domain we are talking about (one's that use select() / poll() to
determine whether data is available). Sometimes, it doesn't work.
When, you ask? When you are using a higher level language, such as a
version of Perl that didn't provide IO::blocking(), and on some
operating systems, such as HP-UX, it wasn't possible to portably
enable O_NONBLOCK for your sockets. We're talking practice here, so
I'm talking a real live example. Sure, it's nice to demand that people
upgrade to a later version of Perl. Guess what? It isn't happening. It
will be another year or two before we can guarantee people have Perl
5.006 on their system.

Anyways, I'm suspecting that the occurrences of these failures are so
low that they have been either un-reproducable, or people haven't even
noticed. Sometimes a blocking read happens to be ok - you are expecting
data, and eventually it will come. Then, the select() / poll() starts
up again, and the application continues on. For all the observer knows,
there was a load spike, and the application wasn't given enough cpu
seconds to process the request.

> Furthermore, returning -EAGAIN or any other kind of "try again"
> _DOES_ break applications, that's why we changed it to block instead.

This is good. The bad part is select() returning "data available", when
the information it is basing its decision on, is invalid, but it hasn't
taken the resources to prove yet. It's wrong. It's acceptable for
O_NONBLOCK, as no properly implemented application that uses O_NONBLOCK
should fail from seeing O_NONBLOCK in these cases.

Applications that do not set O_NONBLOCK have different expectations.
Blocking after select() says data is ready is wrong. If we want to say
'yes, it is wrong, but it's a corner case, too complicated to work around
and we're assuming that people are using O_NONBLOCK', that is fine. I just
want you to say it. :-)

I don't really care to hear "it's right, because that's how I coded it, and
that's how fast I want it to run."

Cheers,
mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 191+ messages in thread

* RE: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 22:24                                       ` David S. Miller
  2004-10-07 22:33                                         ` Alan Curry
  2004-10-07 22:42                                         ` Mark Mielke
@ 2004-10-07 22:46                                         ` Hua Zhong
  2004-10-07 22:48                                           ` David S. Miller
  2 siblings, 1 reply; 191+ messages in thread
From: Hua Zhong @ 2004-10-07 22:46 UTC (permalink / raw)
  To: 'David S. Miller', 'Martijn Sipkema'
  Cc: cfriesen, jst1, linux-kernel, alan, davem

> I can't believe this thread has lasted this long.

The reason is that you haven't just admitted very clearly that 
"Linux select isn't Posix compliant and it was a design decision 
not to do so for performance reasons". I think this kind of 
authorative answer would shut up many people. :-)

> I think people had cotton in their ears when I mentioned 
> that every single 2.4.x and 2.6.x existing system out there 
> has this behavior, therefore even if we changed the behavior 
> some way today people still need to handle this to work on 
> all existing Linux systems.

Unfortunately this isn't the best argument..

I think most people just hope Linux would follow the standard.
The old Linux threads weren't posix-compliant either for years,
and people still fixed (most of?) it in 2.6. By your argument
that would not have happened.

Hua


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: mmap specification - was: ... select specification
  2004-10-07 22:32                             ` Kyle Moffett
@ 2004-10-07 22:46                               ` Andries Brouwer
  2004-10-07 23:30                                 ` Kyle Moffett
  0 siblings, 1 reply; 191+ messages in thread
From: Andries Brouwer @ 2004-10-07 22:46 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: Andries Brouwer, Linux Kernel Mailing List, Alan Cox

On Thu, Oct 07, 2004 at 06:32:43PM -0400, Kyle Moffett wrote:

>>>"References within the address range starting at pa and continuing
>>> for len bytes to whole pages following the end of an object shall
>>> result in delivery of a SIGBUS signal."
> 
> The last bit of the SuS text means:
> 
> pa <-- len --> eof <-> page boundary
> 
> Anywhere from pa to page boundary will generate SIGBUS.

The POSIX text is clear to me, and Linux is compliant.
On the other hand, I have no idea what you try to say.

Andries

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 22:42                                         ` Mark Mielke
@ 2004-10-07 22:47                                           ` David S. Miller
  2004-10-07 23:00                                             ` Mark Mielke
  2004-10-08  0:37                                             ` Lee Revell
  0 siblings, 2 replies; 191+ messages in thread
From: David S. Miller @ 2004-10-07 22:47 UTC (permalink / raw)
  To: Mark Mielke; +Cc: msipkema, cfriesen, hzhong, jst1, linux-kernel, alan, davem

On Thu, 7 Oct 2004 18:42:42 -0400
Mark Mielke <mark@mark.mielke.cc> wrote:

> Sure, it's nice to demand that people
> upgrade to a later version of Perl. Guess what? It isn't happening. It
> will be another year or two before we can guarantee people have Perl
> 5.006 on their system.

If those people are tepid about upgrading perl, I think it would
be even less likely that they would upgrade their kernels.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 22:46                                         ` Hua Zhong
@ 2004-10-07 22:48                                           ` David S. Miller
  0 siblings, 0 replies; 191+ messages in thread
From: David S. Miller @ 2004-10-07 22:48 UTC (permalink / raw)
  To: hzhong; +Cc: msipkema, cfriesen, jst1, linux-kernel, alan, davem

On Thu, 7 Oct 2004 15:46:23 -0700
"Hua Zhong" <hzhong@cisco.com> wrote:

> The reason is that you haven't just admitted very clearly that 
> "Linux select isn't Posix compliant and it was a design decision 
> not to do so for performance reasons".

It is, happy now? :-)

I never claimed it to be POSIX compliant.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 22:47                                           ` David S. Miller
@ 2004-10-07 23:00                                             ` Mark Mielke
  2004-10-07 23:07                                               ` David S. Miller
  2004-10-08  6:10                                               ` Theodore Ts'o
  2004-10-08  0:37                                             ` Lee Revell
  1 sibling, 2 replies; 191+ messages in thread
From: Mark Mielke @ 2004-10-07 23:00 UTC (permalink / raw)
  To: David S. Miller
  Cc: msipkema, cfriesen, hzhong, jst1, linux-kernel, alan, davem

On Thu, Oct 07, 2004 at 03:47:22PM -0700, David S. Miller wrote:
> On Thu, 7 Oct 2004 18:42:42 -0400
> Mark Mielke <mark@mark.mielke.cc> wrote:
> > Sure, it's nice to demand that people
> > upgrade to a later version of Perl. Guess what? It isn't happening. It
> > will be another year or two before we can guarantee people have Perl
> > 5.006 on their system.
> If those people are tepid about upgrading perl, I think it would
> be even less likely that they would upgrade their kernels.

Good practical point for the here and now. :-)

The discussion, though, is more about what it should have been.
The combined frustrations of many of us.

To colour the discussion a bit, many of us have had these same
frustrations with Sun, and HP on various issues.

Just say "it's a bug, but one we have chosen not to fix for practical
reasons." That would have kept me out of this discussion. Saying the
behaviour is correct and that POSIX is wrong - that raises hairs -
both the question kind, and the concern kind.

Cheers,
mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 23:00                                             ` Mark Mielke
@ 2004-10-07 23:07                                               ` David S. Miller
  2004-10-08  6:10                                               ` Theodore Ts'o
  1 sibling, 0 replies; 191+ messages in thread
From: David S. Miller @ 2004-10-07 23:07 UTC (permalink / raw)
  To: Mark Mielke; +Cc: msipkema, cfriesen, hzhong, jst1, linux-kernel, alan, davem

On Thu, 7 Oct 2004 19:00:19 -0400
Mark Mielke <mark@mark.mielke.cc> wrote:

> Just say "it's a bug, but one we have chosen not to fix for practical
> reasons." That would have kept me out of this discussion. Saying the
> behaviour is correct and that POSIX is wrong - that raises hairs -
> both the question kind, and the concern kind.

If anything, I'm saying we're not POSIX compliant in this case
by choice.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 21:49                                 ` Chris Friesen
  2004-10-07 22:00                                   ` David S. Miller
@ 2004-10-07 23:17                                   ` Martijn Sipkema
  1 sibling, 0 replies; 191+ messages in thread
From: Martijn Sipkema @ 2004-10-07 23:17 UTC (permalink / raw)
  To: Chris Friesen
  Cc: hzhong, 'Jean-Sebastien Trottier', 'Linux Kernel',
	'Alan Cox', 'David S. Miller'


From: "Chris Friesen" <cfriesen@nortelnetworks.com>
> Martijn Sipkema wrote:
> > From: "Chris Friesen" <cfriesen@nortelnetworks.com>
> 
> >>Since we wouldn't be posix compliant anyway in the nonblocking case, we may as 
> >>well return EAGAIN--it's the most appropriate.
> > 
> > 
> > No, I don't think so, since POSIX says to return EAGAIN when:
> > 
> >   The socket's file descriptor is marked O_NONBLOCK and no data is waiting to
> >   be received; or MSG_OOB is set and no out-of-band data is available and either
> >   the socket's file descriptor is marked O_NONBLOCK or the socket does not
> >   support blocking to await out-of-band data
> 
> We are discussing the case where the socket is nonblocking and the udp checksum 
> is corrupt, right?  (Because in the blocking case select() would verify the 
> checksum.)
> 
> In this case, select() returns with the socket readable, we call recvmsg() and 
> discover the message is corrupt.  At this point we throw away the corrupt 
> message, so we now have no data waiting to be received.  We return EAGAIN, and 
> userspace goes merrily on its way, handling anything else in its loop, then 
> going back to select().
> 
> Seems perfectly suitable.

Oh, I thought it was about the case where select() does not check the data and
a blocking socket is used and I would think EIO better in that case. But shouldn't
a nonblocking socket return EIO also, since the blocking socket would not in
fact block?

--ms


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 22:00                                   ` David S. Miller
  2004-10-07 22:24                                     ` Chris Friesen
@ 2004-10-07 23:19                                     ` Martijn Sipkema
  2004-10-07 22:24                                       ` David S. Miller
  1 sibling, 1 reply; 191+ messages in thread
From: Martijn Sipkema @ 2004-10-07 23:19 UTC (permalink / raw)
  To: David S. Miller, Chris Friesen; +Cc: hzhong, jst1, linux-kernel, alan, davem

From: "David S. Miller" <davem@davemloft.net>
> On Thu, 07 Oct 2004 15:49:17 -0600
> Chris Friesen <cfriesen@nortelnetworks.com> wrote:
> 
> > In this case, select() returns with the socket readable, we call recvmsg() and 
> > discover the message is corrupt.  At this point we throw away the corrupt 
> > message, so we now have no data waiting to be received.  We return EAGAIN, and 
> > userspace goes merrily on its way, handling anything else in its loop, then 
> > going back to select().
> 
> Incorrect.  When the user specifies blocking on the file descriptor
> we must give it what it asked for.  -EAGAIN on a blocking file descriptor
> is always a bug, in all situations, that's what this code used to do and we
> fixed it because it's a bug.

So why not return EIO instead? It would be even better to have select()
validate the data, but I think returning EIO is better than blocking and most
likely POSIX compliant.


--ms


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 22:42                                           ` David S. Miller
@ 2004-10-07 23:27                                             ` Chris Friesen
  2004-10-08  0:04                                               ` Ben Greear
  2004-10-08  2:51                                             ` Mark Mielke
  1 sibling, 1 reply; 191+ messages in thread
From: Chris Friesen @ 2004-10-07 23:27 UTC (permalink / raw)
  To: David S. Miller; +Cc: martijn, hzhong, jst1, linux-kernel, alan, davem

David S. Miller wrote:

> So people who improperly use select() with blocking sockets get punished
> in a different way, with half the performance compared to today?

Yes.  Rather than having an app that doesn't work at all in the case of 
corrupted packets, they get half the performance.

Chris

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: mmap specification - was: ... select specification
  2004-10-07 22:46                               ` Andries Brouwer
@ 2004-10-07 23:30                                 ` Kyle Moffett
  2004-10-08  9:19                                   ` Andries Brouwer
  0 siblings, 1 reply; 191+ messages in thread
From: Kyle Moffett @ 2004-10-07 23:30 UTC (permalink / raw)
  To: Andries Brouwer; +Cc: Linux Kernel Mailing List, Alan Cox

On Oct 07, 2004, at 18:46, Andries Brouwer wrote:
> The POSIX text is clear to me, and Linux is compliant.
> On the other hand, I have no idea what you try to say.

> On Thu, Oct 07, 2004 at 06:32:43PM -0400, Kyle Moffett wrote:
>
>>>> "References within the address range starting at pa and continuing
>>>> for len bytes to whole pages following the end of an object shall
>>>> result in delivery of a SIGBUS signal."

Reviewing this once more:

> References within the address range starting at pa and continuing for
> len bytes:
range = {pa ... pa+len};

> To whole pages following the end of an object:
range = {pa ... PAGE_ROUND_UP(pa+len)};

> shall result in delivery of a SIGBUS signal:
pa[ range[n] ]; => SIGBUS


This is clearly not what is meant

Cheers,
Kyle Moffett

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/U d- s++: a17 C++++>$ UB/L/X/*++++(+)>$ P+++(++++)>$
L++++(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b++++(++) DI+ D+ G e->++++$ h!*()>++$ r  
!y?(-)
------END GEEK CODE BLOCK------



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 23:27                                             ` Chris Friesen
@ 2004-10-08  0:04                                               ` Ben Greear
  0 siblings, 0 replies; 191+ messages in thread
From: Ben Greear @ 2004-10-08  0:04 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel

Can't a lot of NICs do UDP checksum in hardware, basically for free?

At least in this case there would be no penalty for select()
looking for the corrupted packet?

Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 191+ messages in thread

* RE: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 22:36   ` Martijn Sipkema
@ 2004-10-08  0:19     ` David Schwartz
  2004-10-09 19:21       ` Martijn Sipkema
  0 siblings, 1 reply; 191+ messages in thread
From: David Schwartz @ 2004-10-08  0:19 UTC (permalink / raw)
  To: martijn, linux-kernel


> > POSIX does not require the kernel to predict the future. The
> > only guarantee
> > against having a socket operation block is found in
> > non-blocking sockets.

> It is one thing to implement select()/recvmsg() in a non POSIX compliant
> way; it is another thing to make false claims about that standard. POSIX
> _does_ guarantee that a call to recvmsg() does not block after a call
> to select().

	I do not believe this.

> > Suppose, for example, that instead of using 'read' you used
> > 'recvmsg', and
> > we add an option to 'recvmsg' to allow you to read datagrams with bad
> > checksums. What should 'select' do if a datagram is received with a bad
> > checksum? It has no idea what flavor of 'recvmsg' you're going
> > to call, so
> > it can't know if your operation is going to block or not.

> This is all described in detail in the standard.

	Where, specifically, does the standard guarantee that a subsequent call to
'recvmsg' will not block?

> > No, you are incorrect. Consider, again, a 'recvmsg' flag to allow you to
> > receive messages even if they have bad checksums versus one that blocks
> > until a message with a valid checksum is received. The 'select' function
> > just isn't smart enough.
> >
> > Consider a 'select' for write on a TCP socket. How does
> > 'select' know how
> > many bytes you're going to write? Again, a 'select' hit just indicates
> > something relevant has happened, it *cannot* guarantee that a future
> > operation won't block both because 'select' has no idea what
> > operation is
> > going to take place in the future and because things can change
> > between now
> > and then.

> You really should read the standard on this..

	I have. We obviously disagree on what it says. Since you're the one
claiming a guarantee that I claim does not exist, perhaps you could cite
where you think this guarantee appears.

	DS



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 22:47                                           ` David S. Miller
  2004-10-07 23:00                                             ` Mark Mielke
@ 2004-10-08  0:37                                             ` Lee Revell
  1 sibling, 0 replies; 191+ messages in thread
From: Lee Revell @ 2004-10-08  0:37 UTC (permalink / raw)
  To: David S. Miller
  Cc: Mark Mielke, msipkema, cfriesen, hzhong, jst1, linux-kernel, alan, davem

On Thu, 2004-10-07 at 18:47, David S. Miller wrote:
> On Thu, 7 Oct 2004 18:42:42 -0400
> Mark Mielke <mark@mark.mielke.cc> wrote:
> 
> > Sure, it's nice to demand that people
> > upgrade to a later version of Perl. Guess what? It isn't happening. It
> > will be another year or two before we can guarantee people have Perl
> > 5.006 on their system.
> 
> If those people are tepid about upgrading perl, I think it would
> be even less likely that they would upgrade their kernels.

Not true.  If you upgrade the kernel you may incur a little downtime. 
Upgrade perl and you potentially break 1000's of customer CGI scripts.

Lee


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 22:42                                           ` David S. Miller
  2004-10-07 23:27                                             ` Chris Friesen
@ 2004-10-08  2:51                                             ` Mark Mielke
  2004-10-08  3:39                                               ` David S. Miller
  1 sibling, 1 reply; 191+ messages in thread
From: Mark Mielke @ 2004-10-08  2:51 UTC (permalink / raw)
  To: David S. Miller
  Cc: Chris Friesen, martijn, hzhong, jst1, linux-kernel, alan, davem

On Thu, Oct 07, 2004 at 03:42:04PM -0700, David S. Miller wrote:
> On Thu, 07 Oct 2004 16:39:06 -0600
> Chris Friesen <cfriesen@nortelnetworks.com> wrote:
> > However, you chopped off what I consider the interesting part of my
> > post.  I propose that if we call select() on a blocking file
> > descriptor, we verify the checksum before saying that the socket is
> > readable.  Then, at recvmsg() time, if it hasn't been checked
> > already we would check it (to allow for the case of blocking socket
> > without select()).
> So people who improperly use select() with blocking sockets get punished
> in a different way, with half the performance compared to today?

No. People who use select() and read() in the *other* documented
manner, however ill-advised, would see the expected, and at least from
the perspective of myself and a few other people around here, correct
behaviour. How much it costs to implement the correct behaviour is a
different concern, and perhaps one these people should have considered
when determining whether or not to use O_NONBLOCK...

Your position, I believe has been that the use of select() on a blocking
file descriptor is invalid. Taking this to an extreme, select() should
return EBADF or something to that effect, when passed a file descriptor
that does not have O_NONBLOCK set...

Cheers,
mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-08  2:51                                             ` Mark Mielke
@ 2004-10-08  3:39                                               ` David S. Miller
  2004-10-08  3:48                                                 ` Mark Mielke
  0 siblings, 1 reply; 191+ messages in thread
From: David S. Miller @ 2004-10-08  3:39 UTC (permalink / raw)
  To: Mark Mielke; +Cc: cfriesen, martijn, hzhong, jst1, linux-kernel, alan, davem

On Thu, 7 Oct 2004 22:51:48 -0400
Mark Mielke <mark@mark.mielke.cc> wrote:

> Your position, I believe has been that the use of select() on a blocking
> file descriptor is invalid.

Incorrect.

My position is that expecting a blocking file descriptor not to
block is invalid.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-08  3:39                                               ` David S. Miller
@ 2004-10-08  3:48                                                 ` Mark Mielke
  2004-10-08  3:59                                                   ` David S. Miller
  0 siblings, 1 reply; 191+ messages in thread
From: Mark Mielke @ 2004-10-08  3:48 UTC (permalink / raw)
  To: David S. Miller
  Cc: cfriesen, martijn, hzhong, jst1, linux-kernel, alan, davem

On Thu, Oct 07, 2004 at 08:39:43PM -0700, David S. Miller wrote:
> On Thu, 7 Oct 2004 22:51:48 -0400
> Mark Mielke <mark@mark.mielke.cc> wrote:
> > Your position, I believe has been that the use of select() on a blocking
> > file descriptor is invalid.
> Incorrect.
> My position is that expecting a blocking file descriptor not to
> block is invalid.

Extrapolated, this would be - use of select() on a blocking file descriptor
is invalid.

Otherwise, what would be the point of using select() only to accidentally
block a small percentage of the time that one couldn't predict?

Cheers,
mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-08  3:48                                                 ` Mark Mielke
@ 2004-10-08  3:59                                                   ` David S. Miller
  0 siblings, 0 replies; 191+ messages in thread
From: David S. Miller @ 2004-10-08  3:59 UTC (permalink / raw)
  To: Mark Mielke; +Cc: cfriesen, martijn, hzhong, jst1, linux-kernel, alan, davem

On Thu, 7 Oct 2004 23:48:48 -0400
Mark Mielke <mark@mark.mielke.cc> wrote:

> Extrapolated, this would be - use of select() on a blocking file descriptor
> is invalid.

It's valid, but it's asking for trouble.  It is going to block on
you under certain circumstances.


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 23:00                                             ` Mark Mielke
  2004-10-07 23:07                                               ` David S. Miller
@ 2004-10-08  6:10                                               ` Theodore Ts'o
  2004-10-08 15:20                                                 ` Mark Mielke
  1 sibling, 1 reply; 191+ messages in thread
From: Theodore Ts'o @ 2004-10-08  6:10 UTC (permalink / raw)
  To: David S. Miller, msipkema, cfriesen, hzhong, jst1, linux-kernel,
	alan, davem

On Thu, Oct 07, 2004 at 07:00:19PM -0400, Mark Mielke wrote:
> 
> Just say "it's a bug, but one we have chosen not to fix for practical
> reasons." That would have kept me out of this discussion. Saying the
> behaviour is correct and that POSIX is wrong - that raises hairs -
> both the question kind, and the concern kind.

Why?  POSIX have gotten *lots* of things wrong in the past.  

For example, using 512 byte units for df and du (which we ignore, and
for which the POSIX will hopefully eventually catch up with sanity)
and fcntl unlocking semantics (which we adhere to despite the fact
that is broken beyond belief, and very likely will and will continue
to cause application bugs in the feature).  What we do when POSIX does
something idiotic is something that has to be addressed on a
case-by-case basis.

						- Ted

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07  0:29               ` David S. Miller
  2004-10-07 10:56                 ` Martijn Sipkema
@ 2004-10-08  6:41                 ` Willy Tarreau
  2004-10-08 15:27                   ` Mark Mielke
  2004-10-15 22:42                   ` Robert White
  1 sibling, 2 replies; 191+ messages in thread
From: Willy Tarreau @ 2004-10-08  6:41 UTC (permalink / raw)
  To: David S. Miller; +Cc: Olivier Galibert, linux-kernel

Hi David,

On Wed, Oct 06, 2004 at 05:29:59PM -0700, David S. Miller wrote:
> It absolutely does help the programs not using select(), using
> blocking sockets, and not expecting -EAGAIN.

As I asked in a previous mail in this overly long thread, why not returning
zero bytes at all. It is perfectly valid to receive an UDP packet with 0
data bytes, and any application should be able to support this case anyway.
In case of TCP, this would be a problem because the app would think it
indicates the last byte has been received. But in case of UDP, there is no
problem.

BTW, could we enumerate the known cases where select() might report an FD
as readable while finally not ? If there are only very few cases which can
all be worked around at nearly no cost, it might be worth doing it, or at
least documenting them. From this thread, I gathered :

   1) multi-threaded apps -> broken design anyway, and should be obvious.
      at most, the case can be documented
   2) bad UDP checksums -> this is currently the object of this thread
   3) packet dropped because of buffer size, load, etc... (not confirmed)
   4) others ? any TCP/unix socket/pipe known case ? 

Regards,
Willy


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: mmap specification - was: ... select specification
  2004-10-07 23:30                                 ` Kyle Moffett
@ 2004-10-08  9:19                                   ` Andries Brouwer
  2004-10-09 21:10                                     ` Martijn Sipkema
  0 siblings, 1 reply; 191+ messages in thread
From: Andries Brouwer @ 2004-10-08  9:19 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: Andries Brouwer, Linux Kernel Mailing List, Alan Cox

On Thu, Oct 07, 2004 at 07:30:53PM -0400, Kyle Moffett wrote:
> On Oct 07, 2004, at 18:46, Andries Brouwer wrote:
> >The POSIX text is clear to me, and Linux is compliant.
> >On the other hand, I have no idea what you try to say.
> 
> >On Thu, Oct 07, 2004 at 06:32:43PM -0400, Kyle Moffett wrote:
> >
> >>>>"References within the address range starting at pa and continuing
> >>>>for len bytes to whole pages following the end of an object shall
> >>>>result in delivery of a SIGBUS signal."
> 
> Reviewing this once more:
> 
> >References within the address range starting at pa and continuing for
> >len bytes:
> range = {pa ... pa+len};
> 
> >To whole pages following the end of an object:
> range = {pa ... PAGE_ROUND_UP(pa+len)};

It is here you are wrong.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-08  6:10                                               ` Theodore Ts'o
@ 2004-10-08 15:20                                                 ` Mark Mielke
  0 siblings, 0 replies; 191+ messages in thread
From: Mark Mielke @ 2004-10-08 15:20 UTC (permalink / raw)
  To: Theodore Ts'o, linux-kernel

On Fri, Oct 08, 2004 at 02:10:52AM -0400, Theodore Ts'o wrote:
> On Thu, Oct 07, 2004 at 07:00:19PM -0400, Mark Mielke wrote:
> > Just say "it's a bug, but one we have chosen not to fix for practical
> > reasons." That would have kept me out of this discussion. Saying the
> > behaviour is correct and that POSIX is wrong - that raises hairs -
> > both the question kind, and the concern kind.
> Why?  POSIX have gotten *lots* of things wrong in the past.  
> [ non-relevant complaints about POSIX ]
> What we do when POSIX does
> something idiotic is something that has to be addressed on a
> case-by-case basis.

In this case, POSIX defines select() / blocking read() to be useful.
Linux defines it to be dangerous.

I have no question in my mind which behaviour is 'correct', in this
case. Deciding between something that works, and something that doesn't,
is a no brainer for me. Talking about performance, and so on, is just a
complete distraction. Who cares about performance when a percentage of
the time the caller will be in a confused state as a result?

I'm ok with case-by-case. I'm not ok with a generic "POSIX sucks lots -
why should we be POSIX compliant?"

Cheers,
mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-08  6:41                 ` Willy Tarreau
@ 2004-10-08 15:27                   ` Mark Mielke
  2004-10-15 22:42                   ` Robert White
  1 sibling, 0 replies; 191+ messages in thread
From: Mark Mielke @ 2004-10-08 15:27 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: David S. Miller, Olivier Galibert, linux-kernel

On Fri, Oct 08, 2004 at 08:41:04AM +0200, Willy Tarreau wrote:
> On Wed, Oct 06, 2004 at 05:29:59PM -0700, David S. Miller wrote:
> > It absolutely does help the programs not using select(), using
> > blocking sockets, and not expecting -EAGAIN.
> As I asked in a previous mail in this overly long thread, why not returning
> zero bytes at all. It is perfectly valid to receive an UDP packet with 0
> data bytes, and any application should be able to support this case anyway.
> In case of TCP, this would be a problem because the app would think it
> indicates the last byte has been received. But in case of UDP, there is no
> problem.

0 isn't correct either. No zero length packet was successfully received.

I agree with the current read() behaviour. It's the select() behaviour
that I consider to be wrong. Patching return values is just a hacky way
of avoiding the issue.

The issue can be more easily avoided by saying 'the Linux developers
believe that the use of select() with blocking file descriptors is
invalid or not recommended, and have chosen not to ensure that this
use of system calls is reliable'. "We're not POSIX compliant in this
case" isn't good enough for me. One acknowledges the issue. The other
ignores it.

Cheers,
mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 191+ messages in thread

* RE: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-09 19:21       ` Martijn Sipkema
@ 2004-10-09 18:28         ` David Schwartz
  2004-10-09 18:49           ` Mark Mielke
  0 siblings, 1 reply; 191+ messages in thread
From: David Schwartz @ 2004-10-09 18:28 UTC (permalink / raw)
  To: martijn, linux-kernel


> [...]
> > Where, specifically, does the standard guarantee that a
> > subsequent call to
> > 'recvmsg' will not block?
>
> When using select() on a socket for reading, select will block until
> that socket is ready.
>
> According to POSIX:
>
>   A descriptor shall be considered ready for reading when a call to an
>   input function with O_NONBLOCK clear would not block, whether or not
>   the function would transfer data successfully.

	Note that it says *would* not block, not *will* not block. The definition
of the word "would" is "an expression of probability or likelihood" (or a
"presumption or expectation"). This is *not* a guarantee.

> and
>
>   If a descriptor refers to a socket, the implied input function is the
>   recvmsg() function with parameters requesting normal and ancillary data,
>   such that the presence of either type shall cause the socket to
>   be marked
>   as readable. The presence of out-of-band data shall be checked if the
>   socket option SO_OOBINLINE has been enabled, as out-of-band data is
>   enqueued with normal data. If the socket is currently listening, then it
>   shall be marked as readable if an incoming connection request has been
>   received, and a call to the accept() function shall complete without
>   blocking.
>
> Thus recvmsg() shouldn't in any case block after a select() on a socket.

	I don't draw that conclusion from that paragraph. It does say the presence
of normal data shall mark the socket readable, but it doesn't require the
kernel to keep that data available, at least not as far as I can see.

	As far as I can tell, neither of these two excerpts prohibit an
implementation from, for example, discarding UDP data (say, to save memory)
after it triggered a read hit on a 'select' call. Yes, the 'recvmsg' call
would not have blocked, had it been made at the time the data was available.

	DS



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-09 18:28         ` David Schwartz
@ 2004-10-09 18:49           ` Mark Mielke
  2004-10-09 21:00             ` Martijn Sipkema
  0 siblings, 1 reply; 191+ messages in thread
From: Mark Mielke @ 2004-10-09 18:49 UTC (permalink / raw)
  To: David Schwartz; +Cc: martijn, linux-kernel

On Sat, Oct 09, 2004 at 11:28:28AM -0700, David Schwartz wrote:
> > > Where, specifically, does the standard guarantee that a
> > > subsequent call to
> > > 'recvmsg' will not block?
> > When using select() on a socket for reading, select will block until
> > that socket is ready.
> > According to POSIX:
> >   A descriptor shall be considered ready for reading when a call to an
> >   input function with O_NONBLOCK clear would not block, whether or not
> >   the function would transfer data successfully.
> Note that it says *would* not block, not *will* not block. The definition
> of the word "would" is "an expression of probability or likelihood" (or a
> "presumption or expectation"). This is *not* a guarantee.

Are you sure you aren't confusing 'would' with 'should'?

Would and will are the same, except in terms of time.

This is ridiculous. Everybody in this discussion *knows* that the
existing behaviour is broken. As another poster appeared to show, can
be proven to be usable as a denial of service attack against any
application that doesn't use O_NONBLOCK for UDP packets under Linux.

Please - people who don't agree, just ensure that Linux is documented
to not implement select() on sockets without O_NONBLOCK properly. No
more silly excuses. 'Would' vs 'will' meaning 'probably'... sheesh...

> >   If a descriptor refers to a socket, the implied input function is the
> >   recvmsg() function with parameters requesting normal and ancillary data,
> >   such that the presence of either type shall cause the socket to
> >   be marked
> >   as readable. The presence of out-of-band data shall be checked if the
> >   socket option SO_OOBINLINE has been enabled, as out-of-band data is
> >   enqueued with normal data. If the socket is currently listening, then it
> >   shall be marked as readable if an incoming connection request has been
> >   received, and a call to the accept() function shall complete without
> >   blocking.
> > Thus recvmsg() shouldn't in any case block after a select() on a socket.
> I don't draw that conclusion from that paragraph. It does say the presence
> of normal data shall mark the socket readable, but it doesn't require the
> kernel to keep that data available, at least not as far as I can see.

The data was *never* available. select() lied.

> 	As far as I can tell, neither of these two excerpts prohibit an
> implementation from, for example, discarding UDP data (say, to save memory)
> after it triggered a read hit on a 'select' call.

Your reading let's you have a broken system call interface, and declare that
it is acceptable. Why? What is the purpose of this position? Who does it
benefit?

> Yes, the 'recvmsg' call
> would not have blocked, had it been made at the time the data was available.

Wrong. The data was never available. If the select() was replaced by
recvmsg() it most certainly *would* have blocked. Therefore, select()
should not have said 'data is ready'.

If this understanding isn't clear, I begin to seriously worry about
the competency of a few people...

It's ok to say "we chose to leave it broken because we feel O_NONBLOCK
should be mandatory". Nobody with any sort of authority seems to want
to say this. They would prefer to talk about "POSIX compliancy" as if
the issue was theoretical and irrelevant.

It is disconcerting, to say the least.

Cheers,
mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-08  0:19     ` David Schwartz
@ 2004-10-09 19:21       ` Martijn Sipkema
  2004-10-09 18:28         ` David Schwartz
  0 siblings, 1 reply; 191+ messages in thread
From: Martijn Sipkema @ 2004-10-09 19:21 UTC (permalink / raw)
  To: davids, linux-kernel

From: "David Schwartz" <davids@webmaster.com>
> > > POSIX does not require the kernel to predict the future. The
> > > only guarantee
> > > against having a socket operation block is found in
> > > non-blocking sockets.
> 
> > It is one thing to implement select()/recvmsg() in a non POSIX compliant
> > way; it is another thing to make false claims about that standard. POSIX
> > _does_ guarantee that a call to recvmsg() does not block after a call
> > to select().
> 
> I do not believe this.
> 
[...]
> Where, specifically, does the standard guarantee that a subsequent call to
> 'recvmsg' will not block?

When using select() on a socket for reading, select will block until
that socket is ready.

According to POSIX:

  A descriptor shall be considered ready for reading when a call to an
  input function with O_NONBLOCK clear would not block, whether or not
  the function would transfer data successfully.

and

  If a descriptor refers to a socket, the implied input function is the
  recvmsg() function with parameters requesting normal and ancillary data,
  such that the presence of either type shall cause the socket to be marked
  as readable. The presence of out-of-band data shall be checked if the
  socket option SO_OOBINLINE has been enabled, as out-of-band data is
  enqueued with normal data. If the socket is currently listening, then it
  shall be marked as readable if an incoming connection request has been
  received, and a call to the accept() function shall complete without
  blocking.

Thus recvmsg() shouldn't in any case block after a select() on a socket.


--ms




^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-09 18:49           ` Mark Mielke
@ 2004-10-09 21:00             ` Martijn Sipkema
  2004-10-09 22:59               ` Mark Mielke
  0 siblings, 1 reply; 191+ messages in thread
From: Martijn Sipkema @ 2004-10-09 21:00 UTC (permalink / raw)
  To: Mark Mielke, David Schwartz; +Cc: linux-kernel

[...]
> Please - people who don't agree, just ensure that Linux is documented
> to not implement select() on sockets without O_NONBLOCK properly.

Actually, the behaviour isn't correct for sockets with O_NONBLOCK
either, since EAGAIN may only be returned when recvmsg() would not
block without O_NONBLOCK.


--ms



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: mmap specification - was: ... select specification
  2004-10-08  9:19                                   ` Andries Brouwer
@ 2004-10-09 21:10                                     ` Martijn Sipkema
  0 siblings, 0 replies; 191+ messages in thread
From: Martijn Sipkema @ 2004-10-09 21:10 UTC (permalink / raw)
  To: Andries Brouwer, Kyle Moffett
  Cc: Andries Brouwer, Linux Kernel Mailing List, Alan Cox

From: "Andries Brouwer" <aebr@win.tue.nl>
> On Thu, Oct 07, 2004 at 07:30:53PM -0400, Kyle Moffett wrote:
> > On Oct 07, 2004, at 18:46, Andries Brouwer wrote:
> > >The POSIX text is clear to me, and Linux is compliant.
> > >On the other hand, I have no idea what you try to say.
> > 
> > >On Thu, Oct 07, 2004 at 06:32:43PM -0400, Kyle Moffett wrote:
> > >
> > >>>>"References within the address range starting at pa and continuing
> > >>>>for len bytes to whole pages following the end of an object shall
> > >>>>result in delivery of a SIGBUS signal."
> > 
> > Reviewing this once more:
> > 
> > >References within the address range starting at pa and continuing for
> > >len bytes:
> > range = {pa ... pa+len};
> > 
> > >To whole pages following the end of an object:
> > range = {pa ... PAGE_ROUND_UP(pa+len)};
> 
> It is here you are wrong.

Indeed, I think Kyle took ``end of an object'' to mean the end of the
mapping instead of the end of what is mapped, e.g. EOF in case of a file.


--ms



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-09 21:00             ` Martijn Sipkema
@ 2004-10-09 22:59               ` Mark Mielke
  0 siblings, 0 replies; 191+ messages in thread
From: Mark Mielke @ 2004-10-09 22:59 UTC (permalink / raw)
  To: Martijn Sipkema; +Cc: David Schwartz, linux-kernel

On Sat, Oct 09, 2004 at 10:00:35PM +0100, Martijn Sipkema wrote:
> [...]
> > Please - people who don't agree, just ensure that Linux is documented
> > to not implement select() on sockets without O_NONBLOCK properly.
> Actually, the behaviour isn't correct for sockets with O_NONBLOCK
> either, since EAGAIN may only be returned when recvmsg() would not
> block without O_NONBLOCK.

At least this one is acceptable, though, as most current applications
won't break, or be open to attack. I agree - it should also be documented
as improper, but with enough words to point out that it isn't a big deal.

I think you and I are on the same page.

Cheers,
mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 191+ messages in thread

* RE: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-08  6:41                 ` Willy Tarreau
  2004-10-08 15:27                   ` Mark Mielke
@ 2004-10-15 22:42                   ` Robert White
  2004-10-15 23:33                     ` David Schwartz
  2004-10-16 10:24                     ` Willy Tarreau
  1 sibling, 2 replies; 191+ messages in thread
From: Robert White @ 2004-10-15 22:42 UTC (permalink / raw)
  To: 'Willy Tarreau', 'David S. Miller'
  Cc: 'Olivier Galibert', linux-kernel



-----Original Message-----
From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org]
On Behalf Of Willy Tarreau

> As I asked in a previous mail in this overly long thread, why not returning
> zero bytes at all. It is perfectly valid to receive an UDP packet with 0


Zero bytes is "end of file".  Don't go trying to co-opt end of file.  That way lies
madness and despair.

You would *then* need a flag on each file descriptor to determine if the most
previous call before the read op was a select that returned the file as readable so
you knew whether to block or return the not-really-end-of-file.  Your *app* would
then also need a flag/context to determine whether the end of file just read was
contextually an aborted read after select.


Nope, very very very very very bad idea... 8-)

[On the larger issues, I am surprised that select() doesn't guarantee available data
and one subsequent non-blocking read, but again in the case of a UDP discard after
the select but before the read, that is the only thing that makes sense.  I would
vote (were this a democracy 8-) to put a CAVEAT in the manual that listed the _rare_
cases, as examples, where the warrant of available data may prove false; give a nod
to real life, and _firmly suggest_ that if you are using select, you *probably* want
nonblocking file descriptors too.]


^ permalink raw reply	[flat|nested] 191+ messages in thread

* RE: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-15 22:42                   ` Robert White
@ 2004-10-15 23:33                     ` David Schwartz
  2004-10-16  0:59                       ` Chris Friesen
  2004-10-16  2:35                       ` Mark Mielke
  2004-10-16 10:24                     ` Willy Tarreau
  1 sibling, 2 replies; 191+ messages in thread
From: David Schwartz @ 2004-10-15 23:33 UTC (permalink / raw)
  To: Linux-Kernel@Vger. Kernel. Org


> > As I asked in a previous mail in this overly long thread, why
> > not returning
> > zero bytes at all. It is perfectly valid to receive an UDP packet with 0

> Zero bytes is "end of file".  Don't go trying to co-opt end of
> file.  That way lies
> madness and despair.

	Not for UDP. Zero bytes means that zero bytes of data were received, a
perfectly legitimate (though seldom useful) number.

> You would *then* need a flag on each file descriptor to determine
> if the most
> previous call before the read op was a select that returned the
> file as readable so
> you knew whether to block or return the not-really-end-of-file.
> Your *app* would
> then also need a flag/context to determine whether the end of
> file just read was
> contextually an aborted read after select.

	You mean whether it was a zero-byte datagram or some sort of error.

> [On the larger issues, I am surprised that select() doesn't
> guarantee available data
> and one subsequent non-blocking read, but again in the case of a
> UDP discard after
> the select but before the read, that is the only thing that makes
> sense.  I would
> vote (were this a democracy 8-) to put a CAVEAT in the manual
> that listed the _rare_
> cases, as examples, where the warrant of available data may prove
> false; give a nod
> to real life, and _firmly suggest_ that if you are using select,
> you *probably* want
> nonblocking file descriptors too.]

	I think it's a really bad idea to make 'select' more complicated by trying
to nail down precise semantics for every possible protocol. The 'select'
function is supposed to be protocol-neutral and trying to say it guarantees
X on protocol Y, where such guarantees constrict what the kernel can do and
do not make user code anything but more fragile, doesn't seem to be a good
idea to me.

	For UDP specifically, datagrams are fundamentally discardable whenever that
seems to be a good idea. In general, there are any number of corner cases
for various combinations of protocols and situations where a 'select' hit
will not be followed by an operation that doesn't block.

	It just happens to be that 'select' works best when it's a hint that
something has changed and the operation can/should be re-tried. It works
very badly when the results of a 'select' are supposed to change something
because you're supposed to be able to 'select' (and then not perform the
operation) without affecting things. This is level semantics, not edge.

	The CAVEAT is that 'select', like every other status information function
provided by the kernel, does not guarantee anything about the future. Just
like 'stat' does not guarantee that the file size will still be the same
later when you call 'read'.

	DS



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-15 23:33                     ` David Schwartz
@ 2004-10-16  0:59                       ` Chris Friesen
  2004-10-16  2:35                       ` Mark Mielke
  1 sibling, 0 replies; 191+ messages in thread
From: Chris Friesen @ 2004-10-16  0:59 UTC (permalink / raw)
  To: davids; +Cc: Linux-Kernel@Vger. Kernel. Org

David Schwartz wrote:

> 	The CAVEAT is that 'select', like every other status information function
> provided by the kernel, does not guarantee anything about the future. Just
> like 'stat' does not guarantee that the file size will still be the same
> later when you call 'read'.

This is not a very good counterexample.  If I'm the only one accessing a file, 
then the file size should not just change all by itself.

As you say, select is level triggered.  However, the very definition of select 
returning a file descriptor as readable is that a subsequent read will not 
block.  This seems very straightforward.  Maybe it's not the best from a 
performance point of view, but it's very straightforward.

So if we change the semantics slightly to say that select returning readable 
really means a subsequent *blocking* read will not block, then apps that use 
blocking sockets will get proper semantics, and apps using nonblocking reads 
will get full performance.

As it stands, it's fairly straightforward to do a DOS and hang various apps with 
a single corrupt packet each.  This is suboptimal.

Chris

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-15 23:33                     ` David Schwartz
  2004-10-16  0:59                       ` Chris Friesen
@ 2004-10-16  2:35                       ` Mark Mielke
  2004-10-16  4:23                         ` David Schwartz
  1 sibling, 1 reply; 191+ messages in thread
From: Mark Mielke @ 2004-10-16  2:35 UTC (permalink / raw)
  To: David Schwartz; +Cc: Linux-Kernel@Vger. Kernel. Org

On Fri, Oct 15, 2004 at 04:33:40PM -0700, David Schwartz wrote:
> I think it's a really bad idea to make 'select' more complicated by trying
> to nail down precise semantics for every possible protocol. The 'select'
> function is supposed to be protocol-neutral and trying to say it guarantees
> X on protocol Y, where such guarantees constrict what the kernel can do and
> do not make user code anything but more fragile, doesn't seem to be a good
> idea to me.

Are you saying that a minimal operating system should feel free to implement
select() to always return true with all bits set?

> For UDP specifically, datagrams are fundamentally discardable whenever that
> seems to be a good idea. In general, there are any number of corner cases
> for various combinations of protocols and situations where a 'select' hit
> will not be followed by an operation that doesn't block.

Like?

In the accept() case, you genuinely have a 'if you had substituted the
select() with an accept(), the accept() would have succeeded.'

In the UDP case, you do NOT have this situation. A recvmesg() in place
of select() would block, therefore, select() should not block.

> 	It just happens to be that 'select' works best when it's a hint that
> something has changed and the operation can/should be re-tried. It works
> very badly when the results of a 'select' are supposed to change something
> because you're supposed to be able to 'select' (and then not perform the
> operation) without affecting things. This is level semantics, not edge.

We're talking about a packet that was never readable. If you had an
efficient enough check ahead of time (perhaps implemented in
hardware?), it would never get to the point where select() was in this
position. The Linux developer of this section of code decided that they
wanted UDP to be more efficient by delaying the checksum validation until
the last minute. The cost of this, is that they broke the API with regard
to select(). This isn't being admitted. Instead, off-topic challenges of
POSIX, and impractical claims regarding the use of blocking file descriptors
with select() being not recommended have been offered.

These answers are simply wrong. If the decision is to make select() with
blocking file descriptors unreliable, than the decision *IS* to recommend
that select() never be used with blocking file descriptors. What kind of
operating system developers would recommend the use of select() if it is
known that the behaviour is unreliable?

> The CAVEAT is that 'select', like every other status information function
> provided by the kernel, does not guarantee anything about the future. Just
> like 'stat' does not guarantee that the file size will still be the same
> later when you call 'read'.

We're not talking about the future. Get off that horse.

We're talking about the present. At the time select() is invoked, there is
*no* available data. select() is lying.

Cheers,
mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 191+ messages in thread

* RE: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-16  2:35                       ` Mark Mielke
@ 2004-10-16  4:23                         ` David Schwartz
  2004-10-16  4:35                           ` Mark Mielke
  0 siblings, 1 reply; 191+ messages in thread
From: David Schwartz @ 2004-10-16  4:23 UTC (permalink / raw)
  To: mark; +Cc: Linux-Kernel@Vger. Kernel. Org


> On Fri, Oct 15, 2004 at 04:33:40PM -0700, David Schwartz wrote:
> > I think it's a really bad idea to make 'select' more
> > complicated by trying
> > to nail down precise semantics for every possible protocol. The 'select'
> > function is supposed to be protocol-neutral and trying to say
> > it guarantees
> > X on protocol Y, where such guarantees constrict what the
> > kernel can do and
> > do not make user code anything but more fragile, doesn't seem
> > to be a good
> > idea to me.

> Are you saying that a minimal operating system should feel free
> to implement
> select() to always return true with all bits set?

	Yes, though that would obviously be a very poor implementation.

> > For UDP specifically, datagrams are fundamentally discardable
> > whenever that
> > seems to be a good idea. In general, there are any number of
> > corner cases
> > for various combinations of protocols and situations where a
> > 'select' hit
> > will not be followed by an operation that doesn't block.
>
> Like?

	Like a TCP 'write'.

> In the accept() case, you genuinely have a 'if you had substituted the
> select() with an accept(), the accept() would have succeeded.'

	While this is true, there is an 'as if' here. There's no difference to the
application between a datagram that dropped and a datagram that got
corrupted in transit.

> In the UDP case, you do NOT have this situation. A recvmesg() in place
> of select() would block, therefore, select() should not block.

	But it would block if the UDP datagram had been dropped after the 'select'
hit but before the 'recvmsg'. There is no logical reason there should be a
semantic difference between a UDP packet that was dropped and a UDP packet
that was corrupted.

> > 	It just happens to be that 'select' works best when it's a hint that
> > something has changed and the operation can/should be re-tried. It works
> > very badly when the results of a 'select' are supposed to
> > change something
> > because you're supposed to be able to 'select' (and then not perform the
> > operation) without affecting things. This is level semantics, not edge.

> We're talking about a packet that was never readable.

	There is no *application* difference between a dropped packet and a corrupt
packet. If the packet was dropped, it would have been readable.

> If you had an
> efficient enough check ahead of time (perhaps implemented in
> hardware?), it would never get to the point where select() was in this
> position. The Linux developer of this section of code decided that they
> wanted UDP to be more efficient by delaying the checksum validation until
> the last minute. The cost of this, is that they broke the API with regard
> to select(). This isn't being admitted. Instead, off-topic challenges of
> POSIX, and impractical claims regarding the use of blocking file
> descriptors
> with select() being not recommended have been offered.

> These answers are simply wrong. If the decision is to make select() with
> blocking file descriptors unreliable, than the decision *IS* to recommend
> that select() never be used with blocking file descriptors. What kind of
> operating system developers would recommend the use of select() if it is
> known that the behaviour is unreliable?

	I have never seen anyone recommend anything different. Every time this
comes up, it astonishes me that anyone would ever use 'select' on a blocking
file descriptor except in the very special case where you plan to change it
to non-blocking before performing the I/O operation. There are so many well
known corner cases, TCP write being one, UDP discard due to memory pressure
being another, connection abort being yet another.

> > The CAVEAT is that 'select', like every other status
> > information function
> > provided by the kernel, does not guarantee anything about the
> > future. Just
> > like 'stat' does not guarantee that the file size will still be the same
> > later when you call 'read'.
>
> We're not talking about the future. Get off that horse.
>
> We're talking about the present. At the time select() is invoked, there is
> *no* available data. select() is lying.

	There is no difference to the application between a UDP packet that was
discarded due to memory pressure in-between a 'select' hit and a packet that
was received with a bad checksum. If you can write an application that can
detect the difference, do so. Then I'll agree with you. Until that time,
this argument fails due to the 'as if' rule. If the API can perform
identically under conditions that do not differ as far as the application is
concerned, then the behavior is legal.

	DS



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-16  4:23                         ` David Schwartz
@ 2004-10-16  4:35                           ` Mark Mielke
  2004-10-16  4:58                             ` David Schwartz
  0 siblings, 1 reply; 191+ messages in thread
From: Mark Mielke @ 2004-10-16  4:35 UTC (permalink / raw)
  To: David Schwartz; +Cc: Linux-Kernel@Vger. Kernel. Org

On Fri, Oct 15, 2004 at 09:23:44PM -0700, David Schwartz wrote:
> > Are you saying that a minimal operating system should feel free
> > to implement
> > select() to always return true with all bits set?
> Yes, though that would obviously be a very poor implementation.

The 'obviously' ... 'poor' should tell you something.

> > > For UDP specifically, datagrams are fundamentally discardable
> > > whenever that
> > > seems to be a good idea. In general, there are any number of
> > > corner cases
> > > for various combinations of protocols and situations where a
> > > 'select' hit
> > > will not be followed by an operation that doesn't block.
> > Like?
> Like a TCP 'write'.

Not applicable. See the definition of select() in most reasonable
standards. We're talking about read of a packet. You're talking about
write of an arbitrary number of bytes. No standard I have seen
declares that select() guarantees a write of an arbitrary number of
bytes.

> > In the accept() case, you genuinely have a 'if you had substituted the
> > select() with an accept(), the accept() would have succeeded.'
> While this is true, there is an 'as if' here. There's no difference to the
> application between a datagram that dropped and a datagram that got
> corrupted in transit.

There *shouldn't* be, and yet there is. The application can *currently*
watch select() until it returns true, then, if read() returns EAGAIN in
the non-blocking case, or if it blocks (harder to tell - but don't assume
this means impossible), the application can now be aware that a datagram
was corrupted in transit (as opposed to being dropped).

The internal implementation - an OPTIMIZATION - is being exposed to the
application. In the case of non-blocking, the effect is limited to behaviour
which most of us agree is acceptable. In the case of blocking, the effect
is *not* acceptable. As suggested earlier in this thread, daemons which
receive UDP packets that use blocking file descriptors, and select(), can
be easily DOS'd. You think this is acceptable?

> > In the UDP case, you do NOT have this situation. A recvmesg() in place
> > of select() would block, therefore, select() should not block.
> But it would block if the UDP datagram had been dropped after the 'select'
> hit but before the 'recvmsg'. There is no logical reason there should be a
> semantic difference between a UDP packet that was dropped and a UDP packet
> that was corrupted.

You're thinking too fast, and skipping the most important point here:

    1) packet was dropped earlier (or was never sent)
         - if select() is issued, it blocks
         - if recvmesg() is issued, it blocks
    2) packet was received, but is corrupt
         - if select() is issued, it does not block
         - if recvmesg() is issued, it blocks

See the problem?

> > > 	It just happens to be that 'select' works best when it's a hint that
> > > something has changed and the operation can/should be re-tried. It works
> > > very badly when the results of a 'select' are supposed to
> > > change something
> > > because you're supposed to be able to 'select' (and then not perform the
> > > operation) without affecting things. This is level semantics, not edge.
> 
> > We're talking about a packet that was never readable.
> There is no *application* difference between a dropped packet and a corrupt
> packet. If the packet was dropped, it would have been readable.

So I've and a few other people have tried to explain to you. Since it seems
we agree, why do you contradict yourself by allowing select() to expose
the difference to the application?

> > These answers are simply wrong. If the decision is to make select() with
> > blocking file descriptors unreliable, than the decision *IS* to recommend
> > that select() never be used with blocking file descriptors. What kind of
> > operating system developers would recommend the use of select() if it is
> > known that the behaviour is unreliable?
> 	I have never seen anyone recommend anything different. Every time this
> comes up, it astonishes me that anyone would ever use 'select' on a blocking
> file descriptor except in the very special case where you plan to change it
> to non-blocking before performing the I/O operation. There are so many well
> known corner cases, TCP write being one, UDP discard due to memory pressure
> being another, connection abort being yet another.

It astonishes you that somebody reads any of the UNIX manuals or standards,
and comes to the conclusion that they can use select() and read() together
on a blocking file descriptor?

What astonishes me is how few of these limitations are openly documented.

> > > The CAVEAT is that 'select', like every other status
> > > information function
> > > provided by the kernel, does not guarantee anything about the
> > > future. Just
> > > like 'stat' does not guarantee that the file size will still be the same
> > > later when you call 'read'.
> > We're not talking about the future. Get off that horse.
> > We're talking about the present. At the time select() is invoked, there is
> > *no* available data. select() is lying.
> 	There is no difference to the application between a UDP packet that was
> discarded due to memory pressure in-between a 'select' hit and a packet that
> was received with a bad checksum. If you can write an application that can
> detect the difference, do so. Then I'll agree with you. Until that time,
> this argument fails due to the 'as if' rule. If the API can perform
> identically under conditions that do not differ as far as the application is
> concerned, then the behavior is legal.

As I said, there obviously is. select() and read() exposes the difference.

Please consider the argument outside of your pre-conceived conclusion. :-)

As I've also said before - I'm OK with optimization being important - IF,
it is readily admitted that Linux has a limitation in this regard. Instead,
I see only arguments to try and suggest that Linux is 'correct', and that
there is no real problem. There *IS* a problem, and developers who author
applications for Linux that make use of select() *SHOULD* be *FORCED* to
be aware of these limitations.

Cheers,
mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 191+ messages in thread

* RE: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-16  4:35                           ` Mark Mielke
@ 2004-10-16  4:58                             ` David Schwartz
  2004-10-16  6:25                               ` Mark Mielke
  2004-10-16 18:25                               ` Andries Brouwer
  0 siblings, 2 replies; 191+ messages in thread
From: David Schwartz @ 2004-10-16  4:58 UTC (permalink / raw)
  To: mark; +Cc: Linux-Kernel@Vger. Kernel. Org


> You're thinking too fast, and skipping the most important point here:
>
>     1) packet was dropped earlier (or was never sent)
>          - if select() is issued, it blocks
>          - if recvmesg() is issued, it blocks
>     2) packet was received, but is corrupt
>          - if select() is issued, it does not block
>          - if recvmesg() is issued, it blocks
>
> See the problem?

	I'm talking about the case where it is dropped after the 'select' hit but
before the call to 'recvmsg'. In that case, the select does not block but
the recvmsg does.

> > > We're talking about a packet that was never readable.
> > > There is no *application* difference between a dropped packet
> > > and a corrupt
> > packet. If the packet was dropped, it would have been readable.

> So I've and a few other people have tried to explain to you.
> Since it seems
> we agree, why do you contradict yourself by allowing select() to expose
> the difference to the application?

	It does not. You cannot tell the difference between a packet that was
dropped right before you call 'recvmsg' and a packet that was received
corrupted.

> It astonishes you that somebody reads any of the UNIX manuals or
> standards,
> and comes to the conclusion that they can use select() and read() together
> on a blocking file descriptor?

	The certainly can. They just have to understand what behavior they're going
to get. If you absolutely must not ever block, you must use blocking sockets
because the kernel cannot guarantee future behavior. Period. End of story.

> What astonishes me is how few of these limitations are openly documented.

	That is a *very* good point. But it doesn't help to deny the limitations. A
hit on 'select' does not guarantee that a future operation will not block
unless you have very tight control over circumstances that typical
applications do not have tight control over.

> > 	There is no difference to the application between a UDP
> > packet that was
> > discarded due to memory pressure in-between a 'select' hit and
> > a packet that
> > was received with a bad checksum. If you can write an
> > application that can
> > detect the difference, do so. Then I'll agree with you. Until that time,
> > this argument fails due to the 'as if' rule. If the API can perform
> > > identically under conditions that do not differ as far as the
> > application is
> > concerned, then the behavior is legal.
>
> As I said, there obviously is. select() and read() exposes the difference.
>
> Please consider the argument outside of your pre-conceived conclusion. :-)

	It does not. In fact, quite literally, the UDP packet is dropped right at
the call to 'recvmsg'. This is totally legal behavior -- a UDP packet can be
discarded at *any* time.

> As I've also said before - I'm OK with optimization being important - IF,
> it is readily admitted that Linux has a limitation in this
> regard. Instead,
> I see only arguments to try and suggest that Linux is 'correct', and that
> there is no real problem. There *IS* a problem, and developers who author
> applications for Linux that make use of select() *SHOULD* be *FORCED* to
> be aware of these limitations.

	Linux's behavior is correct in the literal sense that it is doing something
that is allowed. It's incorrect in the sense that it's sub-optimal. However,
it will not break any application that could not already be broken by other
circumstances.

	DS



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-16  4:58                             ` David Schwartz
@ 2004-10-16  6:25                               ` Mark Mielke
  2004-10-16 21:44                                 ` Roland Kuhn
  2004-10-17  0:28                                 ` David Schwartz
  2004-10-16 18:25                               ` Andries Brouwer
  1 sibling, 2 replies; 191+ messages in thread
From: Mark Mielke @ 2004-10-16  6:25 UTC (permalink / raw)
  To: David Schwartz; +Cc: Linux-Kernel@Vger. Kernel. Org

On Fri, Oct 15, 2004 at 09:58:38PM -0700, David Schwartz wrote:
> > You're thinking too fast, and skipping the most important point here:
> >     1) packet was dropped earlier (or was never sent)
> >          - if select() is issued, it blocks
> >          - if recvmesg() is issued, it blocks
> >     2) packet was received, but is corrupt
> >          - if select() is issued, it does not block
> >          - if recvmesg() is issued, it blocks
> > See the problem?
> I'm talking about the case where it is dropped after the 'select' hit but
> before the call to 'recvmsg'. In that case, the select does not block but
> the recvmsg does.

You are talking about the make believe case that only exists due to
the *current* implementation of Linux UDP packet reading. It doesn't
have to exist. It exists only behaviour nobody saw fit to implement it
with semantics that were reliable, because the implentors didn't foresee
blocking file descriptors being used. It's an implementation oversight.

> > It astonishes you that somebody reads any of the UNIX manuals or
> > standards,
> > and comes to the conclusion that they can use select() and read() together
> > on a blocking file descriptor?
> The certainly can. They just have to understand what behavior they're going
> to get. If you absolutely must not ever block, you must use blocking sockets
> because the kernel cannot guarantee future behavior. Period. End of story.

We're not talking about future behaviour. We're talking about past behaviour.

That the kernel chose to delay making a decision too long to tell the truth
in select() is not reasonable for blocking sockets. The answer *MUST* be,
fix it for blocking sockets, or tell the truth, and just say - blocking
file descriptors for UDP sockets should not be used with select(). Why is
that so hard? Why all the distracting and minimizing language about
recommendations?

> > What astonishes me is how few of these limitations are openly documented.
> That is a *very* good point. But it doesn't help to deny the limitations. A
> hit on 'select' does not guarantee that a future operation will not block
> unless you have very tight control over circumstances that typical
> applications do not have tight control over.

Sure, but this isn't about the future, remember. The kernel already has the
information to know whether there is data, or whether there isn't. It just
isn't doing the work to find out.

> > Please consider the argument outside of your pre-conceived conclusion. :-)
> It does not. In fact, quite literally, the UDP packet is dropped right at
> the call to 'recvmsg'. This is totally legal behavior -- a UDP packet can be
> discarded at *any* time.

That's a liberal understanding. It's also distracting. select() could know
that the packet will be discarded. It chooses to not.

> Linux's behavior is correct in the literal sense that it is doing something
> that is allowed. It's incorrect in the sense that it's sub-optimal. However,
> it will not break any application that could not already be broken by other
> circumstances.

Allowed by who? For select() to say data is ready, and read() to block,
is not allowed by all the standards that I have read. This is the first
time I have ever heard of a situation like this, for select(). It is *not*
the same as writing an arbitrary number of bytes, or accepting.

You say it will not break any application that could not already be broken
by other circumstances. I disagree. For example, a UDP-based server that
only receives, and never sends, would be perfectly happy to select() on
several file descriptors, and readmesg() whenever the UDP file descriptor
says readable. It would not break on other operating systems that implement
select() to be useful for determining whether or not to read() from a
blocking file descriptor.

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-15 22:42                   ` Robert White
  2004-10-15 23:33                     ` David Schwartz
@ 2004-10-16 10:24                     ` Willy Tarreau
  2004-10-16 13:21                       ` Mark Mielke
  2004-10-18 22:25                       ` Robert White
  1 sibling, 2 replies; 191+ messages in thread
From: Willy Tarreau @ 2004-10-16 10:24 UTC (permalink / raw)
  To: Robert White
  Cc: 'David S. Miller', 'Olivier Galibert', linux-kernel

On Fri, Oct 15, 2004 at 03:42:55PM -0700, Robert White wrote:
> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org]
> On Behalf Of Willy Tarreau
> 
> > As I asked in a previous mail in this overly long thread, why not returning
> > zero bytes at all. It is perfectly valid to receive an UDP packet with 0
> 
> Zero bytes is "end of file".  Don't go trying to co-opt end of file.  That way lies
> madness and despair.

Please explain me what "end of file" means with UDP. If your UDP-based app
expects to receive a zero when the other end stops transmitting, then it
might wait for a very long time. As opposed to TCP, there's no FIN control
flag to tell the remote host that you sent your last packet.

Willy


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-16 10:24                     ` Willy Tarreau
@ 2004-10-16 13:21                       ` Mark Mielke
  2004-10-18 22:25                       ` Robert White
  1 sibling, 0 replies; 191+ messages in thread
From: Mark Mielke @ 2004-10-16 13:21 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Robert White, 'David S. Miller',
	'Olivier Galibert',
	linux-kernel

On Sat, Oct 16, 2004 at 12:24:33PM +0200, Willy Tarreau wrote:
> On Fri, Oct 15, 2004 at 03:42:55PM -0700, Robert White wrote:
> > > As I asked in a previous mail in this overly long thread, why not
> > > returning zero bytes at all. It is perfectly valid to receive an
> > > UDP packet with 0
> > Zero bytes is "end of file".  Don't go trying to co-opt end of file.
> > That way lies madness and despair.
> Please explain me what "end of file" means with UDP. If your UDP-based app
> expects to receive a zero when the other end stops transmitting, then it
> might wait for a very long time. As opposed to TCP, there's no FIN control
> flag to tell the remote host that you sent your last packet.

He means zero byte packet. :-)

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-16  4:58                             ` David Schwartz
  2004-10-16  6:25                               ` Mark Mielke
@ 2004-10-16 18:25                               ` Andries Brouwer
  2004-10-17  0:28                                 ` David Schwartz
  1 sibling, 1 reply; 191+ messages in thread
From: Andries Brouwer @ 2004-10-16 18:25 UTC (permalink / raw)
  To: David Schwartz; +Cc: mark, Linux-Kernel@Vger. Kernel. Org

On Fri, Oct 15, 2004 at 09:58:38PM -0700, David Schwartz wrote:

> Linux's behavior is correct in the literal sense that it is doing something
> that is allowed. It's incorrect in the sense that it's sub-optimal.

"Allowed" by whom? By you?

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-16  6:25                               ` Mark Mielke
@ 2004-10-16 21:44                                 ` Roland Kuhn
  2004-10-17  0:06                                   ` Mark Mielke
  2004-10-17  0:28                                 ` David Schwartz
  1 sibling, 1 reply; 191+ messages in thread
From: Roland Kuhn @ 2004-10-16 21:44 UTC (permalink / raw)
  To: Mark Mielke; +Cc: Linux-Kernel@Vger. Kernel. Org, David Schwartz

[-- Attachment #1: Type: text/plain, Size: 1556 bytes --]

Hi Mark!

On Oct 16, 2004, at 8:25 AM, Mark Mielke wrote:

> On Fri, Oct 15, 2004 at 09:58:38PM -0700, David Schwartz wrote:
>>> You're thinking too fast, and skipping the most important point here:
>>>     1) packet was dropped earlier (or was never sent)
>>>          - if select() is issued, it blocks
>>>          - if recvmesg() is issued, it blocks
>>>     2) packet was received, but is corrupt
>>>          - if select() is issued, it does not block
>>>          - if recvmesg() is issued, it blocks
>>> See the problem?
>> I'm talking about the case where it is dropped after the 'select' hit 
>> but
>> before the call to 'recvmsg'. In that case, the select does not block 
>> but
>> the recvmsg does.
>
> You are talking about the make believe case that only exists due to
> the *current* implementation of Linux UDP packet reading. It doesn't
> have to exist. It exists only behaviour nobody saw fit to implement it
> with semantics that were reliable, because the implentors didn't 
> foresee
> blocking file descriptors being used. It's an implementation oversight.
>
Well, I haven't read the source to see what would be necessary to 
create this behaviour, but David was talking about the situation where 
the UDP packet is dropped because of memory pressure. This event cannot 
possibly be foretold by select()...

Ciao,
					Roland

--
TU Muenchen, Physik-Department E18, James-Franck-Str. 85747 Garching
Telefon 089/289-12592; Telefax 089/289-12570
--
A mouse is a device used to point at
the xterm you want to type in.
Kim Alm on a.s.r.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-16 21:44                                 ` Roland Kuhn
@ 2004-10-17  0:06                                   ` Mark Mielke
  2004-10-17  0:30                                     ` David Schwartz
  0 siblings, 1 reply; 191+ messages in thread
From: Mark Mielke @ 2004-10-17  0:06 UTC (permalink / raw)
  To: Roland Kuhn; +Cc: Linux-Kernel@Vger. Kernel. Org, David Schwartz

On Sat, Oct 16, 2004 at 11:44:21PM +0200, Roland Kuhn wrote:
> >You are talking about the make believe case that only exists due to
> >the *current* implementation of Linux UDP packet reading. It doesn't
> >have to exist. It exists only behaviour nobody saw fit to implement it
> >with semantics that were reliable, because the implentors didn't 
> >foresee
> >blocking file descriptors being used. It's an implementation oversight.
> Well, I haven't read the source to see what would be necessary to 
> create this behaviour, but David was talking about the situation where 
> the UDP packet is dropped because of memory pressure. This event cannot 
> possibly be foretold by select()...

I don't think he is, but if he is:

I'm not sure that either is reasonable behaviour. The socket buffers
don't increase or decrease at run time, do they? If they do shrink at
run time, this is news to me...

Cheers,
mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 191+ messages in thread

* RE: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-16 18:25                               ` Andries Brouwer
@ 2004-10-17  0:28                                 ` David Schwartz
  2004-10-17 12:22                                   ` Andries Brouwer
  0 siblings, 1 reply; 191+ messages in thread
From: David Schwartz @ 2004-10-17  0:28 UTC (permalink / raw)
  To: aebr; +Cc: mark, Linux-Kernel@Vger. Kernel. Org


> On Fri, Oct 15, 2004 at 09:58:38PM -0700, David Schwartz wrote:
>
> > Linux's behavior is correct in the literal sense that it is
> > doing something
> > that is allowed. It's incorrect in the sense that it's sub-optimal.
>
> "Allowed" by whom? By you?

	I clearly explained what I meant in context that you snipped. In summary, I
mean 'allowed' in the sense that it's not prohibited by the standard and
arguing that it's not allowed leads to direct logical contradictions.
Nothing prohibits an implementation from dropping a UDP packet after it has
been received. Nothing in POSIX requires that a subsequent operation
actually does not block.

	Would you argue that an implementation cannot drop a UDP packet after it
has indicated a read hit on 'select' because of that packet? If so, where
does POSIX say this? And if not, then it can drop a corrupt packet on a call
to 'recvmsg' as well.

	DS




^ permalink raw reply	[flat|nested] 191+ messages in thread

* RE: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-16  6:25                               ` Mark Mielke
  2004-10-16 21:44                                 ` Roland Kuhn
@ 2004-10-17  0:28                                 ` David Schwartz
  2004-10-17 13:35                                   ` Lars Marowsky-Bree
  2004-10-17 14:52                                   ` Mark Mielke
  1 sibling, 2 replies; 191+ messages in thread
From: David Schwartz @ 2004-10-17  0:28 UTC (permalink / raw)
  To: Linux-Kernel@Vger. Kernel. Org


> Sure, but this isn't about the future, remember. The kernel
> already has the
> information to know whether there is data, or whether there isn't. It just
> isn't doing the work to find out.

	The kernel does not have the information yet. It could get it, but it does
not have it.

> Allowed by who? For select() to say data is ready, and read() to block,
> is not allowed by all the standards that I have read. This is the first
> time I have ever heard of a situation like this, for select(). It is *not*
> the same as writing an arbitrary number of bytes, or accepting.

	So is it your position that a kernel cannot drop a UDP packet at any time
after it indicated that the socket was readable because of that packet? If
so, please cite where in POSIX you think you find this requirement.

	The kernel elects to drop the packet on the call to 'recvmsg'. This is its
right -- it can drop a UDP packet at any time. POSIX is careful not to imply
that 'select' guarantees future behavior because this is not possible in
principle.

> You say it will not break any application that could not already be broken
> by other circumstances. I disagree. For example, a UDP-based server that
> only receives, and never sends, would be perfectly happy to select() on
> several file descriptors, and readmesg() whenever the UDP file descriptor
> says readable. It would not break on other operating systems that
> implement
> select() to be useful for determining whether or not to read() from a
> blocking file descriptor.

	Sure it would. It would break on any platform that dropped a UDP packet
after having triggered a read hit on 'select' because of that packet. POSIX
does not say that a future read will not block because it cannot guarantee
that under at least some circumstances.

	DS



^ permalink raw reply	[flat|nested] 191+ messages in thread

* RE: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17  0:06                                   ` Mark Mielke
@ 2004-10-17  0:30                                     ` David Schwartz
  2004-10-17 14:47                                       ` Mark Mielke
  0 siblings, 1 reply; 191+ messages in thread
From: David Schwartz @ 2004-10-17  0:30 UTC (permalink / raw)
  To: Linux-Kernel@Vger. Kernel. Org


> I don't think he is, but if he is:
>
> I'm not sure that either is reasonable behaviour. The socket buffers
> don't increase or decrease at run time, do they? If they do shrink at
> run time, this is news to me...

	The socket buffers are not guaranteed to indicate a particular number of
bytes in a sense that it meaningful to the application. In fact, on Linux,
they don't mean application bytes.

	In any event, we aren't talking about any particular implementation, we are
talking about a standard. So what Linux does or doesn't do in response to
memory pressure isn't relevant. What's relevant is what the standard
actually guarantees and what the semantics of the protocols themselves are.

	UDP is not reliable. Packets can be dropped, mangled, and lost. Nothing in
POSIX prohibits an implementation from dropping a packet right before you
call 'recvmsg'.

	DS



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17  0:28                                 ` David Schwartz
@ 2004-10-17 12:22                                   ` Andries Brouwer
  0 siblings, 0 replies; 191+ messages in thread
From: Andries Brouwer @ 2004-10-17 12:22 UTC (permalink / raw)
  To: David Schwartz; +Cc: aebr, mark, Linux-Kernel@Vger. Kernel. Org

On Sat, Oct 16, 2004 at 05:28:22PM -0700, David Schwartz wrote:

> > > Linux's behavior is correct in the literal sense that it is
> > > doing something
> > > that is allowed. It's incorrect in the sense that it's sub-optimal.
> >
> > "Allowed" by whom? By you?
> 
> 	I clearly explained what I meant in context that you snipped. In summary, I
> mean 'allowed' in the sense that it's not prohibited by the standard

Yes, but it is prohibited by the standard in case you refer to POSIX.
I quoted chapter and verse. If you refer to a different standard, be explicit.

Whether the standard is reasonable or not, and whether we care or not,
that is a different matter.  But you must keep the facts straight.

Andries

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17  0:28                                 ` David Schwartz
@ 2004-10-17 13:35                                   ` Lars Marowsky-Bree
  2004-10-17 14:17                                     ` Buddy Lucas
  2004-10-20 21:31                                     ` H. Peter Anvin
  2004-10-17 14:52                                   ` Mark Mielke
  1 sibling, 2 replies; 191+ messages in thread
From: Lars Marowsky-Bree @ 2004-10-17 13:35 UTC (permalink / raw)
  To: David Schwartz, Linux-Kernel@Vger. Kernel. Org

On 2004-10-16T17:28:24, David Schwartz <davids@webmaster.com> wrote:

> The kernel elects to drop the packet on the call to 'recvmsg'. This is its
> right -- it can drop a UDP packet at any time. POSIX is careful not to imply
> that 'select' guarantees future behavior because this is not possible in
> principle.

I'm sorry, but according to my reading of POSIX and the Austin spec,
this is exactly what select() returning 'ready to read' implies.

The SuV spec is actually quite detailed about the options here:

	A descriptor shall be considered ready for reading when a call
	to an input function with O_NONBLOCK clear would not block,
	whether or not the function would transfer data successfully.
	(The function might return data, an end-of-file indication, or
	an error other than one indicating that it is blocked, and in
	each of these cases the descriptor shall be considered ready for
	reading.)

This actually forbids recvmsg() to return EAGAIN and EWOULDBLOCK as
has been suggested. EIO seems to be the best fit.

But I'd have to agree that blocking on recvmsg() after select() has
indicated ready to read does violate the specification, unless the
socket has actually been opened with O_NONBLOCK.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX AG - A Novell company


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 13:35                                   ` Lars Marowsky-Bree
@ 2004-10-17 14:17                                     ` Buddy Lucas
  2004-10-17 15:05                                       ` Mark Mielke
  2004-10-17 17:22                                       ` Lars Marowsky-Bree
  2004-10-20 21:31                                     ` H. Peter Anvin
  1 sibling, 2 replies; 191+ messages in thread
From: Buddy Lucas @ 2004-10-17 14:17 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: David Schwartz, Linux-Kernel@Vger. Kernel. Org

On Sun, 17 Oct 2004 15:35:37 +0200, Lars Marowsky-Bree <lmb@suse.de> wrote:
> On 2004-10-16T17:28:24, David Schwartz <davids@webmaster.com> wrote:
> 
> > The kernel elects to drop the packet on the call to 'recvmsg'. This is its
> > right -- it can drop a UDP packet at any time. POSIX is careful not to imply
> > that 'select' guarantees future behavior because this is not possible in
> > principle.
> 
> I'm sorry, but according to my reading of POSIX and the Austin spec,
> this is exactly what select() returning 'ready to read' implies.
> 
> The SuV spec is actually quite detailed about the options here:
> 
>         A descriptor shall be considered ready for reading when a call
>         to an input function with O_NONBLOCK clear would not block,
>         whether or not the function would transfer data successfully.
>         (The function might return data, an end-of-file indication, or
>         an error other than one indicating that it is blocked, and in
>         each of these cases the descriptor shall be considered ready for
>         reading.)

But it says nowhere that the select()/recvmsg() operation is atomic, right?


Cheers,
Buddy

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17  0:30                                     ` David Schwartz
@ 2004-10-17 14:47                                       ` Mark Mielke
  0 siblings, 0 replies; 191+ messages in thread
From: Mark Mielke @ 2004-10-17 14:47 UTC (permalink / raw)
  To: David Schwartz; +Cc: Linux-Kernel@Vger. Kernel. Org

On Sat, Oct 16, 2004 at 05:30:23PM -0700, David Schwartz wrote:
> > I'm not sure that either is reasonable behaviour. The socket buffers
> > don't increase or decrease at run time, do they? If they do shrink at
> > run time, this is news to me...
> The socket buffers are not guaranteed to indicate a particular number of
> bytes in a sense that it meaningful to the application. In fact, on Linux,
> they don't mean application bytes.

I believe this also means, that it doesn't happen, correct? So why is the
subject being changed?

> In any event, we aren't talking about any particular implementation, we are
> talking about a standard. So what Linux does or doesn't do in response to
> memory pressure isn't relevant. What's relevant is what the standard
> actually guarantees and what the semantics of the protocols themselves are.

Yes, we are talking about a standard. I'm talking about POSIX, and the
related standards that describe select(), and read()/recvmsg().

> UDP is not reliable. Packets can be dropped, mangled, and lost. Nothing in
> POSIX prohibits an implementation from dropping a packet right before you
> call 'recvmsg'.

You seem to believe that the definition of UDP supercedes every and
all standards that describe select() and read()/recvmsg(). Selecting
your standards based on your preference and comfort level. Who gave
you the right to do this for Linux? Why does your comfort level allow
you to take the 'drop' freedom of UDP to the extreme, ignore POSIX, and
so on, that make the behaviour of select() on blocking file descriptors
reliable in certain specific regards (*including* this one, from the
interpretation of more than one person), and then not outright honestly
declare that *nobody* should use select() on blocking file descriptors,
because it isn't reliable under Linux? Why talk of recommendations, or
explanations? select() for blocking file descriptors under Linux leads
to programs being susceptible to DOS attacks. You don't recommend against
it. You fix it, or just say: Don't use the system calls this way - it isn't
supported under Linux.

I've said all this before, and you've said your piece before. Is there
a particular reason you want to just recommend against it's use, instead
of declaring it invalid under Linux? (blocking sockets with select())

Cheers,
mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17  0:28                                 ` David Schwartz
  2004-10-17 13:35                                   ` Lars Marowsky-Bree
@ 2004-10-17 14:52                                   ` Mark Mielke
  1 sibling, 0 replies; 191+ messages in thread
From: Mark Mielke @ 2004-10-17 14:52 UTC (permalink / raw)
  To: David Schwartz; +Cc: Linux-Kernel@Vger. Kernel. Org

On Sat, Oct 16, 2004 at 05:28:24PM -0700, David Schwartz wrote:
> > Sure, but this isn't about the future, remember. The kernel
> > already has the
> > information to know whether there is data, or whether there isn't. It just
> > isn't doing the work to find out.
> The kernel does not have the information yet. It could get it, but it does
> not have it.

You are wrong, David. The kernel *does* have the information. I haven't
seen anybody except you deny this.

What has been said, is that it is considered too expensive to do the check
at select() time, based on the information that *is* available.

> > Allowed by who? For select() to say data is ready, and read() to block,
> > is not allowed by all the standards that I have read. This is the first
> > time I have ever heard of a situation like this, for select(). It is *not*
> > the same as writing an arbitrary number of bytes, or accepting.
> So is it your position that a kernel cannot drop a UDP packet at any time
> after it indicated that the socket was readable because of that packet? If
> so, please cite where in POSIX you think you find this requirement.

Not after select() has stated that the packet is ready for
reading. That is correct. I would to the definition of select() that
was posted earlier in this thread. You've chosen to ignore this
definition, it seems.

> The kernel elects to drop the packet on the call to 'recvmsg'. This is its
> right -- it can drop a UDP packet at any time. POSIX is careful not to imply
> that 'select' guarantees future behavior because this is not possible in
> principle.

Not future behaviour. POSIX, based on the definition of select() would seem
to suggest that, in a delayed-drop implementation such as Linux's, that
select() should feel free to drop the corrupt packet. recvmsg() just makes
the use of select() with blocking sockets unreasonable. Don't you see this?

> > You say it will not break any application that could not already be broken
> > by other circumstances. I disagree. For example, a UDP-based server that
> > only receives, and never sends, would be perfectly happy to select() on
> > several file descriptors, and readmesg() whenever the UDP file descriptor
> > says readable. It would not break on other operating systems that
> > implement
> > select() to be useful for determining whether or not to read() from a
> > blocking file descriptor.
> 	Sure it would. It would break on any platform that dropped a UDP packet
> after having triggered a read hit on 'select' because of that packet. POSIX
> does not say that a future read will not block because it cannot guarantee
> that under at least some circumstances.

Again - we're not talking about the future. Get off that. The packet
is already in the queue. Linux already has the information to know whether
the packet will be dropped or not.

If you want to say that select() on blocking sockets is invalid, please
say it. Don't recommend against it. Say - the use of select() on blocking
sockets is invalid under Linux. Put it in documentation that people will
read.

That's all we want.

Cheers,
mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 14:17                                     ` Buddy Lucas
@ 2004-10-17 15:05                                       ` Mark Mielke
  2004-10-17 15:40                                         ` Buddy Lucas
  2004-10-17 17:22                                       ` Lars Marowsky-Bree
  1 sibling, 1 reply; 191+ messages in thread
From: Mark Mielke @ 2004-10-17 15:05 UTC (permalink / raw)
  To: Buddy Lucas
  Cc: Lars Marowsky-Bree, David Schwartz, Linux-Kernel@Vger. Kernel. Org

On Sun, Oct 17, 2004 at 04:17:06PM +0200, Buddy Lucas wrote:
> On Sun, 17 Oct 2004 15:35:37 +0200, Lars Marowsky-Bree <lmb@suse.de> wrote:
> > The SuV spec is actually quite detailed about the options here:
> >         A descriptor shall be considered ready for reading when a call
> >         to an input function with O_NONBLOCK clear would not block,
> >         whether or not the function would transfer data successfully.
> >         (The function might return data, an end-of-file indication, or
> >         an error other than one indicating that it is blocked, and in
> >         each of these cases the descriptor shall be considered ready for
> >         reading.)
> But it says nowhere that the select()/recvmsg() operation is atomic, right?

This is a distraction. If the call to select() had been substituted
with a call to recvmsg(), it would have blocked. Instead, select() is
returning 'yes, you can read', and then recvmsg() is blocking. The
select() lied. The information is all sitting in the kernel packet
queue. The content of the packet, and the checksum are sitting in
memory waiting to be considered.  select() has all the information it
needs to make a decision as to whether to say 'yes, you can read' or
'there is nothing to read'. The current implementation delays this
check until the very last instant (recvmsg()) in order to get a few
more performance points. For non-blocking sockets, this works fine -
applications should be able to handle EAGAIN unless they are extremely
poorly wirtten. For blocking sockets, it makes select() useless as a
reliable mechanism for determining whether or not the recvmsg() will
block. I say useless, because I don't know why any professional
programmer would ever use an interface that was unreliable in a
production system. Other people here aren't willing to make this claim.
I'm not sure why...

In the above paragraph, I only prove that the atomic argument is
irrelevant and a distraction. The current behaviour might be
acceptable - but only if it is widely known and understood that
select() should not be used with blocking sockets *AT ALL* under
Linux. Somebody showed what looks to be a successful DOS of inetd on
Linux based on this new knowledge. The existence of this thread
suggests that it isn't widely known or understood.

I want to trust Linux with production systems. Any sort of opinion
that 'specs are only to be used as recommendations' or 'we can
interpret a spec however we want to get performance points, and we
don't have to be careful to document the Linux limitations' is
*scary*. The last time this happened that it caused such a mess was
the implementation of kernel-level threads. Linux developers shouldn't
be making these decisions in the dark. It isn't comforting. This
thread has disconcerted me, and my continued participation is an
attempt on my part to aid in the representation of people like me. I'm
disappointed. Please feel my emotions on this and respect my
motivations and don't write this off so quick as "POSIX is wrong" as
some have done.

Many of us want Linux to continue to become professional grade. We
have a love for it. When your love betrays you, it hurts.

Cheers,
mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 15:05                                       ` Mark Mielke
@ 2004-10-17 15:40                                         ` Buddy Lucas
  2004-10-17 16:13                                           ` Lee Revell
                                                             ` (3 more replies)
  0 siblings, 4 replies; 191+ messages in thread
From: Buddy Lucas @ 2004-10-17 15:40 UTC (permalink / raw)
  To: Buddy Lucas, Lars Marowsky-Bree, David Schwartz,
	Linux-Kernel@Vger. Kernel. Org

On Sun, 17 Oct 2004 11:05:09 -0400, Mark Mielke <mark@mark.mielke.cc> wrote:
> On Sun, Oct 17, 2004 at 04:17:06PM +0200, Buddy Lucas wrote:
> > On Sun, 17 Oct 2004 15:35:37 +0200, Lars Marowsky-Bree <lmb@suse.de> wrote:
> > > The SuV spec is actually quite detailed about the options here:
> > >         A descriptor shall be considered ready for reading when a call
> > >         to an input function with O_NONBLOCK clear would not block,
> > >         whether or not the function would transfer data successfully.
> > >         (The function might return data, an end-of-file indication, or
> > >         an error other than one indicating that it is blocked, and in
> > >         each of these cases the descriptor shall be considered ready for
> > >         reading.)
> > But it says nowhere that the select()/recvmsg() operation is atomic, right?
> 
> This is a distraction. If the call to select() had been substituted
> with a call to recvmsg(), it would have blocked. Instead, select() is
> returning 'yes, you can read', and then recvmsg() is blocking. The
> select() lied. The information is all sitting in the kernel packet

No. A million things might happen between select() and recvmsg(), both
in kernel and application. For a consistent behaviour throughout all
possibilities, you *have* to assume that any read on a blocking fd may
block, and that a fd ready for reading at select() time might not be
readable once the app gets to recvmsg() -- for whatever reason.

And indeed, that implies that select() on blocking fds is generally
not useful if you expect to bypass the blocking through select().
Personally,  I think any application that implements this expectation
is broken. (If only because you might have to do a second read() or
recvmsg() which will either result in a crappy select() loop or a
broken read()/recvmsg() loop).

> [snip]

> poorly wirtten. For blocking sockets, it makes select() useless as a
> reliable mechanism for determining whether or not the recvmsg() will
> block. I say useless, because I don't know why any professional

That use of select() *is* useless, there's no doubt about that. It is
an application problem though.

> [snip]
 
> In the above paragraph, I only prove that the atomic argument is

Where's the proof?! You *define* some behaviour that you think makes most sense.

> irrelevant and a distraction. The current behaviour might be
> acceptable - but only if it is widely known and understood that
> select() should not be used with blocking sockets *AT ALL* under
> Linux. Somebody showed what looks to be a successful DOS of inetd on
> Linux based on this new knowledge. The existence of this thread
> suggests that it isn't widely known or understood.

That is not a DoS, it is an application feature or bug, depending on
what the programmer was thinking.

> I want to trust Linux with production systems. Any sort of opinion

>From a pragmatic point of view, it may be comforting to know that most
applications that can now be considered broken will *still* be broken
even if a recvmsg() will never block after select() has given the
verdict "Thou shalt read".


Cheers,
Buddy

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 15:40                                         ` Buddy Lucas
@ 2004-10-17 16:13                                           ` Lee Revell
  2004-10-17 17:35                                           ` Jesper Juhl
                                                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 191+ messages in thread
From: Lee Revell @ 2004-10-17 16:13 UTC (permalink / raw)
  To: Buddy Lucas
  Cc: Lars Marowsky-Bree, David Schwartz, Linux-Kernel@Vger. Kernel. Org

On Sun, 2004-10-17 at 11:40, Buddy Lucas wrote:
> > poorly wirtten. For blocking sockets, it makes select() useless as a
> > reliable mechanism for determining whether or not the recvmsg() will
> > block. I say useless, because I don't know why any professional
> 
> That use of select() *is* useless, there's no doubt about that. It is
> an application problem though.
> 

At this point it's a documentation problem.  It's clear that the
behavior is by design and is not changing.  It's also clear that this is
not the expected behavior for some people.  Can't we just note this in
CAVEATS or something and move on?

Lee




^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 14:17                                     ` Buddy Lucas
  2004-10-17 15:05                                       ` Mark Mielke
@ 2004-10-17 17:22                                       ` Lars Marowsky-Bree
  2004-10-17 17:54                                         ` Buddy Lucas
  1 sibling, 1 reply; 191+ messages in thread
From: Lars Marowsky-Bree @ 2004-10-17 17:22 UTC (permalink / raw)
  To: Buddy Lucas; +Cc: David Schwartz, Linux-Kernel@Vger. Kernel. Org

On 2004-10-17T16:17:06, Buddy Lucas <buddy.lucas@gmail.com> wrote:

> > The SuV spec is actually quite detailed about the options here:
> > 
> >         A descriptor shall be considered ready for reading when a call
> >         to an input function with O_NONBLOCK clear would not block,
> >         whether or not the function would transfer data successfully.
> >         (The function might return data, an end-of-file indication, or
> >         an error other than one indicating that it is blocked, and in
> >         each of these cases the descriptor shall be considered ready for
> >         reading.)
> But it says nowhere that the select()/recvmsg() operation is atomic, right?

See, Buddy, the point here is that Linux _does_ violate the
specification. You can try weaseling out of it, but it's not going to
work.

This isn't per se the same as saying that it's not a sensible violation,
but very clearly the specs disagree with the current Linux behaviour.

It's impossible to claim that you are allowed by the spec to block on a
recvmsg directly following a successful select. You are not. You could
claim that, but you'd be wrong.

If the packet has been dropped in between, which _could_ have happened
because UDP is allowed to be dropped basically anywhere, EIO may be
returned. But blocking or returning EAGAIN/EWOULDBLOCK is verboten. The
spec is very clearly on that.

(Now I'd claim that returning EIO after a succesful select is also
slightly suboptimal - the performance optimizations should be turned off
for blocking sockets, IMHO, and the data which caused the select() to
return should be considered comitted - but it would be allowed.)

I'm not so sure what's so hard to accept about that. It may be well that
Linux is following the de-facto industry standard (or even setting it)
here, and I'd agree that if you don't want blocking use O_NONBLOCK, but
in no way can Linux claim POSIX/SuV spec compliance for this behaviour.

I'm not getting why people argue so much to try and weasel the words so
that it comes out as compliant. It's not. It may make sense due to
practical reasons, but it's not compliant.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX AG - A Novell company


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 17:35                                           ` Martijn Sipkema
@ 2004-10-17 17:33                                             ` Buddy Lucas
  2004-10-17 19:58                                               ` Martijn Sipkema
  2004-10-17 18:53                                             ` David Schwartz
  1 sibling, 1 reply; 191+ messages in thread
From: Buddy Lucas @ 2004-10-17 17:33 UTC (permalink / raw)
  To: Martijn Sipkema
  Cc: Lars Marowsky-Bree, David Schwartz, Linux-Kernel@Vger. Kernel. Org

On Sun, 17 Oct 2004 18:35:34 +0100, Martijn Sipkema <martijn@entmoot.nl> wrote:
> From: "Buddy Lucas" <buddy.lucas@gmail.com>
> > On Sun, 17 Oct 2004 11:05:09 -0400, Mark Mielke <mark@mark.mielke.cc> wrote:
> > > On Sun, Oct 17, 2004 at 04:17:06PM +0200, Buddy Lucas wrote:
> > > > On Sun, 17 Oct 2004 15:35:37 +0200, Lars Marowsky-Bree <lmb@suse.de> wrote:
> > > > > The SuV spec is actually quite detailed about the options here:
> > > > >         A descriptor shall be considered ready for reading when a call
> > > > >         to an input function with O_NONBLOCK clear would not block,
> > > > >         whether or not the function would transfer data successfully.
> > > > >         (The function might return data, an end-of-file indication, or
> > > > >         an error other than one indicating that it is blocked, and in
> > > > >         each of these cases the descriptor shall be considered ready for
> > > > >         reading.)
> > > > But it says nowhere that the select()/recvmsg() operation is atomic, right?
> > >
> > > This is a distraction. If the call to select() had been substituted
> > > with a call to recvmsg(), it would have blocked. Instead, select() is
> > > returning 'yes, you can read', and then recvmsg() is blocking. The
> > > select() lied. The information is all sitting in the kernel packet
> >
> > No. A million things might happen between select() and recvmsg(), both
> > in kernel and application. For a consistent behaviour throughout all
> > possibilities, you *have* to assume that any read on a blocking fd may
> > block, and that a fd ready for reading at select() time might not be
> > readable once the app gets to recvmsg() -- for whatever reason.
> 
> It is perfectly possible to not have a million things happen between
> select() and recvmsg() and POSIX defines what can happen and what
> can't; it states that a process calling select() on a socket will not block
> on a subsequent recvmsg() on that socket.
> 
> > And indeed, that implies that select() on blocking fds is generally
> > not useful if you expect to bypass the blocking through select().
> > Personally,  I think any application that implements this expectation
> > is broken. (If only because you might have to do a second read() or
> > recvmsg() which will either result in a crappy select() loop or a
> > broken read()/recvmsg() loop).
> 
> The way select() is defined in POSIX effectively means that once an
> application has done a select() on a socket, the data that caused
> select() to return is committed, i.e. it can no longer be dropped and
> should be considered received by the application; this has nothing

That is plainly wrong. Data is never received by an application before
recvmsg() has succeeded.

> to do with UDP being unreliable and being unreliable for the sake
> of it is not what UDP was meant for.
> 
> Whether you think an application that is written to use select() as
> defined in POSIX is broken is not really important. The fact remains
> that Linux currently implements a select() that is _not_ POSIX
> compliant and is so solely for performance reasons. I personally think
> correct behaviour is much more important.

All I'm saying is, that applications that are not correct now, will
probably not be correct even if we change the way Linux handles this
situation. The sanest thing really seems to accept the fact that any
read() on a blocking fd might block, even if the programmer thinks it
really shouldn't.

But then I am one of those who thinks it's sane to check for
EWOULDBLOCK on a nonblocking socket after blocking in select().

Let's just document this and move on to something more important.


Cheers,
Buddy

> --ms
> 
>

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 15:40                                         ` Buddy Lucas
  2004-10-17 16:13                                           ` Lee Revell
@ 2004-10-17 17:35                                           ` Jesper Juhl
  2004-10-17 18:04                                             ` Buddy Lucas
  2004-10-17 17:35                                           ` Martijn Sipkema
  2004-10-17 19:21                                           ` Hua Zhong
  3 siblings, 1 reply; 191+ messages in thread
From: Jesper Juhl @ 2004-10-17 17:35 UTC (permalink / raw)
  To: Buddy Lucas
  Cc: Lars Marowsky-Bree, David Schwartz, Linux-Kernel@Vger. Kernel. Org

On Sun, 17 Oct 2004, Buddy Lucas wrote:

> On Sun, 17 Oct 2004 11:05:09 -0400, Mark Mielke <mark@mark.mielke.cc> wrote:
> > On Sun, Oct 17, 2004 at 04:17:06PM +0200, Buddy Lucas wrote:
> > > On Sun, 17 Oct 2004 15:35:37 +0200, Lars Marowsky-Bree <lmb@suse.de> wrote:
> > > > The SuV spec is actually quite detailed about the options here:
> > > >         A descriptor shall be considered ready for reading when a call
> > > >         to an input function with O_NONBLOCK clear would not block,
> > > >         whether or not the function would transfer data successfully.
> > > >         (The function might return data, an end-of-file indication, or
> > > >         an error other than one indicating that it is blocked, and in
> > > >         each of these cases the descriptor shall be considered ready for
> > > >         reading.)
> > > But it says nowhere that the select()/recvmsg() operation is atomic, right?
> > 
> > This is a distraction. If the call to select() had been substituted
> > with a call to recvmsg(), it would have blocked. Instead, select() is
> > returning 'yes, you can read', and then recvmsg() is blocking. The
> > select() lied. The information is all sitting in the kernel packet
> 
> No. A million things might happen between select() and recvmsg(), both
> in kernel and application. For a consistent behaviour throughout all
> possibilities, you *have* to assume that any read on a blocking fd may
> block, and that a fd ready for reading at select() time might not be
> readable once the app gets to recvmsg() -- for whatever reason.
> 
> And indeed, that implies that select() on blocking fds is generally
> not useful if you expect to bypass the blocking through select().
> Personally,  I think any application that implements this expectation
> is broken. (If only because you might have to do a second read() or
> recvmsg() which will either result in a crappy select() loop or a
> broken read()/recvmsg() loop).
> 
Personally I agree that if you want non blocking sockets that's what you 
should use, but many people expect that if select says a socket is 
readable then attempting to recieve that data will not block. Many people 
refer to Stevens "UNIX Network Programming" to find out how select and 
related networking functions are supposed to behave, and Stevens has this 
to say on page 153 under the heading "Under What Conditions Is a 
Descriptor Ready?" [1] : 

[...]
1. A socket is ready for reading if any of the following four conditions 
is true:

   a. The number of bytes of data in the socker recieve buffer is greater 
      than or equal to the current size of the low-water mark for the 
      socket recieve buffer. A read operation on the socket will not block 
      and will return a value greater than 0 ...

[...]

He's not saying this is specific to either UDP or TCP sockets nor blocking 
or non-blocking sockets, so I suspect that regardless of what POSIX might 
say a lot of people will be reading the above and assume that if select 
says a socket (any type of socket) is ready for reading, then attempting 
to read will not block. 
Using blocking sockets with select in this manner may be silly or 
inefficient, but I think many people expect it to work without the 
subsequent read blocking in any case.


[1] I'm only quoting what I think is relevant to the thread, but I can 
quote the rest if wanted. My copy of the book has ISBN 0-13-490012-X in 
case someone wants to check.


--
Jesper Juhl



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 15:40                                         ` Buddy Lucas
  2004-10-17 16:13                                           ` Lee Revell
  2004-10-17 17:35                                           ` Jesper Juhl
@ 2004-10-17 17:35                                           ` Martijn Sipkema
  2004-10-17 17:33                                             ` Buddy Lucas
  2004-10-17 18:53                                             ` David Schwartz
  2004-10-17 19:21                                           ` Hua Zhong
  3 siblings, 2 replies; 191+ messages in thread
From: Martijn Sipkema @ 2004-10-17 17:35 UTC (permalink / raw)
  To: Buddy Lucas, Lars Marowsky-Bree, David Schwartz,
	Linux-Kernel@Vger. Kernel. Org

From: "Buddy Lucas" <buddy.lucas@gmail.com>
> On Sun, 17 Oct 2004 11:05:09 -0400, Mark Mielke <mark@mark.mielke.cc> wrote:
> > On Sun, Oct 17, 2004 at 04:17:06PM +0200, Buddy Lucas wrote:
> > > On Sun, 17 Oct 2004 15:35:37 +0200, Lars Marowsky-Bree <lmb@suse.de> wrote:
> > > > The SuV spec is actually quite detailed about the options here:
> > > >         A descriptor shall be considered ready for reading when a call
> > > >         to an input function with O_NONBLOCK clear would not block,
> > > >         whether or not the function would transfer data successfully.
> > > >         (The function might return data, an end-of-file indication, or
> > > >         an error other than one indicating that it is blocked, and in
> > > >         each of these cases the descriptor shall be considered ready for
> > > >         reading.)
> > > But it says nowhere that the select()/recvmsg() operation is atomic, right?
> > 
> > This is a distraction. If the call to select() had been substituted
> > with a call to recvmsg(), it would have blocked. Instead, select() is
> > returning 'yes, you can read', and then recvmsg() is blocking. The
> > select() lied. The information is all sitting in the kernel packet
> 
> No. A million things might happen between select() and recvmsg(), both
> in kernel and application. For a consistent behaviour throughout all
> possibilities, you *have* to assume that any read on a blocking fd may
> block, and that a fd ready for reading at select() time might not be
> readable once the app gets to recvmsg() -- for whatever reason.

It is perfectly possible to not have a million things happen between
select() and recvmsg() and POSIX defines what can happen and what
can't; it states that a process calling select() on a socket will not block
on a subsequent recvmsg() on that socket.

> And indeed, that implies that select() on blocking fds is generally
> not useful if you expect to bypass the blocking through select().
> Personally,  I think any application that implements this expectation
> is broken. (If only because you might have to do a second read() or
> recvmsg() which will either result in a crappy select() loop or a
> broken read()/recvmsg() loop).

The way select() is defined in POSIX effectively means that once an
application has done a select() on a socket, the data that caused 
select() to return is committed, i.e. it can no longer be dropped and
should be considered received by the application; this has nothing
to do with UDP being unreliable and being unreliable for the sake
of it is not what UDP was meant for.

Whether you think an application that is written to use select() as
defined in POSIX is broken is not really important. The fact remains
that Linux currently implements a select() that is _not_ POSIX
compliant and is so solely for performance reasons. I personally think
correct behaviour is much more important.


--ms


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 17:22                                       ` Lars Marowsky-Bree
@ 2004-10-17 17:54                                         ` Buddy Lucas
  2004-10-17 18:05                                           ` Lars Marowsky-Bree
  2004-10-17 18:06                                           ` Mark Mielke
  0 siblings, 2 replies; 191+ messages in thread
From: Buddy Lucas @ 2004-10-17 17:54 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: David Schwartz, Linux-Kernel@Vger. Kernel. Org

On Sun, 17 Oct 2004 19:22:44 +0200, Lars Marowsky-Bree <lmb@suse.de> wrote:
> On 2004-10-17T16:17:06, Buddy Lucas <buddy.lucas@gmail.com> wrote:
> 
> > > The SuV spec is actually quite detailed about the options here:
> > >
> > >         A descriptor shall be considered ready for reading when a call
> > >         to an input function with O_NONBLOCK clear would not block,
> > >         whether or not the function would transfer data successfully.
> > >         (The function might return data, an end-of-file indication, or
> > >         an error other than one indicating that it is blocked, and in
> > >         each of these cases the descriptor shall be considered ready for
> > >         reading.)
> > But it says nowhere that the select()/recvmsg() operation is atomic, right?
> 
> See, Buddy, the point here is that Linux _does_ violate the
> specification. You can try weaseling out of it, but it's not going to
> work.

Sigh. Read the quote to which I responded again. Not a word about
atomicity. Nowhere does it say that a descriptor which was ready for
reading at select() time is still readable at recvmsg() time. There is
no doubt that it would be very nice if select() would say something
useful, but that's not the issue here.

> This isn't per se the same as saying that it's not a sensible violation,
> but very clearly the specs disagree with the current Linux behaviour.

So document it.

> It's impossible to claim that you are allowed by the spec to block on a
> recvmsg directly following a successful select. You are not. You could
> claim that, but you'd be wrong.

Empty statement.

> If the packet has been dropped in between, which _could_ have happened
> because UDP is allowed to be dropped basically anywhere, EIO may be
> returned. But blocking or returning EAGAIN/EWOULDBLOCK is verboten. The
> spec is very clearly on that.

Obviously returning EAGAIN/EWOULDBLOCK while reading from a blocking
fd is not what we want (in the situation at hand). I don't see how it
relates to the discussion.

> (Now I'd claim that returning EIO after a succesful select is also
> slightly suboptimal - the performance optimizations should be turned off
> for blocking sockets, IMHO, and the data which caused the select() to
> return should be considered comitted - but it would be allowed.)
> I'm not so sure what's so hard to accept about that. It may be well that
> Linux is following the de-facto industry standard (or even setting it)
> here, and I'd agree that if you don't want blocking use O_NONBLOCK, but
> in no way can Linux claim POSIX/SuV spec compliance for this behaviour.

It doesn't.


Cheers,
Buddy

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 17:35                                           ` Jesper Juhl
@ 2004-10-17 18:04                                             ` Buddy Lucas
  2004-10-17 18:06                                               ` Lars Marowsky-Bree
  0 siblings, 1 reply; 191+ messages in thread
From: Buddy Lucas @ 2004-10-17 18:04 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: Lars Marowsky-Bree, David Schwartz, Linux-Kernel@Vger. Kernel. Org

On Sun, 17 Oct 2004 19:35:03 +0200 (CEST), Jesper Juhl <juhl-lkml@dif.dk> wrote:

> Personally I agree that if you want non blocking sockets that's what you
> should use, but many people expect that if select says a socket is
> readable then attempting to recieve that data will not block. Many people
> refer to Stevens "UNIX Network Programming" to find out how select and
> related networking functions are supposed to behave, and Stevens has this

[ snip ]

Also note the examples that Stevens gives. For instance, he explicitly
checks for EWOULDBLOCK after a read on a nonblocking fd that has been
reported readable by select().


Cheers,
Buddy

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 17:54                                         ` Buddy Lucas
@ 2004-10-17 18:05                                           ` Lars Marowsky-Bree
  2004-10-17 18:06                                           ` Mark Mielke
  1 sibling, 0 replies; 191+ messages in thread
From: Lars Marowsky-Bree @ 2004-10-17 18:05 UTC (permalink / raw)
  To: Buddy Lucas; +Cc: Linux-Kernel@Vger. Kernel. Org

On 2004-10-17T19:54:04, Buddy Lucas <buddy.lucas@gmail.com> wrote:

> Sigh. Read the quote to which I responded again. Not a word about
> atomicity. Nowhere does it say that a descriptor which was ready for
> reading at select() time is still readable at recvmsg() time.

That is the _whole point_ of select() on non-O_NONBLOCK sockets.

> There is no doubt that it would be very nice if select() would say
> something useful, but that's not the issue here.

According to the specs, it says something useful. It doesn't on Linux,
agreed.

> > This isn't per se the same as saying that it's not a sensible violation,
> > but very clearly the specs disagree with the current Linux behaviour.
> So document it.

That's one way of doing it, yes.

> > If the packet has been dropped in between, which _could_ have happened
> > because UDP is allowed to be dropped basically anywhere, EIO may be
> > returned. But blocking or returning EAGAIN/EWOULDBLOCK is verboten. The
> > spec is very clearly on that.
> Obviously returning EAGAIN/EWOULDBLOCK while reading from a blocking
> fd is not what we want (in the situation at hand). I don't see how it
> relates to the discussion.

Others have suggested it in this thread as a possible error code, so
that's how it relates to this discussion. Surprise ;-)

> > I'm not so sure what's so hard to accept about that. It may be well that
> > Linux is following the de-facto industry standard (or even setting it)
> > here, and I'd agree that if you don't want blocking use O_NONBLOCK, but
> > in no way can Linux claim POSIX/SuV spec compliance for this behaviour.
> It doesn't.

*sigh* According to the man pages and the Linux Standard Base it does,
and it has been claimed repeatedly in this thread too.

Please get your facts straight.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX AG - A Novell company


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 18:04                                             ` Buddy Lucas
@ 2004-10-17 18:06                                               ` Lars Marowsky-Bree
  2004-10-17 18:21                                                 ` Buddy Lucas
  2004-10-17 20:04                                                 ` Martijn Sipkema
  0 siblings, 2 replies; 191+ messages in thread
From: Lars Marowsky-Bree @ 2004-10-17 18:06 UTC (permalink / raw)
  To: Buddy Lucas, Jesper Juhl; +Cc: David Schwartz, Linux-Kernel@Vger. Kernel. Org

On 2004-10-17T20:04:21, Buddy Lucas <buddy.lucas@gmail.com> wrote:

> [ snip ]
> 
> Also note the examples that Stevens gives. For instance, he explicitly
> checks for EWOULDBLOCK after a read on a nonblocking fd that has been
> reported readable by select().

The specs don't disagree with that. On a O_NONBLOCK socket, that is
allowed.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX AG - A Novell company


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 17:54                                         ` Buddy Lucas
  2004-10-17 18:05                                           ` Lars Marowsky-Bree
@ 2004-10-17 18:06                                           ` Mark Mielke
  1 sibling, 0 replies; 191+ messages in thread
From: Mark Mielke @ 2004-10-17 18:06 UTC (permalink / raw)
  To: Buddy Lucas
  Cc: Lars Marowsky-Bree, David Schwartz, Linux-Kernel@Vger. Kernel. Org

On Sun, Oct 17, 2004 at 07:54:04PM +0200, Buddy Lucas wrote:
> > This isn't per se the same as saying that it's not a sensible violation,
> > but very clearly the specs disagree with the current Linux behaviour.
> So document it.

This is all I'm expecting to see. :-)

Buddy: You took the opposite stance, but still stuck to the truth. I give
you points for that.

Cheers,
mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 18:06                                               ` Lars Marowsky-Bree
@ 2004-10-17 18:21                                                 ` Buddy Lucas
  2004-10-17 20:04                                                 ` Martijn Sipkema
  1 sibling, 0 replies; 191+ messages in thread
From: Buddy Lucas @ 2004-10-17 18:21 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: Jesper Juhl, David Schwartz, Linux-Kernel@Vger. Kernel. Org

On Sun, 17 Oct 2004 20:06:29 +0200, Lars Marowsky-Bree <lmb@suse.de> wrote:
> On 2004-10-17T20:04:21, Buddy Lucas <buddy.lucas@gmail.com> wrote:
> 
> > [ snip ]
> >
> > Also note the examples that Stevens gives. For instance, he explicitly
> > checks for EWOULDBLOCK after a read on a nonblocking fd that has been
> > reported readable by select().
> 
> The specs don't disagree with that. On a O_NONBLOCK socket, that is
> allowed.

I think the specs got to you, man!


Cheers,
Buddy

^ permalink raw reply	[flat|nested] 191+ messages in thread

* RE: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 17:35                                           ` Martijn Sipkema
  2004-10-17 17:33                                             ` Buddy Lucas
@ 2004-10-17 18:53                                             ` David Schwartz
  2004-10-17 19:26                                               ` Hua Zhong
  2004-10-17 20:32                                               ` Martijn Sipkema
  1 sibling, 2 replies; 191+ messages in thread
From: David Schwartz @ 2004-10-17 18:53 UTC (permalink / raw)
  To: martijn, Linux-Kernel@Vger. Kernel. Org


> It is perfectly possible to not have a million things happen between
> select() and recvmsg() and POSIX defines what can happen and what
> can't; it states that a process calling select() on a socket will
> not block
> on a subsequent recvmsg() on that socket.

	I'm sorry, that's an absolutely preposterous view. For one thing, Linux
violates this by allowing processes and threads to share file descriptors
(since another process can steal the data before the call to 'recvmsg'). Oh
well, I guess we'll have to take that out if we want to comply with POSIX on
'select' semantics.

> The way select() is defined in POSIX effectively means that once an
> application has done a select() on a socket, the data that caused
> select() to return is committed, i.e. it can no longer be dropped and
> should be considered received by the application; this has nothing
> to do with UDP being unreliable and being unreliable for the sake
> of it is not what UDP was meant for.

	Again, I think this is an absurd reading of the standard. No other status
function provides a future guarantee. And it's semantically ugly to have
'select' change the status of network data when it's purely intended to be a
'get status' function.

> Whether you think an application that is written to use select() as
> defined in POSIX is broken is not really important. The fact remains
> that Linux currently implements a select() that is _not_ POSIX
> compliant and is so solely for performance reasons. I personally think
> correct behaviour is much more important.

	This is only because you interpret the standard as providing a future
guarantee that it is literally impossible for any modern operating system to
provide. I certainly don't interpret the standard that way. Look up the word
'would' in the dictionary.

	Linux does in fact make the decision to discard the data *after* the call
to 'select'. This is not in any way different from another process that
shared the file descriptor consuming the data after the call to 'select'.

	DS



^ permalink raw reply	[flat|nested] 191+ messages in thread

* RE: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 15:40                                         ` Buddy Lucas
                                                             ` (2 preceding siblings ...)
  2004-10-17 17:35                                           ` Martijn Sipkema
@ 2004-10-17 19:21                                           ` Hua Zhong
  3 siblings, 0 replies; 191+ messages in thread
From: Hua Zhong @ 2004-10-17 19:21 UTC (permalink / raw)
  To: 'Buddy Lucas', 'Lars Marowsky-Bree',
	'David Schwartz',
	'Linux-Kernel@Vger. Kernel. Org'

> > This is a distraction. If the call to select() had been substituted
> > with a call to recvmsg(), it would have blocked. Instead, 
> select() is
> > returning 'yes, you can read', and then recvmsg() is blocking. The
> > select() lied. The information is all sitting in the kernel packet
> 
> No. A million things might happen between select() and recvmsg(), both
> in kernel and application. For a consistent behaviour throughout all
> possibilities, you *have* to assume that any read on a blocking fd may
> block.

Care to provide a real example?

UDP isn't one. It was done for performance reasons as David admitted and it
could very well be done otherwise: do the checksum before select returns.

David has admitted the only reason Linux chose to do so is performance.

It might be the case that a million things might happen between select and
recvmsg, but none of them, as I can see, *have* to force Linux to work this
way. The only reason as I can see is performance and imlementation
convenience.

Hua


^ permalink raw reply	[flat|nested] 191+ messages in thread

* RE: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 18:53                                             ` David Schwartz
@ 2004-10-17 19:26                                               ` Hua Zhong
  2004-10-17 20:32                                               ` Martijn Sipkema
  1 sibling, 0 replies; 191+ messages in thread
From: Hua Zhong @ 2004-10-17 19:26 UTC (permalink / raw)
  To: davids, martijn, 'Linux-Kernel@Vger. Kernel. Org'

> 	I'm sorry, that's an absolutely preposterous view. For 
> one thing, Linux violates this by allowing processes and 
> threads to share file descriptors (since another process 
> can steal the data before the call to 'recvmsg'). Oh
> well, I guess we'll have to take that out if we want to 
> comply with POSIX on 'select' semantics.

Another typical fake argument. Do not mix kernel and user space problems.

The standard says the [following] recvmsg would not block, not the following
recvmsg *from the same* thread would not block.

Hua



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 19:58                                               ` Martijn Sipkema
@ 2004-10-17 19:33                                                 ` Buddy Lucas
  2004-10-17 20:11                                                   ` Lars Marowsky-Bree
  2004-10-17 20:42                                                   ` Martijn Sipkema
  0 siblings, 2 replies; 191+ messages in thread
From: Buddy Lucas @ 2004-10-17 19:33 UTC (permalink / raw)
  To: Martijn Sipkema
  Cc: Lars Marowsky-Bree, David Schwartz, Linux-Kernel@Vger. Kernel. Org

On Sun, 17 Oct 2004 20:58:39 +0100, Martijn Sipkema <martijn@entmoot.nl> wrote:
> >
> > But then I am one of those who thinks it's sane to check for
> > EWOULDBLOCK on a nonblocking socket after blocking in select().
> 
> A POSIX comliant implementation would never do this.

Here's your own quote, from a couple of hundred mails ago:

> According to POSIX:

> A descriptor shall be considered ready for reading when a call to an
>  input function with O_NONBLOCK clear would not block, whether or not
>  the function would transfer data successfully.

You concluded from this that, if select() says a descriptor is
readable, the subsequent recvmsg() must not block. The point is, from
your quote I cannot deduct anything but: a recvmsg() on a descriptor
that is readable must not block -- which makes perfect sense.

But unless POSIX also says something about the conservability of
"readability" of descriptors, specifically in between select() and
recvmsg(), your conclusion is just wrong.

> > Let's just document this and move on to something more important.
> 
> It actually _is_ important. Just implement select() and recvmsg() as
> described in the standard.

I am very glad Linux makes sane decisions while trying to adhere to
the standards as much as possible.


Cheers,
Buddy

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 17:33                                             ` Buddy Lucas
@ 2004-10-17 19:58                                               ` Martijn Sipkema
  2004-10-17 19:33                                                 ` Buddy Lucas
  0 siblings, 1 reply; 191+ messages in thread
From: Martijn Sipkema @ 2004-10-17 19:58 UTC (permalink / raw)
  To: Buddy Lucas
  Cc: Lars Marowsky-Bree, David Schwartz, Linux-Kernel@Vger. Kernel. Org

From: "Buddy Lucas" <buddy.lucas@gmail.com>
> On Sun, 17 Oct 2004 18:35:34 +0100, Martijn Sipkema <martijn@entmoot.nl> wrote:
> > From: "Buddy Lucas" <buddy.lucas@gmail.com>
> > > On Sun, 17 Oct 2004 11:05:09 -0400, Mark Mielke <mark@mark.mielke.cc> wrote:
> > > > On Sun, Oct 17, 2004 at 04:17:06PM +0200, Buddy Lucas wrote:
> > > > > On Sun, 17 Oct 2004 15:35:37 +0200, Lars Marowsky-Bree <lmb@suse.de> wrote:
> > > > > > The SuV spec is actually quite detailed about the options here:
> > > > > >         A descriptor shall be considered ready for reading when a call
> > > > > >         to an input function with O_NONBLOCK clear would not block,
> > > > > >         whether or not the function would transfer data successfully.
> > > > > >         (The function might return data, an end-of-file indication, or
> > > > > >         an error other than one indicating that it is blocked, and in
> > > > > >         each of these cases the descriptor shall be considered ready for
> > > > > >         reading.)
> > > > > But it says nowhere that the select()/recvmsg() operation is atomic, right?
> > > >
> > > > This is a distraction. If the call to select() had been substituted
> > > > with a call to recvmsg(), it would have blocked. Instead, select() is
> > > > returning 'yes, you can read', and then recvmsg() is blocking. The
> > > > select() lied. The information is all sitting in the kernel packet
> > >
> > > No. A million things might happen between select() and recvmsg(), both
> > > in kernel and application. For a consistent behaviour throughout all
> > > possibilities, you *have* to assume that any read on a blocking fd may
> > > block, and that a fd ready for reading at select() time might not be
> > > readable once the app gets to recvmsg() -- for whatever reason.
> > 
> > It is perfectly possible to not have a million things happen between
> > select() and recvmsg() and POSIX defines what can happen and what
> > can't; it states that a process calling select() on a socket will not block
> > on a subsequent recvmsg() on that socket.
> > 
> > > And indeed, that implies that select() on blocking fds is generally
> > > not useful if you expect to bypass the blocking through select().
> > > Personally,  I think any application that implements this expectation
> > > is broken. (If only because you might have to do a second read() or
> > > recvmsg() which will either result in a crappy select() loop or a
> > > broken read()/recvmsg() loop).
> > 
> > The way select() is defined in POSIX effectively means that once an
> > application has done a select() on a socket, the data that caused
> > select() to return is committed, i.e. it can no longer be dropped and
> > should be considered received by the application; this has nothing
> 
> That is plainly wrong. Data is never received by an application before
> recvmsg() has succeeded.

I didn't say it was, but that from the view of the UDP protocol it is, i.e.
a UDP packet can not be dropped from that point onwards.

> > to do with UDP being unreliable and being unreliable for the sake
> > of it is not what UDP was meant for.
> > 
> > Whether you think an application that is written to use select() as
> > defined in POSIX is broken is not really important. The fact remains
> > that Linux currently implements a select() that is _not_ POSIX
> > compliant and is so solely for performance reasons. I personally think
> > correct behaviour is much more important.
> 
> All I'm saying is, that applications that are not correct now, will
> probably not be correct even if we change the way Linux handles this
> situation. The sanest thing really seems to accept the fact that any
> read() on a blocking fd might block, even if the programmer thinks it
> really shouldn't.
> 
> But then I am one of those who thinks it's sane to check for
> EWOULDBLOCK on a nonblocking socket after blocking in select().

A POSIX comliant implementation would never do this.

> Let's just document this and move on to something more important.

It actually _is_ important. Just implement select() and recvmsg() as
described in the standard.


--ms


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 20:42                                                   ` Martijn Sipkema
@ 2004-10-17 20:02                                                     ` Buddy Lucas
  0 siblings, 0 replies; 191+ messages in thread
From: Buddy Lucas @ 2004-10-17 20:02 UTC (permalink / raw)
  To: Martijn Sipkema
  Cc: Lars Marowsky-Bree, David Schwartz, Linux-Kernel@Vger. Kernel. Org

On Sun, 17 Oct 2004 21:42:04 +0100, Martijn Sipkema <martijn@entmoot.nl> wrote:
> From: "Buddy Lucas" <buddy.lucas@gmail.com>
> > On Sun, 17 Oct 2004 20:58:39 +0100, Martijn Sipkema <martijn@entmoot.nl> wrote:
> > > >
> > > > But then I am one of those who thinks it's sane to check for
> > > > EWOULDBLOCK on a nonblocking socket after blocking in select().
> > >
> > > A POSIX comliant implementation would never do this.
> >
> > Here's your own quote, from a couple of hundred mails ago:
> >
> > > According to POSIX:
> >
> > > A descriptor shall be considered ready for reading when a call to an
> > >  input function with O_NONBLOCK clear would not block, whether or not
> > >  the function would transfer data successfully.
> >
> > You concluded from this that, if select() says a descriptor is
> > readable, the subsequent recvmsg() must not block. The point is, from
> > your quote I cannot deduct anything but: a recvmsg() on a descriptor
> > that is readable must not block -- which makes perfect sense.
> >
> > But unless POSIX also says something about the conservability of
> > "readability" of descriptors, specifically in between select() and
> > recvmsg(), your conclusion is just wrong.
> 
> But with the current Linux implementation it is possible that a call to select()
> returns while a call to recvmsg() would have blocked, and that is not
> correct according to POSIX.

Come on. Those are different things. (If POSIX says this explicitly,
it's even more broken than I thought.)

> And I indeed read this as meaning that the first recvmsg() after the select()
> may not block.

Well, then you *are* wrong. ;-)


Cheers,
Buddy

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 18:06                                               ` Lars Marowsky-Bree
  2004-10-17 18:21                                                 ` Buddy Lucas
@ 2004-10-17 20:04                                                 ` Martijn Sipkema
  2004-10-17 20:08                                                   ` Lars Marowsky-Bree
  1 sibling, 1 reply; 191+ messages in thread
From: Martijn Sipkema @ 2004-10-17 20:04 UTC (permalink / raw)
  To: Lars Marowsky-Bree, Buddy Lucas, Jesper Juhl
  Cc: David Schwartz, Linux-Kernel@Vger. Kernel. Org

From: "Lars Marowsky-Bree" <lmb@suse.de>
>> [ snip ]
>> 
>> Also note the examples that Stevens gives. For instance, he explicitly
>> checks for EWOULDBLOCK after a read on a nonblocking fd that has been
>> reported readable by select().
>
> The specs don't disagree with that. On a O_NONBLOCK socket, that is
> allowed.

No, it isn't. select() may not behave differently based on the O_NONBLOCK
flag at the moment of the select() call. And if a call to recvmsg() with O_NONBLOCK
cleared doesn't block and since it can't return EAGAIN, then I don't think a recvmsg()
call with O_NONBLOCK set should return EAGAIN where something like
EIO should have been returned otherwise.


--ms


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 20:04                                                 ` Martijn Sipkema
@ 2004-10-17 20:08                                                   ` Lars Marowsky-Bree
  0 siblings, 0 replies; 191+ messages in thread
From: Lars Marowsky-Bree @ 2004-10-17 20:08 UTC (permalink / raw)
  To: Martijn Sipkema, Buddy Lucas, Jesper Juhl
  Cc: David Schwartz, Linux-Kernel@Vger. Kernel. Org

On 2004-10-17T21:04:40, Martijn Sipkema <martijn@entmoot.nl> wrote:

> > The specs don't disagree with that. On a O_NONBLOCK socket, that is
> > allowed.
> No, it isn't. select() may not behave differently based on the O_NONBLOCK
> flag at the moment of the select() call. 

Yes it may. Though this is getting nitpicking; please re-read:

"A descriptor shall be considered ready for reading when a call to an
input function with O_NONBLOCK clear would not block, whether or not the
function would transfer data successfully. (The function might return
data, an end-of-file indication, or an error other than one indicating
that it is blocked, and in each of these cases the descriptor shall be
considered ready for reading.)"

No claim is made for O_NONBLOCK set; so in that case we can do something
sane.

> And if a call to recvmsg() with O_NONBLOCK cleared doesn't block and
> since it can't return EAGAIN, then I don't think a recvmsg() call with
> O_NONBLOCK set should return EAGAIN where something like EIO should
> have been returned otherwise.

Point taken. For UDP, returning EIO in both cases is just fine (it's
actually also correct, for a checksum error constitutes a "network read
error"...). At least according to the spec.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX AG - A Novell company


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 19:33                                                 ` Buddy Lucas
@ 2004-10-17 20:11                                                   ` Lars Marowsky-Bree
  2004-10-17 20:25                                                     ` Buddy Lucas
  2004-10-17 20:42                                                   ` Martijn Sipkema
  1 sibling, 1 reply; 191+ messages in thread
From: Lars Marowsky-Bree @ 2004-10-17 20:11 UTC (permalink / raw)
  To: Buddy Lucas, Martijn Sipkema
  Cc: David Schwartz, Linux-Kernel@Vger. Kernel. Org

On 2004-10-17T21:33:27, Buddy Lucas <buddy.lucas@gmail.com> wrote:

> You concluded from this that, if select() says a descriptor is
> readable, the subsequent recvmsg() must not block. The point is, from
> your quote I cannot deduct anything but: a recvmsg() on a descriptor
> that is readable must not block -- which makes perfect sense.
> 
> But unless POSIX also says something about the conservability of
> "readability" of descriptors, specifically in between select() and
> recvmsg(), your conclusion is just wrong.

What kind of idiotic (and most of all, wrong) hairsplitting are you
doing here, for heaven's sake? That's obviously exactly what the
standard implies.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX AG - A Novell company


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 20:11                                                   ` Lars Marowsky-Bree
@ 2004-10-17 20:25                                                     ` Buddy Lucas
  0 siblings, 0 replies; 191+ messages in thread
From: Buddy Lucas @ 2004-10-17 20:25 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: Martijn Sipkema, David Schwartz, Linux-Kernel@Vger. Kernel. Org

On Sun, 17 Oct 2004 22:11:18 +0200, Lars Marowsky-Bree <lmb@suse.de> wrote:
> On 2004-10-17T21:33:27, Buddy Lucas <buddy.lucas@gmail.com> wrote:
> 
> > You concluded from this that, if select() says a descriptor is
> > readable, the subsequent recvmsg() must not block. The point is, from
> > your quote I cannot deduct anything but: a recvmsg() on a descriptor
> > that is readable must not block -- which makes perfect sense.
> >
> > But unless POSIX also says something about the conservability of
> > "readability" of descriptors, specifically in between select() and
> > recvmsg(), your conclusion is just wrong.
> 
> What kind of idiotic (and most of all, wrong) hairsplitting are you
> doing here, for heaven's sake? That's obviously exactly what the
> standard implies.

Take this discussion off-list please.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 18:53                                             ` David Schwartz
  2004-10-17 19:26                                               ` Hua Zhong
@ 2004-10-17 20:32                                               ` Martijn Sipkema
  1 sibling, 0 replies; 191+ messages in thread
From: Martijn Sipkema @ 2004-10-17 20:32 UTC (permalink / raw)
  To: davids, Linux-Kernel@Vger. Kernel. Org

From: "David Schwartz" <davids@webmaster.com>
> > It is perfectly possible to not have a million things happen between
> > select() and recvmsg() and POSIX defines what can happen and what
> > can't; it states that a process calling select() on a socket will
> > not block
> > on a subsequent recvmsg() on that socket.
> 
> I'm sorry, that's an absolutely preposterous view. For one thing, Linux
> violates this by allowing processes and threads to share file descriptors
> (since another process can steal the data before the call to 'recvmsg'). Oh
> well, I guess we'll have to take that out if we want to comply with POSIX on
> 'select' semantics.

I don't think this is comparable, see below.

> > The way select() is defined in POSIX effectively means that once an
> > application has done a select() on a socket, the data that caused
> > select() to return is committed, i.e. it can no longer be dropped and
> > should be considered received by the application; this has nothing
> > to do with UDP being unreliable and being unreliable for the sake
> > of it is not what UDP was meant for.
> 
> Again, I think this is an absurd reading of the standard. No other status
> function provides a future guarantee. And it's semantically ugly to have
> 'select' change the status of network data when it's purely intended to be a
> 'get status' function.

It not about a future guarantee; the information as to whether the data
is corrupt or not is available at the time when select() is called and POSIX
requires it to be.

You _chose_ to implement your select() in a way that is not POSIX
compliant.

> > Whether you think an application that is written to use select() as
> > defined in POSIX is broken is not really important. The fact remains
> > that Linux currently implements a select() that is _not_ POSIX
> > compliant and is so solely for performance reasons. I personally think
> > correct behaviour is much more important.
> 
> This is only because you interpret the standard as providing a future
> guarantee that it is literally impossible for any modern operating system to
> provide.

Hardly so.

> I certainly don't interpret the standard that way. Look up the word
> 'would' in the dictionary.

would is meant here as "if you were to call recvmsg(), then" and not as
"may or may not..".

Your interpretation of select() is that it merely provides a hint that the
socket may be ready; that may be convenient for you, but it is not what
POSIX describes.

> Linux does in fact make the decision to discard the data *after* the call
> to 'select'. This is not in any way different from another process that
> shared the file descriptor consuming the data after the call to 'select'.

I think it is different. The first recvmsg() from one of these processes
doesn't block; it is the recvmsg() on that file descriptor that select()
guarantees will not block.


--ms



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 19:33                                                 ` Buddy Lucas
  2004-10-17 20:11                                                   ` Lars Marowsky-Bree
@ 2004-10-17 20:42                                                   ` Martijn Sipkema
  2004-10-17 20:02                                                     ` Buddy Lucas
  1 sibling, 1 reply; 191+ messages in thread
From: Martijn Sipkema @ 2004-10-17 20:42 UTC (permalink / raw)
  To: Buddy Lucas
  Cc: Lars Marowsky-Bree, David Schwartz, Linux-Kernel@Vger. Kernel. Org

From: "Buddy Lucas" <buddy.lucas@gmail.com>
> On Sun, 17 Oct 2004 20:58:39 +0100, Martijn Sipkema <martijn@entmoot.nl> wrote:
> > >
> > > But then I am one of those who thinks it's sane to check for
> > > EWOULDBLOCK on a nonblocking socket after blocking in select().
> > 
> > A POSIX comliant implementation would never do this.
> 
> Here's your own quote, from a couple of hundred mails ago:
> 
> > According to POSIX:
> 
> > A descriptor shall be considered ready for reading when a call to an
> >  input function with O_NONBLOCK clear would not block, whether or not
> >  the function would transfer data successfully.
> 
> You concluded from this that, if select() says a descriptor is
> readable, the subsequent recvmsg() must not block. The point is, from
> your quote I cannot deduct anything but: a recvmsg() on a descriptor
> that is readable must not block -- which makes perfect sense.
> 
> But unless POSIX also says something about the conservability of
> "readability" of descriptors, specifically in between select() and
> recvmsg(), your conclusion is just wrong.

But with the current Linux implementation it is possible that a call to select()
returns while a call to recvmsg() would have blocked, and that is not
correct according to POSIX.

And I indeed read this as meaning that the first recvmsg() after the select()
may not block.

> > > Let's just document this and move on to something more important.
> > 
> > It actually _is_ important. Just implement select() and recvmsg() as
> > described in the standard.
> 
> I am very glad Linux makes sane decisions while trying to adhere to
> the standards as much as possible.

I don't think this is ``trying to adhere to the standards''... not by a long shot.


--ms


^ permalink raw reply	[flat|nested] 191+ messages in thread

* RE: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-16 10:24                     ` Willy Tarreau
  2004-10-16 13:21                       ` Mark Mielke
@ 2004-10-18 22:25                       ` Robert White
  1 sibling, 0 replies; 191+ messages in thread
From: Robert White @ 2004-10-18 22:25 UTC (permalink / raw)
  To: 'Willy Tarreau'
  Cc: 'David S. Miller', 'Olivier Galibert', linux-kernel

Sorry, I was thinking in the generic case of "a protocol read() after select()" and
not specifically a UDP read() after select(); as any semantic chosen needs to be
generic.  That may be out of scope for your considerations.

If you allow for zero-length packets on the transport, then you are introducing a
semantic entity to silently displace a procedural entity [e.g. if your messaging
scheme allows for zero-length packets and you have the kernel "fake" a zero-length
packet, then the kernel is triggering semantics instead of and error recognition
process; if zero-length packets are impossible, then the kernel is creating a "new"
return condition with unforeseen consequences in all existing code.]

If your process is *not* prepared to deal with a zero-length return from a receive
message, then you will get a semantic error. [e.g. you "know" that all the packets
you receive have a certain structure, but here you are "receiving" a non-error even
that is outside of your semantic set and so not allowed-for in your existing state.
Etc.]

For every other file handle zero-length read is end of file.  So there is this "well
established" semantic meaning for "if ( 0 == read(fd,...))" and you are proposing to
non-trivially create a one-off for the specific case of fd==UDP-socket.  So now if
you pass this file handle through a generic mechanism then you break the generic
semantics by creating a "different class of files" where a state problem leads to the
generation of an "in-band, originless, valid receive event" that is completely
dissimilar to the expected meaning of the return value from a standard function call.

Basically, if it is possible to send and receive a zero-length message in a
connectionless protocol, you are _stealing_ the possible semantic meaning of that
message and retroactively claiming it as a signal from the kernel to the program.  IF
it isn't already possible to send and receive a zero-length message in that
connectionless transport, then you are adding a semantic that all the existing code
may be completely unable to interpret, or which may "trick" applications into
deciding they are getting the end-of-file condition because they don't know or care
that the transport in question is UDP.

So if you have a generic handling mechanism, centered on select(), that "knows" that
if it sees 0==read(...) then it should close the file handle, and if that mechanism
is given sockets conforming to this proposed modification, then that mechanism will
break.

[I *am* out of my depth about whether UDP allows zero-length messages, it has never
come up for me, but I don't think it matters.  If it isn't UDP legal, then you are
adding semantic.  If it is legal, then you are overloading known semantic.  Both
actions are surprising, so both are wrong.]

So returning zero from a read function on a file descriptor that "can not"
meaningfully know end-of-file (because it doesn't FIN etc) is still a very bad idea
because of the odd-out cases where it will have "impossible" or at least wildly
incorrect semantic consequences.

Not a good space to be mucking around in.

But that's just my opinion.  And I am now rambling...  8-)

Rob White.


-----Original Message-----
From: Willy Tarreau [mailto:willy@w.ods.org] 
Sent: Saturday, October 16, 2004 3:25 AM
To: Robert White
Cc: 'David S. Miller'; 'Olivier Galibert'; linux-kernel@vger.kernel.org
Subject: Re: UDP recvmsg blocks after select(), 2.6 bug?

On Fri, Oct 15, 2004 at 03:42:55PM -0700, Robert White wrote:
> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org
[mailto:linux-kernel-owner@vger.kernel.org]
> On Behalf Of Willy Tarreau
> 
> > As I asked in a previous mail in this overly long thread, why not returning
> > zero bytes at all. It is perfectly valid to receive an UDP packet with 0
> 
> Zero bytes is "end of file".  Don't go trying to co-opt end of file.  That way lies
> madness and despair.

Please explain me what "end of file" means with UDP. If your UDP-based app
expects to receive a zero when the other end stops transmitting, then it
might wait for a very long time. As opposed to TCP, there's no FIN control
flag to tell the remote host that you sent your last packet.

Willy


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-17 13:35                                   ` Lars Marowsky-Bree
  2004-10-17 14:17                                     ` Buddy Lucas
@ 2004-10-20 21:31                                     ` H. Peter Anvin
  2004-10-20 21:58                                       ` Chris Friesen
  1 sibling, 1 reply; 191+ messages in thread
From: H. Peter Anvin @ 2004-10-20 21:31 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <20041017133537.GL7468@marowsky-bree.de>
By author:    Lars Marowsky-Bree <lmb@suse.de>
In newsgroup: linux.dev.kernel
> 
> This actually forbids recvmsg() to return EAGAIN and EWOULDBLOCK as
> has been suggested. EIO seems to be the best fit.
> 
> But I'd have to agree that blocking on recvmsg() after select() has
> indicated ready to read does violate the specification, unless the
> socket has actually been opened with O_NONBLOCK.
> 

EIO seems to be The Right Thing[TM]... it pretty much says "yes, we
received something, but it was bad."  What isn't clear to me is how
applications react to EIO.  It could easily be considered a fatal
error... :-/

	-hpa

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-20 21:31                                     ` H. Peter Anvin
@ 2004-10-20 21:58                                       ` Chris Friesen
  2004-10-20 22:00                                         ` H. Peter Anvin
  0 siblings, 1 reply; 191+ messages in thread
From: Chris Friesen @ 2004-10-20 21:58 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

H. Peter Anvin wrote:

> EIO seems to be The Right Thing[TM]... it pretty much says "yes, we
> received something, but it was bad."  What isn't clear to me is how
> applications react to EIO.  It could easily be considered a fatal
> error... :-/

 From an application point of view, The Right Thing would be to do the checksum 
validation at select() time if the socket is blocking.

If it's nonblocking, then just do as we do now and return EAGAIN at recvmsg() time.

This would ensure that all existing apps get the expected semantics, but the 
ones based on blocking sockets would see a performance degredation.

Chris

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-20 21:58                                       ` Chris Friesen
@ 2004-10-20 22:00                                         ` H. Peter Anvin
  2004-10-20 22:12                                           ` Chris Friesen
  2004-10-21  3:01                                           ` Michael Clark
  0 siblings, 2 replies; 191+ messages in thread
From: H. Peter Anvin @ 2004-10-20 22:00 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linux-kernel

Chris Friesen wrote:
> H. Peter Anvin wrote:
> 
>> EIO seems to be The Right Thing[TM]... it pretty much says "yes, we
>> received something, but it was bad."  What isn't clear to me is how
>> applications react to EIO.  It could easily be considered a fatal
>> error... :-/
> 
> 
>  From an application point of view, The Right Thing would be to do the 
> checksum validation at select() time if the socket is blocking.
> 
> If it's nonblocking, then just do as we do now and return EAGAIN at 
> recvmsg() time.
> 
> This would ensure that all existing apps get the expected semantics, but 
> the ones based on blocking sockets would see a performance degredation.
> 

Doing work twice can hardly be considered The Right Thing.

	-hpa

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-20 22:00                                         ` H. Peter Anvin
@ 2004-10-20 22:12                                           ` Chris Friesen
  2004-10-20 23:16                                             ` David Schwartz
  2004-10-21  3:01                                           ` Michael Clark
  1 sibling, 1 reply; 191+ messages in thread
From: Chris Friesen @ 2004-10-20 22:12 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

H. Peter Anvin wrote:

> Doing work twice can hardly be considered The Right Thing.

If the alternative is to have apps that can be DOS'd with a single corrupt 
packet, I think that yes, it is.

Going forward, all apps should use nonblocking sockets, correct?  (With the 
current design, because it's the only way to get correctness, with my proposal 
because it's the only way to get full speed.)

Given that, we then have to worry about the installed base of binaries out there 
that will use blocking sockets.  And it's going to take some time before they 
all convert.  For all those binaries, (which include basic things like syslog, 
inetd, portmap, and statd) the existing kernel behaviour does not match what the 
app is expecting.  With a minor change, we can give the behaviour that the app 
expects, though at a performance penalty.  Once the app switches over to 
nonblocking sockets (which it has to do *anyway* under the current model) then 
it gets full performance.

To summarize:
1) apps have to switch to nonblocking sockets  (for correctness)
2) my proposal makes them work as expected in the meantime, with a performance cost


Chris

^ permalink raw reply	[flat|nested] 191+ messages in thread

* RE: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-20 22:12                                           ` Chris Friesen
@ 2004-10-20 23:16                                             ` David Schwartz
  2004-10-21  1:03                                               ` Chris Friesen
  0 siblings, 1 reply; 191+ messages in thread
From: David Schwartz @ 2004-10-20 23:16 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel


> 2) my proposal makes them work as expected in the meantime, with
> a performance cost
>
> Chris

	Perhaps I missed the details, but under your proposal, how do you predict
at 'select' time what mode the socket will be in at 'recvmsg' time?!

	DS



^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-20 23:16                                             ` David Schwartz
@ 2004-10-21  1:03                                               ` Chris Friesen
  2004-10-21  1:38                                                 ` David Schwartz
  0 siblings, 1 reply; 191+ messages in thread
From: Chris Friesen @ 2004-10-21  1:03 UTC (permalink / raw)
  To: davids; +Cc: H. Peter Anvin, linux-kernel

David Schwartz wrote:

> 	Perhaps I missed the details, but under your proposal, how do you predict
> at 'select' time what mode the socket will be in at 'recvmsg' time?!

Well, if you've got a blocking socket, and do a nonblocking read with 
MSG_DONTWAIT, everything works fine.  You lose a bit of performance, but it works.

The problem case is if you create a socket, set O_NONBLOCK, do select, clear 
O_NONBLOCK, then do a recvmsg().

I suspect it's not a very common thing to do, so my proposal would still help 
the vast majority of existing apps.

Chris

^ permalink raw reply	[flat|nested] 191+ messages in thread

* RE: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-21  1:03                                               ` Chris Friesen
@ 2004-10-21  1:38                                                 ` David Schwartz
  0 siblings, 0 replies; 191+ messages in thread
From: David Schwartz @ 2004-10-21  1:38 UTC (permalink / raw)
  To: cfriesen; +Cc: linux-kernel


> David Schwartz wrote:

> > 	Perhaps I missed the details, but under your proposal, how
> > do you predict
> > at 'select' time what mode the socket will be in at 'recvmsg' time?!

> Well, if you've got a blocking socket, and do a nonblocking read with
> MSG_DONTWAIT, everything works fine.  You lose a bit of
> performance, but it works.
>
> The problem case is if you create a socket, set O_NONBLOCK, do
> select, clear
> O_NONBLOCK, then do a recvmsg().
>
> I suspect it's not a very common thing to do, so my proposal
> would still help
> the vast majority of existing apps.
>
> Chris

	I think this is a reasonable thing to do. Applications that select in one
mode and then operate in another are rare, and the suggested change won't
break anything.

	DS




^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-20 22:00                                         ` H. Peter Anvin
  2004-10-20 22:12                                           ` Chris Friesen
@ 2004-10-21  3:01                                           ` Michael Clark
  2004-10-21  3:52                                             ` Michael Clark
  2004-10-21  4:10                                             ` H. Peter Anvin
  1 sibling, 2 replies; 191+ messages in thread
From: Michael Clark @ 2004-10-21  3:01 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Chris Friesen, linux-kernel

On 10/21/04 06:00, H. Peter Anvin wrote:
> Chris Friesen wrote:
> 
>> H. Peter Anvin wrote:
>>
>>> EIO seems to be The Right Thing[TM]... it pretty much says "yes, we
>>> received something, but it was bad."  What isn't clear to me is how
>>> applications react to EIO.  It could easily be considered a fatal
>>> error... :-/
>>
>>
>>
>>  From an application point of view, The Right Thing would be to do the 
>> checksum validation at select() time if the socket is blocking.
>>
>> If it's nonblocking, then just do as we do now and return EAGAIN at 
>> recvmsg() time.
>>
>> This would ensure that all existing apps get the expected semantics, 
>> but the ones based on blocking sockets would see a performance 
>> degredation.
>>
> 
> Doing work twice can hardly be considered The Right Thing.

Optimisations that break documented interfaces and age old assumptions
can hardly be considered The Right Thing :)

And you only do the checksum once (just earlier), and the copy_to_user
should be cache hot as most of these UDP apps will call recvmesg right
after the select.

That said, an app with many connected sockets will have a high chance
of losing the cache. Although a handful of unconnected UDP sockets
(one per interface) are more common in the use case of a large number
of clients, so in general the performance difference should be minor.

Doing the same amount work (with chance of lower performance because of
cache loss) is good IMHO if it means the choice of a reliable vs an
unreliable interface. You can only take the performance optimisation
argument so far and when these optimisations start breaking interfaces,
i think that's too far ie. what to give up efficiency vs. correctness?

Just my 2c in favour of !O_NONBLOCK early UDP checksum test in select.

~mc

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-21  3:01                                           ` Michael Clark
@ 2004-10-21  3:52                                             ` Michael Clark
  2004-10-21  4:10                                             ` H. Peter Anvin
  1 sibling, 0 replies; 191+ messages in thread
From: Michael Clark @ 2004-10-21  3:52 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1027 bytes --]

Hi All,

I'm actually trying to write a test case to demonstrate
the problem in a repeatable way.

I first looked at socket(PF_INET, SOCK_RAW) to inject the
packets but it appears I have no control over the IP
checksum, and an invalid UDP sum didn't cause any problems
(the packet was accepted fine).

So i've hacked up something with socket(PF_PACKET, SOCK_RAW)
but the problem i'm getting (although tcpdumps of the packets
look perfect), they are not being accepted by my UDP listening
socket. It uses real interfaces, not loopback for the raw
packet injection as the checksumming appears to be bypassed
on the loopback interface (I always see bad UDP checksum
on normally originated packets on the loopback interface).

Anyway, i've probably done something dumb. Anyone care to look
at my test code? The current code sends the packets out on all
interfaces (using the correct associated IP and MAC). Can change
the #if in main() to make it use normal UDP not my cooked packets
(which currently have correct checksums).

~mc

[-- Attachment #2: udptest.c --]
[-- Type: text/x-csrc, Size: 7189 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/select.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/ioctl.h>
#include <netpacket/packet.h>
#include <net/ethernet.h>
#include <netinet/in.h>
#include <netinet/ip.h>
#include <netinet/udp.h>
#include <linux/if.h>


typedef struct cooked_packet {
  struct ether_header ethp;
  struct iphdr ipp;
  struct udphdr udpp;
} __attribute__ ((__packed__)) cooked_packet;


unsigned short ip_cksum(unsigned short *ptr, int nbytes)
{
  int sum;
  unsigned short oddbyte, answer;

  sum = 0;
  while (nbytes > 1)  {
    sum += *ptr++;
    nbytes -= 2;
  }

  /* mop up an odd byte, if necessary */
  if (nbytes == 1) {
    oddbyte = 0;		/* make sure top half is zero */
    *((unsigned char *) &oddbyte) =
      *(unsigned char *)ptr;   /* one byte only */
    sum += oddbyte;
  }

  /*
   * Add back carry outs from top 16 bits to low 16 bits.
   */
  sum  = (sum >> 16) + (sum & 0xffff);	/* add high-16 to low-16 */
  sum += (sum >> 16);			/* add carry */
  answer = ~sum;		/* ones-complement, then truncate to 16 bits */
  return(answer);
}

unsigned short udp_cksum(unsigned short *ptr, int nbytes, in_addr_t *src, in_addr_t *dst)
{
  int i;
  int sum;
  unsigned short oddbyte, answer;

  sum = 0;

  /* add the UDP pseudo header which contains the IP src and
   * destinationn addresses */
  sum += *((unsigned short*)src);
  sum += *((unsigned short*)src+1);
  sum += *((unsigned short*)dst);
  sum += *((unsigned short*)dst+1);

  /* add protocol number + packet length */
  sum += htons(17) + htons(nbytes);

  while (nbytes > 1)  {
    sum += *ptr++;
    nbytes -= 2;
  }

  /* mop up an odd byte, if necessary */
  if (nbytes == 1) {
    oddbyte = 0;		/* make sure top half is zero */
    *((unsigned char *) &oddbyte) =
      *(unsigned char *)ptr;   /* one byte only */
    sum += oddbyte;
  }

  /*
   * Add back carry outs from top 16 bits to low 16 bits.
   */
  sum  = (sum >> 16) + (sum & 0xffff);	/* add high-16 to low-16 */
  sum += (sum >> 16);			/* add carry */
  answer = ~sum;		/* ones-complement, then truncate to 16 bits */
  return(answer);
}

void cook_udp_packet(cooked_packet *p, int len,
		     char *ether_dhost, char *ether_shost,
		     in_addr_t src, in_addr_t dst,
		     unsigned short src_port, unsigned short dst_port)
{
  p->ethp.ether_type = htons(ETH_P_IP);
  memcpy(&p->ethp.ether_dhost, ether_dhost, ETH_ALEN);
  memcpy(&p->ethp.ether_shost, ether_shost, ETH_ALEN);
  p->ipp.version = 0x4;
  p->ipp.ihl = 0x5;
  p->ipp.tos = 0;
  p->ipp.tot_len = htons(sizeof(struct iphdr) + sizeof(struct udphdr) + len);
  p->ipp.id = 0;
  p->ipp.frag_off = htons(0x4000); /* DF */
  p->ipp.ttl = 64;
  p->ipp.protocol = 17; /* UDP */
  p->ipp.check = 0;
  p->ipp.saddr = src;
  p->ipp.daddr = dst;
  p->udpp.source = src_port;
  p->udpp.dest = dst_port;
  p->udpp.len = htons(sizeof(struct udphdr) + len);
  p->udpp.check = 0x0;
  p->udpp.check = udp_cksum((unsigned short*)&p->udpp,
			    sizeof(struct udphdr) + len, &src, &dst);
  p->ipp.check = ip_cksum((unsigned short*)&p->ipp, sizeof(struct ip));
}


void send_raw_udp(unsigned short src_port, unsigned short dst_port,
		  char *payload)
{
  int raw_sock, i;
  struct sockaddr_ll dest_addr;
  struct ifconf ifc;
  struct ifreq *ifr;
  char *packet = malloc(sizeof(cooked_packet) + strlen(payload));

  if((raw_sock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL))) < 0) {
    perror("socket");
    exit(1);
  }

  ifc.ifc_len = 0;
  ifc.ifc_buf = NULL;
  if (ioctl(raw_sock, SIOCGIFCONF, &ifc) < 0) {
    perror("ioctl(SIOCGIFCONF)");
    exit(1);
  }
  ifc.ifc_buf = malloc(ifc.ifc_len);
  if (ioctl(raw_sock, SIOCGIFCONF, &ifc) < 0) {
    perror("ioctl(SIOCGIFCONF)");
    exit(1);
  }

  /* send the packet out on all interfaces - just while testing */
  ifr = ifc.ifc_req;
  for (i = ifc.ifc_len / sizeof(struct ifreq); --i >= 0; ifr++) {
      struct ifreq ixifr, ipifr, hwifr;
      struct sockaddr_in *ipaddr;
      char *hw;

      strcpy(ixifr.ifr_name, ifr->ifr_name);
      if(ioctl(raw_sock, SIOCGIFINDEX, &ixifr) < 0) continue;
      strcpy(ipifr.ifr_name, ifr->ifr_name);
      if(ioctl(raw_sock, SIOCGIFADDR, &ipifr) < 0) continue;
      ipaddr = (struct sockaddr_in*)&ipifr.ifr_addr;
      strcpy(hwifr.ifr_name, ifr->ifr_name);
      if(ioctl(raw_sock, SIOCGIFHWADDR, &hwifr) < 0) continue;
      hw = (char*)&hwifr.ifr_hwaddr.sa_data;

      printf("sending via %s addr=%s hw=%02X:%02X:%02X:%02X:%02X:%02X\n",
	     ifr->ifr_name, inet_ntoa(ipaddr->sin_addr),
	     (hw[0] & 0377), (hw[1] & 0377), (hw[2] & 0377),
	     (hw[3] & 0377), (hw[4] & 0377), (hw[5] & 0377));
      memcpy(packet + sizeof(cooked_packet), payload, strlen(payload)+1);
      cook_udp_packet((cooked_packet*)packet, strlen(payload)+1,
		      hw, hw, ipaddr->sin_addr.s_addr, ipaddr->sin_addr.s_addr,
		      src_port, dst_port);

      memset(&dest_addr, 0, sizeof(dest_addr));
      dest_addr.sll_family = AF_PACKET;
      dest_addr.sll_ifindex = ixifr.ifr_ifindex;

      sendto(raw_sock, packet, sizeof(cooked_packet) + strlen(payload)+1,
	     0, (struct sockaddr*)&dest_addr, sizeof(dest_addr));
  }
  close(raw_sock);
}

void send_normal_udp(unsigned short src_port, unsigned short dst_port,
		     char *payload)
{
  int udp_sock;
  struct sockaddr_in src_addr, dest_addr;

  src_addr.sin_family = AF_INET;
  src_addr.sin_port = src_port;
  src_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);

  dest_addr.sin_family = AF_INET;
  dest_addr.sin_port = dst_port;
  dest_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);

  if((udp_sock = socket(PF_INET, SOCK_DGRAM, 0)) < 0) {
    perror("socket");
    exit(1);
  }

  if(bind(udp_sock, (struct sockaddr*)&src_addr, sizeof(src_addr)) < 0 ) {
    perror("bind");
    exit(1);
  }

  sendto(udp_sock, payload, strlen(payload) + 1, 0,
	 (struct sockaddr*)&dest_addr, sizeof(dest_addr));

  close(udp_sock);
}


int main(int argc, char **argv)
{
  int listen_sock, ret, addr_len;
  struct sockaddr_in bind_addr, peer_addr;
  unsigned short src_port = htons(1234);
  unsigned short dst_port = htons(5678);
  char *payload = "hello";
  char buf[6];
  fd_set readfds;
  struct timeval timeout;

  bind_addr.sin_family = AF_INET;
  bind_addr.sin_port = dst_port;
  bind_addr.sin_addr.s_addr = INADDR_ANY;

  if((listen_sock = socket(PF_INET, SOCK_DGRAM, 0)) < 0) {
    perror("socket");
    exit(1);
  }

  if(bind(listen_sock, (struct sockaddr*)&bind_addr, sizeof(bind_addr)) < 0 ) {
    perror("bind");
    exit(1);
  }

#if 1
  send_raw_udp(src_port, dst_port, payload);
#else
  send_normal_udp(src_port, dst_port, payload);
#endif

  printf("calling select\n");
  FD_ZERO(&readfds);
  FD_SET(listen_sock, &readfds);
  if((ret = select(listen_sock+1, &readfds, NULL, NULL, NULL)) < 0) {
    perror("select");
    exit(1);
  }
  if(!FD_ISSET(listen_sock, &readfds)) {
    printf("hmmm, socket is not readable\n");
    exit(1);
  }
  printf("socket is readable\n");

  addr_len = sizeof(peer_addr);
  if(recvfrom(listen_sock, buf, sizeof(buf), 0,
	      (struct sockaddr*)&peer_addr, &addr_len) < 0) {
    perror("recvfrom");
    exit(1);
  }
  printf("recvfrom success: %s\n", buf);

}

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-21  3:01                                           ` Michael Clark
  2004-10-21  3:52                                             ` Michael Clark
@ 2004-10-21  4:10                                             ` H. Peter Anvin
  2004-10-21  5:06                                               ` Chris Friesen
  1 sibling, 1 reply; 191+ messages in thread
From: H. Peter Anvin @ 2004-10-21  4:10 UTC (permalink / raw)
  To: Michael Clark; +Cc: Chris Friesen, linux-kernel

Michael Clark wrote:
>>
>> Doing work twice can hardly be considered The Right Thing.
> 
> Optimisations that break documented interfaces and age old assumptions
> can hardly be considered The Right Thing :)

The whole point is that it doesn't break the *documented* interface.

	-hpa

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-21  4:10                                             ` H. Peter Anvin
@ 2004-10-21  5:06                                               ` Chris Friesen
  2004-10-21  5:11                                                 ` H. Peter Anvin
  0 siblings, 1 reply; 191+ messages in thread
From: Chris Friesen @ 2004-10-21  5:06 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Michael Clark, linux-kernel

H. Peter Anvin wrote:

> The whole point is that it doesn't break the *documented* interface.

In my view (and apparently others, as has been verified in current apps using 
blocking sockets), current behaviour *does* break the documented interface.

The man page for select says:

"Those  listed  in  readfds  will  be watched  to  see if characters become 
available for reading (more precisely, to see if a read will not block..."

If I'm the only one touching the socket, select returns with it readable, and I 
block when calling recvmsg, then by definition that behaviour does not match the 
documented interface.

Chris

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-21  5:06                                               ` Chris Friesen
@ 2004-10-21  5:11                                                 ` H. Peter Anvin
  2004-10-21  5:50                                                   ` Chris Friesen
  0 siblings, 1 reply; 191+ messages in thread
From: H. Peter Anvin @ 2004-10-21  5:11 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Michael Clark, linux-kernel

Chris Friesen wrote:
> H. Peter Anvin wrote:
> 
>> The whole point is that it doesn't break the *documented* interface.
> 
> 
> In my view (and apparently others, as has been verified in current apps 
> using blocking sockets), current behaviour *does* break the documented 
> interface.
> 
> The man page for select says:
> 
> "Those  listed  in  readfds  will  be watched  to  see if characters 
> become available for reading (more precisely, to see if a read will not 
> block..."
> 
> If I'm the only one touching the socket, select returns with it 
> readable, and I block when calling recvmsg, then by definition that 
> behaviour does not match the documented interface.
> 

I'm talking about returning -1, EIO.

	-hpa

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-21  5:11                                                 ` H. Peter Anvin
@ 2004-10-21  5:50                                                   ` Chris Friesen
  2004-10-21  5:58                                                     ` H. Peter Anvin
  2004-10-21  6:14                                                     ` Michael Clark
  0 siblings, 2 replies; 191+ messages in thread
From: Chris Friesen @ 2004-10-21  5:50 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Michael Clark, linux-kernel

H. Peter Anvin wrote:
>> H. Peter Anvin wrote:
>>
>>> The whole point is that it doesn't break the *documented* interface.

> I'm talking about returning -1, EIO.


Ah.  By "it", I thought you meant the current performance optimizations, not the 
EIO.  My apologies.

I think returning EIO is suboptimal, as it is not an expected error value for 
recvmsg().  (It's not listed in the man pages for recvmsg() or ip.)  It would 
certainly work for new apps, but probably not for many existing binaries.

On the other hand, if you simply do the checksum verification at select() time 
for blocking sockets, then the existing binaries get exactly the behaviour they 
expect.

Chris

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-21  5:50                                                   ` Chris Friesen
@ 2004-10-21  5:58                                                     ` H. Peter Anvin
  2004-10-21 15:18                                                       ` Chris Friesen
  2004-10-21  6:14                                                     ` Michael Clark
  1 sibling, 1 reply; 191+ messages in thread
From: H. Peter Anvin @ 2004-10-21  5:58 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Michael Clark, linux-kernel

Chris Friesen wrote:
> H. Peter Anvin wrote:
> 
>>> H. Peter Anvin wrote:
>>>
>>>> The whole point is that it doesn't break the *documented* interface.
> 
> 
>> I'm talking about returning -1, EIO.
> 
> 
> 
> Ah.  By "it", I thought you meant the current performance optimizations, 
> not the EIO.  My apologies.
> 
> I think returning EIO is suboptimal, as it is not an expected error 
> value for recvmsg().  (It's not listed in the man pages for recvmsg() or 
> ip.)  It would certainly work for new apps, but probably not for many 
> existing binaries.

POSIX specifies:

The recvmsg( ) function shall fail if:

[EAGAIN] or [EWOULDBLOCK] The socket's file descriptor is marked 
O_NONBLOCK and no data is waiting to be received; or MSG_OOB is set and 
no out-of-band data is available and either the socket s file descriptor 
is marked O_NONBLOCK or the socket does not support blocking to await 
out-of-band data.

[EBADF] The socket argument is not a valid open file descriptor.

[ECONNRESET] A connection was forcibly closed by a peer.

[EINTR] This function was interrupted by a signal before any data was 
available.

[EINVAL] The sum of the iov_len values overflows a ssize_t, or the 
MSG_OOB flag is set 37371 and no out-of-band data is available.

[EMSGSIZE] The msg_iovlen member of the msghdr structure pointed to by 
message is less 37373 than or equal to 0, or is greater than {IOV_MAX}.

[ENOTCONN] A receive is attempted on a connection-mode socket that is 
not connected.

[ENOTSOCK] The socket argument does not refer to a socket.

[EOPNOTSUPP] The specified flags are not supported for this socket type.

[ETIMEDOUT] The connection timed out during connection establishment, or 
due to a transmission timeout on active connection.

The recvmsg( ) function may fail if:

[EIO] An I/O error occurred while reading from or writing to the file 
system.

[ENOBUFS] Insufficient resources were available in the system to perform 
the operation.

[ENOMEM] Insufficient memory was available to fulfill the request.

Since you didn't code to Linux, and didn't code to POSIX... what did you 
code to?

> On the other hand, if you simply do the checksum verification at 
> select() time for blocking sockets, then the existing binaries get 
> exactly the behaviour they expect.
> 

... unless the blocking changes.  In which case you either have to do 
work twice, or it might *never* happen.  Not to mention the extra code 
complexity.

The performance overhead of checksumming is substantial; I have seen 
some real horror examples of what happens when you do it badly.

	-hpa

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-21  5:50                                                   ` Chris Friesen
  2004-10-21  5:58                                                     ` H. Peter Anvin
@ 2004-10-21  6:14                                                     ` Michael Clark
  1 sibling, 0 replies; 191+ messages in thread
From: Michael Clark @ 2004-10-21  6:14 UTC (permalink / raw)
  To: Chris Friesen, H. Peter Anvin; +Cc: linux-kernel

On 10/21/04 13:50, Chris Friesen wrote:
> H. Peter Anvin wrote:
> 
>>> H. Peter Anvin wrote:
>>>
>>>> The whole point is that it doesn't break the *documented* interface.
> 
> 
>> I'm talking about returning -1, EIO.
> 
> 
> 
> Ah.  By "it", I thought you meant the current performance optimizations, 
> not the EIO.  My apologies.

Same.

> I think returning EIO is suboptimal, as it is not an expected error 
> value for recvmsg().  (It's not listed in the man pages for recvmsg() or 
> ip.)  It would certainly work for new apps, but probably not for many 
> existing binaries.

Yes, big likelyhood of breaking apps although does give you the
deterministic behaviour of recvmsg not blocking after select.

> On the other hand, if you simply do the checksum verification at 
> select() time for blocking sockets, then the existing binaries get 
> exactly the behaviour they expect.

This would be the best from existing usersapce apps, although i'd much
rather have EIO returned than the current behaviour if that was that
was the only choice (although the offshoot would be the need for patches
to quite a few apps).

~mc

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-21  5:58                                                     ` H. Peter Anvin
@ 2004-10-21 15:18                                                       ` Chris Friesen
  0 siblings, 0 replies; 191+ messages in thread
From: Chris Friesen @ 2004-10-21 15:18 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Michael Clark, linux-kernel

H. Peter Anvin wrote:

> POSIX specifies:

<useful stuff snipped>

> The recvmsg( ) function may fail if:
> 
> [EIO] An I/O error occurred while reading from or writing to the file 
> system.

<snipped>

> Since you didn't code to Linux, and didn't code to POSIX... what did you 
> code to?

I didn't code it--my code generally uses nonblocking sockets or doesn't use 
select at all.  I'm just commenting on existing apps.

What do you mean by "didn't code to Linux"?  The Linux man pages for recvmsg() 
and ip do not mention EIO.  Hence, I suspect that not many people coding for 
Linux will have handled it.  Furthermore, the Linux man page for select() says 
that a socket that is returned as readable will not block on a subsequent read.

>> On the other hand, if you simply do the checksum verification at 
>> select() time for blocking sockets, then the existing binaries get 
>> exactly the behaviour they expect.

> ... unless the blocking changes.  In which case you either have to do 
> work twice, or it might *never* happen.  Not to mention the extra code 
> complexity.

If you verify the checksum at select time, you could just flag the packet as 
verified.  Then even if you do a recvmsg() with MSG_DONTWAIT, you wouldn't have 
to verify it again.  It means an extra pass over the data compared to a full-on 
O_NONBLOCK socket, but it's still correct.

If you change from nonblocking to blocking between select() and recvmsg(), then 
you have a problem, but you're still no worse off than the current situation.

The extra complexity is a valid point, but I suggest that the expectations of 
the installed base are more important.


> The performance overhead of checksumming is substantial; I have seen 
> some real horror examples of what happens when you do it badly.

Again, this is only a backwards compatibility thing.  All new apps should use 
nonblocking sockets anyways, right?  So this way, old apps don't suffer from 
single-packet-DOS attacks, at the cost of a performance penalty.

Chris

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-19  1:21 John Pearson
@ 2004-10-19 13:50 ` Colin Phipps
  0 siblings, 0 replies; 191+ messages in thread
From: Colin Phipps @ 2004-10-19 13:50 UTC (permalink / raw)
  To: linux-kernel

On Tue, Oct 19, 2004 at 10:51:03AM +0930, John Pearson wrote:
> As far as I can see:
> 
>   - YES, Linux select 'lies' and violates POSIX wrt checksums:
>     a call to recvmsg() might well have blocked when select()
>     said data was ready, as a result of a currupt UDP packet;
> 
>   - NO, 'fixing' select() won't guarantee that recvmsg()
>     will not block/return EAGAIN, because select() only
>     guarantees that a call to recvmsg() would not have blocked
>     at that time - as others have observed, it cannot guarantee
>     that 'valid' data won't subsequently be discarded; any
>     subsequent call to recvmsg() is only 'immediate' in a fuzzy,
>     imprecise and inadequate sense.
> 
> Can we get back to arguing about something less repetitive
> (or at least, make the circle larger and more scenic)?

I would put a third point on the list; the behaviour makes the failure
case for a lot of broken apps much more likely, and easy to trigger
remotely.

In the interest of making things more "scenic", let's have a few more
broken apps:

glibc RPC - so portmap, statd, ...

It seems there's a common idiom in most of the broken programs.
Programmers assume that they can take a collection of library functions
that do blocking IO, and then multiplex them by sticking a select on the
front to choose when to call them. Given the wording of the POSIX
standard, it could be naively read to endorse this idiom - it says a
socket is readable if it won't block to read.  glibc RPC does this; the
underlying functions all assume blocking fds, and it then sticks a
select on the front. This occurs again in inetd, again in syslog, again
in net-snmp, and those are just the ones I see on my desktop machine.
You can easily patch the kernel to have it report them all (just
remember to disable syslog first, as it is one of the culprits).

Sure they are all broken, but now they are all exposed to bad UDP
checksums. Perhaps the people who benefit from the time saved by this
micro-optimisation would care to use the time saved to fix glibc?


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
@ 2004-10-19  1:21 John Pearson
  2004-10-19 13:50 ` Colin Phipps
  0 siblings, 1 reply; 191+ messages in thread
From: John Pearson @ 2004-10-19  1:21 UTC (permalink / raw)
  To: linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=unknown-8bit, Size: 4381 bytes --]


This has probably been said before, but just in case I missed
it; there are many viewpoints in this thread, but the two main camps
appear to be arguing at cross-purposes.

As far as I can see:

  - YES, Linux select 'lies' and violates POSIX wrt checksums:
    a call to recvmsg() might well have blocked when select()
    said data was ready, as a result of a currupt UDP packet;

  - NO, 'fixing' select() won't guarantee that recvmsg()
    will not block/return EAGAIN, because select() only
    guarantees that a call to recvmsg() would not have blocked
    at that time - as others have observed, it cannot guarantee
    that 'valid' data won't subsequently be discarded; any
    subsequent call to recvmsg() is only 'immediate' in a fuzzy,
    imprecise and inadequate sense.

Can we get back to arguing about something less repetitive
(or at least, make the circle larger and more scenic)?



John.


On Sun, Oct 17, 2004 at 07:22:44PM +0200, Lars Marowsky-Bree wrote
> On 2004-10-17T16:17:06, Buddy Lucas <buddy.lucas@gmail.com> wrote:
> 
> > > The SuV spec is actually quite detailed about the options here:
> > > 
> > >         A descriptor shall be considered ready for reading when a call
> > >         to an input function with O_NONBLOCK clear would not block,
> > >         whether or not the function would transfer data successfully.
> > >         (The function might return data, an end-of-file indication, or
> > >         an error other than one indicating that it is blocked, and in
> > >         each of these cases the descriptor shall be considered ready for
> > >         reading.)
> > But it says nowhere that the select()/recvmsg() operation is atomic, right?
> 
> See, Buddy, the point here is that Linux _does_ violate the
> specification. You can try weaseling out of it, but it's not going to
> work.
> 
> This isn't per se the same as saying that it's not a sensible violation,
> but very clearly the specs disagree with the current Linux behaviour.
> 
> It's impossible to claim that you are allowed by the spec to block on a
> recvmsg directly following a successful select. You are not. You could
> claim that, but you'd be wrong.
> 
> If the packet has been dropped in between, which _could_ have happened
> because UDP is allowed to be dropped basically anywhere, EIO may be
> returned. But blocking or returning EAGAIN/EWOULDBLOCK is verboten. The
> spec is very clearly on that.
> 
> (Now I'd claim that returning EIO after a succesful select is also
> slightly suboptimal - the performance optimizations should be turned off
> for blocking sockets, IMHO, and the data which caused the select() to
> return should be considered comitted - but it would be allowed.)
> 
> I'm not so sure what's so hard to accept about that. It may be well that
> Linux is following the de-facto industry standard (or even setting it)
> here, and I'd agree that if you don't want blocking use O_NONBLOCK, but
> in no way can Linux claim POSIX/SuV spec compliance for this behaviour.
> 
> I'm not getting why people argue so much to try and weasel the words so
> that it comes out as compliant. It's not. It may make sense due to
> practical reasons, but it's not compliant.
> 
> 
> Sincerely,
>     Lars Marowsky-Brée <lmb@suse.de>
> 
> -- 
> High Availability & Clustering
> SUSE Labs, Research and Development
> SUSE LINUX AG - A Novell company
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 
> --__--__--

-- 
Voice: +61 8 8202 9040
Email: jpearson@sa.pracom.com.au

Pracom Ltd
288 Glen Osmond Road
Fullarton, South Australia 5063

Ph: + 61 8 82029000
Fax: +61 8 82029001

CAUTION: This email and any attachments may contain information that is
confidential and subject to copyright. If you are not the
intended recipient, you must not read, use, disseminate, distribute or
copy this email or any attachments. If you have received this
email in error, please notify the sender immediately by reply email and
erase this email and any attachments.

DISCLAIMER: Pracom uses virus-scanning technology but accepts no
responsibility for loss or damage arising from the use of the
information transmitted by this email including damage from virus.


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
       [not found] ` <fa.isqjio8.ok2coo@ifi.uio.no>
@ 2004-10-09 13:24   ` Bodo Eggert
  0 siblings, 0 replies; 191+ messages in thread
From: Bodo Eggert @ 2004-10-09 13:24 UTC (permalink / raw)
  To: linux-kernel

David S. Miller wrote:

> People, get the heck over this.  The kernel has behaved this way
> for more than 3 years both in 2.4.x and 2.6.x.  The code in question
> even exists in the 2.2.x sources as well.
> 
> Therefore, it would be totally pointless to change the behavior
> now since anyone writing an application wishing it to work on
> all existing Linux kernels needs to handle this case anyways.

If you want people to write workaround for functions that intentional
break applications depending on dysfunctional behaviour, you should
document it. You didn't, and therefore most applications will be broken:

google survey:
Results 1 - 10 of about 13,100 for udp select recv. (0.18 seconds)
Results 1 - 10 of about 636 for udp select recv o_nonblock. (0.16 seconds)

Results 1 - 10 of about 4,350 for select recv SOCK_DGRAM. (0.40 seconds)
Results 1 - 10 of about 685 for select recv SOCK_DGRAM o_nonblock. (0.39
seconds) 


I think nobody complaining results from nobody sending bad UDP packets,
and nobody sending bad UDP packets resulted from nobody complaining.


BTW: If you're breaking select() for blocking sockets, you can as well
return -EBROKEN. It's as close to the specification as waiting after
guaranteeing not to wait, but it will not result in hidden flaws.
-- 
Fun things to slip into your budget
Request for 'supermodel access' to the UNIX server.
        Just don't tell the PHB why your home directory is named 'jpgs.'

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 18:16 linux
@ 2004-10-09 12:07 ` Colin Phipps
  0 siblings, 0 replies; 191+ messages in thread
From: Colin Phipps @ 2004-10-09 12:07 UTC (permalink / raw)
  To: linux-kernel

So the performance gain is significant. And programs that break were
buggy anyway. But that still leaves the question of whether it benefits
users, given that there is a lot of software, buggy by this
interpretation, that can break. In particular, exposing UDP daemons to
denial of service using bad-checksum UDP packets looks like a rather
interesting security issue.

I have just tried syslog and inetd on a couple of machines running
2.6.8.1, and both hang when given a single bad-checksum udp packet.
hping2 -2 -c 1 -b is the tool of choice.  Sure, they could have broken
anyway, but this makes them easy targets - and presumably they are the
tip of the iceberg.

-- 
Colin Phipps <cph@cph.demon.co.uk>

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
@ 2004-10-07 18:16 linux
  2004-10-09 12:07 ` Colin Phipps
  0 siblings, 1 reply; 191+ messages in thread
From: linux @ 2004-10-07 18:16 UTC (permalink / raw)
  To: linux-kernel, mark

> There is one claim I'd like to question - the claim that select()
> would be slowed down unnecessarily, even if the behaviour was changed
> for both O_NONBLOCK enabled. Isn't it more expensive to allow the
> application to be woken up, and poll using read(), than to just do a
> quick check in the kernel and not tell the application there is data,
> when there really isn't?

It depends on how often the second check fails.

The 99.9+% case is that the checksum is good, and in that case, you have
to pay the wakeup cost anyway.  (So sending bad-checksum packets isn't
even a useful DoS; good-checksum packets still steal more cycles.)

You only get the benefit of saving the wakeup if the check fails.
But every time it passes, you get the benefit of making the check cheaper.
You have to multiply by the relative probabilities to see the net effect.

For something like UDP checksums on packets which have already passed the
ethernet checksum, that's a pretty overwhelming ratio.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 10:07       ` Adam Heath
@ 2004-10-07 11:29         ` Martijn Sipkema
  0 siblings, 0 replies; 191+ messages in thread
From: Martijn Sipkema @ 2004-10-07 11:29 UTC (permalink / raw)
  To: Adam Heath; +Cc: Linux Kernel Mailing List

From: "Adam Heath" <doogie@debian.org>
Sent: Thursday, October 07, 2004 11:07


> On Thu, 7 Oct 2004, Martijn Sipkema wrote:
> 
> > > > It does not matter - this behaviour should not be depended upon. There are
> > > > lots of other reasons why a packet might in fact not be available, kernels
> > > > are allowed to drop UDP packets at will.
> > >
> > > I've been lurking and reading this thread with great interest.  I had been
> > > leaning towards thinking the kernel was wrong, until I read this email.
> > >
> > > This is a very excellent point.
> >
> > No, it isn't. If the kernel drops a UDP packet, select() should not return
> > indicating available data.
> 
> The kernel can drop a packet after select() returns, and before read() is
> called.  That's the whole point of *U*DP.

I don't think that is the point of UDP and I don't think the kernel should
do that, but there are two options for handling this:

If recvmsg() blocks until valid data is available then so should select().
If recvmsg() returns an error on invalid data then select() would indicate the
socket() as ready without knowing if the data was valid.


--ms


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07  8:28   ` Adam Heath
@ 2004-10-07 10:38     ` Martijn Sipkema
  2004-10-07 10:07       ` Adam Heath
  0 siblings, 1 reply; 191+ messages in thread
From: Martijn Sipkema @ 2004-10-07 10:38 UTC (permalink / raw)
  To: Adam Heath; +Cc: Linux Kernel Mailing List

From: "Adam Heath" <doogie@debian.org>
Sent: Thursday, October 07, 2004 09:28
> On Thu, 7 Oct 2004, bert hubert wrote:
> 
> > On Wed, Oct 06, 2004 at 09:50:10PM -0700, Dan Kegel wrote:
> >
> > > It would be nice to know how other operating systems behave
> > > when receiving UDP packets with bad checksums.  Can someone
> > > try BSD and Solaris?
> >
> > It does not matter - this behaviour should not be depended upon. There are
> > lots of other reasons why a packet might in fact not be available, kernels
> > are allowed to drop UDP packets at will.
> 
> I've been lurking and reading this thread with great interest.  I had been
> leaning towards thinking the kernel was wrong, until I read this email.
> 
> This is a very excellent point.

No, it isn't. If the kernel drops a UDP packet, select() should not return
indicating available data.


--ms


^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07 10:38     ` Martijn Sipkema
@ 2004-10-07 10:07       ` Adam Heath
  2004-10-07 11:29         ` Martijn Sipkema
  0 siblings, 1 reply; 191+ messages in thread
From: Adam Heath @ 2004-10-07 10:07 UTC (permalink / raw)
  To: Martijn Sipkema; +Cc: Linux Kernel Mailing List

On Thu, 7 Oct 2004, Martijn Sipkema wrote:

> > > It does not matter - this behaviour should not be depended upon. There are
> > > lots of other reasons why a packet might in fact not be available, kernels
> > > are allowed to drop UDP packets at will.
> >
> > I've been lurking and reading this thread with great interest.  I had been
> > leaning towards thinking the kernel was wrong, until I read this email.
> >
> > This is a very excellent point.
>
> No, it isn't. If the kernel drops a UDP packet, select() should not return
> indicating available data.

The kernel can drop a packet after select() returns, and before read() is
called.  That's the whole point of *U*DP.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07  8:04 ` bert hubert
@ 2004-10-07  8:28   ` Adam Heath
  2004-10-07 10:38     ` Martijn Sipkema
  0 siblings, 1 reply; 191+ messages in thread
From: Adam Heath @ 2004-10-07  8:28 UTC (permalink / raw)
  Cc: Linux Kernel Mailing List

On Thu, 7 Oct 2004, bert hubert wrote:

> On Wed, Oct 06, 2004 at 09:50:10PM -0700, Dan Kegel wrote:
>
> > It would be nice to know how other operating systems behave
> > when receiving UDP packets with bad checksums.  Can someone
> > try BSD and Solaris?
>
> It does not matter - this behaviour should not be depended upon. There are
> lots of other reasons why a packet might in fact not be available, kernels
> are allowed to drop UDP packets at will.

I've been lurking and reading this thread with great interest.  I had been
leaning towards thinking the kernel was wrong, until I read this email.

This is a very excellent point.

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
  2004-10-07  4:50 Dan Kegel
@ 2004-10-07  8:04 ` bert hubert
  2004-10-07  8:28   ` Adam Heath
  0 siblings, 1 reply; 191+ messages in thread
From: bert hubert @ 2004-10-07  8:04 UTC (permalink / raw)
  To: Dan Kegel; +Cc: Linux Kernel Mailing List, joris

On Wed, Oct 06, 2004 at 09:50:10PM -0700, Dan Kegel wrote:

> It would be nice to know how other operating systems behave
> when receiving UDP packets with bad checksums.  Can someone
> try BSD and Solaris?

It does not matter - this behaviour should not be depended upon. There are
lots of other reasons why a packet might in fact not be available, kernels
are allowed to drop UDP packets at will.

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
@ 2004-10-07  4:50 Dan Kegel
  2004-10-07  8:04 ` bert hubert
  0 siblings, 1 reply; 191+ messages in thread
From: Dan Kegel @ 2004-10-07  4:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List, joris

Joris wrote:
> On Wed, 6 Oct 2004, Andries Brouwer wrote:
>> Does this really happen?
> 
> Yes. Finally got my raw-udp-with-wrong-checksum sending program to work
> over localhost and it hangs recvfrom pretty good.
> 
>> All kernel versions?
> 
> Quick guess: probably since late 2.4. Source of 2.4.27 udp.c is similar to
> 2.6.9, but 2.4.17 returns EAGAIN even for blocking sockets, apparently
> this was "fixed" later on.

It would be nice to know how other operating systems behave
when receiving UDP packets with bad checksums.  Can someone
try BSD and Solaris?

- Dan

-- 
My technical stuff: http://kegel.com
My politics: see http://www.misleader.org for examples of why I'm for regime change

^ permalink raw reply	[flat|nested] 191+ messages in thread

* Re: UDP recvmsg blocks after select(), 2.6 bug?
@ 2004-10-06 15:30 Dan Kegel
  0 siblings, 0 replies; 191+ messages in thread
From: Dan Kegel @ 2004-10-06 15:30 UTC (permalink / raw)
  To: Linux Kernel Mailing List, davem

David S. Miller wrote:
> Incorrect UDP checksums could cause the read data to
> be discarded.  We do the copy into userspace and checksum
> computation in parallel.  This is totally legal and we've
> been doing it since 2.4.x first got released.

Might there be a similar effect for packets with bad IP or TCP checksums?
(http://citeseer.ist.psu.edu/stone00when.html)

And as Bert says, Stevens mentions that with TCP accepts,
the other side might close before you call accept.

BTW this is why I insisted in JSR-51 that Java's NIO not allow
the use of Selector with blocking sockets:

http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/SelectableChannel.html
"A channel must be placed into non-blocking mode before being
registered with a selector, and may not be returned to blocking mode until it has been deregistered."

So at least Java programs that use NIO are immune to this
particular user error (unless your implementation of NIO
relaxes this sanity check, tsk).

- Dan

-- 
My technical stuff: http://kegel.com
My politics: see http://www.misleader.org for examples of why I'm for regime change

^ permalink raw reply	[flat|nested] 191+ messages in thread

end of thread, other threads:[~2004-10-21 15:22 UTC | newest]

Thread overview: 191+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-06 14:52 UDP recvmsg blocks after select(), 2.6 bug? Joris van Rantwijk
2004-10-06 15:01 ` David S. Miller
2004-10-06 15:13   ` Chris Friesen
2004-10-06 15:15   ` Richard B. Johnson
2004-10-06 15:21     ` David S. Miller
2004-10-06 15:29       ` Richard B. Johnson
2004-10-06 15:42         ` David S. Miller
2004-10-06 15:57           ` Chris Friesen
2004-10-06 15:44         ` Lars Marowsky-Bree
2004-10-07  1:16         ` Paul Jakma
2004-10-07  7:10           ` Chris Friesen
2004-10-07 11:53             ` Paul Jakma
2004-10-07 13:32               ` Martijn Sipkema
2004-10-07 12:53                 ` Paul Jakma
2004-10-07 13:12                   ` Richard B. Johnson
2004-10-07 14:07                   ` Martijn Sipkema
2004-10-07 13:19                     ` Paul Jakma
2004-10-07 13:36                     ` Paul Jakma
2004-10-07 15:01                       ` Jean-Sebastien Trottier
2004-10-07 16:20                         ` Chris Friesen
2004-10-07 18:20                           ` Hua Zhong
2004-10-07 18:33                             ` Chris Friesen
2004-10-07 22:41                               ` Martijn Sipkema
2004-10-07 21:49                                 ` Chris Friesen
2004-10-07 22:00                                   ` David S. Miller
2004-10-07 22:24                                     ` Chris Friesen
2004-10-07 22:26                                       ` David S. Miller
2004-10-07 22:39                                         ` Chris Friesen
2004-10-07 22:42                                           ` David S. Miller
2004-10-07 23:27                                             ` Chris Friesen
2004-10-08  0:04                                               ` Ben Greear
2004-10-08  2:51                                             ` Mark Mielke
2004-10-08  3:39                                               ` David S. Miller
2004-10-08  3:48                                                 ` Mark Mielke
2004-10-08  3:59                                                   ` David S. Miller
2004-10-07 23:19                                     ` Martijn Sipkema
2004-10-07 22:24                                       ` David S. Miller
2004-10-07 22:33                                         ` Alan Curry
2004-10-07 22:42                                         ` Mark Mielke
2004-10-07 22:47                                           ` David S. Miller
2004-10-07 23:00                                             ` Mark Mielke
2004-10-07 23:07                                               ` David S. Miller
2004-10-08  6:10                                               ` Theodore Ts'o
2004-10-08 15:20                                                 ` Mark Mielke
2004-10-08  0:37                                             ` Lee Revell
2004-10-07 22:46                                         ` Hua Zhong
2004-10-07 22:48                                           ` David S. Miller
2004-10-07 23:17                                   ` Martijn Sipkema
2004-10-07 13:45                     ` Alan Cox
2004-10-07 16:32                       ` Martijn Sipkema
2004-10-07 14:50                         ` Alan Cox
2004-10-07 21:58                           ` mmap specification - was: ... select specification Andries Brouwer
2004-10-07 22:17                             ` Chris Wedgwood
2004-10-07 22:34                               ` Andries Brouwer
2004-10-07 22:32                             ` Kyle Moffett
2004-10-07 22:46                               ` Andries Brouwer
2004-10-07 23:30                                 ` Kyle Moffett
2004-10-08  9:19                                   ` Andries Brouwer
2004-10-09 21:10                                     ` Martijn Sipkema
2004-10-07 13:48                     ` UDP recvmsg blocks after select(), 2.6 bug? Alan Cox
2004-10-07 14:57                       ` Richard B. Johnson
2004-10-07 15:18                       ` Adam Heath
2004-10-07 16:39                         ` Martijn Sipkema
2004-10-07 16:09                           ` Mark Mielke
2004-10-07 17:18                             ` Chris Friesen
2004-10-06 15:31       ` Chris Friesen
2004-10-06 15:41         ` David S. Miller
2004-10-06 16:07           ` Richard B. Johnson
2004-10-06 16:57           ` Neil Horman
2004-10-06 15:59       ` Paul Jackson
2004-10-06 16:35       ` Martijn Sipkema
2004-10-06 15:30     ` Chris Friesen
2004-10-06 15:09 ` Richard B. Johnson
2004-10-06 15:18 ` bert hubert
2004-10-06 16:41 ` Alan Cox
2004-10-06 18:04   ` Joris van Rantwijk
2004-10-06 19:30     ` Andries Brouwer
2004-10-06 19:23       ` Alan Cox
2004-10-06 22:08         ` Martijn Sipkema
2004-10-06 20:25           ` Alan Cox
2004-10-06 22:15             ` Andries Brouwer
2004-10-06 22:32               ` David S. Miller
2004-10-06 23:25               ` YOSHIFUJI Hideaki / 吉藤英明
2004-10-06 23:11             ` Willy Tarreau
2004-10-06 19:43       ` Hua Zhong
2004-10-06 19:54         ` Chris Friesen
2004-10-06 19:59           ` Hua Zhong
2004-10-06 20:10             ` Chris Friesen
2004-10-06 21:45               ` Martijn Sipkema
2004-10-06 23:35                 ` David S. Miller
2004-10-06 20:06           ` David S. Miller
2004-10-06 20:18             ` Chris Friesen
2004-10-06 20:26               ` Hua Zhong
2004-10-06 20:38               ` Andries Brouwer
2004-10-06 20:58                 ` Joris van Rantwijk
2004-10-06 22:29                 ` David S. Miller
2004-10-07 16:08                 ` Adrian Phillips
2004-10-06 20:06         ` Olivier Galibert
2004-10-06 23:35           ` David S. Miller
2004-10-07  0:19             ` Olivier Galibert
2004-10-07  0:29               ` David S. Miller
2004-10-07 10:56                 ` Martijn Sipkema
2004-10-08  6:41                 ` Willy Tarreau
2004-10-08 15:27                   ` Mark Mielke
2004-10-15 22:42                   ` Robert White
2004-10-15 23:33                     ` David Schwartz
2004-10-16  0:59                       ` Chris Friesen
2004-10-16  2:35                       ` Mark Mielke
2004-10-16  4:23                         ` David Schwartz
2004-10-16  4:35                           ` Mark Mielke
2004-10-16  4:58                             ` David Schwartz
2004-10-16  6:25                               ` Mark Mielke
2004-10-16 21:44                                 ` Roland Kuhn
2004-10-17  0:06                                   ` Mark Mielke
2004-10-17  0:30                                     ` David Schwartz
2004-10-17 14:47                                       ` Mark Mielke
2004-10-17  0:28                                 ` David Schwartz
2004-10-17 13:35                                   ` Lars Marowsky-Bree
2004-10-17 14:17                                     ` Buddy Lucas
2004-10-17 15:05                                       ` Mark Mielke
2004-10-17 15:40                                         ` Buddy Lucas
2004-10-17 16:13                                           ` Lee Revell
2004-10-17 17:35                                           ` Jesper Juhl
2004-10-17 18:04                                             ` Buddy Lucas
2004-10-17 18:06                                               ` Lars Marowsky-Bree
2004-10-17 18:21                                                 ` Buddy Lucas
2004-10-17 20:04                                                 ` Martijn Sipkema
2004-10-17 20:08                                                   ` Lars Marowsky-Bree
2004-10-17 17:35                                           ` Martijn Sipkema
2004-10-17 17:33                                             ` Buddy Lucas
2004-10-17 19:58                                               ` Martijn Sipkema
2004-10-17 19:33                                                 ` Buddy Lucas
2004-10-17 20:11                                                   ` Lars Marowsky-Bree
2004-10-17 20:25                                                     ` Buddy Lucas
2004-10-17 20:42                                                   ` Martijn Sipkema
2004-10-17 20:02                                                     ` Buddy Lucas
2004-10-17 18:53                                             ` David Schwartz
2004-10-17 19:26                                               ` Hua Zhong
2004-10-17 20:32                                               ` Martijn Sipkema
2004-10-17 19:21                                           ` Hua Zhong
2004-10-17 17:22                                       ` Lars Marowsky-Bree
2004-10-17 17:54                                         ` Buddy Lucas
2004-10-17 18:05                                           ` Lars Marowsky-Bree
2004-10-17 18:06                                           ` Mark Mielke
2004-10-20 21:31                                     ` H. Peter Anvin
2004-10-20 21:58                                       ` Chris Friesen
2004-10-20 22:00                                         ` H. Peter Anvin
2004-10-20 22:12                                           ` Chris Friesen
2004-10-20 23:16                                             ` David Schwartz
2004-10-21  1:03                                               ` Chris Friesen
2004-10-21  1:38                                                 ` David Schwartz
2004-10-21  3:01                                           ` Michael Clark
2004-10-21  3:52                                             ` Michael Clark
2004-10-21  4:10                                             ` H. Peter Anvin
2004-10-21  5:06                                               ` Chris Friesen
2004-10-21  5:11                                                 ` H. Peter Anvin
2004-10-21  5:50                                                   ` Chris Friesen
2004-10-21  5:58                                                     ` H. Peter Anvin
2004-10-21 15:18                                                       ` Chris Friesen
2004-10-21  6:14                                                     ` Michael Clark
2004-10-17 14:52                                   ` Mark Mielke
2004-10-16 18:25                               ` Andries Brouwer
2004-10-17  0:28                                 ` David Schwartz
2004-10-17 12:22                                   ` Andries Brouwer
2004-10-16 10:24                     ` Willy Tarreau
2004-10-16 13:21                       ` Mark Mielke
2004-10-18 22:25                       ` Robert White
2004-10-06 20:41         ` Neil Horman
2004-10-06 22:27           ` Chris Friesen
2004-10-06 23:32             ` Neil Horman
2004-10-06 23:36             ` David S. Miller
2004-10-07 19:31 ` David Schwartz
2004-10-07 22:36   ` Martijn Sipkema
2004-10-08  0:19     ` David Schwartz
2004-10-09 19:21       ` Martijn Sipkema
2004-10-09 18:28         ` David Schwartz
2004-10-09 18:49           ` Mark Mielke
2004-10-09 21:00             ` Martijn Sipkema
2004-10-09 22:59               ` Mark Mielke
2004-10-06 15:30 Dan Kegel
2004-10-07  4:50 Dan Kegel
2004-10-07  8:04 ` bert hubert
2004-10-07  8:28   ` Adam Heath
2004-10-07 10:38     ` Martijn Sipkema
2004-10-07 10:07       ` Adam Heath
2004-10-07 11:29         ` Martijn Sipkema
2004-10-07 18:16 linux
2004-10-09 12:07 ` Colin Phipps
     [not found] <fa.haprsoi.8k8kbk@ifi.uio.no>
     [not found] ` <fa.isqjio8.ok2coo@ifi.uio.no>
2004-10-09 13:24   ` Bodo Eggert
2004-10-19  1:21 John Pearson
2004-10-19 13:50 ` Colin Phipps

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).