linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Does Linux select() violate POSIX?
@ 2011-06-18 17:06 Nemo Publius
  2011-06-18 17:43 ` Eric Dumazet
  2011-06-22 18:20 ` Chris Friesen
  0 siblings, 2 replies; 10+ messages in thread
From: Nemo Publius @ 2011-06-18 17:06 UTC (permalink / raw)
  To: linux-kernel

Suppose I  have a file descriptor referencing a TCP/IP socket in blocking mode.

Suppose select() reports that the descriptor is ready for reading.

If I then call recv() on that descriptor, can it _ever_ block?


According to the Linux select man page
(http://linux.die.net/man/2/select), the answer is yes:

"Under Linux, select() may report a socket file descriptor as "ready
for reading", while nevertheless a subsequent read blocks. This could
for example happen when data has arrived but upon examination has
wrong checksum and is discarded. There may be other circumstances in
which a file descriptor is spuriously reported as ready."


According to the POSIX specification for select
(http://pubs.opengroup.org/onlinepubs/9699919799/functions/select.html),
the answer is no:

"A descriptor shall be considered ready for reading when a call to an
input function with O_NONBLOCK clear would not block, whether or not
the function would transfer data successfully. (The function might
return data, an end-of-file indication, or an error other than one
indicating that it is blocked, and in each of these cases the
descriptor shall be considered ready for reading.)"


There are only three possibilities:

1) I am mis-reading the POSIX spec.
2) The Linux select() man page is wrong.
3) Linux select() violates the POSIX spec.

So, which is it?

Thanks!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Does Linux select() violate POSIX?
  2011-06-18 17:06 Does Linux select() violate POSIX? Nemo Publius
@ 2011-06-18 17:43 ` Eric Dumazet
  2011-06-18 18:22   ` Nemo Publius
  2011-06-22 18:20 ` Chris Friesen
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2011-06-18 17:43 UTC (permalink / raw)
  To: Nemo Publius; +Cc: linux-kernel

Le samedi 18 juin 2011 à 10:06 -0700, Nemo Publius a écrit :
> Suppose I  have a file descriptor referencing a TCP/IP socket in blocking mode.
> 
> Suppose select() reports that the descriptor is ready for reading.
> 
> If I then call recv() on that descriptor, can it _ever_ block?
> 
> 
> According to the Linux select man page
> (http://linux.die.net/man/2/select), the answer is yes:
> 
> "Under Linux, select() may report a socket file descriptor as "ready
> for reading", while nevertheless a subsequent read blocks. This could
> for example happen when data has arrived but upon examination has
> wrong checksum and is discarded. There may be other circumstances in
> which a file descriptor is spuriously reported as ready."
> 

Only UDP can currently do that, not TCP, if NIC did not already verified
the checksum.

So the answer to your question is no.

> 
> According to the POSIX specification for select
> (http://pubs.opengroup.org/onlinepubs/9699919799/functions/select.html),
> the answer is no:
> 
> "A descriptor shall be considered ready for reading when a call to an
> input function with O_NONBLOCK clear would not block, whether or not
> the function would transfer data successfully. (The function might
> return data, an end-of-file indication, or an error other than one
> indicating that it is blocked, and in each of these cases the
> descriptor shall be considered ready for reading.)"
> 
> 
> There are only three possibilities:
> 
> 1) I am mis-reading the POSIX spec.
> 2) The Linux select() man page is wrong.
> 3) Linux select() violates the POSIX spec.

We dont care, since every sane application using select() should also
use socket in non blocking mode.

Between time select()/poll() says 'OK you can go', and time you enter
kernel, conditions might have changed. For example, maybe kernel memory
is not available and a send() would _block_, even if socket queue is
empty.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Does Linux select() violate POSIX?
  2011-06-18 17:43 ` Eric Dumazet
@ 2011-06-18 18:22   ` Nemo Publius
  2011-06-18 18:33     ` Alan Cox
  0 siblings, 1 reply; 10+ messages in thread
From: Nemo Publius @ 2011-06-18 18:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel

First, thank you for your reply.

On Sat, Jun 18, 2011 at 10:43 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le samedi 18 juin 2011 à 10:06 -0700, Nemo Publius a écrit :
>> Suppose I  have a file descriptor referencing a TCP/IP socket in blocking mode.
>>
>> Suppose select() reports that the descriptor is ready for reading.
>>
>> If I then call recv() on that descriptor, can it _ever_ block?
>>
>>
>> According to the Linux select man page
>> (http://linux.die.net/man/2/select), the answer is yes:
>> ...
>
> Only UDP can currently do that, not TCP, if NIC did not already verified
> the checksum.
>
> So the answer to your question is no.

You mean the answer happens to be "no" for reading a TCP socket with
the current Linux implementation, but that is not considered part of
the interface, so the behavior could change in the future and so I
cannot depend on it?

>> According to the POSIX specification for select
>> (http://pubs.opengroup.org/onlinepubs/9699919799/functions/select.html),
>> the answer is no:
>> ...
>
>
> We dont care, since every sane application using select() should also
> use socket in non blocking mode.

This is simply not true for any POSIX-compliant operating system.
Which in this case happens to include every Unix ever written since
the beginning of time, apart from Linux.

Put another way...  The whole point of the POSIX spec is to allow me
to write portable code.  If every random Unix implementation makes up
its own mind about what is "sane" and violates the spec in arbitrary
and unpredictable ways, what is the point of having a spec?

> Between time select()/poll() says 'OK you can go', and time you enter
> kernel, conditions might have changed. For example, maybe kernel memory
> is not available and a send() would _block_, even if socket queue is
> empty.

Sounds like a kernel bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Does Linux select() violate POSIX?
  2011-06-18 18:22   ` Nemo Publius
@ 2011-06-18 18:33     ` Alan Cox
  2011-06-18 18:51       ` Nemo Publius
  0 siblings, 1 reply; 10+ messages in thread
From: Alan Cox @ 2011-06-18 18:33 UTC (permalink / raw)
  To: Nemo Publius; +Cc: Eric Dumazet, linux-kernel

> > We dont care, since every sane application using select() should also
> > use socket in non blocking mode.
> 
> This is simply not true for any POSIX-compliant operating system.
> Which in this case happens to include every Unix ever written since
> the beginning of time, apart from Linux.

Actually no - there are lots of device cases where instantaneously it is
true that a read would not block but the condition then changes again.

An obvious simple example beyond that is a socket with two readers.

> Put another way...  The whole point of the POSIX spec is to allow me
> to write portable code.  If every random Unix implementation makes up
> its own mind about what is "sane" and violates the spec in arbitrary
> and unpredictable ways, what is the point of having a spec?

Linux follows Posix generally, but nobody writes portable code that does
blocking reads on a poll/select interface because there are a bazillion
ways it can then block - events read by other tasks, discards due to
memory exhaustion, events that are cleared the other end, etc.

> > Between time select()/poll() says 'OK you can go', and time you enter
> > kernel, conditions might have changed. For example, maybe kernel memory
> > is not available and a send() would _block_, even if socket queue is
> > empty.
> 
> Sounds like a kernel bug.

It's a design decision and a huge performance win. It's one of the areas
where POSIX read in its strictest form cripples your performance.

Alan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Does Linux select() violate POSIX?
  2011-06-18 18:33     ` Alan Cox
@ 2011-06-18 18:51       ` Nemo Publius
  2011-06-19 14:41         ` Bernd Petrovitsch
  0 siblings, 1 reply; 10+ messages in thread
From: Nemo Publius @ 2011-06-18 18:51 UTC (permalink / raw)
  To: Alan Cox; +Cc: Eric Dumazet, linux-kernel

On Sat, Jun 18, 2011 at 11:33 AM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

>> This is simply not true for any POSIX-compliant operating system.
>> Which in this case happens to include every Unix ever written since
>> the beginning of time, apart from Linux.
>
> Actually no - there are lots of device cases where instantaneously it is
> true that a read would not block but the condition then changes again.

Well, not to be contentious, but...  Can you identify any Unix other
than Linux where this is allowed to happen?  I am pretty sure BSD (for
example) takes pains to avoid it.

> An obvious simple example beyond that is a socket with two readers.

With any "test something, then assume result of test" sequence,
obviously I can have race conditions with multiple processes or
threads.  I mean, had I asked, "I call write() and then lseek() to
where I started and then read() on a file; am I guaranteed to read
back what I wrote?"  And you said no, because some other process could
write in the meantime...  I would say that is technically true but not
at all what I was asking.

This is the same thing.  Of course I am talking about select()
followed by recv() without any intervening user-space operations on
the descriptor.

> Linux follows Posix generally, but nobody writes portable code that does
> blocking reads on a poll/select interface because there are a bazillion
> ways it can then block - events read by other tasks, discards due to
> memory exhaustion, events that are cleared the other end, etc.

Only if I wrote my program that way...  Or if I am running on Linux.

> It's a design decision and a huge performance win. It's one of the areas
> where POSIX read in its strictest form cripples your performance.

A fair answer.  :-)

So in short, Linux deliberately chooses non-compliance here because
(a) it is a "huge performance win" and (b) there is an easy
work-around that (c) you usually want to be doing anyway.  That
answers my question.

Thank you for taking the time to reply, Alan.  I was hoping for an
"authoritative" response, and yours certainly qualifies.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Does Linux select() violate POSIX?
  2011-06-18 18:51       ` Nemo Publius
@ 2011-06-19 14:41         ` Bernd Petrovitsch
  2011-06-19 22:21           ` Nemo Publius
  0 siblings, 1 reply; 10+ messages in thread
From: Bernd Petrovitsch @ 2011-06-19 14:41 UTC (permalink / raw)
  To: Nemo Publius; +Cc: Alan Cox, Eric Dumazet, linux-kernel

On Sam, 2011-06-18 at 11:51 -0700, Nemo Publius wrote:
[...] 
> With any "test something, then assume result of test" sequence,
> obviously I can have race conditions with multiple processes or
> threads.  I mean, had I asked, "I call write() and then lseek() to

ACK.

> where I started and then read() on a file; am I guaranteed to read
> back what I wrote?"  And you said no, because some other process could
> write in the meantime...  I would say that is technically true but not
> at all what I was asking.

Then you should reformulate your question because the answer is
technically correct.
If the (technically correct!) answer does not help you, you asked the
wrong question.
It's as simple as that.

Kind regards,
Bernd
-- 
Bernd Petrovitsch                  Email : bernd@petrovitsch.priv.at
                     LUGA : http://www.luga.at


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Does Linux select() violate POSIX?
  2011-06-19 14:41         ` Bernd Petrovitsch
@ 2011-06-19 22:21           ` Nemo Publius
  2011-06-19 22:32             ` Alan Cox
  0 siblings, 1 reply; 10+ messages in thread
From: Nemo Publius @ 2011-06-19 22:21 UTC (permalink / raw)
  To: Bernd Petrovitsch; +Cc: Alan Cox, Eric Dumazet, linux-kernel

On Sun, Jun 19, 2011 at 7:41 AM, Bernd Petrovitsch
<bernd@petrovitsch.priv.at> wrote:
> On Sam, 2011-06-18 at 11:51 -0700, Nemo Publius wrote:
>
> Then you should reformulate your question because the answer is
> technically correct.

First of all, had you bothered to read the very next sentence in the
Email to which you felt the need to reply, you would find I did
precisely that:

    "Of course I am talking about select() followed by recv() without
any intervening user-space operations on the descriptor."

Second, you are wrong.  I basically asked, "Is select() followed by
recv() guaranteed not to block?"

Possible answers include:

"No; your computer might crash."

"No; your kernel image might be corrupt."

"No; space aliens might destroy the earth."

"No; some other process might access the descriptor in the meantime."

All of these are "technically" correct.  All of them are also
completely useless.  You -- and everyone else who read my question --
know _exactly_ what I was asking.

And I got my answer, which is yes, Linux select() violates POSIX, and
that decision is deliberate.

But again, thank you so much for your valuable contribution to the discussion.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Does Linux select() violate POSIX?
  2011-06-19 22:21           ` Nemo Publius
@ 2011-06-19 22:32             ` Alan Cox
  2011-06-19 22:45               ` Nemo Publius
  0 siblings, 1 reply; 10+ messages in thread
From: Alan Cox @ 2011-06-19 22:32 UTC (permalink / raw)
  To: Nemo Publius; +Cc: Bernd Petrovitsch, Eric Dumazet, linux-kernel

> And I got my answer, which is yes, Linux select() violates POSIX, and
> that decision is deliberate.
> 
> But again, thank you so much for your valuable contribution to the discussion.

It's worth noting that the POSIX semantics are actually unimplementable
for some network protocols anyway particularly on send. TCP is a fine
example. A remote TCP isn't *supposed* to shrink its window but they can
do, and that that point the space select() saw for a send is closed down
again by the remote host.

All sorts of similar issues appear all over the place. There are also
interesting API corner cases such as the behaviour of

	listen()
	select
			connection made
	select returns
			remote closes connection
	accept
			behaviour is not determinate

(and in general POSIX doens't address sockets well)

So for portable code always mix select and poll with non blocking I/O. It
doesn't matter what the specs say, the real world says drive defensively
8)

Alan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Does Linux select() violate POSIX?
  2011-06-19 22:32             ` Alan Cox
@ 2011-06-19 22:45               ` Nemo Publius
  0 siblings, 0 replies; 10+ messages in thread
From: Nemo Publius @ 2011-06-19 22:45 UTC (permalink / raw)
  To: Alan Cox; +Cc: Bernd Petrovitsch, Eric Dumazet, linux-kernel

On Sun, Jun 19, 2011 at 3:32 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>
> It's worth noting that the POSIX semantics are actually unimplementable
> for some network protocols anyway particularly on send. TCP is a fine
> example. A remote TCP isn't *supposed* to shrink its window but they can
> do, and that that point the space select() saw for a send is closed down
> again by the remote host.

Which makes me wonder what *BSD does for such a situation.  Although
not enough to check the source.  :-)

> All sorts of similar issues appear all over the place. There are also
> interesting API corner cases such as the behaviour of
>
>        listen()
>        select
>                        connection made
>        select returns
>                        remote closes connection
>        accept
>                        behaviour is not determinate

Hm, I thought this was what ECONNABORTED was for?

That is, accept() might return ECONNABORTED, or it might return a
descriptor and then a later operation on that descriptor would fail
with ECONNRESET...  But either way, select() followed by accept() need
not block.

> (and in general POSIX doens't address sockets well)

Well, no argument there.

> So for portable code always mix select and poll with non blocking I/O. It
> doesn't matter what the specs say, the real world says drive defensively
> 8)

No argument here, either.  This was mostly for a barroom bet (well,
StackOverflow...  same thing), but also because I was curious.  There
are not a lot of ways in which Linux chooses to violate POSIX.  Which
might make a fun list to put together, come to think of it.

Thanks again, Alan.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Does Linux select() violate POSIX?
  2011-06-18 17:06 Does Linux select() violate POSIX? Nemo Publius
  2011-06-18 17:43 ` Eric Dumazet
@ 2011-06-22 18:20 ` Chris Friesen
  1 sibling, 0 replies; 10+ messages in thread
From: Chris Friesen @ 2011-06-22 18:20 UTC (permalink / raw)
  To: Nemo Publius; +Cc: linux-kernel

On 06/18/2011 11:06 AM, Nemo Publius wrote:
> Suppose I  have a file descriptor referencing a TCP/IP socket in blocking mode.
>
> Suppose select() reports that the descriptor is ready for reading.
>
> If I then call recv() on that descriptor, can it _ever_ block?

There was a long discussion about this back in 2004.

http://lkml.org/lkml/2004/10/6/117

Based on that discussion and the need to deal with legacy apps, 
udp_poll() has special-case code to handle blocking sockets--it 
validates the checksum before declaring the socket readable.  This costs 
some performance, so for non-blocking sockets the checksum validation is 
deferred until later when it will be hot in the cache due to the copy to 
userspace.

Other protocols may not handle this and so the warning is still valid in 
general.

Chris


-- 
Chris Friesen
Software Developer
GENBAND
chris.friesen@genband.com
www.genband.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-06-22 18:26 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-18 17:06 Does Linux select() violate POSIX? Nemo Publius
2011-06-18 17:43 ` Eric Dumazet
2011-06-18 18:22   ` Nemo Publius
2011-06-18 18:33     ` Alan Cox
2011-06-18 18:51       ` Nemo Publius
2011-06-19 14:41         ` Bernd Petrovitsch
2011-06-19 22:21           ` Nemo Publius
2011-06-19 22:32             ` Alan Cox
2011-06-19 22:45               ` Nemo Publius
2011-06-22 18:20 ` Chris Friesen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).