All of lore.kernel.org
 help / color / mirror / Atom feed
* How do I avoid recvmsg races with IP_RECVERR?
@ 2015-06-02 19:40 Andy Lutomirski
  2015-06-02 21:17 ` Hannes Frederic Sowa
  0 siblings, 1 reply; 10+ messages in thread
From: Andy Lutomirski @ 2015-06-02 19:40 UTC (permalink / raw)
  To: Network Development

As far as I can tell, enabling IP_RECVERR causes the presence of a
queued error to cause recvmsg, etc to return an error (once).  It's
worse, though: a new error can be queued asynchronously at any time,
this setting sk_err to a nonzero value.  How do I sensibly distinguish
recvmsg failures to to genuine errors receiving messages from recvmsg
failures because there's a queued error?

The only way I can see to get reliable error handling is to literally
call recvmsg in a loop:

while (true /* or while POLLIN is set */) {
  int ret = recvmsg(..., MSG_ERRQUEUE not set);
  if (ret < 0 && /* what goes here? */) {
    whoops!  this might be a harmless asynchronous error!
    take no action!
  }

  /* if POLLERR (or maybe unconditionally), recvmsg(..., MSG_ERRQUEUE);
}

The problem is that, if I'm screwing something up (thus causing EINVAL
or something similar), this will just spin forever.

Am I missing something here?  Would it make sense to add
MSG_IGNORE_ERROR to suppress the sock_error check or IP_RECVERR=2 to
stop setting sk_err?


Thanks,
Andy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How do I avoid recvmsg races with IP_RECVERR?
  2015-06-02 19:40 How do I avoid recvmsg races with IP_RECVERR? Andy Lutomirski
@ 2015-06-02 21:17 ` Hannes Frederic Sowa
  2015-06-02 21:33   ` Andy Lutomirski
  0 siblings, 1 reply; 10+ messages in thread
From: Hannes Frederic Sowa @ 2015-06-02 21:17 UTC (permalink / raw)
  To: Andy Lutomirski, Network Development

On Tue, Jun 2, 2015, at 21:40, Andy Lutomirski wrote:
> As far as I can tell, enabling IP_RECVERR causes the presence of a
> queued error to cause recvmsg, etc to return an error (once).  It's
> worse, though: a new error can be queued asynchronously at any time,
> this setting sk_err to a nonzero value.  How do I sensibly distinguish
> recvmsg failures to to genuine errors receiving messages from recvmsg
> failures because there's a queued error?
> 
> The only way I can see to get reliable error handling is to literally
> call recvmsg in a loop:
> 
> while (true /* or while POLLIN is set */) {
>   int ret = recvmsg(..., MSG_ERRQUEUE not set);
>   if (ret < 0 && /* what goes here? */) {
>     whoops!  this might be a harmless asynchronous error!
>     take no action!
>   }

I see either two possibilities:

We export the icmp_err_convert tables along with the udp_lib_err error
conversions to user space and spice them up with flags to mark if they
are transient (icmp_err_convert already has a fatal flag).

Otherwise you should be able to call recvmsg with MSG_ERRQUEUE set after
you got a ret < 0 when calling without MSG_ERRQUEUE and inspect the
sock_extended_err, no?

Bye,
Hannes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How do I avoid recvmsg races with IP_RECVERR?
  2015-06-02 21:17 ` Hannes Frederic Sowa
@ 2015-06-02 21:33   ` Andy Lutomirski
  2015-06-02 21:42     ` Hannes Frederic Sowa
  0 siblings, 1 reply; 10+ messages in thread
From: Andy Lutomirski @ 2015-06-02 21:33 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Andy Lutomirski, Network Development

On Tue, Jun 2, 2015 at 2:17 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Tue, Jun 2, 2015, at 21:40, Andy Lutomirski wrote:
>> As far as I can tell, enabling IP_RECVERR causes the presence of a
>> queued error to cause recvmsg, etc to return an error (once).  It's
>> worse, though: a new error can be queued asynchronously at any time,
>> this setting sk_err to a nonzero value.  How do I sensibly distinguish
>> recvmsg failures to to genuine errors receiving messages from recvmsg
>> failures because there's a queued error?
>>
>> The only way I can see to get reliable error handling is to literally
>> call recvmsg in a loop:
>>
>> while (true /* or while POLLIN is set */) {
>>   int ret = recvmsg(..., MSG_ERRQUEUE not set);
>>   if (ret < 0 && /* what goes here? */) {
>>     whoops!  this might be a harmless asynchronous error!
>>     take no action!
>>   }
>
> I see either two possibilities:
>
> We export the icmp_err_convert tables along with the udp_lib_err error
> conversions to user space and spice them up with flags to mark if they
> are transient (icmp_err_convert already has a fatal flag).

This seems overcomplicated.  I'd rather have a flag I pass to tell the
kernel that I don't want to see transient errors (nd that I'll clear
them myself using POLLERR and either MSG_ERRQUEUE or SO_ERROR.

>
> Otherwise you should be able to call recvmsg with MSG_ERRQUEUE set after
> you got a ret < 0 when calling without MSG_ERRQUEUE and inspect the
> sock_extended_err, no?

I do this already, which makes me think that there's a bug or another
race somewhere.  I've only seen a failure once in several years of
operation.

The failure happened on a ping socket.  I suspect that the race is:

ping_err: ip_icmp_error(...);

user: recvmsg(MSG_ERRQUEUE) and dequeues the error.

ping_err: sk_err = err;

user: recvmsg(MSG_ERRQUEUE not set), and recvmsg sees and clears the
error via sock_error.

user: recvmsg(MSG_ERRQUEUE), and recvmsg returns -EAGAIN.

Now the user code thinks that it was a real (non-transient) error and aborts.

Shouldn't that sk->sk_err = err assignment at least use WRITE_ONCE?

Even if this race were fixed, this interface still sucks IMO.

--Andy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How do I avoid recvmsg races with IP_RECVERR?
  2015-06-02 21:33   ` Andy Lutomirski
@ 2015-06-02 21:42     ` Hannes Frederic Sowa
  2015-06-02 21:49       ` Andy Lutomirski
  2015-06-02 21:50       ` Hannes Frederic Sowa
  0 siblings, 2 replies; 10+ messages in thread
From: Hannes Frederic Sowa @ 2015-06-02 21:42 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Andy Lutomirski, Network Development

On Tue, Jun 2, 2015, at 23:33, Andy Lutomirski wrote:
> On Tue, Jun 2, 2015 at 2:17 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > On Tue, Jun 2, 2015, at 21:40, Andy Lutomirski wrote:
> >> As far as I can tell, enabling IP_RECVERR causes the presence of a
> >> queued error to cause recvmsg, etc to return an error (once).  It's
> >> worse, though: a new error can be queued asynchronously at any time,
> >> this setting sk_err to a nonzero value.  How do I sensibly distinguish
> >> recvmsg failures to to genuine errors receiving messages from recvmsg
> >> failures because there's a queued error?
> >>
> >> The only way I can see to get reliable error handling is to literally
> >> call recvmsg in a loop:
> >>
> >> while (true /* or while POLLIN is set */) {
> >>   int ret = recvmsg(..., MSG_ERRQUEUE not set);
> >>   if (ret < 0 && /* what goes here? */) {
> >>     whoops!  this might be a harmless asynchronous error!
> >>     take no action!
> >>   }
> >
> > I see either two possibilities:
> >
> > We export the icmp_err_convert tables along with the udp_lib_err error
> > conversions to user space and spice them up with flags to mark if they
> > are transient (icmp_err_convert already has a fatal flag).
> 
> This seems overcomplicated.  I'd rather have a flag I pass to tell the
> kernel that I don't want to see transient errors (nd that I'll clear
> them myself using POLLERR and either MSG_ERRQUEUE or SO_ERROR.
> 
> >
> > Otherwise you should be able to call recvmsg with MSG_ERRQUEUE set after
> > you got a ret < 0 when calling without MSG_ERRQUEUE and inspect the
> > sock_extended_err, no?
> 
> I do this already, which makes me think that there's a bug or another
> race somewhere.  I've only seen a failure once in several years of
> operation.
> 
> The failure happened on a ping socket.  I suspect that the race is:
> 
> ping_err: ip_icmp_error(...);
> 
> user: recvmsg(MSG_ERRQUEUE) and dequeues the error.
> 
> ping_err: sk_err = err;
> 
> user: recvmsg(MSG_ERRQUEUE not set), and recvmsg sees and clears the
> error via sock_error.
> 
> user: recvmsg(MSG_ERRQUEUE), and recvmsg returns -EAGAIN.
> 
> Now the user code thinks that it was a real (non-transient) error and
> aborts.
> 
> Shouldn't that sk->sk_err = err assignment at least use WRITE_ONCE?

Hmm, I don't think this will help.

> Even if this race were fixed, this interface still sucks IMO.

Yes. :/

My proposal would be to make the error conversion lazy:

Keeping duplicate data is not a good idea in general: So we shouldn't
use sk->sk_err if IP_RECVERR is set at all but let sock_error just use
the sk_error_queue and extract the error code from there.

Only if IP_RECVERR was not set, we use sk->sk_err logic.

What do you think?

Bye,
Hannes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How do I avoid recvmsg races with IP_RECVERR?
  2015-06-02 21:42     ` Hannes Frederic Sowa
@ 2015-06-02 21:49       ` Andy Lutomirski
  2015-06-02 21:50       ` Hannes Frederic Sowa
  1 sibling, 0 replies; 10+ messages in thread
From: Andy Lutomirski @ 2015-06-02 21:49 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Andy Lutomirski, Network Development

On Tue, Jun 2, 2015 at 2:42 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Tue, Jun 2, 2015, at 23:33, Andy Lutomirski wrote:
>> On Tue, Jun 2, 2015 at 2:17 PM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>> > On Tue, Jun 2, 2015, at 21:40, Andy Lutomirski wrote:
>> >> As far as I can tell, enabling IP_RECVERR causes the presence of a
>> >> queued error to cause recvmsg, etc to return an error (once).  It's
>> >> worse, though: a new error can be queued asynchronously at any time,
>> >> this setting sk_err to a nonzero value.  How do I sensibly distinguish
>> >> recvmsg failures to to genuine errors receiving messages from recvmsg
>> >> failures because there's a queued error?
>> >>
>> >> The only way I can see to get reliable error handling is to literally
>> >> call recvmsg in a loop:
>> >>
>> >> while (true /* or while POLLIN is set */) {
>> >>   int ret = recvmsg(..., MSG_ERRQUEUE not set);
>> >>   if (ret < 0 && /* what goes here? */) {
>> >>     whoops!  this might be a harmless asynchronous error!
>> >>     take no action!
>> >>   }
>> >
>> > I see either two possibilities:
>> >
>> > We export the icmp_err_convert tables along with the udp_lib_err error
>> > conversions to user space and spice them up with flags to mark if they
>> > are transient (icmp_err_convert already has a fatal flag).
>>
>> This seems overcomplicated.  I'd rather have a flag I pass to tell the
>> kernel that I don't want to see transient errors (nd that I'll clear
>> them myself using POLLERR and either MSG_ERRQUEUE or SO_ERROR.
>>
>> >
>> > Otherwise you should be able to call recvmsg with MSG_ERRQUEUE set after
>> > you got a ret < 0 when calling without MSG_ERRQUEUE and inspect the
>> > sock_extended_err, no?
>>
>> I do this already, which makes me think that there's a bug or another
>> race somewhere.  I've only seen a failure once in several years of
>> operation.
>>
>> The failure happened on a ping socket.  I suspect that the race is:
>>
>> ping_err: ip_icmp_error(...);
>>
>> user: recvmsg(MSG_ERRQUEUE) and dequeues the error.
>>
>> ping_err: sk_err = err;
>>
>> user: recvmsg(MSG_ERRQUEUE not set), and recvmsg sees and clears the
>> error via sock_error.
>>
>> user: recvmsg(MSG_ERRQUEUE), and recvmsg returns -EAGAIN.
>>
>> Now the user code thinks that it was a real (non-transient) error and
>> aborts.
>>
>> Shouldn't that sk->sk_err = err assignment at least use WRITE_ONCE?
>
> Hmm, I don't think this will help.

It won't help this race, but it'll at least make it clearer that the
code has some kind of reasonably well-defined semantics.

>
>> Even if this race were fixed, this interface still sucks IMO.
>
> Yes. :/
>
> My proposal would be to make the error conversion lazy:
>
> Keeping duplicate data is not a good idea in general: So we shouldn't
> use sk->sk_err if IP_RECVERR is set at all but let sock_error just use
> the sk_error_queue and extract the error code from there.
>
> Only if IP_RECVERR was not set, we use sk->sk_err logic.
>
> What do you think?

That seems entirely sensible to me, except that it might break some
existing application.  There's also this code:

    if ((family == AF_INET && !inet_sock->recverr) ||
        (family == AF_INET6 && !inet6_sk(sk)->recverr)) {
        if (!harderr || sk->sk_state != TCP_ESTABLISHED)
            goto out;  <-- skips the assignment to sk_err

which means that recverr kind of has the opposite semantics right now.

In fact, the man page agrees with the current behavior (minus the race):

       IP_RECVERR (since Linux 2.2)
              Enable extended reliable error message passing.  When enabled on
              a datagram socket, all generated errors will be queued in a per-
              socket error queue.  When the user  receives  an  error  from  a
              socket   operation,  the  errors  can  be  received  by  calling
              recvmsg(2)   with    the    MSG_ERRQUEUE    flag    set.     The
              sock_extended_err  structure describing the error will be passed
              in an ancillary message with the type IP_RECVERR and  the  level
              IPPROTO_IP.   This  is  useful  for  reliable  error handling on
              unconnected sockets.  The received data  portion  of  the  error
              queue contains the error packet.

The sensible semantics would be to change this to "When the user
receives POLLERR, the errors can be received...".  So maybe there
should be another value for IP_RECVERR to opt in to the alternate
semantics.

--Andy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How do I avoid recvmsg races with IP_RECVERR?
  2015-06-02 21:42     ` Hannes Frederic Sowa
  2015-06-02 21:49       ` Andy Lutomirski
@ 2015-06-02 21:50       ` Hannes Frederic Sowa
  2015-06-03  0:03         ` Andy Lutomirski
  1 sibling, 1 reply; 10+ messages in thread
From: Hannes Frederic Sowa @ 2015-06-02 21:50 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Andy Lutomirski, Network Development

On Tue, Jun 2, 2015, at 23:42, Hannes Frederic Sowa wrote:
> On Tue, Jun 2, 2015, at 23:33, Andy Lutomirski wrote:
> > On Tue, Jun 2, 2015 at 2:17 PM, Hannes Frederic Sowa
> > <hannes@stressinduktion.org> wrote:
> > > On Tue, Jun 2, 2015, at 21:40, Andy Lutomirski wrote:
> >
> > [...]
> >
> > I do this already, which makes me think that there's a bug or another
> > race somewhere.  I've only seen a failure once in several years of
> > operation.
> > 
> > The failure happened on a ping socket.  I suspect that the race is:
> > 
> > ping_err: ip_icmp_error(...);
> > 
> > user: recvmsg(MSG_ERRQUEUE) and dequeues the error.
> > 
> > ping_err: sk_err = err;
> > 
> > user: recvmsg(MSG_ERRQUEUE not set), and recvmsg sees and clears the
> > error via sock_error.
> > 
> > user: recvmsg(MSG_ERRQUEUE), and recvmsg returns -EAGAIN.
> > 
> > Now the user code thinks that it was a real (non-transient) error and
> > aborts.
> > 
> > Shouldn't that sk->sk_err = err assignment at least use WRITE_ONCE?
> 
> Hmm, I don't think this will help.
> 
> > Even if this race were fixed, this interface still sucks IMO.
> 
> Yes. :/
> 
> My proposal would be to make the error conversion lazy:
> 
> Keeping duplicate data is not a good idea in general: So we shouldn't
> use sk->sk_err if IP_RECVERR is set at all but let sock_error just use
> the sk_error_queue and extract the error code from there.
> 
> Only if IP_RECVERR was not set, we use sk->sk_err logic.
> 
> What do you think?

I just noticed that this will probably break existing user space
applications which require that icmp errors are transient even with
IP_RECVERR. We can mark that with a bit in the sk_error_queue pointer
and xchg the pointer, hmmm....

Bye,
Hannes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How do I avoid recvmsg races with IP_RECVERR?
  2015-06-02 21:50       ` Hannes Frederic Sowa
@ 2015-06-03  0:03         ` Andy Lutomirski
  2015-06-03  0:33           ` Hannes Frederic Sowa
  0 siblings, 1 reply; 10+ messages in thread
From: Andy Lutomirski @ 2015-06-03  0:03 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Andy Lutomirski, Network Development

On Tue, Jun 2, 2015 at 2:50 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Tue, Jun 2, 2015, at 23:42, Hannes Frederic Sowa wrote:
>> On Tue, Jun 2, 2015, at 23:33, Andy Lutomirski wrote:
>> > On Tue, Jun 2, 2015 at 2:17 PM, Hannes Frederic Sowa
>> > <hannes@stressinduktion.org> wrote:
>> > > On Tue, Jun 2, 2015, at 21:40, Andy Lutomirski wrote:
>> >
>> > [...]
>> >
>> > I do this already, which makes me think that there's a bug or another
>> > race somewhere.  I've only seen a failure once in several years of
>> > operation.
>> >
>> > The failure happened on a ping socket.  I suspect that the race is:
>> >
>> > ping_err: ip_icmp_error(...);
>> >
>> > user: recvmsg(MSG_ERRQUEUE) and dequeues the error.
>> >
>> > ping_err: sk_err = err;
>> >
>> > user: recvmsg(MSG_ERRQUEUE not set), and recvmsg sees and clears the
>> > error via sock_error.
>> >
>> > user: recvmsg(MSG_ERRQUEUE), and recvmsg returns -EAGAIN.
>> >
>> > Now the user code thinks that it was a real (non-transient) error and
>> > aborts.
>> >
>> > Shouldn't that sk->sk_err = err assignment at least use WRITE_ONCE?
>>
>> Hmm, I don't think this will help.
>>
>> > Even if this race were fixed, this interface still sucks IMO.
>>
>> Yes. :/
>>
>> My proposal would be to make the error conversion lazy:
>>
>> Keeping duplicate data is not a good idea in general: So we shouldn't
>> use sk->sk_err if IP_RECVERR is set at all but let sock_error just use
>> the sk_error_queue and extract the error code from there.
>>
>> Only if IP_RECVERR was not set, we use sk->sk_err logic.
>>
>> What do you think?
>
> I just noticed that this will probably break existing user space
> applications which require that icmp errors are transient even with
> IP_RECVERR. We can mark that with a bit in the sk_error_queue pointer
> and xchg the pointer, hmmm....

Do you mean to fix the race like this but to otherwise leave the semantics
alone?  That would be an improvement, but it might be nice to also add
a non-crappy API for this, too.

--Andy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How do I avoid recvmsg races with IP_RECVERR?
  2015-06-03  0:03         ` Andy Lutomirski
@ 2015-06-03  0:33           ` Hannes Frederic Sowa
  2015-06-03  0:56             ` Andy Lutomirski
  2016-04-09  0:02             ` Andy Lutomirski
  0 siblings, 2 replies; 10+ messages in thread
From: Hannes Frederic Sowa @ 2015-06-03  0:33 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Andy Lutomirski, Network Development

On Wed, Jun 3, 2015, at 02:03, Andy Lutomirski wrote:
> On Tue, Jun 2, 2015 at 2:50 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> >> My proposal would be to make the error conversion lazy:
> >>
> >> Keeping duplicate data is not a good idea in general: So we shouldn't
> >> use sk->sk_err if IP_RECVERR is set at all but let sock_error just use
> >> the sk_error_queue and extract the error code from there.
> >>
> >> Only if IP_RECVERR was not set, we use sk->sk_err logic.
> >>
> >> What do you think?
> >
> > I just noticed that this will probably break existing user space
> > applications which require that icmp errors are transient even with
> > IP_RECVERR. We can mark that with a bit in the sk_error_queue pointer
> > and xchg the pointer, hmmm....
> 
> Do you mean to fix the race like this but to otherwise leave the
> semantics
> alone?  That would be an improvement, but it might be nice to also add
> a non-crappy API for this, too.

Yes, keep current semantics but fix the race you reported.

I currently don't have good proposals for a decent API to handle this
besides adding some ancillary cmsg data to msg_control. This still would
not solve the problem fundamentally, as a -EFAULT/-EINVAL return value
could also mean that msg_control should not be touched, thus we end up
again relying on errno checking. :/ Thus checking error queue after
receiving an error indications is my best hunch so far.

Your proposal with MSG_IGNORE_ERROR seems reasonable so far for ping or
udp, but I haven't fully grasped the TCP semantics of sk->sk_err, yet.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How do I avoid recvmsg races with IP_RECVERR?
  2015-06-03  0:33           ` Hannes Frederic Sowa
@ 2015-06-03  0:56             ` Andy Lutomirski
  2016-04-09  0:02             ` Andy Lutomirski
  1 sibling, 0 replies; 10+ messages in thread
From: Andy Lutomirski @ 2015-06-03  0:56 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Andy Lutomirski, Network Development

On Tue, Jun 2, 2015 at 5:33 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Wed, Jun 3, 2015, at 02:03, Andy Lutomirski wrote:
>> On Tue, Jun 2, 2015 at 2:50 PM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>> >> My proposal would be to make the error conversion lazy:
>> >>
>> >> Keeping duplicate data is not a good idea in general: So we shouldn't
>> >> use sk->sk_err if IP_RECVERR is set at all but let sock_error just use
>> >> the sk_error_queue and extract the error code from there.
>> >>
>> >> Only if IP_RECVERR was not set, we use sk->sk_err logic.
>> >>
>> >> What do you think?
>> >
>> > I just noticed that this will probably break existing user space
>> > applications which require that icmp errors are transient even with
>> > IP_RECVERR. We can mark that with a bit in the sk_error_queue pointer
>> > and xchg the pointer, hmmm....
>>
>> Do you mean to fix the race like this but to otherwise leave the
>> semantics
>> alone?  That would be an improvement, but it might be nice to also add
>> a non-crappy API for this, too.
>
> Yes, keep current semantics but fix the race you reported.
>
> I currently don't have good proposals for a decent API to handle this
> besides adding some ancillary cmsg data to msg_control. This still would
> not solve the problem fundamentally, as a -EFAULT/-EINVAL return value
> could also mean that msg_control should not be touched, thus we end up
> again relying on errno checking. :/ Thus checking error queue after
> receiving an error indications is my best hunch so far.
>
> Your proposal with MSG_IGNORE_ERROR seems reasonable so far for ping or
> udp, but I haven't fully grasped the TCP semantics of sk->sk_err, yet.

I always assumed that TCP didn't have transient errors.  Shouldn't a
connection either be up or down but not up with errors?  If that's
wrong, then it's probably worth understanding what's going on before
trying to design a fix.

--Andy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How do I avoid recvmsg races with IP_RECVERR?
  2015-06-03  0:33           ` Hannes Frederic Sowa
  2015-06-03  0:56             ` Andy Lutomirski
@ 2016-04-09  0:02             ` Andy Lutomirski
  1 sibling, 0 replies; 10+ messages in thread
From: Andy Lutomirski @ 2016-04-09  0:02 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Andy Lutomirski, Network Development

On Tue, Jun 2, 2015 at 5:33 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Wed, Jun 3, 2015, at 02:03, Andy Lutomirski wrote:
>> On Tue, Jun 2, 2015 at 2:50 PM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>> >> My proposal would be to make the error conversion lazy:
>> >>
>> >> Keeping duplicate data is not a good idea in general: So we shouldn't
>> >> use sk->sk_err if IP_RECVERR is set at all but let sock_error just use
>> >> the sk_error_queue and extract the error code from there.
>> >>
>> >> Only if IP_RECVERR was not set, we use sk->sk_err logic.
>> >>
>> >> What do you think?
>> >
>> > I just noticed that this will probably break existing user space
>> > applications which require that icmp errors are transient even with
>> > IP_RECVERR. We can mark that with a bit in the sk_error_queue pointer
>> > and xchg the pointer, hmmm....
>>
>> Do you mean to fix the race like this but to otherwise leave the
>> semantics
>> alone?  That would be an improvement, but it might be nice to also add
>> a non-crappy API for this, too.
>
> Yes, keep current semantics but fix the race you reported.
>
> I currently don't have good proposals for a decent API to handle this
> besides adding some ancillary cmsg data to msg_control. This still would
> not solve the problem fundamentally, as a -EFAULT/-EINVAL return value
> could also mean that msg_control should not be touched, thus we end up
> again relying on errno checking. :/ Thus checking error queue after
> receiving an error indications is my best hunch so far.
>
> Your proposal with MSG_IGNORE_ERROR seems reasonable so far for ping or
> udp, but I haven't fully grasped the TCP semantics of sk->sk_err, yet.

I was looking at this a bit, and I was thinking about adding a new
socket option, but I'm a bit vague on how all this fits together.

One option would be a socket option that simply causes sock_error to
return 0 (and change SO_ERROR to peek at sk_err directly).  But there
seem to be sock_error callers all over the place, and maybe this
change would cause problems.

Another option would be to add a socket option that explicitly turns
off everything that queues soft errors to sk_err.

I think that, for IP datagrams at least, the ideal semantics would be
for soft errors not to affect sk_err and for POLLERR to be set if the
error queue is nonempty.

--Andy

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-04-09  0:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-02 19:40 How do I avoid recvmsg races with IP_RECVERR? Andy Lutomirski
2015-06-02 21:17 ` Hannes Frederic Sowa
2015-06-02 21:33   ` Andy Lutomirski
2015-06-02 21:42     ` Hannes Frederic Sowa
2015-06-02 21:49       ` Andy Lutomirski
2015-06-02 21:50       ` Hannes Frederic Sowa
2015-06-03  0:03         ` Andy Lutomirski
2015-06-03  0:33           ` Hannes Frederic Sowa
2015-06-03  0:56             ` Andy Lutomirski
2016-04-09  0:02             ` Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.