netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* udp sendmsg ENOBUFS clarification
@ 2019-09-17 20:20 Josh Hunt
  2019-09-18 15:35 ` Willem de Bruijn
  0 siblings, 1 reply; 3+ messages in thread
From: Josh Hunt @ 2019-09-17 20:20 UTC (permalink / raw)
  To: netdev, Eric Dumazet, David Miller, Willem de Bruijn

I was running some tests recently with the udpgso_bench_tx benchmark in 
selftests and noticed that in some configurations it reported sending 
more than line rate! Looking into it more I found that I was overflowing 
the qdisc queue and so it was sending back NET_XMIT_DROP however this 
error did not propagate back up to the application and so it assumed 
whatever it sent was done successfully. That's when I learned about 
IP_RECVERR and saw that the benchmark isn't using that socket option.

That's all fairly straightforward, but what I was hoping to get 
clarification on is where is the line drawn on when or when not to send 
ENOBUFS back to the application if IP_RECVERR is *not* set? My guess 
based on going through the code is that as long as the packet leaves the 
stack (in this case sent to the qdisc) that's where we stop reporting 
ENOBUFS back to the application, but can someone confirm?

For example, we sanitize the error in udp_send_skb():
send:
         err = ip_send_skb(sock_net(sk), skb);
         if (err) {
                 if (err == -ENOBUFS && !inet->recverr) {
                         UDP_INC_STATS(sock_net(sk),
                                       UDP_MIB_SNDBUFERRORS, is_udplite);
                         err = 0;
                 }
         } else


but in udp_sendmsg() we don't:

         if (err == -ENOBUFS || test_bit(SOCK_NOSPACE, 
&sk->sk_socket->flags)) {
                 UDP_INC_STATS(sock_net(sk),
                               UDP_MIB_SNDBUFERRORS, is_udplite);
         }
         return err;

In the case above it looks like we may only get ENOBUFS for allocation 
failures inside of the stack in udp_sendmsg() and so that's why we 
propagate the error back up to the application?

Somewhat related, while I was trying to find answer to the above I came 
across this thread https://patchwork.ozlabs.org/patch/32857/ It looks 
like the man send() man page still only says the following about -ENOBUFS:

  "The output queue for a network interface was full.
   This generally indicates that the interface has stopped sending,
   but may be caused by transient congestion.
   (Normally, this does not occur in Linux. Packets are just silently
   dropped when a device queue overflows.) "

but as Eric points out that's not true when IP_RECVERR is set on the 
socket. Was there an attempt to update the man page to reflect this, but 
it was rejected? I couldn't find any discussion on this.

Thanks
Josh

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: udp sendmsg ENOBUFS clarification
  2019-09-17 20:20 udp sendmsg ENOBUFS clarification Josh Hunt
@ 2019-09-18 15:35 ` Willem de Bruijn
  2019-09-19  2:59   ` Josh Hunt
  0 siblings, 1 reply; 3+ messages in thread
From: Willem de Bruijn @ 2019-09-18 15:35 UTC (permalink / raw)
  To: Josh Hunt; +Cc: netdev, Eric Dumazet, David Miller

On Tue, Sep 17, 2019 at 4:20 PM Josh Hunt <johunt@akamai.com> wrote:
>
> I was running some tests recently with the udpgso_bench_tx benchmark in
> selftests and noticed that in some configurations it reported sending
> more than line rate! Looking into it more I found that I was overflowing
> the qdisc queue and so it was sending back NET_XMIT_DROP however this
> error did not propagate back up to the application and so it assumed
> whatever it sent was done successfully. That's when I learned about
> IP_RECVERR and saw that the benchmark isn't using that socket option.
>
> That's all fairly straightforward, but what I was hoping to get
> clarification on is where is the line drawn on when or when not to send
> ENOBUFS back to the application if IP_RECVERR is *not* set? My guess
> based on going through the code is that as long as the packet leaves the
> stack (in this case sent to the qdisc) that's where we stop reporting
> ENOBUFS back to the application, but can someone confirm?

Once a packet is queued the system call may return, so any subsequent
drops after dequeue are not propagated back. The relevant rc is set in
__dev_xmit_skb on q->enqueue. On setups with multiple devices, such as
a tunnel or bonding path, enqueue on the lower device is similar not
propagated.

> For example, we sanitize the error in udp_send_skb():
> send:
>          err = ip_send_skb(sock_net(sk), skb);
>          if (err) {
>                  if (err == -ENOBUFS && !inet->recverr) {
>                          UDP_INC_STATS(sock_net(sk),
>                                        UDP_MIB_SNDBUFERRORS, is_udplite);
>                          err = 0;
>                  }
>          } else
>
>
> but in udp_sendmsg() we don't:
>
>          if (err == -ENOBUFS || test_bit(SOCK_NOSPACE,
> &sk->sk_socket->flags)) {
>                  UDP_INC_STATS(sock_net(sk),
>                                UDP_MIB_SNDBUFERRORS, is_udplite);
>          }
>          return err;

That's interesting. My --incorrect-- understanding until now had been
that IP_RECVERR does nothing but enable optional extra detailed error
reporting on top of system call error codes.

But indeed it enables backpressure being reported as a system call
error that is suppressed otherwise. I don't know why. The behavior
precedes git history.

> In the case above it looks like we may only get ENOBUFS for allocation
> failures inside of the stack in udp_sendmsg() and so that's why we
> propagate the error back up to the application?

Both the udp lockless fast path and the slow corked path go through
udp_send_skb, so the backpressure is suppressed consistently across
both cases.

Indeed the error handling in udp_sendmsg then is not related to
backpressure, but to other causes of ENOBUF, i.e., allocation failure.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: udp sendmsg ENOBUFS clarification
  2019-09-18 15:35 ` Willem de Bruijn
@ 2019-09-19  2:59   ` Josh Hunt
  0 siblings, 0 replies; 3+ messages in thread
From: Josh Hunt @ 2019-09-19  2:59 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: netdev, Eric Dumazet, David Miller

On 9/18/19 8:35 AM, Willem de Bruijn wrote:
> On Tue, Sep 17, 2019 at 4:20 PM Josh Hunt <johunt@akamai.com> wrote:
>>
>> I was running some tests recently with the udpgso_bench_tx benchmark in
>> selftests and noticed that in some configurations it reported sending
>> more than line rate! Looking into it more I found that I was overflowing
>> the qdisc queue and so it was sending back NET_XMIT_DROP however this
>> error did not propagate back up to the application and so it assumed
>> whatever it sent was done successfully. That's when I learned about
>> IP_RECVERR and saw that the benchmark isn't using that socket option.
>>
>> That's all fairly straightforward, but what I was hoping to get
>> clarification on is where is the line drawn on when or when not to send
>> ENOBUFS back to the application if IP_RECVERR is *not* set? My guess
>> based on going through the code is that as long as the packet leaves the
>> stack (in this case sent to the qdisc) that's where we stop reporting
>> ENOBUFS back to the application, but can someone confirm?
> 
> Once a packet is queued the system call may return, so any subsequent
> drops after dequeue are not propagated back. The relevant rc is set in
> __dev_xmit_skb on q->enqueue. On setups with multiple devices, such as
> a tunnel or bonding path, enqueue on the lower device is similar not
> propagated.

Yeah that makes total sense. Once it's enqueued you'd expect it to not 
be able to return an error, but in this particular case we get an error 
on enqueue so was surprised when it did not get back to the application.

> 
>> For example, we sanitize the error in udp_send_skb():
>> send:
>>           err = ip_send_skb(sock_net(sk), skb);
>>           if (err) {
>>                   if (err == -ENOBUFS && !inet->recverr) {
>>                           UDP_INC_STATS(sock_net(sk),
>>                                         UDP_MIB_SNDBUFERRORS, is_udplite);
>>                           err = 0;
>>                   }
>>           } else
>>
>>
>> but in udp_sendmsg() we don't:
>>
>>           if (err == -ENOBUFS || test_bit(SOCK_NOSPACE,
>> &sk->sk_socket->flags)) {
>>                   UDP_INC_STATS(sock_net(sk),
>>                                 UDP_MIB_SNDBUFERRORS, is_udplite);
>>           }
>>           return err;
> 
> That's interesting. My --incorrect-- understanding until now had been
> that IP_RECVERR does nothing but enable optional extra detailed error
> reporting on top of system call error codes.
> 
> But indeed it enables backpressure being reported as a system call
> error that is suppressed otherwise. I don't know why. The behavior
> precedes git history.

Yeah it's interesting. I wasn't able to find any documentation or 
discussion on it either which is why I figured I'd ask the question on 
netdev in case others know.

> 
>> In the case above it looks like we may only get ENOBUFS for allocation
>> failures inside of the stack in udp_sendmsg() and so that's why we
>> propagate the error back up to the application?
> 
> Both the udp lockless fast path and the slow corked path go through
> udp_send_skb, so the backpressure is suppressed consistently across
> both cases.
> 
> Indeed the error handling in udp_sendmsg then is not related to
> backpressure, but to other causes of ENOBUF, i.e., allocation failure.
> 

Yep. Thanks for going through this. We'll see if others have any 
comments. I will likely send a patch for the man page adding that you 
can get ENOBUFS on Linux but need to set IP_RECVERR as Eric pointed out 
in the patch I linked to previously.

Josh

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-09-19  2:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-17 20:20 udp sendmsg ENOBUFS clarification Josh Hunt
2019-09-18 15:35 ` Willem de Bruijn
2019-09-19  2:59   ` Josh Hunt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).