netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* UDP sendto() fails with EINVAL when host under network load
@ 2018-12-27 21:13 charles cross
  2018-12-27 23:48 ` Stephen Hemminger
  0 siblings, 1 reply; 2+ messages in thread
From: charles cross @ 2018-12-27 21:13 UTC (permalink / raw)
  To: netdev

Hi netdev,
I've got an application that handles network traffic using various protocols. The application is comprised of a supervisor process and one or more worker processes that implement a watchdog that enables the supervisor to kill hung workers or detect when they've crashed and start new ones. Originally we had only a single worker process and the watchdog was comprised of a UDP socket on the loopback address through which the supervisor sends a health check to the worker and the healthy worker replies. When we improved the application to support multiple worker processes we were able to simply extend the watchdog to use multicast. This was accomplished with no significant change to the watchdog logic, i.e., just a matter of the workers joining the multicast group and replying with an ID when the the supervisor sends to the multicast group.

The new multicast watchdog works fine except under heavy load. Using the test program curl-loader we ramp up to several thousand http connections to the worker process. As the load builds the supervisor health check starts to fail intermittently and until it reaches 100% failure at peak load. The failure occurs on the origination of the healthcheck when sendto() fails with EINVAL. As the load drops, sendto() begins to succeed again. The arguments to sendto() do not change during the test. Using printk I have isolated the failure to udp_sendmsg() in net/ipv4/udp.c:

int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t len)

Within this function at this block

    /* Lockless fast path for the non-corking case. */
    if (!corkreq) {
        skb = ip_make_skb(sk, fl4, getfrag, msg->msg_iov, ulen,
                  sizeof(struct udphdr), &ipc, &rt,
                  msg->msg_flags);
        err = PTR_ERR(skb);
        if (!IS_ERR_OR_NULL(skb))
            err = udp_send_skb(skb, fl4);

               printk(KERN_ERR "%s goto out from line: %d\n",__FUNCTION__,__LINE__);
        goto out;
    }

the function udp_send_skb() is returning EINVAL.

The kernel is v3.10.0 from upstream RHEL 7.5. Can anyone offer advice before I proceed down the stack to look for the root cause? The behavior (failure under load but recovery after the load is removed) suggests contention for resources but the EINVAL return code makes no sense to me given the arguments to sendto() do not change. I am totally unfamiliar with this code so any help is appreciated.

Thanks,
Chris

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: UDP sendto() fails with EINVAL when host under network load
  2018-12-27 21:13 UDP sendto() fails with EINVAL when host under network load charles cross
@ 2018-12-27 23:48 ` Stephen Hemminger
  0 siblings, 0 replies; 2+ messages in thread
From: Stephen Hemminger @ 2018-12-27 23:48 UTC (permalink / raw)
  To: charles cross; +Cc: netdev

On Thu, 27 Dec 2018 16:13:29 -0500
charles cross <xcross59@icloud.com> wrote:

> The kernel is v3.10.0 from upstream RHEL 7.5. Can anyone offer advice before I proceed down the stack to look for the root cause? The behavior (failure under load but recovery after the load is removed) suggests contention for resources but the EINVAL return code makes no sense to me given the arguments to sendto() do not change. I am totally unfamiliar with this code so any help is appreciated.

RHEL is on a 5 1/2 year old kernel. So I doubt the upstream kernel developers are going to be much help here.
Can you reproduce it with 4.20?

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-12-27 23:48 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-27 21:13 UDP sendto() fails with EINVAL when host under network load charles cross
2018-12-27 23:48 ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).