From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B570C61DA4 for ; Fri, 27 Jan 2023 22:04:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232421AbjA0WEX (ORCPT ); Fri, 27 Jan 2023 17:04:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41208 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232381AbjA0WET (ORCPT ); Fri, 27 Jan 2023 17:04:19 -0500 Received: from mail-yb1-xb33.google.com (mail-yb1-xb33.google.com [IPv6:2607:f8b0:4864:20::b33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 905F662D20 for ; Fri, 27 Jan 2023 14:04:16 -0800 (PST) Received: by mail-yb1-xb33.google.com with SMTP id 129so7720175ybb.0 for ; Fri, 27 Jan 2023 14:04:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=5ohR+0rZmCaLAS7ieSQMPFcMY0YXXufvyIxmbzUJjpY=; b=QsatLhaFnWSXhPmdRslUYg6oNm7HsNFs6MeWjyLitafGucJCB84BxrprRLjnBSgA08 9Vx/7QVTiCgz05zEtILq63eX5YLEhToFKOwhCj9fUqZ20YeK1sUe/tUZ/DIkMFmgbIj+ mEaXlsCv5ZsMtg7XwB7Ri7a3n5ODRM/+bjA4Sr9BwI1IpDzkM2eeCxIXjipG41EmDC57 +qDvnMtE3WCC5hiFdyuSucCOtSy2veG4b3ZWcTYJ+zATEhGtANBzW15Q3Crajm534M36 OqIjXAEojFjJvUDEx9e31HsTOaiiMjbD3VB1lsQHFLPkrYf4XqhZlYHOio4e3IDU5jEa Z85Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=5ohR+0rZmCaLAS7ieSQMPFcMY0YXXufvyIxmbzUJjpY=; b=AFDiVX+teiLXrfy71QmJrmf25cQDikyAMU2y5dis9mKZyIEWheR/x5CCPqQ05lKSf8 HjI0s4tZ9WGto1cIiBeDRQC1xq7I//jA72AEuJVR7uldvPdCLYNK+qhTpDvXTuCR8xA8 vDRy47F4NGMa1OpHTNpbzrsD3A3qYX1kwi3qpe1oWTIUfGC/Y1FlgKqph9krPq9fBeqn YgjN3v6Aa0eq8tP09aGY7PNkoENsnxOoJMKIitumC/PyaV1ps39uoYXyM07NwToXQHwA cUMlLwy35K4n/1eTN4mghduIQ2UfXS+zOED8Qw82dMi3mB6cLIyNReBYu1TI4ztTwiIe pJ2g== X-Gm-Message-State: AFqh2kr/9w9E9Utgw0ueJ1Hq5jPJ/yS4k/kL90lqNpSHyN6K16n/bM5V dWdJFk/oKMR9ztM2LeMeg/o7RLi8jZOwVJe5LEQR9g== X-Google-Smtp-Source: AMrXdXvOK9+ujMEgBOAsX5NGN3GRpCK9UUEcoO25Huvd8luJUDO/qifWrBXYWnRUCH8A9y4wQ+xTMmoZNBOmn+KobLM= X-Received: by 2002:a25:750b:0:b0:7d0:f8e3:6d80 with SMTP id q11-20020a25750b000000b007d0f8e36d80mr3376773ybc.363.1674857055672; Fri, 27 Jan 2023 14:04:15 -0800 (PST) MIME-Version: 1.0 References: <20230127181625.286546-1-andrei.gherzan@canonical.com> In-Reply-To: <20230127181625.286546-1-andrei.gherzan@canonical.com> From: Willem de Bruijn Date: Fri, 27 Jan 2023 17:03:39 -0500 Message-ID: Subject: Re: [PATCH] selftests: net: udpgso_bench_tx: Introduce exponential back-off retries To: Andrei Gherzan Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Shuah Khan , netdev@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 27, 2023 at 1:16 PM Andrei Gherzan wrote: > > The tx and rx test programs are used in a couple of test scripts including > "udpgro_bench.sh". Taking this as an example, when the rx/tx programs > are invoked subsequently, there is a chance that the rx one is not ready to > accept socket connections. This racing bug could fail the test with at > least one of the following: > > ./udpgso_bench_tx: connect: Connection refused > ./udpgso_bench_tx: sendmsg: Connection refused > ./udpgso_bench_tx: write: Connection refused > > This change addresses this by adding routines that retry the socket > operations with an exponential back off algorithm from 100ms to 2s. > > Fixes: 3a687bef148d ("selftests: udp gso benchmark") > Signed-off-by: Andrei Gherzan Synchronizing the two processes is indeed tricky. Perhaps more robust is opening an initial TCP connection, with SO_RCVTIMEO to bound the waiting time. That covers all tests in one go. > --- > tools/testing/selftests/net/udpgso_bench_tx.c | 57 +++++++++++++------ > 1 file changed, 41 insertions(+), 16 deletions(-) > > diff --git a/tools/testing/selftests/net/udpgso_bench_tx.c b/tools/testing/selftests/net/udpgso_bench_tx.c > index f1fdaa270291..4dea9ee7eb46 100644 > --- a/tools/testing/selftests/net/udpgso_bench_tx.c > +++ b/tools/testing/selftests/net/udpgso_bench_tx.c > @@ -53,6 +53,9 @@ > > #define NUM_PKT 100 > > +#define MAX_DELAY 2000000 > +#define INIT_DELAY 100000 > + > static bool cfg_cache_trash; > static int cfg_cpu = -1; > static int cfg_connected = true; > @@ -257,13 +260,18 @@ static void flush_errqueue(int fd, const bool do_poll) > static int send_tcp(int fd, char *data) > { > int ret, done = 0, count = 0; > + useconds_t delay = INIT_DELAY; > > while (done < cfg_payload_len) { > - ret = send(fd, data + done, cfg_payload_len - done, > - cfg_zerocopy ? MSG_ZEROCOPY : 0); > - if (ret == -1) > - error(1, errno, "write"); > - > + delay = INIT_DELAY; > + while ((ret = send(fd, data + done, cfg_payload_len - done, > + cfg_zerocopy ? MSG_ZEROCOPY : 0)) == -1) { > + usleep(delay); > + if (delay < MAX_DELAY) > + delay *= 2; > + else > + error(1, errno, "write"); > + } > done += ret; > count++; > } send_tcp should not be affected, as connect will by then already have succeeded. Also, as a reliable protocol it will internally retry, after returning with success. > @@ -274,17 +282,23 @@ static int send_tcp(int fd, char *data) > static int send_udp(int fd, char *data) > { > int ret, total_len, len, count = 0; > + useconds_t delay = INIT_DELAY; > > total_len = cfg_payload_len; > > while (total_len) { > len = total_len < cfg_mss ? total_len : cfg_mss; > > - ret = sendto(fd, data, len, cfg_zerocopy ? MSG_ZEROCOPY : 0, > - cfg_connected ? NULL : (void *)&cfg_dst_addr, > - cfg_connected ? 0 : cfg_alen); > - if (ret == -1) > - error(1, errno, "write"); > + delay = INIT_DELAY; > + while ((ret = sendto(fd, data, len, cfg_zerocopy ? MSG_ZEROCOPY : 0, > + cfg_connected ? NULL : (void *)&cfg_dst_addr, > + cfg_connected ? 0 : cfg_alen)) == -1) { should ideally only retry on the expected errno. Unreliable datagram sendto will succeed and initially. It will fail with error later, after reception of an ICMP dst unreachable response. > + usleep(delay); > + if (delay < MAX_DELAY) > + delay *= 2; > + else > + error(1, errno, "write"); > + } > if (ret != len) > error(1, errno, "write: %uB != %uB\n", ret, len); > > @@ -378,6 +392,7 @@ static int send_udp_segment(int fd, char *data) > struct iovec iov = {0}; > size_t msg_controllen; > struct cmsghdr *cmsg; > + useconds_t delay = INIT_DELAY; > int ret; > > iov.iov_base = data; > @@ -401,9 +416,13 @@ static int send_udp_segment(int fd, char *data) > msg.msg_name = (void *)&cfg_dst_addr; > msg.msg_namelen = cfg_alen; > > - ret = sendmsg(fd, &msg, cfg_zerocopy ? MSG_ZEROCOPY : 0); > - if (ret == -1) > - error(1, errno, "sendmsg"); > + while ((ret = sendmsg(fd, &msg, cfg_zerocopy ? MSG_ZEROCOPY : 0)) == -1) { > + usleep(delay); > + if (delay < MAX_DELAY) > + delay *= 2; > + else > + error(1, errno, "sendmsg"); > + } > if (ret != iov.iov_len) > error(1, 0, "sendmsg: %u != %llu\n", ret, > (unsigned long long)iov.iov_len); > @@ -616,6 +635,7 @@ int main(int argc, char **argv) > { > unsigned long num_msgs, num_sends; > unsigned long tnow, treport, tstop; > + useconds_t delay = INIT_DELAY; > int fd, i, val, ret; > > parse_opts(argc, argv); > @@ -648,9 +668,14 @@ int main(int argc, char **argv) > } > } > > - if (cfg_connected && > - connect(fd, (void *)&cfg_dst_addr, cfg_alen)) > - error(1, errno, "connect"); > + if (cfg_connected) > + while (connect(fd, (void *)&cfg_dst_addr, cfg_alen)) { > + usleep(delay); > + if (delay < MAX_DELAY) > + delay *= 2; > + else > + error(1, errno, "connect"); > + } > > if (cfg_segment) > set_pmtu_discover(fd, cfg_family == PF_INET); > -- > 2.34.1 >