From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60D89C4332F for ; Thu, 13 Oct 2022 22:02:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 95DC86B0071; Thu, 13 Oct 2022 18:02:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 90D9C6B0073; Thu, 13 Oct 2022 18:02:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7ADAA6B0074; Thu, 13 Oct 2022 18:02:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 5911D6B0071 for ; Thu, 13 Oct 2022 18:02:39 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 1ABBC160377 for ; Thu, 13 Oct 2022 22:02:39 +0000 (UTC) X-FDA: 80017301238.04.0F49A8F Received: from mail-yw1-f180.google.com (mail-yw1-f180.google.com [209.85.128.180]) by imf21.hostedemail.com (Postfix) with ESMTP id BC39C1C0025 for ; Thu, 13 Oct 2022 22:02:38 +0000 (UTC) Received: by mail-yw1-f180.google.com with SMTP id 00721157ae682-360871745b0so30410927b3.3 for ; Thu, 13 Oct 2022 15:02:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=hZ05NezorcEtoVNj7unPXrgpx+kKJvY/SBKVBU51cG8=; b=kQHWuBMSm95FHVfPYenSVW6q+WXGzYRlH84J7fGwUfg20rFIOhMaUVVZMhexUoMz8g C6bk/266XQ7KcD4Kz+kfwGBLHyaPXFU/fXzVg+jXqtc0pU4Tr8XekSa4H4qRIXrdd/7t gZIljrpQhjrAQZ3U47fLsU1p9WNYwxqe7Nhz3oNRyaOEfYfd4elXYBf0Z7GeblnjlRPS yH0yv/tM1QvkO1efhpc3hFUCRE1pQmxleoj0lFq5tMTsHXNCNAooH7AX6/1YTqo62oIL CDJcM0S695bLMAoa1Qtz7OMw8S3o7xrp/JOBVSvzOtIxh9CIMYXDVNA6Hk6wgWU/M9b9 n4mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hZ05NezorcEtoVNj7unPXrgpx+kKJvY/SBKVBU51cG8=; b=ZmjuBi71IYE5DRxEH26p/WIkvQGBrQkiT5+AGyIzKu9q26LM43YFBU4MkUj5dtVaBZ TQ4AGBE5RMNEU//Qz39ZOfjEixeb/cj3ctNo1702CE9CJJaqAVIChj7LCOjs+X+FCS42 Vf3ezUvl+mWtnostn9Cdj+fPOWbW7NKNqe+NT4W+3wmP/2BYDvOHmuqFghNML3UmOTNM 0kEInvJFIMoQN9oIdZkD53vK/5nll8AA82GxIkFCQCnR7mOoXeFk9zJOuEXWt3SB/MC+ +HLiR+dUYYnvQUDZnc/JzOaNB1SIXkz55P1AtMsAgzSTJAdKgXuOQO4ykIgJjEVhu07C Imdw== X-Gm-Message-State: ACrzQf0Iak63Rq0UslE+tuCYIKTFreOaGEeQDtrvDJ9WvkZlUtuTmG+8 KD2Dkp6rlyNNvmlUAmOpDqCz4geiZ8g51xzBp1m5zw== X-Google-Smtp-Source: AMsMyM4v8k3TxWAK7J74yVZadAuceNPbHEFNsa95OWDzDPxn/WmC0hSnpPnttnscsdNxJjD6G/YMPPz5yS0PR+CaU1s= X-Received: by 2002:a0d:d807:0:b0:356:851e:b8eb with SMTP id a7-20020a0dd807000000b00356851eb8ebmr1883720ywe.489.1665698557585; Thu, 13 Oct 2022 15:02:37 -0700 (PDT) MIME-Version: 1.0 References: <20210817194003.2102381-1-weiwan@google.com> <20221012163300.795e7b86@kernel.org> <20221013144950.44b52f90@kernel.org> In-Reply-To: <20221013144950.44b52f90@kernel.org> From: Eric Dumazet Date: Thu, 13 Oct 2022 15:02:26 -0700 Message-ID: Subject: Re: [PATCH net-next] net-memcg: pass in gfp_t mask to mem_cgroup_charge_skmem() To: Jakub Kicinski Cc: Wei Wang , netdev@vger.kernel.org, "David S . Miller" , cgroups@vger.kernel.org, linux-mm@kvack.org, Shakeel Butt , Roman Gushchin , Neil Spring , ycheng@google.com Content-Type: text/plain; charset="UTF-8" ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kQHWuBMS; spf=pass (imf21.hostedemail.com: domain of edumazet@google.com designates 209.85.128.180 as permitted sender) smtp.mailfrom=edumazet@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665698558; a=rsa-sha256; cv=none; b=n/b/jXUwT8KmjaGvMjTwpQycTmO3kjiMjLTsGeeeU8XzXmFdkpiW2dReAvMtVrij4dN6ac C+AO8YYIKvpJXVsDkI85CZtT1bhcpLxVr6Vwk8pcwNFHcms1I/L7oZ/mI5rF9IA4woYl+u p07N/9/vCv2rQEq/w8bUEJUbP2DkR+g= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665698558; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hZ05NezorcEtoVNj7unPXrgpx+kKJvY/SBKVBU51cG8=; b=rooafcHF5w4oTEzez1RlDvoPpznfS+3SbI43CKdTxgYNFkduxM2boAjtDO/W3O8CCHFn0E tc0g/qOG/IGewMfVd/xBJ1Lf0+LW8CvoiwfO1UxcY5COwrnbKoc3qVPMAUORaFNv9xQl82 oJIfcJE61BPDIlLUCvQFHX1BvLuOjFY= X-Rspamd-Server: rspam05 X-Rspam-User: Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kQHWuBMS; spf=pass (imf21.hostedemail.com: domain of edumazet@google.com designates 209.85.128.180 as permitted sender) smtp.mailfrom=edumazet@google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: 3ghqx94i8jxj14k5kat3nfzx8bi9a9hp X-Rspamd-Queue-Id: BC39C1C0025 X-HE-Tag: 1665698558-260879 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Oct 13, 2022 at 2:49 PM Jakub Kicinski wrote: > > On Wed, 12 Oct 2022 16:33:00 -0700 Jakub Kicinski wrote: > > This patch is causing a little bit of pain to us, to workloads running > > with just memory.max set. After this change the TCP rx path no longer > > forces the charging. > > > > Any recommendation for the fix? Setting memory.high a few MB under > > memory.max seems to remove the failures. > > Eric, is there anything that would make the TCP perform particularly > poorly under mem pressure? > > Dropping and pruning happens a lot here: > > # nstat -a | grep -i -E 'Prune|Drop' > TcpExtPruneCalled 1202577 0.0 > TcpExtOfoPruned 734606 0.0 > TcpExtTCPOFODrop 64191 0.0 > TcpExtTCPRcvQDrop 384305 0.0 > > Same workload on 5.6 kernel: > > TcpExtPruneCalled 1223043 0.0 > TcpExtOfoPruned 3377 0.0 > TcpExtListenDrops 10596 0.0 > TcpExtTCPOFODrop 22 0.0 > TcpExtTCPRcvQDrop 734 0.0 > > From a quick look at the code and with what Shakeel explained in mind - > previously we would have "loaded up the cache" after the first failed > try, so we never got into the loop inside tcp_try_rmem_schedule() which > most likely nukes the entire OFO queue: > > static int tcp_try_rmem_schedule(struct sock *sk, struct sk_buff *skb, > unsigned int size) > { > if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf || > !sk_rmem_schedule(sk, skb, size)) { > /* ^ would fail but "load up the cache" ^ */ > > if (tcp_prune_queue(sk) < 0) > return -1; > > /* v this one would not fail due to the cache v */ > while (!sk_rmem_schedule(sk, skb, size)) { > if (!tcp_prune_ofo_queue(sk)) > return -1; > > Neil mentioned that he's seen multi-second stalls when SACKed segments > get dropped from the OFO queue. Sender waits for a very long time before > retrying something that was already SACKed if the receiver keeps > sacking new, later segments. Even when ACK reaches the previously-SACKed > block which should prove to the sender that something is very wrong. > > I tried to repro this with a packet drill and it's not what I see > exactly, I need to keep shortening the RTT otherwise the retx comes > out before the next SACK arrives. > > I'll try to read the code, and maybe I'll get lucky and manage capture > the exact impacted flows :S But does anything of this nature ring the > bell? > > `../common/defaults.sh` > > 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > +0 bind(3, ..., ...) = 0 > +0 listen(3, 1) = 0 > > +0 < S 0:0(0) win 65535 > +0 > S. 0:0(0) ack 1 > +.1 < . 1:1(0) ack 1 win 2048 > +0 accept(3, ..., ...) = 4 > > +0 write(4, ..., 60000) = 60000 > +0 > P. 1:10001(10000) ack 1 > > // Do some SACK-ing > +.1 < . 1:1(0) ack 1 win 513 > +.001 < . 1:1(0) ack 1 win 513 > // ..and we pretend we lost 1001:2001 > +.001 < . 1:1(0) ack 1 win 513 > > // re-xmit holes and send more > +0 > . 10001:11001(1000) ack 1 > +0 > . 1:1001(1000) ack 1 > +0 > . 2001:3001(1000) ack 1 win 256 > +0 > P. 11001:13001(2000) ack 1 win 256 > +0 > P. 13001:15001(2000) ack 1 win 256 > > +.1 < . 1:1(0) ack 1001 win 513 > > +0 > P. 15001:18001(3000) ack 1 win 256 > +0 > P. 18001:20001(2000) ack 1 win 256 > +0 > P. 20001:22001(2000) ack 1 win 256 > > +.1 < . 1:1(0) ack 1001 win 513 > > +0 > P. 22001:24001(2000) ack 1 win 256 > +0 > P. 24001:26001(2000) ack 1 win 256 > +0 > P. 26001:28001(2000) ack 1 win 256 > +0 > . 28001:29001(1000) ack 1 win 256 > > +0.05 < . 1:1(0) ack 1001 win 257 > > +0 > P. 29001:31001(2000) ack 1 win 256 > +0 > P. 31001:33001(2000) ack 1 win 256 > +0 > P. 33001:35001(2000) ack 1 win 256 > +0 > . 35001:36001(1000) ack 1 win 256 > > +0.05 < . 1:1(0) ack 1001 win 257 > > +0 > P. 36001:38001(2000) ack 1 win 256 > +0 > P. 38001:40001(2000) ack 1 win 256 > +0 > P. 40001:42001(2000) ack 1 win 256 > +0 > . 42001:43001(1000) ack 1 win 256 > > +0.05 < . 1:1(0) ack 1001 win 257 > > +0 > P. 43001:45001(2000) ack 1 win 256 > +0 > P. 45001:47001(2000) ack 1 win 256 > +0 > P. 47001:49001(2000) ack 1 win 256 > +0 > . 49001:50001(1000) ack 1 win 256 > > +0.04 < . 1:1(0) ack 1001 win 257 > > +0 > P. 50001:52001(2000) ack 1 win 256 > +0 > P. 52001:54001(2000) ack 1 win 256 > +0 > P. 54001:56001(2000) ack 1 win 256 > +0 > . 56001:57001(1000) ack 1 win 256 > > +0.04 > . 1001:2001(1000) ack 1 win 256 > > This is SACK reneging, I would have to double check linux behavior, but reverting to timeout could very well happen. > +.1 < . 1:1(0) ack 1001 win 257 >