All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qian Cai <quic_qiancai@quicinc.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	netdev <netdev@vger.kernel.org>,
	Eric Dumazet <edumazet@google.com>
Subject: Re: [PATCH v2 net-next] net: generalize skb freeing deferral to per-cpu lists
Date: Fri, 29 Apr 2022 12:18:10 -0400	[thread overview]
Message-ID: <20220429161810.GA175@qian> (raw)
In-Reply-To: <20220422201237.416238-1-eric.dumazet@gmail.com>

On Fri, Apr 22, 2022 at 01:12:37PM -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Logic added in commit f35f821935d8 ("tcp: defer skb freeing after socket
> lock is released") helped bulk TCP flows to move the cost of skbs
> frees outside of critical section where socket lock was held.
> 
> But for RPC traffic, or hosts with RFS enabled, the solution is far from
> being ideal.
> 
> For RPC traffic, recvmsg() has to return to user space right after
> skb payload has been consumed, meaning that BH handler has no chance
> to pick the skb before recvmsg() thread. This issue is more visible
> with BIG TCP, as more RPC fit one skb.
> 
> For RFS, even if BH handler picks the skbs, they are still picked
> from the cpu on which user thread is running.
> 
> Ideally, it is better to free the skbs (and associated page frags)
> on the cpu that originally allocated them.
> 
> This patch removes the per socket anchor (sk->defer_list) and
> instead uses a per-cpu list, which will hold more skbs per round.
> 
> This new per-cpu list is drained at the end of net_action_rx(),
> after incoming packets have been processed, to lower latencies.
> 
> In normal conditions, skbs are added to the per-cpu list with
> no further action. In the (unlikely) cases where the cpu does not
> run net_action_rx() handler fast enough, we use an IPI to raise
> NET_RX_SOFTIRQ on the remote cpu.
> 
> Also, we do not bother draining the per-cpu list from dev_cpu_dead()
> This is because skbs in this list have no requirement on how fast
> they should be freed.
> 
> Note that we can add in the future a small per-cpu cache
> if we see any contention on sd->defer_lock.

Hmm, we started to see some memory leak reports from kmemleak that have
been around for hours without being freed since yesterday's linux-next
tree which included this commit. Any thoughts?

unreferenced object 0xffff400610f55cc0 (size 216):
  comm "git-remote-http", pid 781180, jiffies 4314091475 (age 4323.740s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 c0 7e 87 ff 3f ff ff 00 00 00 00 00 00 00 00  ..~..?..........
  backtrace:
     kmem_cache_alloc_node
     __alloc_skb
     __tcp_send_ack.part.0
     tcp_send_ack
     tcp_cleanup_rbuf
     tcp_recvmsg_locked
     tcp_recvmsg
     inet_recvmsg
     __sys_recvfrom
     __arm64_sys_recvfrom
     invoke_syscall
     el0_svc_common.constprop.0
     do_el0_svc
     el0_svc
     el0t_64_sync_handler
     el0t_64_sync
unreferenced object 0xffff4001e58f0c40 (size 216):
  comm "git-remote-http", pid 781180, jiffies 4314091483 (age 4323.968s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 c0 7e 87 ff 3f ff ff 00 00 00 00 00 00 00 00  ..~..?..........
  backtrace:
     kmem_cache_alloc_node
     __alloc_skb
     __tcp_send_ack.part.0
     tcp_send_ack
     tcp_cleanup_rbuf
     tcp_recvmsg_locked
     tcp_recvmsg
     inet_recvmsg
     __sys_recvfrom
     __arm64_sys_recvfrom
     invoke_syscall
     el0_svc_common.constprop.0
     do_el0_svc
     el0_svc
     el0t_64_sync_handler
     el0t_64_sync

  parent reply	other threads:[~2022-04-29 16:18 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-22 20:12 [PATCH v2 net-next] net: generalize skb freeing deferral to per-cpu lists Eric Dumazet
2022-04-26  7:38 ` Paolo Abeni
2022-04-26 13:11   ` Eric Dumazet
2022-04-26 15:28     ` Paolo Abeni
2022-04-26 16:13       ` Eric Dumazet
2022-04-27  0:50 ` patchwork-bot+netdevbpf
2022-04-27 15:34 ` Ido Schimmel
2022-04-27 16:53   ` Eric Dumazet
2022-04-27 17:11     ` Eric Dumazet
2022-04-27 17:59       ` Eric Dumazet
2022-04-27 18:54         ` Eric Dumazet
2022-04-29 16:18 ` Qian Cai [this message]
2022-04-29 16:23   ` Eric Dumazet
2022-04-29 20:44     ` Qian Cai
2022-05-06  6:44 ` [net] 72fd55c0db: invoked_oom-killer:gfp_mask=0x kernel test robot
2022-05-06  6:44   ` kernel test robot
2022-05-06  8:15   ` Eric Dumazet
2022-05-06  8:15     ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220429161810.GA175@qian \
    --to=quic_qiancai@quicinc.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.