All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Kicinski <jakub.kicinski@netronome.com>
To: Eric Dumazet <edumazet@google.com>
Cc: "David S . Miller" <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Soheil Hassas Yeganeh <soheil@google.com>,
	Willem de Bruijn <willemb@google.com>
Subject: Re: [PATCH v3 net-next 3/3] tcp: add one skb cache for rx
Date: Tue, 2 Apr 2019 18:17:38 -0700	[thread overview]
Message-ID: <20190402181738.09980a62@cakuba.hsd1.ca.comcast.net> (raw)
In-Reply-To: <20190322155640.248144-4-edumazet@google.com>

On Fri, 22 Mar 2019 08:56:40 -0700, Eric Dumazet wrote:
> Often times, recvmsg() system calls and BH handling for a particular
> TCP socket are done on different cpus.
> 
> This means the incoming skb had to be allocated on a cpu,
> but freed on another.
> 
> This incurs a high spinlock contention in slab layer for small rpc,
> but also a high number of cache line ping pongs for larger packets.
> 
> A full size GRO packet might use 45 page fragments, meaning
> that up to 45 put_page() can be involved.
> 
> More over performing the __kfree_skb() in the recvmsg() context
> adds a latency for user applications, and increase probability
> of trapping them in backlog processing, since the BH handler
> might found the socket owned by the user.
> 
> This patch, combined with the prior one increases the rpc
> performance by about 10 % on servers with large number of cores.
> 
> (tcp_rr workload with 10,000 flows and 112 threads reach 9 Mpps
>  instead of 8 Mpps)
> 
> This also increases single bulk flow performance on 40Gbit+ links,
> since in this case there are often two cpus working in tandem :
> 
>  - CPU handling the NIC rx interrupts, feeding the receive queue,
>   and (after this patch) freeing the skbs that were consumed.
> 
>  - CPU in recvmsg() system call, essentially 100 % busy copying out
>   data to user space.
> 
> Having at most one skb in a per-socket cache has very little risk
> of memory exhaustion, and since it is protected by socket lock,
> its management is essentially free.
> 
> Note that if rps/rfs is used, we do not enable this feature, because
> there is high chance that the same cpu is handling both the recvmsg()
> system call and the TCP rx path, but that another cpu did the skb
> allocations in the device driver right before the RPS/RFS logic.
> 
> To properly handle this case, it seems we would need to record
> on which cpu skb was allocated, and use a different channel
> to give skbs back to this cpu.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> Acked-by: Willem de Bruijn <willemb@google.com>

Hi Eric!

Somehow this appears to make ktls run out of stack:

[  132.022746][ T1597] BUG: stack guard page was hit at 00000000d40fad41 (stack is 0000000029dde9f4..000000008cce03d5)
[  132.034492][ T1597] kernel stack overflow (double-fault): 0000 [#1] PREEMPT SMP
[  132.042733][ T1597] CPU: 1 PID: 1597 Comm: hurl Not tainted 5.1.0-rc2-perf-00642-g179e7e21995d-dirty #683
[  132.053500][ T1597] Hardware name: ...
[  132.062714][ T1597] RIP: 0010:free_one_page+0x2b/0x490
[  132.068526][ T1597] Code: 1f 44 00 00 41 57 48 8d 87 40 05 00 00 49 89 f7 41 56 49 89 d6 41 55 41 54 49 89 fc 48 89 c7 55 89 cd 532
[  132.090369][ T1597] RSP: 0018:ffffb46c03d9fff8 EFLAGS: 00010092
[  132.097054][ T1597] RAX: ffff91ed7fffd240 RBX: 0000000000000000 RCX: 0000000000000003
[  132.105874][ T1597] RDX: 0000000000469c68 RSI: ffffd6e151a71a00 RDI: ffff91ed7fffd240
[  132.114697][ T1597] RBP: 0000000000000003 R08: 0000000000000000 R09: dead000000000200
[  132.123521][ T1597] R10: ffffd6e151a71808 R11: 0000000000000000 R12: ffff91ed7fffcd00
[  132.132344][ T1597] R13: ffffd6e140000000 R14: 0000000000469c68 R15: ffffd6e151a71a00
[  132.141209][ T1597] FS:  00007f1545154700(0000) GS:ffff91f16f600000(0000) knlGS:0000000000000000
[  132.151143][ T1597] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  132.158433][ T1597] CR2: ffffb46c03d9ffe8 CR3: 00000004587e6006 CR4: 00000000003606e0
[  132.167299][ T1597] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  132.176166][ T1597] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  132.185027][ T1597] Call Trace:
[  132.188628][ T1597]  __free_pages_ok+0x143/0x2c0
[  132.193881][ T1597]  skb_release_data+0x8e/0x140
[  132.199131][ T1597]  ? skb_release_data+0xad/0x140
[  132.204566][ T1597]  kfree_skb+0x32/0xb0

[...]

[  135.889113][ T1597]  skb_release_data+0xad/0x140
[  135.894363][ T1597]  ? skb_release_data+0xad/0x140
[  135.899806][ T1597]  kfree_skb+0x32/0xb0
[  135.904279][ T1597]  skb_release_data+0xad/0x140
[  135.909528][ T1597]  ? skb_release_data+0xad/0x140
[  135.914972][ T1597]  kfree_skb+0x32/0xb0
[  135.919444][ T1597]  skb_release_data+0xad/0x140
[  135.924694][ T1597]  ? skb_release_data+0xad/0x140
[  135.930138][ T1597]  kfree_skb+0x32/0xb0
[  135.934610][ T1597]  skb_release_data+0xad/0x140
[  135.939860][ T1597]  ? skb_release_data+0xad/0x140
[  135.945295][ T1597]  kfree_skb+0x32/0xb0
[  135.949767][ T1597]  skb_release_data+0xad/0x140
[  135.955017][ T1597]  __kfree_skb+0xe/0x20
[  135.959578][ T1597]  tcp_disconnect+0xd6/0x4d0
[  135.964632][ T1597]  tcp_close+0xf4/0x430
[  135.969200][ T1597]  ? tcp_check_oom+0xf0/0xf0
[  135.974255][ T1597]  tls_sk_proto_close+0xe4/0x1e0 [tls]
[  135.980283][ T1597]  inet_release+0x36/0x60
[  135.985047][ T1597]  __sock_release+0x37/0xa0
[  135.990004][ T1597]  sock_close+0x11/0x20
[  135.994574][ T1597]  __fput+0xa2/0x1d0
[  135.998853][ T1597]  task_work_run+0x89/0xb0
[  136.003715][ T1597]  exit_to_usermode_loop+0x9a/0xa0
[  136.009345][ T1597]  do_syscall_64+0xc0/0xf0
[  136.014207][ T1597]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  136.020710][ T1597] RIP: 0033:0x7f1546cb5447
[  136.025570][ T1597] Code: 00 00 0f 05 48 3d 00 f0 ff ff 77 3f f3 c3 0f 1f 44 00 00 53 89 fb 48 83 ec 10 e8 c4 fb ff ff 89 df 89 c24
[  136.047476][ T1597] RSP: 002b:00007f1545153ba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
[  136.056827][ T1597] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 00007f1546cb5447
[  136.065692][ T1597] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000008
[  136.074556][ T1597] RBP: 00007f1538000b20 R08: 0000000000000008 R09: 0000000000000000
[  136.083419][ T1597] R10: 00007f1545153bc0 R11: 0000000000000293 R12: 00005631f41cf1a0
[  136.092285][ T1597] R13: 00005631f41cf1b8 R14: 00007f1538003330 R15: 00007f1538003330
[  136.101151][ T1597] Modules linked in: ctr ghash_generic gf128mul gcm rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache bis
[  136.150271][ T1597] ---[ end trace 67081a0c8ea38611 ]---


This is hurl <> nginx running over loopback doing a 100 MB GET.

🙄

  reply	other threads:[~2019-04-03  1:17 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-22 15:56 [PATCH v3 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Eric Dumazet
2019-03-22 15:56 ` [PATCH v3 net-next 1/3] net: convert rps_needed and rfs_needed to new static branch api Eric Dumazet
2019-03-22 15:56 ` [PATCH v3 net-next 2/3] tcp: add one skb cache for tx Eric Dumazet
2019-03-22 15:56 ` [PATCH v3 net-next 3/3] tcp: add one skb cache for rx Eric Dumazet
2019-04-03  1:17   ` Jakub Kicinski [this message]
2019-04-03  8:15     ` Eric Dumazet
2019-04-12 14:43   ` [tcp] 01b4c2aab8: lmbench3.TCP.socket.bandwidth.10MB.MB/sec -20.2% regression kernel test robot
2019-04-12 14:43     ` kernel test robot
2019-03-22 16:37 ` [PATCH v3 net-next 0/3] tcp: add rx/tx cache to reduce lock contention Tariq Toukan
2019-03-22 16:55   ` Eric Dumazet
2019-03-24  1:58 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190402181738.09980a62@cakuba.hsd1.ca.comcast.net \
    --to=jakub.kicinski@netronome.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=soheil@google.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.