From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADE9EC4360F for ; Wed, 3 Apr 2019 01:17:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 684D420830 for ; Wed, 3 Apr 2019 01:17:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=netronome-com.20150623.gappssmtp.com header.i=@netronome-com.20150623.gappssmtp.com header.b="hZzhT601" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726486AbfDCBRp (ORCPT ); Tue, 2 Apr 2019 21:17:45 -0400 Received: from mail-qt1-f194.google.com ([209.85.160.194]:42978 "EHLO mail-qt1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726130AbfDCBRo (ORCPT ); Tue, 2 Apr 2019 21:17:44 -0400 Received: by mail-qt1-f194.google.com with SMTP id p20so17398259qtc.9 for ; Tue, 02 Apr 2019 18:17:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netronome-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :organization:mime-version:content-transfer-encoding; bh=FGSPJ/Gk7xOfcuYke/3dRgIMweWFXBM7LB6l4oufgMA=; b=hZzhT601QP7vk76oG6rbKx5fMisx5mFCPGNFBJ9+1PWxnRC/HKipuJ+gZESkNGeYiP V0ZqJDHfNGIbHAJHC+KpEKBDS/zrEFlk/Yw/Bs94S334KTFq4hfyo8h15qiEhIdat7C5 rsQkHXJJkDSI67+YqXqMCSsjiCo43nhyYfSCHJxUdFxtLBeOHbGcXsGXCQXwALigrxsx PGVZ64QuomWTLnrP7SdzK3WV+aZRW+BEwf++zr64RAoXwUUlPGG0MBATikpn9E60ZfTY ZZIiFELvZRFhqWm0DruTkgBRfuLOI0aZLEyHCLrcnVJ07/BvLj1c9/Bx8A7QiYzr7+qz 7++A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:organization:mime-version:content-transfer-encoding; bh=FGSPJ/Gk7xOfcuYke/3dRgIMweWFXBM7LB6l4oufgMA=; b=IY4sn0yiF8o+pE7YwodjKkaeRoIGLvu/AMYbEvOoLJYeCCmVmdms/6nugN2jUeU77W 0w/mQNrpgj30PyOyNhwaD854V2Fkn+KW/jGDsJLTo+tVrSq8GYRtzZdlGGvExNSlwxJs kNDpTAbDQH1n++WTCR+iUdp3otye7ubbJNx/AiU7RkkKwqarQ7ba8nNsN5gQtGVMSoz2 VaYRapvBMn0zhnwAeteJPDz9JtYaBaVRoHVR/jEnvhiKMnqjHt4udH6uwQkA9oDx7HJ5 kFBFLPMtUVJyPlmQq886wHfQt6p8FOntdyB3bIUGPnSC1RXQGOhTZBtVxXIfVzU6zQff AmGA== X-Gm-Message-State: APjAAAWIbHgRPlK9mJ6BKmrKQahWFp2eGknjCOnhxg0UULu6kHqipOJp AZQdi0xBnPwjrupO0SzhkyovqQ== X-Google-Smtp-Source: APXvYqy+03BwepkwCQtAc/555n8TUJopi1qpKNnGl5109FbT+2xg3/Vg+0G/sayIZg0hqRlwOv/g8A== X-Received: by 2002:ac8:1908:: with SMTP id t8mr62984703qtj.347.1554254263175; Tue, 02 Apr 2019 18:17:43 -0700 (PDT) Received: from cakuba.hsd1.ca.comcast.net ([66.60.152.14]) by smtp.gmail.com with ESMTPSA id a47sm9902862qtb.79.2019.04.02.18.17.41 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 02 Apr 2019 18:17:42 -0700 (PDT) Date: Tue, 2 Apr 2019 18:17:38 -0700 From: Jakub Kicinski To: Eric Dumazet Cc: "David S . Miller" , netdev , Eric Dumazet , Soheil Hassas Yeganeh , Willem de Bruijn Subject: Re: [PATCH v3 net-next 3/3] tcp: add one skb cache for rx Message-ID: <20190402181738.09980a62@cakuba.hsd1.ca.comcast.net> In-Reply-To: <20190322155640.248144-4-edumazet@google.com> References: <20190322155640.248144-1-edumazet@google.com> <20190322155640.248144-4-edumazet@google.com> Organization: Netronome Systems, Ltd. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Fri, 22 Mar 2019 08:56:40 -0700, Eric Dumazet wrote: > Often times, recvmsg() system calls and BH handling for a particular > TCP socket are done on different cpus. >=20 > This means the incoming skb had to be allocated on a cpu, > but freed on another. >=20 > This incurs a high spinlock contention in slab layer for small rpc, > but also a high number of cache line ping pongs for larger packets. >=20 > A full size GRO packet might use 45 page fragments, meaning > that up to 45 put_page() can be involved. >=20 > More over performing the __kfree_skb() in the recvmsg() context > adds a latency for user applications, and increase probability > of trapping them in backlog processing, since the BH handler > might found the socket owned by the user. >=20 > This patch, combined with the prior one increases the rpc > performance by about 10 % on servers with large number of cores. >=20 > (tcp_rr workload with 10,000 flows and 112 threads reach 9 Mpps > instead of 8 Mpps) >=20 > This also increases single bulk flow performance on 40Gbit+ links, > since in this case there are often two cpus working in tandem : >=20 > - CPU handling the NIC rx interrupts, feeding the receive queue, > and (after this patch) freeing the skbs that were consumed. >=20 > - CPU in recvmsg() system call, essentially 100 % busy copying out > data to user space. >=20 > Having at most one skb in a per-socket cache has very little risk > of memory exhaustion, and since it is protected by socket lock, > its management is essentially free. >=20 > Note that if rps/rfs is used, we do not enable this feature, because > there is high chance that the same cpu is handling both the recvmsg() > system call and the TCP rx path, but that another cpu did the skb > allocations in the device driver right before the RPS/RFS logic. >=20 > To properly handle this case, it seems we would need to record > on which cpu skb was allocated, and use a different channel > to give skbs back to this cpu. >=20 > Signed-off-by: Eric Dumazet > Acked-by: Soheil Hassas Yeganeh > Acked-by: Willem de Bruijn Hi Eric! Somehow this appears to make ktls run out of stack: [ 132.022746][ T1597] BUG: stack guard page was hit at 00000000d40fad41 (s= tack is 0000000029dde9f4..000000008cce03d5) [ 132.034492][ T1597] kernel stack overflow (double-fault): 0000 [#1] PREE= MPT SMP [ 132.042733][ T1597] CPU: 1 PID: 1597 Comm: hurl Not tainted 5.1.0-rc2-pe= rf-00642-g179e7e21995d-dirty #683 [ 132.053500][ T1597] Hardware name: ... [ 132.062714][ T1597] RIP: 0010:free_one_page+0x2b/0x490 [ 132.068526][ T1597] Code: 1f 44 00 00 41 57 48 8d 87 40 05 00 00 49 89 f= 7 41 56 49 89 d6 41 55 41 54 49 89 fc 48 89 c7 55 89 cd 532 [ 132.090369][ T1597] RSP: 0018:ffffb46c03d9fff8 EFLAGS: 00010092 [ 132.097054][ T1597] RAX: ffff91ed7fffd240 RBX: 0000000000000000 RCX: 000= 0000000000003 [ 132.105874][ T1597] RDX: 0000000000469c68 RSI: ffffd6e151a71a00 RDI: fff= f91ed7fffd240 [ 132.114697][ T1597] RBP: 0000000000000003 R08: 0000000000000000 R09: dea= d000000000200 [ 132.123521][ T1597] R10: ffffd6e151a71808 R11: 0000000000000000 R12: fff= f91ed7fffcd00 [ 132.132344][ T1597] R13: ffffd6e140000000 R14: 0000000000469c68 R15: fff= fd6e151a71a00 [ 132.141209][ T1597] FS: 00007f1545154700(0000) GS:ffff91f16f600000(0000= ) knlGS:0000000000000000 [ 132.151143][ T1597] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 132.158433][ T1597] CR2: ffffb46c03d9ffe8 CR3: 00000004587e6006 CR4: 000= 00000003606e0 [ 132.167299][ T1597] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 000= 0000000000000 [ 132.176166][ T1597] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 000= 0000000000400 [ 132.185027][ T1597] Call Trace: [ 132.188628][ T1597] __free_pages_ok+0x143/0x2c0 [ 132.193881][ T1597] skb_release_data+0x8e/0x140 [ 132.199131][ T1597] ? skb_release_data+0xad/0x140 [ 132.204566][ T1597] kfree_skb+0x32/0xb0 [...] [ 135.889113][ T1597] skb_release_data+0xad/0x140 [ 135.894363][ T1597] ? skb_release_data+0xad/0x140 [ 135.899806][ T1597] kfree_skb+0x32/0xb0 [ 135.904279][ T1597] skb_release_data+0xad/0x140 [ 135.909528][ T1597] ? skb_release_data+0xad/0x140 [ 135.914972][ T1597] kfree_skb+0x32/0xb0 [ 135.919444][ T1597] skb_release_data+0xad/0x140 [ 135.924694][ T1597] ? skb_release_data+0xad/0x140 [ 135.930138][ T1597] kfree_skb+0x32/0xb0 [ 135.934610][ T1597] skb_release_data+0xad/0x140 [ 135.939860][ T1597] ? skb_release_data+0xad/0x140 [ 135.945295][ T1597] kfree_skb+0x32/0xb0 [ 135.949767][ T1597] skb_release_data+0xad/0x140 [ 135.955017][ T1597] __kfree_skb+0xe/0x20 [ 135.959578][ T1597] tcp_disconnect+0xd6/0x4d0 [ 135.964632][ T1597] tcp_close+0xf4/0x430 [ 135.969200][ T1597] ? tcp_check_oom+0xf0/0xf0 [ 135.974255][ T1597] tls_sk_proto_close+0xe4/0x1e0 [tls] [ 135.980283][ T1597] inet_release+0x36/0x60 [ 135.985047][ T1597] __sock_release+0x37/0xa0 [ 135.990004][ T1597] sock_close+0x11/0x20 [ 135.994574][ T1597] __fput+0xa2/0x1d0 [ 135.998853][ T1597] task_work_run+0x89/0xb0 [ 136.003715][ T1597] exit_to_usermode_loop+0x9a/0xa0 [ 136.009345][ T1597] do_syscall_64+0xc0/0xf0 [ 136.014207][ T1597] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 136.020710][ T1597] RIP: 0033:0x7f1546cb5447 [ 136.025570][ T1597] Code: 00 00 0f 05 48 3d 00 f0 ff ff 77 3f f3 c3 0f 1= f 44 00 00 53 89 fb 48 83 ec 10 e8 c4 fb ff ff 89 df 89 c24 [ 136.047476][ T1597] RSP: 002b:00007f1545153ba0 EFLAGS: 00000293 ORIG_RAX= : 0000000000000003 [ 136.056827][ T1597] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 000= 07f1546cb5447 [ 136.065692][ T1597] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000= 0000000000008 [ 136.074556][ T1597] RBP: 00007f1538000b20 R08: 0000000000000008 R09: 000= 0000000000000 [ 136.083419][ T1597] R10: 00007f1545153bc0 R11: 0000000000000293 R12: 000= 05631f41cf1a0 [ 136.092285][ T1597] R13: 00005631f41cf1b8 R14: 00007f1538003330 R15: 000= 07f1538003330 [ 136.101151][ T1597] Modules linked in: ctr ghash_generic gf128mul gcm rp= csec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache bis [ 136.150271][ T1597] ---[ end trace 67081a0c8ea38611 ]--- This is hurl <> nginx running over loopback doing a 100 MB GET. =F0=9F=99=84