linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daire Byrne <daire@dneg.com>
To: Frank van der Linden <fllinden@amazon.com>
Cc: Chuck Lever <chuck.lever@oracle.com>,
	bfields <bfields@fieldses.org>,
	linux-nfs <linux-nfs@vger.kernel.org>,
	linux-cachefs <linux-cachefs@redhat.com>
Subject: Re: Adventures in NFS re-exporting
Date: Sat, 26 Sep 2020 10:00:22 +0100 (BST)	[thread overview]
Message-ID: <108670779.52656705.1601110822013.JavaMail.zimbra@dneg.com> (raw)
In-Reply-To: <20200923210157.GA1650@dev-dsk-fllinden-2c-c1893d73.us-west-2.amazon.com>

----- On 23 Sep, 2020, at 22:01, Frank van der Linden fllinden@amazon.com wrote:

> On Wed, Sep 23, 2020 at 09:25:07PM +0100, Daire Byrne wrote:
>> 
>> ----- On 22 Sep, 2020, at 17:43, Chuck Lever chuck.lever@oracle.com wrote:
>> >> On Sep 17, 2020, at 4:23 PM, Frank van der Linden <fllinden@amazon.com> wrote:
>> >>
>> >> On Thu, Sep 17, 2020 at 03:09:31PM -0400, bfields wrote:
>> >>>
>> >>> On Thu, Sep 17, 2020 at 05:01:11PM +0100, Daire Byrne wrote:
>> >>>>
>> >>>> ----- On 15 Sep, 2020, at 18:21, bfields bfields@fieldses.org wrote:
>> >>>>
>> >>>>>> 4) With an NFSv4 re-export, lots of open/close requests (hundreds per
>> >>>>>> second) quickly eat up the CPU on the re-export server and perf top
>> >>>>>> shows we are mostly in native_queued_spin_lock_slowpath.
>> >>>>>
>> >>>>> Any statistics on who's calling that function?
>> >>>>
>> >>>> I've always struggled to reproduce this with a simple open/close simulation, so
>> >>>> I suspect some other operations need to be mixed in too. But I have one
>> >>>> production workload that I know has lots of opens & closes (buggy software)
>> >>>> included in amongst the usual reads, writes etc.
>> >>>>
>> >>>> With just 40 clients mounting the reexport server (v5.7.6) using NFSv4.2, we see
>> >>>> the CPU of the nfsd threads increase rapidly and by the time we have 100
>> >>>> clients, we have maxed out the 32 cores of the server with most of that in
>> >>>> native_queued_spin_lock_slowpath.
>> >>>
>> >>> That sounds a lot like what Frank Van der Linden reported:
>> >>>
>> >>>        https://lore.kernel.org/linux-nfs/20200608192122.GA19171@dev-dsk-fllinden-2c-c1893d73.us-west-2.amazon.com/
>> >>>
>> >>> It looks like a bug in the filehandle caching code.
>> >>>
>> >>> --b.
>> >>
>> >> Yes, that does look like the same one.
>> >>
>> >> I still think that not caching v4 files at all may be the best way to go
>> >> here, since the intent of the filecache code was to speed up v2/v3 I/O,
>> >> where you end up doing a lot of opens/closes, but it doesn't make as
>> >> much sense for v4.
>> >>
>> >> However, short of that, I tested a local patch a few months back, that
>> >> I never posted here, so I'll do so now. It just makes v4 opens in to
>> >> 'long term' opens, which do not get put on the LRU, since that doesn't
>> >> make sense (they are in the hash table, so they are still cached).
>> >>
>> >> Also, the file caching code seems to walk the LRU a little too often,
>> >> but that's another issue - and this change keeps the LRU short, so it's
>> >> not a big deal.
>> >>
>> >> I don't particularly love this patch, but it does keep the LRU short, and
>> >> did significantly speed up my testcase (by about 50%). So, maybe you can
>> >> give it a try.
>> >>
>> >> I'll also attach a second patch, that converts the hash table to an rhashtable,
>> >> which automatically grows and shrinks in size with usage. That patch also
>> >> helped, but not by nearly as much (I think it yielded another 10%).
>> >
>> > For what it's worth, I applied your two patches to my test server, along
>> > with my patch that force-closes cached file descriptors during NFSv4
>> > CLOSE processing. The patch combination improves performance (faster
>> > elapsed time) for my workload as well.
>> 
>> I tested Frank's NFSv4 filecache patches with some production workloads and I've
>> hit the below refcount issue a couple of times in the last 48 hours with
>> v5.8.10. This server was re-exporting an NFS client mount at the time.
>> 
>> Apologies for the spam if I've just hit something unrelated to the patches that
>> is present in v5.8.10.... In truth, I have not used this kernel version before
>> with this workload and just patched it because I had it ready to go. I'll
>> remove the 2 patches and verify.
>> 
>> Daire
>> 
>> 
>> [ 8930.027838] ------------[ cut here ]------------
>> [ 8930.032769] refcount_t: addition on 0; use-after-free.
>> [ 8930.038251] WARNING: CPU: 2 PID: 3624 at lib/refcount.c:25
>> refcount_warn_saturate+0x6e/0xf0
>> [ 8930.046799] Modules linked in: tcp_diag inet_diag rpcsec_gss_krb5 nfsv4
>> dns_resolver act_mirred sch_ingress ifb nfsv3 nfs cls_u32 sch_fq sch_prio
>> cachefiles fscache ext4 mbcache jbd2 sb_edac rapl sg virtio_rng i2c_piix4
>> input_leds nfsd auth_rpcgss nfs_acl lockd grace binfmt_misc ip_tables xfs
>> libcrc32c sd_mod t10_pi 8021q garp mrp virtio_net net_failover failover
>> virtio_scsi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel
>> aesni_intel scsi_transport_iscsi crypto_simd cryptd glue_helper virtio_pci
>> virtio_ring virtio serio_raw sunrpc dm_mirror dm_region_hash dm_log dm_mod
>> [ 8930.098703] CPU: 2 PID: 3624 Comm: nfsd Tainted: G        W
>> 5.8.10-1.dneg.x86_64 #1
>> [ 8930.107391] Hardware name: Google Google Compute Engine/Google Compute
>> Engine, BIOS Google 01/01/2011
>> [ 8930.116775] RIP: 0010:refcount_warn_saturate+0x6e/0xf0
>> [ 8930.122078] Code: 49 91 18 01 01 e8 57 d6 c2 ff 0f 0b 5d c3 80 3d 38 91 18 01
>> 00 75 d1 48 c7 c7 d0 5c 13 82 c6 05 28 91 18 01 01 e8 37 d6 c2 ff <0f> 0b 5d c3
>> 80 3d 1a 91 18 01 00 75 b1 48 c7 c7 a8 5c 13 82 c6 05
>> [ 8930.141107] RSP: 0018:ffffc900012efc70 EFLAGS: 00010282
>> [ 8930.146497] RAX: 0000000000000000 RBX: ffff888cc12811e0 RCX: 0000000000000000
>> [ 8930.153793] RDX: ffff888d0bca8f20 RSI: ffff888d0bc98d40 RDI: ffff888d0bc98d40
>> [ 8930.161087] RBP: ffffc900012efc70 R08: ffff888d0bc98d40 R09: 0000000000000019
>> [ 8930.168380] R10: 000000000000072e R11: ffffc900012efad8 R12: ffff888b8bdad600
>> [ 8930.175680] R13: ffff888cd428ebe0 R14: ffff8889264f9170 R15: 0000000000000000
>> [ 8930.182976] FS:  0000000000000000(0000) GS:ffff888d0bc80000(0000)
>> knlGS:0000000000000000
>> [ 8930.191231] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 8930.197139] CR2: 00007fbe43ca1248 CR3: 0000000ce48ee004 CR4: 00000000001606e0
>> [ 8930.204436] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 8930.211734] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 8930.219027] Call Trace:
>> [ 8930.221665]  nfsd4_process_open2+0xa48/0xec0 [nfsd]
>> [ 8930.226724]  ? nfsd_permission+0x6b/0x100 [nfsd]
>> [ 8930.231524]  ? fh_verify+0x167/0x210 [nfsd]
>> [ 8930.235893]  nfsd4_open+0x407/0x820 [nfsd]
>> [ 8930.240248]  nfsd4_proc_compound+0x3c2/0x760 [nfsd]
>> [ 8930.245296]  ? nfsd4_decode_compound.constprop.0+0x3a9/0x450 [nfsd]
>> [ 8930.251734]  nfsd_dispatch+0xe2/0x220 [nfsd]
>> [ 8930.256213]  svc_process_common+0x47b/0x6f0 [sunrpc]
>> [ 8930.261355]  ? svc_sock_secure_port+0x16/0x30 [sunrpc]
>> [ 8930.266707]  ? nfsd_svc+0x330/0x330 [nfsd]
>> [ 8930.270981]  svc_process+0xc5/0x100 [sunrpc]
>> [ 8930.275423]  nfsd+0xe8/0x150 [nfsd]
>> [ 8930.280028]  kthread+0x114/0x150
>> [ 8930.283434]  ? nfsd_destroy+0x60/0x60 [nfsd]
>> [ 8930.287875]  ? kthread_park+0x90/0x90
>> [ 8930.291700]  ret_from_fork+0x22/0x30
>> [ 8930.295447] ---[ end trace c551536c3520545c ]---
> 
> It's entirely possible that my patch introduces a refcounting error - it was
> intended as a proof-of-concept on how to fix the LRU locking issue for v4
> open file caching (while keeping it enabled) - which is why I didn't
> "formally" send it in.
> 
> Having said that, I don't immediately see the problem.
> 
> Maybe try it without the rhashtable patch, that is much less of an
> optimization.
> 
> The problem would have to be nf_ref as part of nfsd_file, or fi_ref as part
> of nfs4_file. If it's the latter, it's probably the rhashtable change.

Thanks Frank; I think you are right in that it seems to be a problem with the rhashtable patch. Another 48 hours using the same workload with just the main patch and I have not seen the same issue again so far.

Also, it still has the effect of reducing the CPU usage dramatically such that there are plenty of cores still left idle. This is actually helping us buy some more time while we fix our obviously broken software so that it doesn't open/close so crazily.

So, many thanks for that.

Daire

  reply	other threads:[~2020-09-26  9:00 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-07 17:31 Adventures in NFS re-exporting Daire Byrne
2020-09-08  9:40 ` Mkrtchyan, Tigran
2020-09-08 11:06   ` Daire Byrne
2020-09-15 17:21 ` J. Bruce Fields
2020-09-15 19:59   ` Trond Myklebust
2020-09-16 16:01     ` Daire Byrne
2020-10-19 16:19       ` Daire Byrne
2020-10-19 17:53         ` [PATCH 0/2] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-19 17:53           ` [PATCH 1/2] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-19 17:53             ` [PATCH 2/2] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-19 20:05         ` [PATCH v2 0/2] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-19 20:05           ` [PATCH v2 1/2] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-19 20:05             ` [PATCH v2 2/2] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-20 18:37         ` [PATCH v3 0/3] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-20 18:37           ` [PATCH v3 1/3] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-20 18:37             ` [PATCH v3 2/3] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-20 18:37               ` [PATCH v3 3/3] NFSv4: Observe the NFS_MOUNT_SOFTREVAL flag in _nfs4_proc_lookupp trondmy
2020-10-21  9:33         ` Adventures in NFS re-exporting Daire Byrne
2020-11-09 16:02           ` bfields
2020-11-12 13:01             ` Daire Byrne
2020-11-12 13:57               ` bfields
2020-11-12 18:33                 ` Daire Byrne
2020-11-12 20:55                   ` bfields
2020-11-12 23:05                     ` Daire Byrne
2020-11-13 14:50                       ` bfields
2020-11-13 22:26                         ` bfields
2020-11-14 12:57                           ` Daire Byrne
2020-11-16 15:18                             ` bfields
2020-11-16 15:53                             ` bfields
2020-11-16 19:21                               ` Daire Byrne
2020-11-16 15:29                           ` Jeff Layton
2020-11-16 15:56                             ` bfields
2020-11-16 16:03                               ` Jeff Layton
2020-11-16 16:14                                 ` bfields
2020-11-16 16:38                                   ` Jeff Layton
2020-11-16 19:03                                     ` bfields
2020-11-16 20:03                                       ` Jeff Layton
2020-11-17  3:16                                         ` bfields
2020-11-17  3:18                                           ` [PATCH 1/4] nfsd: move fill_{pre,post}_wcc to nfsfh.c J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 2/4] nfsd: pre/post attr is using wrong change attribute J. Bruce Fields
2020-11-17 12:34                                               ` Jeff Layton
2020-11-17 15:26                                                 ` J. Bruce Fields
2020-11-17 15:34                                                   ` Jeff Layton
2020-11-20 22:38                                                     ` J. Bruce Fields
2020-11-20 22:39                                                       ` [PATCH 1/8] nfsd: only call inode_query_iversion in the I_VERSION case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 2/8] nfsd: simplify nfsd4_change_info J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 3/8] nfsd: minor nfsd4_change_attribute cleanup J. Bruce Fields
2020-11-21  0:34                                                           ` Jeff Layton
2020-11-20 22:39                                                         ` [PATCH 4/8] nfsd4: don't query change attribute in v2/v3 case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 5/8] nfs: use change attribute for NFS re-exports J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 6/8] nfsd: move change attribute generation to filesystem J. Bruce Fields
2020-11-21  0:58                                                           ` Jeff Layton
2020-11-21  1:01                                                             ` J. Bruce Fields
2020-11-21 13:00                                                           ` Jeff Layton
2020-11-20 22:39                                                         ` [PATCH 7/8] nfsd: skip some unnecessary stats in the v4 case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 8/8] Revert "nfsd4: support change_attr_type attribute" J. Bruce Fields
2020-11-20 22:44                                                       ` [PATCH 2/4] nfsd: pre/post attr is using wrong change attribute J. Bruce Fields
2020-11-21  1:03                                                         ` Jeff Layton
2020-11-21 21:44                                                           ` Daire Byrne
2020-11-22  0:02                                                             ` bfields
2020-11-22  1:55                                                               ` Daire Byrne
2020-11-22  3:03                                                                 ` bfields
2020-11-23 20:07                                                                   ` Daire Byrne
2020-11-17 15:25                                               ` J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 3/4] nfs: don't mangle i_version on NFS J. Bruce Fields
2020-11-17 12:27                                               ` Jeff Layton
2020-11-17 14:14                                                 ` J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 4/4] nfs: support i_version in the NFSv4 case J. Bruce Fields
2020-11-17 12:34                                               ` Jeff Layton
2020-11-24 20:35               ` Adventures in NFS re-exporting Daire Byrne
2020-11-24 21:15                 ` bfields
2020-11-24 22:15                   ` Frank Filz
2020-11-25 14:47                     ` 'bfields'
2020-11-25 16:25                       ` Frank Filz
2020-11-25 19:03                         ` 'bfields'
2020-11-26  0:04                           ` Frank Filz
2020-11-25 17:14                   ` Daire Byrne
2020-11-25 19:31                     ` bfields
2020-12-03 12:20                     ` Daire Byrne
2020-12-03 18:51                       ` bfields
2020-12-03 20:27                         ` Trond Myklebust
2020-12-03 21:13                           ` bfields
2020-12-03 21:32                             ` Frank Filz
2020-12-03 21:34                             ` Trond Myklebust
2020-12-03 21:45                               ` Frank Filz
2020-12-03 21:57                                 ` Trond Myklebust
2020-12-03 22:04                                   ` bfields
2020-12-03 22:14                                     ` Trond Myklebust
2020-12-03 22:39                                       ` Frank Filz
2020-12-03 22:50                                         ` Trond Myklebust
2020-12-03 23:34                                           ` Frank Filz
2020-12-03 22:44                                       ` bfields
2020-12-03 21:54                               ` bfields
2020-12-03 22:45                               ` bfields
2020-12-03 22:53                                 ` Trond Myklebust
2020-12-03 23:16                                   ` bfields
2020-12-03 23:28                                     ` Frank Filz
2020-12-04  1:02                                     ` Trond Myklebust
2020-12-04  1:41                                       ` bfields
2020-12-04  2:27                                         ` Trond Myklebust
2020-09-17 16:01   ` Daire Byrne
2020-09-17 19:09     ` bfields
2020-09-17 20:23       ` Frank van der Linden
2020-09-17 21:57         ` bfields
2020-09-19 11:08           ` Daire Byrne
2020-09-22 16:43         ` Chuck Lever
2020-09-23 20:25           ` Daire Byrne
2020-09-23 21:01             ` Frank van der Linden
2020-09-26  9:00               ` Daire Byrne [this message]
2020-09-28 15:49                 ` Frank van der Linden
2020-09-28 16:08                   ` Chuck Lever
2020-09-28 17:42                     ` Frank van der Linden
2020-09-22 12:31 ` Daire Byrne
2020-09-22 13:52   ` Trond Myklebust
2020-09-23 12:40     ` J. Bruce Fields
2020-09-23 13:09       ` Trond Myklebust
2020-09-23 17:07         ` bfields
2020-09-30 19:30   ` [Linux-cachefs] " Jeff Layton
2020-10-01  0:09     ` Daire Byrne
2020-10-01 10:36       ` Jeff Layton
2020-10-01 12:38         ` Trond Myklebust
2020-10-01 16:39           ` Jeff Layton
2020-10-05 12:54         ` Daire Byrne
2020-10-13  9:59           ` Daire Byrne
2020-10-01 18:41     ` J. Bruce Fields
2020-10-01 19:24       ` Trond Myklebust
2020-10-01 19:26         ` bfields
2020-10-01 19:29           ` Trond Myklebust
2020-10-01 19:51             ` bfields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=108670779.52656705.1601110822013.JavaMail.zimbra@dneg.com \
    --to=daire@dneg.com \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=fllinden@amazon.com \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).