linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Frank van der Linden <fllinden@amazon.com>
To: Daire Byrne <daire@dneg.com>
Cc: Chuck Lever <chuck.lever@oracle.com>,
	bfields <bfields@fieldses.org>,
	linux-nfs <linux-nfs@vger.kernel.org>,
	linux-cachefs <linux-cachefs@redhat.com>
Subject: Re: Adventures in NFS re-exporting
Date: Wed, 23 Sep 2020 21:01:57 +0000	[thread overview]
Message-ID: <20200923210157.GA1650@dev-dsk-fllinden-2c-c1893d73.us-west-2.amazon.com> (raw)
In-Reply-To: <1790619463.44171163.1600892707423.JavaMail.zimbra@dneg.com>

On Wed, Sep 23, 2020 at 09:25:07PM +0100, Daire Byrne wrote:
> 
> ----- On 22 Sep, 2020, at 17:43, Chuck Lever chuck.lever@oracle.com wrote:
> >> On Sep 17, 2020, at 4:23 PM, Frank van der Linden <fllinden@amazon.com> wrote:
> >>
> >> On Thu, Sep 17, 2020 at 03:09:31PM -0400, bfields wrote:
> >>>
> >>> On Thu, Sep 17, 2020 at 05:01:11PM +0100, Daire Byrne wrote:
> >>>>
> >>>> ----- On 15 Sep, 2020, at 18:21, bfields bfields@fieldses.org wrote:
> >>>>
> >>>>>> 4) With an NFSv4 re-export, lots of open/close requests (hundreds per
> >>>>>> second) quickly eat up the CPU on the re-export server and perf top
> >>>>>> shows we are mostly in native_queued_spin_lock_slowpath.
> >>>>>
> >>>>> Any statistics on who's calling that function?
> >>>>
> >>>> I've always struggled to reproduce this with a simple open/close simulation, so
> >>>> I suspect some other operations need to be mixed in too. But I have one
> >>>> production workload that I know has lots of opens & closes (buggy software)
> >>>> included in amongst the usual reads, writes etc.
> >>>>
> >>>> With just 40 clients mounting the reexport server (v5.7.6) using NFSv4.2, we see
> >>>> the CPU of the nfsd threads increase rapidly and by the time we have 100
> >>>> clients, we have maxed out the 32 cores of the server with most of that in
> >>>> native_queued_spin_lock_slowpath.
> >>>
> >>> That sounds a lot like what Frank Van der Linden reported:
> >>>
> >>>        https://lore.kernel.org/linux-nfs/20200608192122.GA19171@dev-dsk-fllinden-2c-c1893d73.us-west-2.amazon.com/
> >>>
> >>> It looks like a bug in the filehandle caching code.
> >>>
> >>> --b.
> >>
> >> Yes, that does look like the same one.
> >>
> >> I still think that not caching v4 files at all may be the best way to go
> >> here, since the intent of the filecache code was to speed up v2/v3 I/O,
> >> where you end up doing a lot of opens/closes, but it doesn't make as
> >> much sense for v4.
> >>
> >> However, short of that, I tested a local patch a few months back, that
> >> I never posted here, so I'll do so now. It just makes v4 opens in to
> >> 'long term' opens, which do not get put on the LRU, since that doesn't
> >> make sense (they are in the hash table, so they are still cached).
> >>
> >> Also, the file caching code seems to walk the LRU a little too often,
> >> but that's another issue - and this change keeps the LRU short, so it's
> >> not a big deal.
> >>
> >> I don't particularly love this patch, but it does keep the LRU short, and
> >> did significantly speed up my testcase (by about 50%). So, maybe you can
> >> give it a try.
> >>
> >> I'll also attach a second patch, that converts the hash table to an rhashtable,
> >> which automatically grows and shrinks in size with usage. That patch also
> >> helped, but not by nearly as much (I think it yielded another 10%).
> >
> > For what it's worth, I applied your two patches to my test server, along
> > with my patch that force-closes cached file descriptors during NFSv4
> > CLOSE processing. The patch combination improves performance (faster
> > elapsed time) for my workload as well.
> 
> I tested Frank's NFSv4 filecache patches with some production workloads and I've hit the below refcount issue a couple of times in the last 48 hours with v5.8.10. This server was re-exporting an NFS client mount at the time.
> 
> Apologies for the spam if I've just hit something unrelated to the patches that is present in v5.8.10.... In truth, I have not used this kernel version before with this workload and just patched it because I had it ready to go. I'll remove the 2 patches and verify.
> 
> Daire
> 
> 
> [ 8930.027838] ------------[ cut here ]------------
> [ 8930.032769] refcount_t: addition on 0; use-after-free.
> [ 8930.038251] WARNING: CPU: 2 PID: 3624 at lib/refcount.c:25 refcount_warn_saturate+0x6e/0xf0
> [ 8930.046799] Modules linked in: tcp_diag inet_diag rpcsec_gss_krb5 nfsv4 dns_resolver act_mirred sch_ingress ifb nfsv3 nfs cls_u32 sch_fq sch_prio cachefiles fscache ext4 mbcache jbd2 sb_edac rapl sg virtio_rng i2c_piix4 input_leds nfsd auth_rpcgss nfs_acl lockd grace binfmt_misc ip_tables xfs libcrc32c sd_mod t10_pi 8021q garp mrp virtio_net net_failover failover virtio_scsi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel scsi_transport_iscsi crypto_simd cryptd glue_helper virtio_pci virtio_ring virtio serio_raw sunrpc dm_mirror dm_region_hash dm_log dm_mod
> [ 8930.098703] CPU: 2 PID: 3624 Comm: nfsd Tainted: G        W         5.8.10-1.dneg.x86_64 #1
> [ 8930.107391] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> [ 8930.116775] RIP: 0010:refcount_warn_saturate+0x6e/0xf0
> [ 8930.122078] Code: 49 91 18 01 01 e8 57 d6 c2 ff 0f 0b 5d c3 80 3d 38 91 18 01 00 75 d1 48 c7 c7 d0 5c 13 82 c6 05 28 91 18 01 01 e8 37 d6 c2 ff <0f> 0b 5d c3 80 3d 1a 91 18 01 00 75 b1 48 c7 c7 a8 5c 13 82 c6 05
> [ 8930.141107] RSP: 0018:ffffc900012efc70 EFLAGS: 00010282
> [ 8930.146497] RAX: 0000000000000000 RBX: ffff888cc12811e0 RCX: 0000000000000000
> [ 8930.153793] RDX: ffff888d0bca8f20 RSI: ffff888d0bc98d40 RDI: ffff888d0bc98d40
> [ 8930.161087] RBP: ffffc900012efc70 R08: ffff888d0bc98d40 R09: 0000000000000019
> [ 8930.168380] R10: 000000000000072e R11: ffffc900012efad8 R12: ffff888b8bdad600
> [ 8930.175680] R13: ffff888cd428ebe0 R14: ffff8889264f9170 R15: 0000000000000000
> [ 8930.182976] FS:  0000000000000000(0000) GS:ffff888d0bc80000(0000) knlGS:0000000000000000
> [ 8930.191231] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 8930.197139] CR2: 00007fbe43ca1248 CR3: 0000000ce48ee004 CR4: 00000000001606e0
> [ 8930.204436] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 8930.211734] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 8930.219027] Call Trace:
> [ 8930.221665]  nfsd4_process_open2+0xa48/0xec0 [nfsd]
> [ 8930.226724]  ? nfsd_permission+0x6b/0x100 [nfsd]
> [ 8930.231524]  ? fh_verify+0x167/0x210 [nfsd]
> [ 8930.235893]  nfsd4_open+0x407/0x820 [nfsd]
> [ 8930.240248]  nfsd4_proc_compound+0x3c2/0x760 [nfsd]
> [ 8930.245296]  ? nfsd4_decode_compound.constprop.0+0x3a9/0x450 [nfsd]
> [ 8930.251734]  nfsd_dispatch+0xe2/0x220 [nfsd]
> [ 8930.256213]  svc_process_common+0x47b/0x6f0 [sunrpc]
> [ 8930.261355]  ? svc_sock_secure_port+0x16/0x30 [sunrpc]
> [ 8930.266707]  ? nfsd_svc+0x330/0x330 [nfsd]
> [ 8930.270981]  svc_process+0xc5/0x100 [sunrpc]
> [ 8930.275423]  nfsd+0xe8/0x150 [nfsd]
> [ 8930.280028]  kthread+0x114/0x150
> [ 8930.283434]  ? nfsd_destroy+0x60/0x60 [nfsd]
> [ 8930.287875]  ? kthread_park+0x90/0x90
> [ 8930.291700]  ret_from_fork+0x22/0x30
> [ 8930.295447] ---[ end trace c551536c3520545c ]---

It's entirely possible that my patch introduces a refcounting error - it was
intended as a proof-of-concept on how to fix the LRU locking issue for v4
open file caching (while keeping it enabled) - which is why I didn't
"formally" send it in.

Having said that, I don't immediately see the problem.

Maybe try it without the rhashtable patch, that is much less of an
optimization.

The problem would have to be nf_ref as part of nfsd_file, or fi_ref as part
of nfs4_file. If it's the latter, it's probably the rhashtable change.

- Frank

  reply	other threads:[~2020-09-23 21:02 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-07 17:31 Adventures in NFS re-exporting Daire Byrne
2020-09-08  9:40 ` Mkrtchyan, Tigran
2020-09-08 11:06   ` Daire Byrne
2020-09-15 17:21 ` J. Bruce Fields
2020-09-15 19:59   ` Trond Myklebust
2020-09-16 16:01     ` Daire Byrne
2020-10-19 16:19       ` Daire Byrne
2020-10-19 17:53         ` [PATCH 0/2] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-19 17:53           ` [PATCH 1/2] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-19 17:53             ` [PATCH 2/2] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-19 20:05         ` [PATCH v2 0/2] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-19 20:05           ` [PATCH v2 1/2] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-19 20:05             ` [PATCH v2 2/2] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-20 18:37         ` [PATCH v3 0/3] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-20 18:37           ` [PATCH v3 1/3] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-20 18:37             ` [PATCH v3 2/3] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-20 18:37               ` [PATCH v3 3/3] NFSv4: Observe the NFS_MOUNT_SOFTREVAL flag in _nfs4_proc_lookupp trondmy
2020-10-21  9:33         ` Adventures in NFS re-exporting Daire Byrne
2020-11-09 16:02           ` bfields
2020-11-12 13:01             ` Daire Byrne
2020-11-12 13:57               ` bfields
2020-11-12 18:33                 ` Daire Byrne
2020-11-12 20:55                   ` bfields
2020-11-12 23:05                     ` Daire Byrne
2020-11-13 14:50                       ` bfields
2020-11-13 22:26                         ` bfields
2020-11-14 12:57                           ` Daire Byrne
2020-11-16 15:18                             ` bfields
2020-11-16 15:53                             ` bfields
2020-11-16 19:21                               ` Daire Byrne
2020-11-16 15:29                           ` Jeff Layton
2020-11-16 15:56                             ` bfields
2020-11-16 16:03                               ` Jeff Layton
2020-11-16 16:14                                 ` bfields
2020-11-16 16:38                                   ` Jeff Layton
2020-11-16 19:03                                     ` bfields
2020-11-16 20:03                                       ` Jeff Layton
2020-11-17  3:16                                         ` bfields
2020-11-17  3:18                                           ` [PATCH 1/4] nfsd: move fill_{pre,post}_wcc to nfsfh.c J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 2/4] nfsd: pre/post attr is using wrong change attribute J. Bruce Fields
2020-11-17 12:34                                               ` Jeff Layton
2020-11-17 15:26                                                 ` J. Bruce Fields
2020-11-17 15:34                                                   ` Jeff Layton
2020-11-20 22:38                                                     ` J. Bruce Fields
2020-11-20 22:39                                                       ` [PATCH 1/8] nfsd: only call inode_query_iversion in the I_VERSION case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 2/8] nfsd: simplify nfsd4_change_info J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 3/8] nfsd: minor nfsd4_change_attribute cleanup J. Bruce Fields
2020-11-21  0:34                                                           ` Jeff Layton
2020-11-20 22:39                                                         ` [PATCH 4/8] nfsd4: don't query change attribute in v2/v3 case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 5/8] nfs: use change attribute for NFS re-exports J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 6/8] nfsd: move change attribute generation to filesystem J. Bruce Fields
2020-11-21  0:58                                                           ` Jeff Layton
2020-11-21  1:01                                                             ` J. Bruce Fields
2020-11-21 13:00                                                           ` Jeff Layton
2020-11-20 22:39                                                         ` [PATCH 7/8] nfsd: skip some unnecessary stats in the v4 case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 8/8] Revert "nfsd4: support change_attr_type attribute" J. Bruce Fields
2020-11-20 22:44                                                       ` [PATCH 2/4] nfsd: pre/post attr is using wrong change attribute J. Bruce Fields
2020-11-21  1:03                                                         ` Jeff Layton
2020-11-21 21:44                                                           ` Daire Byrne
2020-11-22  0:02                                                             ` bfields
2020-11-22  1:55                                                               ` Daire Byrne
2020-11-22  3:03                                                                 ` bfields
2020-11-23 20:07                                                                   ` Daire Byrne
2020-11-17 15:25                                               ` J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 3/4] nfs: don't mangle i_version on NFS J. Bruce Fields
2020-11-17 12:27                                               ` Jeff Layton
2020-11-17 14:14                                                 ` J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 4/4] nfs: support i_version in the NFSv4 case J. Bruce Fields
2020-11-17 12:34                                               ` Jeff Layton
2020-11-24 20:35               ` Adventures in NFS re-exporting Daire Byrne
2020-11-24 21:15                 ` bfields
2020-11-24 22:15                   ` Frank Filz
2020-11-25 14:47                     ` 'bfields'
2020-11-25 16:25                       ` Frank Filz
2020-11-25 19:03                         ` 'bfields'
2020-11-26  0:04                           ` Frank Filz
2020-11-25 17:14                   ` Daire Byrne
2020-11-25 19:31                     ` bfields
2020-12-03 12:20                     ` Daire Byrne
2020-12-03 18:51                       ` bfields
2020-12-03 20:27                         ` Trond Myklebust
2020-12-03 21:13                           ` bfields
2020-12-03 21:32                             ` Frank Filz
2020-12-03 21:34                             ` Trond Myklebust
2020-12-03 21:45                               ` Frank Filz
2020-12-03 21:57                                 ` Trond Myklebust
2020-12-03 22:04                                   ` bfields
2020-12-03 22:14                                     ` Trond Myklebust
2020-12-03 22:39                                       ` Frank Filz
2020-12-03 22:50                                         ` Trond Myklebust
2020-12-03 23:34                                           ` Frank Filz
2020-12-03 22:44                                       ` bfields
2020-12-03 21:54                               ` bfields
2020-12-03 22:45                               ` bfields
2020-12-03 22:53                                 ` Trond Myklebust
2020-12-03 23:16                                   ` bfields
2020-12-03 23:28                                     ` Frank Filz
2020-12-04  1:02                                     ` Trond Myklebust
2020-12-04  1:41                                       ` bfields
2020-12-04  2:27                                         ` Trond Myklebust
2020-09-17 16:01   ` Daire Byrne
2020-09-17 19:09     ` bfields
2020-09-17 20:23       ` Frank van der Linden
2020-09-17 21:57         ` bfields
2020-09-19 11:08           ` Daire Byrne
2020-09-22 16:43         ` Chuck Lever
2020-09-23 20:25           ` Daire Byrne
2020-09-23 21:01             ` Frank van der Linden [this message]
2020-09-26  9:00               ` Daire Byrne
2020-09-28 15:49                 ` Frank van der Linden
2020-09-28 16:08                   ` Chuck Lever
2020-09-28 17:42                     ` Frank van der Linden
2020-09-22 12:31 ` Daire Byrne
2020-09-22 13:52   ` Trond Myklebust
2020-09-23 12:40     ` J. Bruce Fields
2020-09-23 13:09       ` Trond Myklebust
2020-09-23 17:07         ` bfields
2020-09-30 19:30   ` [Linux-cachefs] " Jeff Layton
2020-10-01  0:09     ` Daire Byrne
2020-10-01 10:36       ` Jeff Layton
2020-10-01 12:38         ` Trond Myklebust
2020-10-01 16:39           ` Jeff Layton
2020-10-05 12:54         ` Daire Byrne
2020-10-13  9:59           ` Daire Byrne
2020-10-01 18:41     ` J. Bruce Fields
2020-10-01 19:24       ` Trond Myklebust
2020-10-01 19:26         ` bfields
2020-10-01 19:29           ` Trond Myklebust
2020-10-01 19:51             ` bfields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200923210157.GA1650@dev-dsk-fllinden-2c-c1893d73.us-west-2.amazon.com \
    --to=fllinden@amazon.com \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=daire@dneg.com \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).