From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79B96C07E94 for ; Fri, 4 Jun 2021 12:26:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 563D361418 for ; Fri, 4 Jun 2021 12:26:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230039AbhFDM2J (ORCPT ); Fri, 4 Jun 2021 08:28:09 -0400 Received: from mail.kernel.org ([198.145.29.99]:58540 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229931AbhFDM2J (ORCPT ); Fri, 4 Jun 2021 08:28:09 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 140ED61405; Fri, 4 Jun 2021 12:26:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1622809583; bh=Zy+dQEdANSqpRB4HgJq0QoB1lm/L7iTxlX2KguowZ9c=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=eIHiqJITC7Ggx0Vio6w8aJ4lhZ/cLd6SA0tP/eqcq3Vb1YRXfQiSG+z7StOF/pcDo 7LDMl+1PEoP03HM5phS9PgWGr4kWgJYeAnuUPSF8sGVzqbthdXEiYkzL4Rbv92kYfX 1l4X2JLbolLvFbAhzmhfq3nnF1/HtB3pqx32qSW2ld1kVVfxgm2/Ykk0iex8BDR948 q+kiJRObhh8z8qTyQjy/mgD8Ex1fmLc81+wRoi0+DyIv7V8kmEkXzPsd3M1+pS8PLg OZ0uSPwA6fjqqd0I5KrNRmixSnsytoQsjPYzWSnUWzVzdl/Kw8wCG41pGr+ieGPeDt GU7RJvsy/jhWQ== Message-ID: <77df0b922c9d02e371cbe9d2a6308eeab408abab.camel@kernel.org> Subject: Re: [PATCH] ceph: ensure we flush delayed caps when unmounting From: Jeff Layton To: Luis Henriques Cc: ceph-devel@vger.kernel.org, idryomov@gmail.com Date: Fri, 04 Jun 2021 08:26:21 -0400 In-Reply-To: References: <20210603134812.80276-1-jlayton@kernel.org> <6cd5b19cbcee46474709a97b273c4270088fb241.camel@kernel.org> Content-Type: text/plain; charset="ISO-8859-15" User-Agent: Evolution 3.40.1 (3.40.1-1.fc34) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org On Fri, 2021-06-04 at 10:35 +0100, Luis Henriques wrote: > On Thu, Jun 03, 2021 at 12:57:22PM -0400, Jeff Layton wrote: > > On Thu, 2021-06-03 at 09:48 -0400, Jeff Layton wrote: > > > I've seen some warnings when testing recently that indicate that there > > > are caps still delayed on the delayed list even after we've started > > > unmounting. > > > > > > When checking delayed caps, process the whole list if we're unmounting, > > > and check for delayed caps after setting the stopping var and flushing > > > dirty caps. > > > > > > Signed-off-by: Jeff Layton > > > --- > > > fs/ceph/caps.c | 3 ++- > > > fs/ceph/mds_client.c | 1 + > > > 2 files changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > > > index a5e93b185515..68b4c6dfe4db 100644 > > > --- a/fs/ceph/caps.c > > > +++ b/fs/ceph/caps.c > > > @@ -4236,7 +4236,8 @@ void ceph_check_delayed_caps(struct ceph_mds_client *mdsc) > > > ci = list_first_entry(&mdsc->cap_delay_list, > > > struct ceph_inode_info, > > > i_cap_delay_list); > > > - if ((ci->i_ceph_flags & CEPH_I_FLUSH) == 0 && > > > + if (!mdsc->stopping && > > > + (ci->i_ceph_flags & CEPH_I_FLUSH) == 0 && > > > time_before(jiffies, ci->i_hold_caps_max)) > > > break; > > > list_del_init(&ci->i_cap_delay_list); > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > > > index e5af591d3bd4..916af5497829 100644 > > > --- a/fs/ceph/mds_client.c > > > +++ b/fs/ceph/mds_client.c > > > @@ -4691,6 +4691,7 @@ void ceph_mdsc_pre_umount(struct ceph_mds_client *mdsc) > > > > > > lock_unlock_sessions(mdsc); > > > ceph_flush_dirty_caps(mdsc); > > > + ceph_check_delayed_caps(mdsc); > > > wait_requests(mdsc); > > > > > > /* > > > > I'm going to self-NAK this patch for now. Initially this looked good in > > testing, but I think it's just papering over the real problem, which is > > that ceph_async_iput can queue a job to a workqueue after the point > > where we've flushed that workqueue on umount. > > Ah, yeah. I think I saw this a few times with generic/014 (and I believe > we chatted about it on irc). I've been on and off trying to figure out > the way to fix it but it's really tricky. > Yeah, that's putting it mildly.  The biggest issue here is the session->s_mutex, which is held over large swaths of the code, but it's not fully clear what it protects. The original patch that added ceph_async_iput did it to avoid the session mutex that gets held for ceph_iterate_session_caps. My current thinking is that we probably don't need to hold the session mutex over that function in some cases, if we can guarantee that the ceph_cap objects we're iterating over don't go away when the lock is dropped. So, I'm trying to add some refcounting to the ceph_cap structures themselves to see if that helps. It may turn out to be a dead end, but if we don't chip away at the edges of the fundamental problem, we'll never get there... -- Jeff Layton