From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E32AC433E2 for ; Sun, 13 Sep 2020 12:45:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CF31D217BA for ; Sun, 13 Sep 2020 12:45:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1600001125; bh=BvTGIw0DVimi6HDTCPF+JdHAt7bTyCJ7w+5MFuVMMRc=; h=Subject:From:To:Cc:Date:In-Reply-To:References:List-ID:From; b=YW0MWmDvi+hWiLvZKwOiSvIn9SDGxh0j0vs5w1jEbucXTe681XzWsBb1uZczuz2w1 wpKTYwGbLewi9Zshq6Nf7J7bzdRwqntbgQIxuoF5ULHBmvEJ1ZDT3+UYEVpuQfdIgr hbZFGV00LGP0xkUAo/tzZk5PaPQsE0mePwKnjgRs= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725961AbgIMMpY (ORCPT ); Sun, 13 Sep 2020 08:45:24 -0400 Received: from mail.kernel.org ([198.145.29.99]:40762 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725931AbgIMMpW (ORCPT ); Sun, 13 Sep 2020 08:45:22 -0400 Received: from vulkan (047-135-012-206.res.spectrum.com [47.135.12.206]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id AC2B62158C; Sun, 13 Sep 2020 12:45:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1600001121; bh=BvTGIw0DVimi6HDTCPF+JdHAt7bTyCJ7w+5MFuVMMRc=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=BN9xfDZxA4m08MppPF+xDjDbt8ktl+K/JHiW2khW/BI0BVIxCm2LDwRUvsxTBlTdV 10hv14FODNyj6hs/2Qr6gMvE8kXQMfglTOD+Au1sXOyQTFp71ArwtEocDWShBfNuRC xdtECbS3V3fP2a6WvLhj7So5rE6xwG4c+C2jNL9E= Message-ID: <3f6d555b3a2e6ccbc82631bc58c18de0be10383e.camel@kernel.org> Subject: Re: [PATCH v5 0/2] ceph: metrics for opened files, pinned caps and opened inodes From: Jeff Layton To: Ilya Dryomov Cc: Xiubo Li , "Yan, Zheng" , Patrick Donnelly , Ceph Development Date: Sun, 13 Sep 2020 08:45:18 -0400 In-Reply-To: References: <20200903130140.799392-1-xiubli@redhat.com> <449a56624f3dd4e2a4a4cf95cd24d69c53700b6d.camel@kernel.org> <9a5c5d2f-d105-21c4-327e-5ad18bf49518@redhat.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.36.5 (3.36.5-1.fc32) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org On Sun, 2020-09-13 at 12:40 +0200, Ilya Dryomov wrote: > On Fri, Sep 11, 2020 at 9:46 PM Jeff Layton wrote: > > On Fri, 2020-09-11 at 07:49 -0400, Jeff Layton wrote: > > > On Fri, 2020-09-11 at 11:43 +0800, Xiubo Li wrote: > > > > On 2020/9/10 20:13, Jeff Layton wrote: > > > > > On Thu, 2020-09-10 at 08:00 +0200, Ilya Dryomov wrote: > > > > > > On Thu, Sep 10, 2020 at 2:59 AM Xiubo Li wrote: > > > > > > > On 2020/9/10 4:34, Ilya Dryomov wrote: > > > > > > > > On Thu, Sep 3, 2020 at 4:22 PM Xiubo Li wrote: > > > > > > > > > On 2020/9/3 22:18, Jeff Layton wrote: > > > > > > > > > > On Thu, 2020-09-03 at 09:01 -0400, xiubli@redhat.com wrote: > > > > > > > > > > > From: Xiubo Li > > > > > > > > > > > > > > > > > > > > > > Changed in V5: > > > > > > > > > > > - Remove mdsc parsing helpers except the ceph_sb_to_mdsc() > > > > > > > > > > > - Remove the is_opened member. > > > > > > > > > > > > > > > > > > > > > > Changed in V4: > > > > > > > > > > > - A small fix about the total_inodes. > > > > > > > > > > > > > > > > > > > > > > Changed in V3: > > > > > > > > > > > - Resend for V2 just forgot one patch, which is adding some helpers > > > > > > > > > > > support to simplify the code. > > > > > > > > > > > > > > > > > > > > > > Changed in V2: > > > > > > > > > > > - Add number of inodes that have opened files. > > > > > > > > > > > - Remove the dir metrics and fold into files. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Xiubo Li (2): > > > > > > > > > > > ceph: add ceph_sb_to_mdsc helper support to parse the mdsc > > > > > > > > > > > ceph: metrics for opened files, pinned caps and opened inodes > > > > > > > > > > > > > > > > > > > > > > fs/ceph/caps.c | 41 +++++++++++++++++++++++++++++++++++++---- > > > > > > > > > > > fs/ceph/debugfs.c | 11 +++++++++++ > > > > > > > > > > > fs/ceph/dir.c | 20 +++++++------------- > > > > > > > > > > > fs/ceph/file.c | 13 ++++++------- > > > > > > > > > > > fs/ceph/inode.c | 11 ++++++++--- > > > > > > > > > > > fs/ceph/locks.c | 2 +- > > > > > > > > > > > fs/ceph/metric.c | 14 ++++++++++++++ > > > > > > > > > > > fs/ceph/metric.h | 7 +++++++ > > > > > > > > > > > fs/ceph/quota.c | 10 +++++----- > > > > > > > > > > > fs/ceph/snap.c | 2 +- > > > > > > > > > > > fs/ceph/super.h | 6 ++++++ > > > > > > > > > > > 11 files changed, 103 insertions(+), 34 deletions(-) > > > > > > > > > > > > > > > > > > > > > Looks good. I went ahead and merge this into testing. > > > > > > > > > > > > > > > > > > > > Small merge conflict in quota.c, which I guess is probably due to not > > > > > > > > > > basing this on testing branch. I also dropped what looks like an > > > > > > > > > > unrelated hunk in the second patch. > > > > > > > > > > > > > > > > > > > > In the future, if you can be sure that patches you post apply cleanly to > > > > > > > > > > testing branch then that would make things easier. > > > > > > > > > Okay, will do it. > > > > > > > > Hi Xiubo, > > > > > > > > > > > > > > > > There is a problem with lifetimes here. mdsc isn't guaranteed to exist > > > > > > > > when ->free_inode() is called. This can lead to crashes on a NULL mdsc > > > > > > > > in ceph_free_inode() in case of e.g. "umount -f". I know it was Jeff's > > > > > > > > suggestion to move the decrement of total_inodes into ceph_free_inode(), > > > > > > > > but it doesn't look like it can be easily deferred past ->evict_inode(). > > > > > > > Okay, I will take a look. > > > > > > Given that it's just a counter which we don't care about if the > > > > > > mount is going away, some form of "if (mdsc)" check might do, but > > > > > > need to make sure that it covers possible races, if any. > > > > > > > > > > > Good catch, Ilya. > > > > > > > > > > What may be best is to move the increment out of ceph_alloc_inode and > > > > > instead put it in ceph_set_ino_cb. Then the decrement can go back into > > > > > ceph_evict_inode. > > > > > > > > Hi Jeff, Ilya > > > > > > > > Checked the code, it seems in the ceph_evict_inode() we will also hit > > > > the same issue . > > > > > > > > With the '-f' options when umounting, it will skip the inodes whose > > > > i_count ref > 0. And then free the fsc/mdsc in ceph. And later the > > > > iput_final() will call the ceph_evict_inode() and then ceph_free_inode(). > > > > > > > > Could we just check if !!(sb->s_flags & SB_ACTIVE) is false will we skip > > > > the counting ? > > > > > > > > > > Note that umount -f (MNT_FORCE) just means that ceph_umount_begin is > > > called before unmounting. > > > > > > If what you're saying it true, then we have bigger problems. > > > ceph_evict_inode does this today when ci->i_snap_realm is set: > > > > > > struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc; > > > > > > ...and then goes on to use that mdsc pointer. > > > > > > > Now that I look, I don't think that this is a problem. ceph_kill_sb > > calls generic_shutdown_super, which calls evict_inodes before the client > > is torn down. That should ensure that the mdsc is still good when evict > > is called. > > > > We will need to move the increment into the iget5_locked "set" function. > > Maybe we can squash the patch below into yours? > > > > ----------------------8<--------------------------- > > > > ceph: use total_inodes to count hashed inodes instead of allocated ones > > > > We can't guarantee that the mdsc will still be around when free_inode is > > called, so move this into evict_inode instead. The increment then will > > need to be moved when the thing is hashed, so move that into the set > > callback. > > > > Reported-by: Ilya Dryomov > > Signed-off-by: Jeff Layton > > --- > > fs/ceph/inode.c | 12 ++++++------ > > 1 file changed, 6 insertions(+), 6 deletions(-) > > > > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c > > index 5b9d2ff8af34..39c13fefba8a 100644 > > --- a/fs/ceph/inode.c > > +++ b/fs/ceph/inode.c > > @@ -42,10 +42,13 @@ static void ceph_inode_work(struct work_struct *work); > > static int ceph_set_ino_cb(struct inode *inode, void *data) > > { > > struct ceph_inode_info *ci = ceph_inode(inode); > > + struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(inode->i_sb); > > > > ci->i_vino = *(struct ceph_vino *)data; > > inode->i_ino = ceph_vino_to_ino_t(ci->i_vino); > > inode_set_iversion_raw(inode, 0); > > + percpu_counter_inc(&mdsc->metric.total_inodes); > > + > > return 0; > > } > > > > @@ -425,7 +428,6 @@ static int ceph_fill_fragtree(struct inode *inode, > > */ > > struct inode *ceph_alloc_inode(struct super_block *sb) > > { > > - struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(sb); > > struct ceph_inode_info *ci; > > int i; > > > > @@ -525,17 +527,12 @@ struct inode *ceph_alloc_inode(struct super_block *sb) > > > > ci->i_meta_err = 0; > > > > - percpu_counter_inc(&mdsc->metric.total_inodes); > > - > > return &ci->vfs_inode; > > } > > > > void ceph_free_inode(struct inode *inode) > > { > > struct ceph_inode_info *ci = ceph_inode(inode); > > - struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(inode->i_sb); > > - > > - percpu_counter_dec(&mdsc->metric.total_inodes); > > > > kfree(ci->i_symlink); > > kmem_cache_free(ceph_inode_cachep, ci); > > @@ -544,11 +541,14 @@ void ceph_free_inode(struct inode *inode) > > void ceph_evict_inode(struct inode *inode) > > { > > struct ceph_inode_info *ci = ceph_inode(inode); > > + struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(inode->i_sb); > > I'd also remove a duplicate mdsc variable declared in ci->i_snap_realm > branch. > Good catch. Fixed in testing branch. Thanks, -- Jeff Layton