From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 388EEC2BC11 for ; Fri, 11 Sep 2020 19:46:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EBF0F2075B for ; Fri, 11 Sep 2020 19:46:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1599853617; bh=r8AUtmEf9Y6MaU6ng4NgjUVmpxDHBVxj/Lds9DsOuek=; h=Subject:From:To:Cc:Date:In-Reply-To:References:List-ID:From; b=tD20ahMYPXfHmzby9mCy0uNLvRWPdTGWWVIaMEn5HO1vn2+wJBbxvmB1NHFFz8xXh g8790aKZ7Wtnz9NuZ2LcPL+exl/e9YDnZZBMkmm4hj5g91mWmpMYnzzdsYn4vqIVzO v6IOrwDXfnYFjq/fQl63TcHsdxMTVvjLZ8mc65fw= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725914AbgIKTq4 (ORCPT ); Fri, 11 Sep 2020 15:46:56 -0400 Received: from mail.kernel.org ([198.145.29.99]:43076 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725817AbgIKTqy (ORCPT ); Fri, 11 Sep 2020 15:46:54 -0400 Received: from tleilax.poochiereds.net (68-20-15-154.lightspeed.rlghnc.sbcglobal.net [68.20.15.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A95F8208E4; Fri, 11 Sep 2020 19:46:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1599853613; bh=r8AUtmEf9Y6MaU6ng4NgjUVmpxDHBVxj/Lds9DsOuek=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=YGAzvpo3aq8qmPW6obCtu33WIDkbd/NG0TyUezH+xQ6JK9v/Hvps8dRjeeDn1Acri fsqGCZaN8LK/KO6bydxNbrq+ODpfn2q0XHJOZMLGaaNHvCag/LgcshchFl2DpwpYA6 dODVwb9bH+ivJUKEJgj8IRwpPuSguaaoN4W3Ph2I= Message-ID: Subject: Re: [PATCH v5 0/2] ceph: metrics for opened files, pinned caps and opened inodes From: Jeff Layton To: Xiubo Li , Ilya Dryomov Cc: "Yan, Zheng" , Patrick Donnelly , Ceph Development Date: Fri, 11 Sep 2020 15:46:51 -0400 In-Reply-To: References: <20200903130140.799392-1-xiubli@redhat.com> <449a56624f3dd4e2a4a4cf95cd24d69c53700b6d.camel@kernel.org> <9a5c5d2f-d105-21c4-327e-5ad18bf49518@redhat.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.36.5 (3.36.5-1.fc32) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org On Fri, 2020-09-11 at 07:49 -0400, Jeff Layton wrote: > On Fri, 2020-09-11 at 11:43 +0800, Xiubo Li wrote: > > On 2020/9/10 20:13, Jeff Layton wrote: > > > On Thu, 2020-09-10 at 08:00 +0200, Ilya Dryomov wrote: > > > > On Thu, Sep 10, 2020 at 2:59 AM Xiubo Li wrote: > > > > > On 2020/9/10 4:34, Ilya Dryomov wrote: > > > > > > On Thu, Sep 3, 2020 at 4:22 PM Xiubo Li wrote: > > > > > > > On 2020/9/3 22:18, Jeff Layton wrote: > > > > > > > > On Thu, 2020-09-03 at 09:01 -0400, xiubli@redhat.com wrote: > > > > > > > > > From: Xiubo Li > > > > > > > > > > > > > > > > > > Changed in V5: > > > > > > > > > - Remove mdsc parsing helpers except the ceph_sb_to_mdsc() > > > > > > > > > - Remove the is_opened member. > > > > > > > > > > > > > > > > > > Changed in V4: > > > > > > > > > - A small fix about the total_inodes. > > > > > > > > > > > > > > > > > > Changed in V3: > > > > > > > > > - Resend for V2 just forgot one patch, which is adding some helpers > > > > > > > > > support to simplify the code. > > > > > > > > > > > > > > > > > > Changed in V2: > > > > > > > > > - Add number of inodes that have opened files. > > > > > > > > > - Remove the dir metrics and fold into files. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Xiubo Li (2): > > > > > > > > > ceph: add ceph_sb_to_mdsc helper support to parse the mdsc > > > > > > > > > ceph: metrics for opened files, pinned caps and opened inodes > > > > > > > > > > > > > > > > > > fs/ceph/caps.c | 41 +++++++++++++++++++++++++++++++++++++---- > > > > > > > > > fs/ceph/debugfs.c | 11 +++++++++++ > > > > > > > > > fs/ceph/dir.c | 20 +++++++------------- > > > > > > > > > fs/ceph/file.c | 13 ++++++------- > > > > > > > > > fs/ceph/inode.c | 11 ++++++++--- > > > > > > > > > fs/ceph/locks.c | 2 +- > > > > > > > > > fs/ceph/metric.c | 14 ++++++++++++++ > > > > > > > > > fs/ceph/metric.h | 7 +++++++ > > > > > > > > > fs/ceph/quota.c | 10 +++++----- > > > > > > > > > fs/ceph/snap.c | 2 +- > > > > > > > > > fs/ceph/super.h | 6 ++++++ > > > > > > > > > 11 files changed, 103 insertions(+), 34 deletions(-) > > > > > > > > > > > > > > > > > Looks good. I went ahead and merge this into testing. > > > > > > > > > > > > > > > > Small merge conflict in quota.c, which I guess is probably due to not > > > > > > > > basing this on testing branch. I also dropped what looks like an > > > > > > > > unrelated hunk in the second patch. > > > > > > > > > > > > > > > > In the future, if you can be sure that patches you post apply cleanly to > > > > > > > > testing branch then that would make things easier. > > > > > > > Okay, will do it. > > > > > > Hi Xiubo, > > > > > > > > > > > > There is a problem with lifetimes here. mdsc isn't guaranteed to exist > > > > > > when ->free_inode() is called. This can lead to crashes on a NULL mdsc > > > > > > in ceph_free_inode() in case of e.g. "umount -f". I know it was Jeff's > > > > > > suggestion to move the decrement of total_inodes into ceph_free_inode(), > > > > > > but it doesn't look like it can be easily deferred past ->evict_inode(). > > > > > Okay, I will take a look. > > > > Given that it's just a counter which we don't care about if the > > > > mount is going away, some form of "if (mdsc)" check might do, but > > > > need to make sure that it covers possible races, if any. > > > > > > > Good catch, Ilya. > > > > > > What may be best is to move the increment out of ceph_alloc_inode and > > > instead put it in ceph_set_ino_cb. Then the decrement can go back into > > > ceph_evict_inode. > > > > Hi Jeff, Ilya > > > > Checked the code, it seems in the ceph_evict_inode() we will also hit > > the same issue . > > > > With the '-f' options when umounting, it will skip the inodes whose > > i_count ref > 0. And then free the fsc/mdsc in ceph. And later the > > iput_final() will call the ceph_evict_inode() and then ceph_free_inode(). > > > > Could we just check if !!(sb->s_flags & SB_ACTIVE) is false will we skip > > the counting ? > > > > Note that umount -f (MNT_FORCE) just means that ceph_umount_begin is > called before unmounting. > > If what you're saying it true, then we have bigger problems. > ceph_evict_inode does this today when ci->i_snap_realm is set: > > struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc; > > ...and then goes on to use that mdsc pointer. > Now that I look, I don't think that this is a problem. ceph_kill_sb calls generic_shutdown_super, which calls evict_inodes before the client is torn down. That should ensure that the mdsc is still good when evict is called. We will need to move the increment into the iget5_locked "set" function. Maybe we can squash the patch below into yours? ----------------------8<--------------------------- ceph: use total_inodes to count hashed inodes instead of allocated ones We can't guarantee that the mdsc will still be around when free_inode is called, so move this into evict_inode instead. The increment then will need to be moved when the thing is hashed, so move that into the set callback. Reported-by: Ilya Dryomov Signed-off-by: Jeff Layton --- fs/ceph/inode.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 5b9d2ff8af34..39c13fefba8a 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -42,10 +42,13 @@ static void ceph_inode_work(struct work_struct *work); static int ceph_set_ino_cb(struct inode *inode, void *data) { struct ceph_inode_info *ci = ceph_inode(inode); + struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(inode->i_sb); ci->i_vino = *(struct ceph_vino *)data; inode->i_ino = ceph_vino_to_ino_t(ci->i_vino); inode_set_iversion_raw(inode, 0); + percpu_counter_inc(&mdsc->metric.total_inodes); + return 0; } @@ -425,7 +428,6 @@ static int ceph_fill_fragtree(struct inode *inode, */ struct inode *ceph_alloc_inode(struct super_block *sb) { - struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(sb); struct ceph_inode_info *ci; int i; @@ -525,17 +527,12 @@ struct inode *ceph_alloc_inode(struct super_block *sb) ci->i_meta_err = 0; - percpu_counter_inc(&mdsc->metric.total_inodes); - return &ci->vfs_inode; } void ceph_free_inode(struct inode *inode) { struct ceph_inode_info *ci = ceph_inode(inode); - struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(inode->i_sb); - - percpu_counter_dec(&mdsc->metric.total_inodes); kfree(ci->i_symlink); kmem_cache_free(ceph_inode_cachep, ci); @@ -544,11 +541,14 @@ void ceph_free_inode(struct inode *inode) void ceph_evict_inode(struct inode *inode) { struct ceph_inode_info *ci = ceph_inode(inode); + struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(inode->i_sb); struct ceph_inode_frag *frag; struct rb_node *n; dout("evict_inode %p ino %llx.%llx\n", inode, ceph_vinop(inode)); + percpu_counter_dec(&mdsc->metric.total_inodes); + truncate_inode_pages_final(&inode->i_data); clear_inode(inode); -- 2.26.2