All of lore.kernel.org
 help / color / mirror / Atom feed
From: Luis Henriques <lhenriques@suse.com>
To: "Yan\, Zheng" <ukernel@gmail.com>
Cc: "Yan\, Zheng" <zyan@redhat.com>, Sage Weil <sage@redhat.com>,
	Ilya Dryomov <idryomov@gmail.com>,
	ceph-devel <ceph-devel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	stable@vger.kernel.org
Subject: Re: [PATCH] ceph: Fix a memory leak in ci->i_head_snapc
Date: Fri, 22 Mar 2019 10:04:44 +0000	[thread overview]
Message-ID: <87o963jkcj.fsf@suse.com> (raw)
In-Reply-To: <87a7hr19v7.fsf@suse.com> (Luis Henriques's message of "Tue, 19 Mar 2019 09:39:24 +0000")

Luis Henriques <lhenriques@suse.com> writes:

> "Yan, Zheng" <ukernel@gmail.com> writes:
>
>> On Tue, Mar 19, 2019 at 12:22 AM Luis Henriques <lhenriques@suse.com> wrote:
>>>
>>> "Yan, Zheng" <ukernel@gmail.com> writes:
>>>
>>> > On Mon, Mar 18, 2019 at 6:33 PM Luis Henriques <lhenriques@suse.com> wrote:
>>> >>
>>> >> "Yan, Zheng" <ukernel@gmail.com> writes:
>>> >>
>>> >> > On Fri, Mar 15, 2019 at 7:13 PM Luis Henriques <lhenriques@suse.com> wrote:
>>> >> >>
>>> >> >> I'm occasionally seeing a kmemleak warning in xfstest generic/013:
>>> >> >>
>>> >> >> unreferenced object 0xffff8881fccca940 (size 32):
>>> >> >>   comm "kworker/0:1", pid 12, jiffies 4295005883 (age 130.648s)
>>> >> >>   hex dump (first 32 bytes):
>>> >> >>     01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00  ................
>>> >> >>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> >> >>   backtrace:
>>> >> >>     [<00000000d741a1ea>] build_snap_context+0x5b/0x2a0
>>> >> >>     [<0000000021a00533>] rebuild_snap_realms+0x27/0x90
>>> >> >>     [<00000000ac538600>] rebuild_snap_realms+0x42/0x90
>>> >> >>     [<000000000e955fac>] ceph_update_snap_trace+0x2ee/0x610
>>> >> >>     [<00000000a9550416>] ceph_handle_snap+0x317/0x5f3
>>> >> >>     [<00000000fc287b83>] dispatch+0x362/0x176c
>>> >> >>     [<00000000a312c741>] ceph_con_workfn+0x9ce/0x2cf0
>>> >> >>     [<000000004168e3a9>] process_one_work+0x1d4/0x400
>>> >> >>     [<000000002188e9e7>] worker_thread+0x2d/0x3c0
>>> >> >>     [<00000000b593e4b3>] kthread+0x112/0x130
>>> >> >>     [<00000000a8587dca>] ret_from_fork+0x35/0x40
>>> >> >>     [<00000000ba1c9c1d>] 0xffffffffffffffff
>>> >> >>
>>> >> >> It looks like it is possible that we miss a flush_ack from the MDS when,
>>> >> >> for example, umounting the filesystem.  In that case, we can simply drop
>>> >> >> the reference to the ceph_snap_context obtained in ceph_queue_cap_snap().
>>> >> >>
>>> >> >> Link: https://tracker.ceph.com/issues/38224
>>> >> >> Cc: stable@vger.kernel.org
>>> >> >> Signed-off-by: Luis Henriques <lhenriques@suse.com>
>>> >> >> ---
>>> >> >>  fs/ceph/caps.c | 7 +++++++
>>> >> >>  1 file changed, 7 insertions(+)
>>> >> >>
>>> >> >> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
>>> >> >> index 36a8dc699448..208f4dc6f574 100644
>>> >> >> --- a/fs/ceph/caps.c
>>> >> >> +++ b/fs/ceph/caps.c
>>> >> >> @@ -1054,6 +1054,7 @@ int ceph_is_any_caps(struct inode *inode)
>>> >> >>  static void drop_inode_snap_realm(struct ceph_inode_info *ci)
>>> >> >>  {
>>> >> >>         struct ceph_snap_realm *realm = ci->i_snap_realm;
>>> >> >> +
>>> >> >>         spin_lock(&realm->inodes_with_caps_lock);
>>> >> >>         list_del_init(&ci->i_snap_realm_item);
>>> >> >>         ci->i_snap_realm_counter++;
>>> >> >> @@ -1063,6 +1064,12 @@ static void drop_inode_snap_realm(struct ceph_inode_info *ci)
>>> >> >>         spin_unlock(&realm->inodes_with_caps_lock);
>>> >> >>         ceph_put_snap_realm(ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc,
>>> >> >>                             realm);
>>> >> >> +       /*
>>> >> >> +        * ci->i_head_snapc should be NULL, but we may still be waiting for a
>>> >> >> +        * flush_ack from the MDS.  In that case, we still hold a ref for the
>>> >> >> +        * ceph_snap_context and we need to drop it.
>>> >> >> +        */
>>> >> >> +       ceph_put_snap_context(ci->i_head_snapc);
>>> >> >>  }
>>> >> >>
>>> >> >>  /*
>>> >> >
>>> >> > This does not seem right.  i_head_snapc is cleared when
>>> >> > (ci->i_wrbuffer_ref_head == 0 && ci->i_dirty_caps == 0 &&
>>> >> > ci->i_flushing_caps == 0) . Nothing do with dropping ci->i_snap_realm.
>>> >> > Did you see 'reconnect denied' during the test? If you did, the fix
>>> >> > should be in iterate_session_caps()
>>> >> >
>>> >>
>>> >> No, I didn't saw any 'reconnect denied' in the test.  The test actually
>>> >> seems to execute fine, except from the memory leak.
>>> >>
>>> >> It's very difficult to reproduce this issue, but last time I managed to
>>> >> get this memory leak to trigger I actually had some debugging code in
>>> >> drop_inode_snap_realm, something like:
>>> >>
>>> >>   if (ci->i_head_snapc)
>>> >>         printk("i_head_snapc: 0x%px\n", ci->i_head_snapc);
>>> >
>>> > please add code that prints i_wrbuffer_ref_head, i_dirty_caps,
>>> > i_flushing_caps. and try reproducing it again.
>>> >
>>>
>>> Ok, it took me a few hours, but I managed to reproduce the bug, with
>>> those extra printks.  All those values are set to 0 when the bug
>>> triggers (and i_head_snapc != NULL).
>>>
>>
>> Thanks, which test triggers this bug?
>
> That's generic/013.  It usually triggers after a few hours of running it
> in a loop (I'm using a vstart cluster for that).
>
>>
>> I searched that code, found we may fail to cleanup i_head_snap in two
>> places. One is in ceph_queue_cap_snap, Another is in
>> remove_session_caps_cb().
>
> Ah, great!  I spent a lot of time looking but I couldn't really find it.
> My bet was that ceph_queue_cap_snap was doing the ceph_get_snap_context
> and that the corresponding ceph_put_snap_context would be done in
> handle_cap_flush_ack. That's why I mentioned the missing flush_ack from
> MDS.

Something that I didn't said explicitly is that I *know* that the
unbalanced ceph_get_snap_context() is the one in function
ceph_queue_cap_snap().  I know this for sure because I've managed to
reproduce the issue several times with an instrumented kernel (with
tracepoints) that allowed me to keep track of all ceph_snap_context
operations (create, get, put, delete).

Cheers,
-- 
Luis

WARNING: multiple messages have this Message-ID (diff)
From: Luis Henriques <lhenriques@suse.com>
To: "Yan, Zheng" <ukernel@gmail.com>
Cc: "Yan, Zheng" <zyan@redhat.com>, Sage Weil <sage@redhat.com>,
	Ilya Dryomov <idryomov@gmail.com>,
	ceph-devel <ceph-devel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	stable@vger.kernel.org
Subject: Re: [PATCH] ceph: Fix a memory leak in ci->i_head_snapc
Date: Fri, 22 Mar 2019 10:04:44 +0000	[thread overview]
Message-ID: <87o963jkcj.fsf@suse.com> (raw)
In-Reply-To: <87a7hr19v7.fsf@suse.com> (Luis Henriques's message of "Tue, 19 Mar 2019 09:39:24 +0000")

Luis Henriques <lhenriques@suse.com> writes:

> "Yan, Zheng" <ukernel@gmail.com> writes:
>
>> On Tue, Mar 19, 2019 at 12:22 AM Luis Henriques <lhenriques@suse.com> wrote:
>>>
>>> "Yan, Zheng" <ukernel@gmail.com> writes:
>>>
>>> > On Mon, Mar 18, 2019 at 6:33 PM Luis Henriques <lhenriques@suse.com> wrote:
>>> >>
>>> >> "Yan, Zheng" <ukernel@gmail.com> writes:
>>> >>
>>> >> > On Fri, Mar 15, 2019 at 7:13 PM Luis Henriques <lhenriques@suse.com> wrote:
>>> >> >>
>>> >> >> I'm occasionally seeing a kmemleak warning in xfstest generic/013:
>>> >> >>
>>> >> >> unreferenced object 0xffff8881fccca940 (size 32):
>>> >> >>   comm "kworker/0:1", pid 12, jiffies 4295005883 (age 130.648s)
>>> >> >>   hex dump (first 32 bytes):
>>> >> >>     01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00  ................
>>> >> >>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> >> >>   backtrace:
>>> >> >>     [<00000000d741a1ea>] build_snap_context+0x5b/0x2a0
>>> >> >>     [<0000000021a00533>] rebuild_snap_realms+0x27/0x90
>>> >> >>     [<00000000ac538600>] rebuild_snap_realms+0x42/0x90
>>> >> >>     [<000000000e955fac>] ceph_update_snap_trace+0x2ee/0x610
>>> >> >>     [<00000000a9550416>] ceph_handle_snap+0x317/0x5f3
>>> >> >>     [<00000000fc287b83>] dispatch+0x362/0x176c
>>> >> >>     [<00000000a312c741>] ceph_con_workfn+0x9ce/0x2cf0
>>> >> >>     [<000000004168e3a9>] process_one_work+0x1d4/0x400
>>> >> >>     [<000000002188e9e7>] worker_thread+0x2d/0x3c0
>>> >> >>     [<00000000b593e4b3>] kthread+0x112/0x130
>>> >> >>     [<00000000a8587dca>] ret_from_fork+0x35/0x40
>>> >> >>     [<00000000ba1c9c1d>] 0xffffffffffffffff
>>> >> >>
>>> >> >> It looks like it is possible that we miss a flush_ack from the MDS when,
>>> >> >> for example, umounting the filesystem.  In that case, we can simply drop
>>> >> >> the reference to the ceph_snap_context obtained in ceph_queue_cap_snap().
>>> >> >>
>>> >> >> Link: https://tracker.ceph.com/issues/38224
>>> >> >> Cc: stable@vger.kernel.org
>>> >> >> Signed-off-by: Luis Henriques <lhenriques@suse.com>
>>> >> >> ---
>>> >> >>  fs/ceph/caps.c | 7 +++++++
>>> >> >>  1 file changed, 7 insertions(+)
>>> >> >>
>>> >> >> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
>>> >> >> index 36a8dc699448..208f4dc6f574 100644
>>> >> >> --- a/fs/ceph/caps.c
>>> >> >> +++ b/fs/ceph/caps.c
>>> >> >> @@ -1054,6 +1054,7 @@ int ceph_is_any_caps(struct inode *inode)
>>> >> >>  static void drop_inode_snap_realm(struct ceph_inode_info *ci)
>>> >> >>  {
>>> >> >>         struct ceph_snap_realm *realm = ci->i_snap_realm;
>>> >> >> +
>>> >> >>         spin_lock(&realm->inodes_with_caps_lock);
>>> >> >>         list_del_init(&ci->i_snap_realm_item);
>>> >> >>         ci->i_snap_realm_counter++;
>>> >> >> @@ -1063,6 +1064,12 @@ static void drop_inode_snap_realm(struct ceph_inode_info *ci)
>>> >> >>         spin_unlock(&realm->inodes_with_caps_lock);
>>> >> >>         ceph_put_snap_realm(ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc,
>>> >> >>                             realm);
>>> >> >> +       /*
>>> >> >> +        * ci->i_head_snapc should be NULL, but we may still be waiting for a
>>> >> >> +        * flush_ack from the MDS.  In that case, we still hold a ref for the
>>> >> >> +        * ceph_snap_context and we need to drop it.
>>> >> >> +        */
>>> >> >> +       ceph_put_snap_context(ci->i_head_snapc);
>>> >> >>  }
>>> >> >>
>>> >> >>  /*
>>> >> >
>>> >> > This does not seem right.  i_head_snapc is cleared when
>>> >> > (ci->i_wrbuffer_ref_head == 0 && ci->i_dirty_caps == 0 &&
>>> >> > ci->i_flushing_caps == 0) . Nothing do with dropping ci->i_snap_realm.
>>> >> > Did you see 'reconnect denied' during the test? If you did, the fix
>>> >> > should be in iterate_session_caps()
>>> >> >
>>> >>
>>> >> No, I didn't saw any 'reconnect denied' in the test.  The test actually
>>> >> seems to execute fine, except from the memory leak.
>>> >>
>>> >> It's very difficult to reproduce this issue, but last time I managed to
>>> >> get this memory leak to trigger I actually had some debugging code in
>>> >> drop_inode_snap_realm, something like:
>>> >>
>>> >>   if (ci->i_head_snapc)
>>> >>         printk("i_head_snapc: 0x%px\n", ci->i_head_snapc);
>>> >
>>> > please add code that prints i_wrbuffer_ref_head, i_dirty_caps,
>>> > i_flushing_caps. and try reproducing it again.
>>> >
>>>
>>> Ok, it took me a few hours, but I managed to reproduce the bug, with
>>> those extra printks.  All those values are set to 0 when the bug
>>> triggers (and i_head_snapc != NULL).
>>>
>>
>> Thanks, which test triggers this bug?
>
> That's generic/013.  It usually triggers after a few hours of running it
> in a loop (I'm using a vstart cluster for that).
>
>>
>> I searched that code, found we may fail to cleanup i_head_snap in two
>> places. One is in ceph_queue_cap_snap, Another is in
>> remove_session_caps_cb().
>
> Ah, great!  I spent a lot of time looking but I couldn't really find it.
> My bet was that ceph_queue_cap_snap was doing the ceph_get_snap_context
> and that the corresponding ceph_put_snap_context would be done in
> handle_cap_flush_ack. That's why I mentioned the missing flush_ack from
> MDS.

Something that I didn't said explicitly is that I *know* that the
unbalanced ceph_get_snap_context() is the one in function
ceph_queue_cap_snap().  I know this for sure because I've managed to
reproduce the issue several times with an instrumented kernel (with
tracepoints) that allowed me to keep track of all ceph_snap_context
operations (create, get, put, delete).

Cheers,
-- 
Luis

  reply	other threads:[~2019-03-22 10:05 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-15 11:11 [PATCH] ceph: Fix a memory leak in ci->i_head_snapc Luis Henriques
2019-03-18  3:26 ` Yan, Zheng
2019-03-18 10:33   ` Luis Henriques
2019-03-18 10:33     ` Luis Henriques
2019-03-18 11:42     ` Yan, Zheng
2019-03-18 16:22       ` Luis Henriques
2019-03-18 16:22         ` Luis Henriques
2019-03-19  3:13         ` Yan, Zheng
2019-03-19  9:39           ` Luis Henriques
2019-03-19  9:39             ` Luis Henriques
2019-03-22 10:04             ` Luis Henriques [this message]
2019-03-22 10:04               ` Luis Henriques
2019-04-03  2:54               ` Yan, Zheng
2019-04-03  9:47                 ` Luis Henriques
2019-04-03  9:47                   ` Luis Henriques
2019-04-16 13:30                   ` Luis Henriques
2019-04-16 13:30                     ` Luis Henriques
2019-04-18  3:37                     ` Yan, Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o963jkcj.fsf@suse.com \
    --to=lhenriques@suse.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sage@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=ukernel@gmail.com \
    --cc=zyan@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.