linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Seyfried <stefan.seyfried@googlemail.com>
To: Kent Overstreet <kmo@daterainc.com>,
	Eric Wheeler <linux-bcache@lists.ewheeler.net>
Cc: linux-bcache@vger.kernel.org, Ross Anderson <rosander@dsotm.net>,
	Stefan Priebe <s.priebe@profihost.ag>
Subject: Re: 3.17-rc6: bcache_gc: BUG: soft lockup - CPU#2 stuck for 23s!
Date: Fri, 21 Nov 2014 23:54:44 +0100	[thread overview]
Message-ID: <546FC2B4.5020201@message-id.invalid.com> (raw)
In-Reply-To: <20141101204447.GB22219@kmo-pixel>

Hi Kent,

Am 01.11.2014 um 21:44 schrieb Kent Overstreet:
> On Sun, Sep 28, 2014 at 05:25:37PM -0700, Eric Wheeler wrote:
>> Hello Kent, Ross, all:
>>
>> We're getting bcache_gc backtraces and soft lockups; the system continues to
>> be responsive and eventually recovers.  We are running 3.17-rc6. (This
>> appears to be a continuation of the thread from 2014-09-15)
>>
>> Please see the following two backtraces.  The first shows up in
>> btree_gc_count_keys(), the other is triggered somehow by rcu_sched.  We will
>> test with -rc7 this week, though I didn't see any bcache commits in rc7.
>>
>> The server is quite busy:
>>   dd in userspace from dm-thinp snapshots to another server
>>   two DRBD verify's active backed by dm-thinp volumes
>>   note that, dd fills up the buffers so this could be operating with few
>>   pages free. (Though we have min-mem set to 256MB.)
>>
>> I see we are hitting functions like bch_ptr_bad() and bch_extent_bad().
>> Could that indicate a cache corruption on our volume?
> 
> No - those are the normal "check the validity of medata" functions.
> 
>> I'm happy to test patches if you have any suggestions or tests that I should
>> run it through.
> 
> I think it might just be a missing cond_resched()... there's a check during
> garbage collection for need_resched() but it appears we might not actually be
> calling schedule() then.

I'm still hitting this quite often (once per week?), the machine does
not recover and for I cannot shut it down but need to reboot it hard.

I have seen this with 3.16.6 (openSUSE 13.2 standard kernel) and 3.17.2
(latest stable as of that boot).

This is on an old core2 duo, one CPU is always spinning in the kernel
when this happens.
I have also seen the machine recover from this, but the last occurences
have been deadly.

My setup is:
* a 60GB LV on a Crucial CT240M500 SSD as cache device (other LVs on
that SSD are for testing other stuff)
* 30GB /home   on rotating rust (a LV on a 2TB WD 2.5" drive)
* 750GB /space a LV on the same rotating rust
* 4GB /var/log/journal again a LV on the 2.5" drive

/space is used for both big-file storage (ISOs, some videos) and for
lots-of-small-files storage (yocto project embedded development, ccache
directory, ....)
/var/log/journal is the latest addition to the bcache set, after
updating to openSUSE 13.2. I would say that I only see the problems
since I added /var/log/journal, but that happened directly after
updating to 13.2 which also includes a kernel update from 3.11.10 to
3.16.x, so it could be both.

I cannot see that any specific action triggers the but, the machine is
just idling along and suddenly the soft lockup detector triggers...

> 
> Try this patch:
> 
> commit a64afc92e17e709bdd1618edd04bc608f6a44c55
> Author: Kent Overstreet <kmo@daterainc.com>
> Date:   Sat Nov 1 13:44:13 2014 -0700
> 
>     bcache: Add a cond_resched() call to gc
>     
>     Change-Id: Id4f18c533b80ddb40df94ed0bb5e2a236a4bc325
> 
> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> index 00cde40db5..218f21ac02 100644
> --- a/drivers/md/bcache/btree.c
> +++ b/drivers/md/bcache/btree.c
> @@ -1741,6 +1741,7 @@ static void bch_btree_gc(struct cache_set *c)
>  	do {
>  		ret = btree_root(gc_root, c, &op, &writes, &stats);
>  		closure_sync(&writes);
> +		cond_resched();
>  
>  		if (ret && ret != -EAGAIN)
>  			pr_warn("gc failed!");
> 

I have rebuilt the 3.17.3 bcache module with this patch now and will see
if that helps. This is not yet in 3.18-rc, is there a reason why this is
not going upstream? The issue is certainly annoying...

Best regards,

	Stefan
-- 
Stefan Seyfried
Linux Consultant & Developer
Mail: seyfried@b1-systems.de GPG Key: 0x731B665B

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537

  reply	other threads:[~2014-11-21 23:03 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-29  0:25 3.17-rc6: bcache_gc: BUG: soft lockup - CPU#2 stuck for 23s! Eric Wheeler
2014-10-27  2:52 ` 3.17: bcache_gc: BUG: soft lockup - CPU#2 stuck for 22s! Eric Wheeler
2014-10-31  9:20   ` Zhu Yanhai
2014-10-31 10:35     ` Re[2]: " Pavel Goran
2014-11-01  2:35     ` Eric Wheeler
2014-11-01 20:44 ` 3.17-rc6: bcache_gc: BUG: soft lockup - CPU#2 stuck for 23s! Kent Overstreet
2014-11-21 22:54   ` Stefan Seyfried [this message]
2014-11-21 23:20     ` Kent Overstreet
2014-11-22  0:22       ` Eric Wheeler
2014-11-22 12:46         ` Stefan Seyfried
2014-11-24 18:52           ` Eric Wheeler
2014-12-03  9:32             ` Stefan Seyfried
2014-12-03 11:25               ` Thomas Stein
2014-11-23 11:17         ` Thomas Stein
2014-11-24 18:49           ` Eric Wheeler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=546FC2B4.5020201@message-id.invalid.com \
    --to=stefan.seyfried@googlemail.com \
    --cc=kmo@daterainc.com \
    --cc=linux-bcache@lists.ewheeler.net \
    --cc=linux-bcache@vger.kernel.org \
    --cc=rosander@dsotm.net \
    --cc=s.priebe@profihost.ag \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).