All of lore.kernel.org
 help / color / mirror / Atom feed
* Debugging Deadlocks?
@ 2017-05-30 16:12 Sargun Dhillon
  2017-05-31  6:47 ` Duncan
  2017-05-31 12:54 ` David Sterba
  0 siblings, 2 replies; 5+ messages in thread
From: Sargun Dhillon @ 2017-05-30 16:12 UTC (permalink / raw)
  To: BTRFS ML

We've been running BtrFS for a couple months now in production on
several clusters. We're running on Canonical's 4.8 kernel, and
currently, in the process of moving to our own patchset atop vanilla
4.10+. I'm glad to say it's been a fairly good experience for us. Bar
some performance issues, it's been largely smooth sailing.

There has been one class of persistent issues that has been plaguing
our cluster is deadlocks. We've seen a fair number of issues where
there are some number of background threads and user threads are in
the process of performing operations where some are waiting to start a
transaction, and at least one background thread or user thread is in
the process of committing a transaction. Unfortunately, these
situations are ending in deadlocks, where no threads are making
progress.

We've talked about a couple ideas internally, like adding the ability
to timeout transactions, abort commits or start_transactions which are
taking too long, and adding more debugging to get insights into the
state of the filesystem. Unfortunately, since our usage and knowledge
of BtrFS is still somewhat nascent, we're unsure of what is the right
investment.

I'm curious, are other people seeing deadlocks crop up in production
often? How are you going about debugging them, and are there any good
pieces of advice on avoiding these for production workloads?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Debugging Deadlocks?
  2017-05-30 16:12 Debugging Deadlocks? Sargun Dhillon
@ 2017-05-31  6:47 ` Duncan
  2017-05-31 20:29   ` Adam Borowski
  2017-05-31 12:54 ` David Sterba
  1 sibling, 1 reply; 5+ messages in thread
From: Duncan @ 2017-05-31  6:47 UTC (permalink / raw)
  To: linux-btrfs

Sargun Dhillon posted on Tue, 30 May 2017 09:12:39 -0700 as excerpted:

> We've been running BtrFS for a couple months now in production on
> several clusters. We're running on Canonical's 4.8 kernel, and
> currently, in the process of moving to our own patchset atop vanilla
> 4.10+. I'm glad to say it's been a fairly good experience for us. Bar
> some performance issues, it's been largely smooth sailing.
> 
> There has been one class of persistent issues that has been plaguing our
> cluster is deadlocks.

Being just a list regular and btrfs (personal) user, not a dev or big-
time production user, I can't say I've seen a deadlocks problem either 
here or reported in significant numbers on-list, but beyond that I can't 
help there.

I'm replying, however, regarding your kernel choices.

Good for getting off kernel 4.8, as in mainline kernel terms that's only 
a short-term stable release and support has now ended.

But I'm slightly concerned with your kernel 4.10+ choice on production 
clusters.  4.9 is the most recent mainline and therefore btrfs LTS kernel 
series, and as such, what I was expecting.

Now don't get me wrong, 4.10+ is appropriate ATM as well, and if you're 
planning to stay current, within the 2-latest-current-kernel-cycles list 
recommendation, I'd consider it preferred.

However, most large-scale production deployments tend to prefer a 
somewhat slower upgrade cycle than that, in which case 4.9 is preferred 
as the latest mainline LTS series.

As far as LTS series go, this list tries to support the latest two LTS 
series, as it does the latest two current stable series.  While that's 
rather shorter than the LTS series support in general, it's in keeping 
with the fact that btrfs remains still stabilizing and as such under 
heavy development, tho it's far more stable than it was back in the 
kernel 3.x or early 4.x era.

At present that means 4.9 and the previous 4.4, altho in practice 4.4 was 
long enough ago that we prefer 4.9 unless there's some definite reason 
it's not going to work for you.

But you're not talking as old as 4.4 in any case, so it's a question of 
4.9 LTS and staying with that series for awhile, or 4.10+, but upgrading 
every 10 weeks or so as a new kernel series is released and the second-
back, now 4.10 as 4.11 is the newest, becomes the third back and thus 
slips out of both mainline stable release and btrfs list primary support 
range.

If you're comfortable with a ten-week upgrade cycle on the scale you're 
running in production, then by all means, go 4.10 or 4.11 at this point 
and do the upgrades, as that's preferable here for those where it's 
acceptable, but if not, then I'd strongly recommend the 4.9 LTS series 
for now, and upgrading LTS kernel series once a year or whatever, after 
the next LTS series comes out and has had a release or two to shake out 
the early bugs.

OTOH if there's something you really need 4.10 for but would otherwise 
prefer LTS, then yes, go current now and try to do the 10-week cycle, 
until the next LTS, then if desired stick with it and drop back to annual 
or whatever LTS series upgrades.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Debugging Deadlocks?
  2017-05-30 16:12 Debugging Deadlocks? Sargun Dhillon
  2017-05-31  6:47 ` Duncan
@ 2017-05-31 12:54 ` David Sterba
  2017-06-01  0:32   ` Sargun Dhillon
  1 sibling, 1 reply; 5+ messages in thread
From: David Sterba @ 2017-05-31 12:54 UTC (permalink / raw)
  To: Sargun Dhillon; +Cc: BTRFS ML

On Tue, May 30, 2017 at 09:12:39AM -0700, Sargun Dhillon wrote:
> We've been running BtrFS for a couple months now in production on
> several clusters. We're running on Canonical's 4.8 kernel, and
> currently, in the process of moving to our own patchset atop vanilla
> 4.10+. I'm glad to say it's been a fairly good experience for us. Bar
> some performance issues, it's been largely smooth sailing.

Yay, thanks for the feedback.

> There has been one class of persistent issues that has been plaguing
> our cluster is deadlocks. We've seen a fair number of issues where
> there are some number of background threads and user threads are in
> the process of performing operations where some are waiting to start a
> transaction, and at least one background thread or user thread is in
> the process of committing a transaction. Unfortunately, these
> situations are ending in deadlocks, where no threads are making
> progress.

In such situations, save stacks of all processes (/proc/PID/stack). I
don't want to play terminology here, so by a deadlock I could understand
a system that's making progress so slow that's effectively stuck. This
could happen if the files are freamgented, so eg. traversing extents
takes locks and has a lot of work before it unlocks. Add some extent
sharing and updating references, this adds some points where the threads
just wait.

The stacktraces could give an idea of what kind of hang it is.

> We've talked about a couple ideas internally, like adding the ability
> to timeout transactions, abort commits or start_transactions which are
> taking too long, and adding more debugging to get insights into the
> state of the filesystem. Unfortunately, since our usage and knowledge
> of BtrFS is still somewhat nascent, we're unsure of what is the right
> investment.

There's a kernel-wide hung task detection, but I think a similar
mechanism around just the transaction commits would be useful, as a
debugging option.

There are number of ways how a transaction can be blocked though, so
we'd need to choose the starting point. Extent-related locks, waiting
for writes, other locks, the intenral transactional logic (and possibly
more).

> I'm curious, are other people seeing deadlocks crop up in production
> often? How are you going about debugging them, and are there any good
> pieces of advice on avoiding these for production workloads?

I have seen hangs with kernel 4.9 a while back triggered by a
long-running iozone stress test, but 4.8 was not affected, and 4.10+
worked fine again. I don't know if/which btrfs patches the 'canonical
4.8' kernel has, so this might not be related.

As for deadlocks (double taken lock, lock inversion), I haven't seen
them for a long time. The testing kernels run with lockdep, so we should
be able to see them early. You could try to run turn lockdep on if the
performance penalty is still acceptable for you.  But there are still
cases that lockdep does not cover IIRC, due to the higher-level
semantics of the various btrfs trees and locking of extent buffers.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Debugging Deadlocks?
  2017-05-31  6:47 ` Duncan
@ 2017-05-31 20:29   ` Adam Borowski
  0 siblings, 0 replies; 5+ messages in thread
From: Adam Borowski @ 2017-05-31 20:29 UTC (permalink / raw)
  To: linux-btrfs

On Wed, May 31, 2017 at 06:47:09AM +0000, Duncan wrote:
> Sargun Dhillon posted on Tue, 30 May 2017 09:12:39 -0700 as excerpted:
> > currently, in the process of moving to our own patchset atop vanilla
> > 4.10+
> 
> Good for getting off kernel 4.8, as in mainline kernel terms that's only 
> a short-term stable release and support has now ended.

In fact, support for 4.10 has already ended too.

You either stick with LTS (currently 4.9) or keep upgrading to the
newest-and-greatest bleeding edge as soon as it's released.
For production machines, the choice is obvious.

-- 
Don't be racist.  White, amber or black, all beers should be judged based
solely on their merits.  Heck, even if occasionally a cider applies for a
beer's job, why not?
On the other hand, corpo lager is not a race.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Debugging Deadlocks?
  2017-05-31 12:54 ` David Sterba
@ 2017-06-01  0:32   ` Sargun Dhillon
  0 siblings, 0 replies; 5+ messages in thread
From: Sargun Dhillon @ 2017-06-01  0:32 UTC (permalink / raw)
  To: dsterba, Sargun Dhillon, BTRFS ML

On Wed, May 31, 2017 at 5:54 AM, David Sterba <dsterba@suse.cz> wrote:
> On Tue, May 30, 2017 at 09:12:39AM -0700, Sargun Dhillon wrote:
>> We've been running BtrFS for a couple months now in production on
>> several clusters. We're running on Canonical's 4.8 kernel, and
>> currently, in the process of moving to our own patchset atop vanilla
>> 4.10+. I'm glad to say it's been a fairly good experience for us. Bar
>> some performance issues, it's been largely smooth sailing.
>
> Yay, thanks for the feedback.
>
>> There has been one class of persistent issues that has been plaguing
>> our cluster is deadlocks. We've seen a fair number of issues where
>> there are some number of background threads and user threads are in
>> the process of performing operations where some are waiting to start a
>> transaction, and at least one background thread or user thread is in
>> the process of committing a transaction. Unfortunately, these
>> situations are ending in deadlocks, where no threads are making
>> progress.
>
> In such situations, save stacks of all processes (/proc/PID/stack). I
> don't want to play terminology here, so by a deadlock I could understand
> a system that's making progress so slow that's effectively stuck. This
> could happen if the files are freamgented, so eg. traversing extents
> takes locks and has a lot of work before it unlocks. Add some extent
 > sharing and updating references, this adds some points where the threads
> just wait.
>
> The stacktraces could give an idea of what kind of hang it is.
>
We're saving a dump of the tasks currently running. A recent dump can
be found here: http://cwillu.com:8080/50.19.255.106/1. This is the
only snapshot I have from a node that's not making any progress.

We also see the other case, where tasks are not making progress very
quickly, and it causes the kernel hung task detector to kick in. This
happens pretty often, and it's difficult to catch when it's happening,
but the symptoms can be frustrating, including failed instance
healthchecks, poor performance, and high latency for interactive
services. Some of the traces we've gotten from the stuck task detector
include:
https://gist.github.com/sargun/9643c0c380d27a147ef3486e1d51dbdb
https://gist.github.com/sargun/8858263b8d04c8ab726738022725ec12


>> We've talked about a couple ideas internally, like adding the ability
>> to timeout transactions, abort commits or start_transactions which are
>> taking too long, and adding more debugging to get insights into the
>> state of the filesystem. Unfortunately, since our usage and knowledge
>> of BtrFS is still somewhat nascent, we're unsure of what is the right
>> investment.
>
> There's a kernel-wide hung task detection, but I think a similar
> mechanism around just the transaction commits would be useful, as a
> debugging option.
>
> There are number of ways how a transaction can be blocked though, so
> we'd need to choose the starting point. Extent-related locks, waiting
> for writes, other locks, the intenral transactional logic (and possibly
> more).
>
As a first step, it'd be nice to have the transaction wrapped in a
stack frame. We can then instrument it much more easily with off the
shelf tools like simple BPF-based kprobes / kretprobes, or ftraces,
rather than having to write a custom probe that's familiar with the
innards of the txn datastructure, and does its own accounting to keep
track of what's in flight.

I'll take a cut at something as simple as an in-memory list of
transactions which is periodically scanned for transactions which are
taking too long, and log whether they're stuck starting, commiting, or
in-flight and uncommitted.

>> I'm curious, are other people seeing deadlocks crop up in production
>> often? How are you going about debugging them, and are there any good
>> pieces of advice on avoiding these for production workloads?
>
> I have seen hangs with kernel 4.9 a while back triggered by a
> long-running iozone stress test, but 4.8 was not affected, and 4.10+
> worked fine again. I don't know if/which btrfs patches the 'canonical
> 4.8' kernel has, so this might not be related.
>
> As for deadlocks (double taken lock, lock inversion), I haven't seen
> them for a long time. The testing kernels run with lockdep, so we should
> be able to see them early. You could try to run turn lockdep on if the
> performance penalty is still acceptable for you.  But there are still
> cases that lockdep does not cover IIRC, due to the higher-level
> semantics of the various btrfs trees and locking of extent buffers.
For some of these use-cases, we can pretty easily recreate the pattern
on the machine. For others, it's more complicated to find out which
containers, and datasets were scheduled to be processed on the
machine. We've run some sanity, and stress tests, but we can rarely
get the filesystem to fall over in a predictable way in these tests
compared to some production workloads.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-06-01  0:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-30 16:12 Debugging Deadlocks? Sargun Dhillon
2017-05-31  6:47 ` Duncan
2017-05-31 20:29   ` Adam Borowski
2017-05-31 12:54 ` David Sterba
2017-06-01  0:32   ` Sargun Dhillon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.