* Debugging Deadlocks? @ 2017-05-30 16:12 Sargun Dhillon 2017-05-31 6:47 ` Duncan 2017-05-31 12:54 ` David Sterba 0 siblings, 2 replies; 5+ messages in thread From: Sargun Dhillon @ 2017-05-30 16:12 UTC (permalink / raw) To: BTRFS ML We've been running BtrFS for a couple months now in production on several clusters. We're running on Canonical's 4.8 kernel, and currently, in the process of moving to our own patchset atop vanilla 4.10+. I'm glad to say it's been a fairly good experience for us. Bar some performance issues, it's been largely smooth sailing. There has been one class of persistent issues that has been plaguing our cluster is deadlocks. We've seen a fair number of issues where there are some number of background threads and user threads are in the process of performing operations where some are waiting to start a transaction, and at least one background thread or user thread is in the process of committing a transaction. Unfortunately, these situations are ending in deadlocks, where no threads are making progress. We've talked about a couple ideas internally, like adding the ability to timeout transactions, abort commits or start_transactions which are taking too long, and adding more debugging to get insights into the state of the filesystem. Unfortunately, since our usage and knowledge of BtrFS is still somewhat nascent, we're unsure of what is the right investment. I'm curious, are other people seeing deadlocks crop up in production often? How are you going about debugging them, and are there any good pieces of advice on avoiding these for production workloads? ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Debugging Deadlocks? 2017-05-30 16:12 Debugging Deadlocks? Sargun Dhillon @ 2017-05-31 6:47 ` Duncan 2017-05-31 20:29 ` Adam Borowski 2017-05-31 12:54 ` David Sterba 1 sibling, 1 reply; 5+ messages in thread From: Duncan @ 2017-05-31 6:47 UTC (permalink / raw) To: linux-btrfs Sargun Dhillon posted on Tue, 30 May 2017 09:12:39 -0700 as excerpted: > We've been running BtrFS for a couple months now in production on > several clusters. We're running on Canonical's 4.8 kernel, and > currently, in the process of moving to our own patchset atop vanilla > 4.10+. I'm glad to say it's been a fairly good experience for us. Bar > some performance issues, it's been largely smooth sailing. > > There has been one class of persistent issues that has been plaguing our > cluster is deadlocks. Being just a list regular and btrfs (personal) user, not a dev or big- time production user, I can't say I've seen a deadlocks problem either here or reported in significant numbers on-list, but beyond that I can't help there. I'm replying, however, regarding your kernel choices. Good for getting off kernel 4.8, as in mainline kernel terms that's only a short-term stable release and support has now ended. But I'm slightly concerned with your kernel 4.10+ choice on production clusters. 4.9 is the most recent mainline and therefore btrfs LTS kernel series, and as such, what I was expecting. Now don't get me wrong, 4.10+ is appropriate ATM as well, and if you're planning to stay current, within the 2-latest-current-kernel-cycles list recommendation, I'd consider it preferred. However, most large-scale production deployments tend to prefer a somewhat slower upgrade cycle than that, in which case 4.9 is preferred as the latest mainline LTS series. As far as LTS series go, this list tries to support the latest two LTS series, as it does the latest two current stable series. While that's rather shorter than the LTS series support in general, it's in keeping with the fact that btrfs remains still stabilizing and as such under heavy development, tho it's far more stable than it was back in the kernel 3.x or early 4.x era. At present that means 4.9 and the previous 4.4, altho in practice 4.4 was long enough ago that we prefer 4.9 unless there's some definite reason it's not going to work for you. But you're not talking as old as 4.4 in any case, so it's a question of 4.9 LTS and staying with that series for awhile, or 4.10+, but upgrading every 10 weeks or so as a new kernel series is released and the second- back, now 4.10 as 4.11 is the newest, becomes the third back and thus slips out of both mainline stable release and btrfs list primary support range. If you're comfortable with a ten-week upgrade cycle on the scale you're running in production, then by all means, go 4.10 or 4.11 at this point and do the upgrades, as that's preferable here for those where it's acceptable, but if not, then I'd strongly recommend the 4.9 LTS series for now, and upgrading LTS kernel series once a year or whatever, after the next LTS series comes out and has had a release or two to shake out the early bugs. OTOH if there's something you really need 4.10 for but would otherwise prefer LTS, then yes, go current now and try to do the 10-week cycle, until the next LTS, then if desired stick with it and drop back to annual or whatever LTS series upgrades. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Debugging Deadlocks? 2017-05-31 6:47 ` Duncan @ 2017-05-31 20:29 ` Adam Borowski 0 siblings, 0 replies; 5+ messages in thread From: Adam Borowski @ 2017-05-31 20:29 UTC (permalink / raw) To: linux-btrfs On Wed, May 31, 2017 at 06:47:09AM +0000, Duncan wrote: > Sargun Dhillon posted on Tue, 30 May 2017 09:12:39 -0700 as excerpted: > > currently, in the process of moving to our own patchset atop vanilla > > 4.10+ > > Good for getting off kernel 4.8, as in mainline kernel terms that's only > a short-term stable release and support has now ended. In fact, support for 4.10 has already ended too. You either stick with LTS (currently 4.9) or keep upgrading to the newest-and-greatest bleeding edge as soon as it's released. For production machines, the choice is obvious. -- Don't be racist. White, amber or black, all beers should be judged based solely on their merits. Heck, even if occasionally a cider applies for a beer's job, why not? On the other hand, corpo lager is not a race. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Debugging Deadlocks? 2017-05-30 16:12 Debugging Deadlocks? Sargun Dhillon 2017-05-31 6:47 ` Duncan @ 2017-05-31 12:54 ` David Sterba 2017-06-01 0:32 ` Sargun Dhillon 1 sibling, 1 reply; 5+ messages in thread From: David Sterba @ 2017-05-31 12:54 UTC (permalink / raw) To: Sargun Dhillon; +Cc: BTRFS ML On Tue, May 30, 2017 at 09:12:39AM -0700, Sargun Dhillon wrote: > We've been running BtrFS for a couple months now in production on > several clusters. We're running on Canonical's 4.8 kernel, and > currently, in the process of moving to our own patchset atop vanilla > 4.10+. I'm glad to say it's been a fairly good experience for us. Bar > some performance issues, it's been largely smooth sailing. Yay, thanks for the feedback. > There has been one class of persistent issues that has been plaguing > our cluster is deadlocks. We've seen a fair number of issues where > there are some number of background threads and user threads are in > the process of performing operations where some are waiting to start a > transaction, and at least one background thread or user thread is in > the process of committing a transaction. Unfortunately, these > situations are ending in deadlocks, where no threads are making > progress. In such situations, save stacks of all processes (/proc/PID/stack). I don't want to play terminology here, so by a deadlock I could understand a system that's making progress so slow that's effectively stuck. This could happen if the files are freamgented, so eg. traversing extents takes locks and has a lot of work before it unlocks. Add some extent sharing and updating references, this adds some points where the threads just wait. The stacktraces could give an idea of what kind of hang it is. > We've talked about a couple ideas internally, like adding the ability > to timeout transactions, abort commits or start_transactions which are > taking too long, and adding more debugging to get insights into the > state of the filesystem. Unfortunately, since our usage and knowledge > of BtrFS is still somewhat nascent, we're unsure of what is the right > investment. There's a kernel-wide hung task detection, but I think a similar mechanism around just the transaction commits would be useful, as a debugging option. There are number of ways how a transaction can be blocked though, so we'd need to choose the starting point. Extent-related locks, waiting for writes, other locks, the intenral transactional logic (and possibly more). > I'm curious, are other people seeing deadlocks crop up in production > often? How are you going about debugging them, and are there any good > pieces of advice on avoiding these for production workloads? I have seen hangs with kernel 4.9 a while back triggered by a long-running iozone stress test, but 4.8 was not affected, and 4.10+ worked fine again. I don't know if/which btrfs patches the 'canonical 4.8' kernel has, so this might not be related. As for deadlocks (double taken lock, lock inversion), I haven't seen them for a long time. The testing kernels run with lockdep, so we should be able to see them early. You could try to run turn lockdep on if the performance penalty is still acceptable for you. But there are still cases that lockdep does not cover IIRC, due to the higher-level semantics of the various btrfs trees and locking of extent buffers. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Debugging Deadlocks? 2017-05-31 12:54 ` David Sterba @ 2017-06-01 0:32 ` Sargun Dhillon 0 siblings, 0 replies; 5+ messages in thread From: Sargun Dhillon @ 2017-06-01 0:32 UTC (permalink / raw) To: dsterba, Sargun Dhillon, BTRFS ML On Wed, May 31, 2017 at 5:54 AM, David Sterba <dsterba@suse.cz> wrote: > On Tue, May 30, 2017 at 09:12:39AM -0700, Sargun Dhillon wrote: >> We've been running BtrFS for a couple months now in production on >> several clusters. We're running on Canonical's 4.8 kernel, and >> currently, in the process of moving to our own patchset atop vanilla >> 4.10+. I'm glad to say it's been a fairly good experience for us. Bar >> some performance issues, it's been largely smooth sailing. > > Yay, thanks for the feedback. > >> There has been one class of persistent issues that has been plaguing >> our cluster is deadlocks. We've seen a fair number of issues where >> there are some number of background threads and user threads are in >> the process of performing operations where some are waiting to start a >> transaction, and at least one background thread or user thread is in >> the process of committing a transaction. Unfortunately, these >> situations are ending in deadlocks, where no threads are making >> progress. > > In such situations, save stacks of all processes (/proc/PID/stack). I > don't want to play terminology here, so by a deadlock I could understand > a system that's making progress so slow that's effectively stuck. This > could happen if the files are freamgented, so eg. traversing extents > takes locks and has a lot of work before it unlocks. Add some extent > sharing and updating references, this adds some points where the threads > just wait. > > The stacktraces could give an idea of what kind of hang it is. > We're saving a dump of the tasks currently running. A recent dump can be found here: http://cwillu.com:8080/50.19.255.106/1. This is the only snapshot I have from a node that's not making any progress. We also see the other case, where tasks are not making progress very quickly, and it causes the kernel hung task detector to kick in. This happens pretty often, and it's difficult to catch when it's happening, but the symptoms can be frustrating, including failed instance healthchecks, poor performance, and high latency for interactive services. Some of the traces we've gotten from the stuck task detector include: https://gist.github.com/sargun/9643c0c380d27a147ef3486e1d51dbdb https://gist.github.com/sargun/8858263b8d04c8ab726738022725ec12 >> We've talked about a couple ideas internally, like adding the ability >> to timeout transactions, abort commits or start_transactions which are >> taking too long, and adding more debugging to get insights into the >> state of the filesystem. Unfortunately, since our usage and knowledge >> of BtrFS is still somewhat nascent, we're unsure of what is the right >> investment. > > There's a kernel-wide hung task detection, but I think a similar > mechanism around just the transaction commits would be useful, as a > debugging option. > > There are number of ways how a transaction can be blocked though, so > we'd need to choose the starting point. Extent-related locks, waiting > for writes, other locks, the intenral transactional logic (and possibly > more). > As a first step, it'd be nice to have the transaction wrapped in a stack frame. We can then instrument it much more easily with off the shelf tools like simple BPF-based kprobes / kretprobes, or ftraces, rather than having to write a custom probe that's familiar with the innards of the txn datastructure, and does its own accounting to keep track of what's in flight. I'll take a cut at something as simple as an in-memory list of transactions which is periodically scanned for transactions which are taking too long, and log whether they're stuck starting, commiting, or in-flight and uncommitted. >> I'm curious, are other people seeing deadlocks crop up in production >> often? How are you going about debugging them, and are there any good >> pieces of advice on avoiding these for production workloads? > > I have seen hangs with kernel 4.9 a while back triggered by a > long-running iozone stress test, but 4.8 was not affected, and 4.10+ > worked fine again. I don't know if/which btrfs patches the 'canonical > 4.8' kernel has, so this might not be related. > > As for deadlocks (double taken lock, lock inversion), I haven't seen > them for a long time. The testing kernels run with lockdep, so we should > be able to see them early. You could try to run turn lockdep on if the > performance penalty is still acceptable for you. But there are still > cases that lockdep does not cover IIRC, due to the higher-level > semantics of the various btrfs trees and locking of extent buffers. For some of these use-cases, we can pretty easily recreate the pattern on the machine. For others, it's more complicated to find out which containers, and datasets were scheduled to be processed on the machine. We've run some sanity, and stress tests, but we can rarely get the filesystem to fall over in a predictable way in these tests compared to some production workloads. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-06-01 0:33 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-05-30 16:12 Debugging Deadlocks? Sargun Dhillon 2017-05-31 6:47 ` Duncan 2017-05-31 20:29 ` Adam Borowski 2017-05-31 12:54 ` David Sterba 2017-06-01 0:32 ` Sargun Dhillon
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.