linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Omar Sandoval <osandov@osandov.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com,
	Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH] Btrfs: fix workqueue deadlock on dependent filesystems
Date: Mon, 19 Aug 2019 18:37:04 +0200	[thread overview]
Message-ID: <20190819163704.GG24086@twin.jikos.cz> (raw)
In-Reply-To: <0bea516a54b26e4e1c42e6fe47548cb48cc4172b.1565112813.git.osandov@fb.com>

On Tue, Aug 06, 2019 at 10:34:52AM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> We hit a the following very strange deadlock on a system with Btrfs on a
> loop device backed by another Btrfs filesystem:
> 
> 1. The top (loop device) filesystem queues an async_cow work item from
>    cow_file_range_async(). We'll call this work X.
> 2. Worker thread A starts work X (normal_work_helper()).
> 3. Worker thread A executes the ordered work for the top filesystem
>    (run_ordered_work()).
> 4. Worker thread A finishes the ordered work for work X and frees X
>    (work->ordered_free()).
> 5. Worker thread A executes another ordered work and gets blocked on I/O
>    to the bottom filesystem (still in run_ordered_work()).
> 6. Meanwhile, the bottom filesystem allocates and queues an async_cow
>    work item which happens to be the recently-freed X.
> 7. The workqueue code sees that X is already being executed by worker
>    thread A, so it schedules X to be executed _after_ worker thread A
>    finishes (see the find_worker_executing_work() call in
>    process_one_work()).
> 
> Now, the top filesystem is waiting for I/O on the bottom filesystem, but
> the bottom filesystem is waiting for the top filesystem to finish, so we
> deadlock.
> 
> This happens because we are breaking the workqueue assumption that a
> work item cannot be recycled while it still depends on other work. Fix
> it by waiting to free the work item until we are done with all of the
> related ordered work.
> 
> P.S.:
> 
> One might ask why the workqueue code doesn't try to detect a recycled
> work item. It actually does try by checking whether the work item has
> the same work function (find_worker_executing_work()), but in our case
> the function is the same. This is the only key that the workqueue code
> has available to compare, short of adding an additional, layer-violating
> "custom key". Considering that we're the only ones that have ever hit
> this, we should just play by the rules.
> 
> Unfortunately, we haven't been able to create a minimal reproducer other
> than our full container setup using a compress-force=zstd filesystem on
> top of another compress-force=zstd filesystem.
> 
> Suggested-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Omar Sandoval <osandov@fb.com>

Added to misc-next, thanks.

      parent reply	other threads:[~2019-08-19 16:36 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-06 17:34 [PATCH] Btrfs: fix workqueue deadlock on dependent filesystems Omar Sandoval
2019-08-07  7:17 ` Nikolay Borisov
2019-08-07 17:08   ` Omar Sandoval
2019-08-12 11:38 ` Filipe Manana
2019-08-12 18:48   ` Omar Sandoval
2019-08-12 18:53     ` Filipe Manana
2019-08-19 16:37 ` David Sterba [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190819163704.GG24086@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=osandov@osandov.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).