* [PATCH v2] btrfs: qgroup: exit the rescan worker during umount
@ 2015-09-03 1:05 Justin Maggard
2015-09-22 14:45 ` David Sterba
2015-10-08 9:25 ` Filipe Manana
0 siblings, 2 replies; 5+ messages in thread
From: Justin Maggard @ 2015-09-03 1:05 UTC (permalink / raw)
To: linux-btrfs; +Cc: Justin Maggard
v2: Fix stupid error while making formatting changes...
I was hitting a consistent NULL pointer dereference during shutdown that
showed the trace running through end_workqueue_bio(). I traced it back to
the endio_meta_workers workqueue being poked after it had already been
destroyed.
Eventually I found that the root cause was a qgroup rescan that was still
in progress while we were stopping all the btrfs workers.
Currently we explicitly pause balance and scrub operations in
close_ctree(), but we do nothing to stop the qgroup rescan. We should
probably be doing the same for qgroup rescan, but that's a much larger
change. This small change is good enough to allow me to unmount without
crashing.
Signed-off-by: Justin Maggard <jmaggard@netgear.com>
---
fs/btrfs/qgroup.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index d904ee1..5bfcee9 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2278,7 +2278,7 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
goto out;
err = 0;
- while (!err) {
+ while (!err && !btrfs_fs_closing(fs_info)) {
trans = btrfs_start_transaction(fs_info->fs_root, 0);
if (IS_ERR(trans)) {
err = PTR_ERR(trans);
@@ -2301,7 +2301,8 @@ out:
btrfs_free_path(path);
mutex_lock(&fs_info->qgroup_rescan_lock);
- fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
+ if (!btrfs_fs_closing(fs_info))
+ fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
if (err > 0 &&
fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT) {
@@ -2330,7 +2331,9 @@ out:
}
btrfs_end_transaction(trans, fs_info->quota_root);
- if (err >= 0) {
+ if (btrfs_fs_closing(fs_info)) {
+ btrfs_info(fs_info, "qgroup scan paused");
+ } else if (err >= 0) {
btrfs_info(fs_info, "qgroup scan completed%s",
err > 0 ? " (inconsistency flag cleared)" : "");
} else {
--
2.5.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2] btrfs: qgroup: exit the rescan worker during umount
2015-09-03 1:05 [PATCH v2] btrfs: qgroup: exit the rescan worker during umount Justin Maggard
@ 2015-09-22 14:45 ` David Sterba
2015-09-26 0:25 ` Justin Maggard
2015-10-08 9:25 ` Filipe Manana
1 sibling, 1 reply; 5+ messages in thread
From: David Sterba @ 2015-09-22 14:45 UTC (permalink / raw)
To: Justin Maggard; +Cc: linux-btrfs, Justin Maggard
On Wed, Sep 02, 2015 at 06:05:17PM -0700, Justin Maggard wrote:
> v2: Fix stupid error while making formatting changes...
I haven't noticed any difference between the patches, what exactly did
you change?
> I was hitting a consistent NULL pointer dereference during shutdown that
> showed the trace running through end_workqueue_bio(). I traced it back to
> the endio_meta_workers workqueue being poked after it had already been
> destroyed.
>
> Eventually I found that the root cause was a qgroup rescan that was still
> in progress while we were stopping all the btrfs workers.
>
> Currently we explicitly pause balance and scrub operations in
> close_ctree(), but we do nothing to stop the qgroup rescan. We should
> probably be doing the same for qgroup rescan, but that's a much larger
> change. This small change is good enough to allow me to unmount without
> crashing.
>
> Signed-off-by: Justin Maggard <jmaggard@netgear.com>
Can you please submit the test you've used to trigger the crash to
fstests?
Reviewed-by: David Sterba <dsterba@suse.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] btrfs: qgroup: exit the rescan worker during umount
2015-09-22 14:45 ` David Sterba
@ 2015-09-26 0:25 ` Justin Maggard
2015-09-26 11:49 ` Filipe Manana
0 siblings, 1 reply; 5+ messages in thread
From: Justin Maggard @ 2015-09-26 0:25 UTC (permalink / raw)
To: dsterba, Justin Maggard, BTRFS, Justin Maggard
On Tue, Sep 22, 2015 at 7:45 AM, David Sterba <dsterba@suse.cz> wrote:
> On Wed, Sep 02, 2015 at 06:05:17PM -0700, Justin Maggard wrote:
>> v2: Fix stupid error while making formatting changes...
>
> I haven't noticed any difference between the patches, what exactly did
> you change?
>
I broke compiling while cleaning up some checkpatch.pl feedback.
Here's what changed between v1 and v2:
- if (!btrfs_fs_closing(fs_info)) {
+ if (!btrfs_fs_closing(fs_info))
>> I was hitting a consistent NULL pointer dereference during shutdown that
>> showed the trace running through end_workqueue_bio(). I traced it back to
>> the endio_meta_workers workqueue being poked after it had already been
>> destroyed.
>>
>> Eventually I found that the root cause was a qgroup rescan that was still
>> in progress while we were stopping all the btrfs workers.
>>
>> Currently we explicitly pause balance and scrub operations in
>> close_ctree(), but we do nothing to stop the qgroup rescan. We should
>> probably be doing the same for qgroup rescan, but that's a much larger
>> change. This small change is good enough to allow me to unmount without
>> crashing.
>>
>> Signed-off-by: Justin Maggard <jmaggard@netgear.com>
>
> Can you please submit the test you've used to trigger the crash to
> fstests?
>
Sure, I've got a reproducer coded up for xfstests now. Should I just
send that to this list, or is there a better place to send it?
-Justin
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] btrfs: qgroup: exit the rescan worker during umount
2015-09-26 0:25 ` Justin Maggard
@ 2015-09-26 11:49 ` Filipe Manana
0 siblings, 0 replies; 5+ messages in thread
From: Filipe Manana @ 2015-09-26 11:49 UTC (permalink / raw)
To: Justin Maggard; +Cc: dsterba, BTRFS, Justin Maggard
On Sat, Sep 26, 2015 at 1:25 AM, Justin Maggard <jmaggard10@gmail.com> wrote:
> On Tue, Sep 22, 2015 at 7:45 AM, David Sterba <dsterba@suse.cz> wrote:
>> On Wed, Sep 02, 2015 at 06:05:17PM -0700, Justin Maggard wrote:
>>> v2: Fix stupid error while making formatting changes...
>>
>> I haven't noticed any difference between the patches, what exactly did
>> you change?
>>
>
> I broke compiling while cleaning up some checkpatch.pl feedback.
> Here's what changed between v1 and v2:
>
> - if (!btrfs_fs_closing(fs_info)) {
> + if (!btrfs_fs_closing(fs_info))
>
>
>>> I was hitting a consistent NULL pointer dereference during shutdown that
>>> showed the trace running through end_workqueue_bio(). I traced it back to
>>> the endio_meta_workers workqueue being poked after it had already been
>>> destroyed.
>>>
>>> Eventually I found that the root cause was a qgroup rescan that was still
>>> in progress while we were stopping all the btrfs workers.
>>>
>>> Currently we explicitly pause balance and scrub operations in
>>> close_ctree(), but we do nothing to stop the qgroup rescan. We should
>>> probably be doing the same for qgroup rescan, but that's a much larger
>>> change. This small change is good enough to allow me to unmount without
>>> crashing.
>>>
>>> Signed-off-by: Justin Maggard <jmaggard@netgear.com>
>>
>> Can you please submit the test you've used to trigger the crash to
>> fstests?
>>
>
> Sure, I've got a reproducer coded up for xfstests now. Should I just
> send that to this list, or is there a better place to send it?
Just send it to fstests@vger.kernel.org with the btrfs mailing list on
cc. If you take a look at test submission emails in the btrfs mailing
list, you'll see how it's usually done.
thanks
>
> -Justin
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Filipe David Manana,
"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] btrfs: qgroup: exit the rescan worker during umount
2015-09-03 1:05 [PATCH v2] btrfs: qgroup: exit the rescan worker during umount Justin Maggard
2015-09-22 14:45 ` David Sterba
@ 2015-10-08 9:25 ` Filipe Manana
1 sibling, 0 replies; 5+ messages in thread
From: Filipe Manana @ 2015-10-08 9:25 UTC (permalink / raw)
To: Justin Maggard; +Cc: linux-btrfs, Justin Maggard
On Thu, Sep 3, 2015 at 2:05 AM, Justin Maggard <jmaggard10@gmail.com> wrote:
> v2: Fix stupid error while making formatting changes...
>
> I was hitting a consistent NULL pointer dereference during shutdown that
> showed the trace running through end_workqueue_bio(). I traced it back to
> the endio_meta_workers workqueue being poked after it had already been
> destroyed.
>
> Eventually I found that the root cause was a qgroup rescan that was still
> in progress while we were stopping all the btrfs workers.
>
> Currently we explicitly pause balance and scrub operations in
> close_ctree(), but we do nothing to stop the qgroup rescan. We should
> probably be doing the same for qgroup rescan, but that's a much larger
> change. This small change is good enough to allow me to unmount without
> crashing.
>
> Signed-off-by: Justin Maggard <jmaggard@netgear.com>
> ---
> fs/btrfs/qgroup.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index d904ee1..5bfcee9 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2278,7 +2278,7 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
> goto out;
>
> err = 0;
> - while (!err) {
> + while (!err && !btrfs_fs_closing(fs_info)) {
> trans = btrfs_start_transaction(fs_info->fs_root, 0);
> if (IS_ERR(trans)) {
> err = PTR_ERR(trans);
> @@ -2301,7 +2301,8 @@ out:
> btrfs_free_path(path);
>
> mutex_lock(&fs_info->qgroup_rescan_lock);
> - fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
> + if (!btrfs_fs_closing(fs_info))
> + fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
>
> if (err > 0 &&
> fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT) {
> @@ -2330,7 +2331,9 @@ out:
> }
> btrfs_end_transaction(trans, fs_info->quota_root);
>
> - if (err >= 0) {
> + if (btrfs_fs_closing(fs_info)) {
> + btrfs_info(fs_info, "qgroup scan paused");
> + } else if (err >= 0) {
> btrfs_info(fs_info, "qgroup scan completed%s",
> err > 0 ? " (inconsistency flag cleared)" : "");
> } else {
Justin, this is still racy (however much less racy than before).
Once we leave the loop because of the condition
btrfs_fs_closing(fs_info), we start a transaction and do some write
operation on the quota btree. While or before we do such write
operation, close_ctree() might have completed or be at a point where
such write operation will result in another null pointer dereference,
or accessing some dangling pointer, or leak a transaction that never
gets committed (because close_ctree() already stopped the transaction
kthread), etc, etc.
So in addition to what you did, you need to call
btrfs_qgroup_wait_for_completion(fs_info) at disk-io.c:close_ctree()
right after setting fs_info->closing to 1.
Otherwise it looks good.
Thanks.
> --
> 2.5.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Filipe David Manana,
"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-10-08 9:25 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-03 1:05 [PATCH v2] btrfs: qgroup: exit the rescan worker during umount Justin Maggard
2015-09-22 14:45 ` David Sterba
2015-09-26 0:25 ` Justin Maggard
2015-09-26 11:49 ` Filipe Manana
2015-10-08 9:25 ` Filipe Manana
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.