linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Btrfs: fix deadlock with memory reclaim during scrub
@ 2018-11-23 13:45 fdmanana
  2018-11-23 16:05 ` [PATCH v2] " fdmanana
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: fdmanana @ 2018-11-23 13:45 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When a transaction commit starts, it attempts to pause scrub and it blocks
until the scrub is paused. So while the transaction is blocked waiting for
scrub to pause, we can not do memory allocation with GFP_KERNEL while scrub
is running, we must use GFP_NOS to avoid deadlock with reclaim. Checking
for pause requests is done early in the while loop of scrub_stripe(), and
later in the loop, scrub_extent() is called, which in turns calls
scrub_pages(), which does memory allocations using GFP_KERNEL. So use
GFP_NOFS for the memory allocations if there are any scrub pause requests.

Fixes: 58c4e173847a ("btrfs: scrub: use GFP_KERNEL on the submission path")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/scrub.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 3be1456b5116..5fcb9d1eb983 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2204,13 +2204,26 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
 {
 	struct scrub_block *sblock;
 	int index;
+	bool pause_req = (atomic_read(&sctx->fs_info->scrub_pause_req) != 0);
+	unsigned int nofs_flag;
+	int ret = 0;
+
+	/*
+	 * In order to avoid deadlock with reclaim when there is a transaction
+	 * trying to pause scrub, use GFP_NOFS. The pausing request is done when
+	 * the transaction commit starts, and it blocks the transaction until
+	 * scrub is paused (done at specific points at scrub_stripe()).
+	 */
+	if (pause_req)
+		nofs_flag = memalloc_nofs_save();
 
 	sblock = kzalloc(sizeof(*sblock), GFP_KERNEL);
 	if (!sblock) {
 		spin_lock(&sctx->stat_lock);
 		sctx->stat.malloc_errors++;
 		spin_unlock(&sctx->stat_lock);
-		return -ENOMEM;
+		ret = -ENOMEM;
+		goto out;
 	}
 
 	/* one ref inside this function, plus one for each page added to
@@ -2230,7 +2243,8 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
 			sctx->stat.malloc_errors++;
 			spin_unlock(&sctx->stat_lock);
 			scrub_block_put(sblock);
-			return -ENOMEM;
+			ret = -ENOMEM;
+			goto out;
 		}
 		BUG_ON(index >= SCRUB_MAX_PAGES_PER_BLOCK);
 		scrub_page_get(spage);
@@ -2269,12 +2283,11 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
 	} else {
 		for (index = 0; index < sblock->page_count; index++) {
 			struct scrub_page *spage = sblock->pagev[index];
-			int ret;
 
 			ret = scrub_add_page_to_rd_bio(sctx, spage);
 			if (ret) {
 				scrub_block_put(sblock);
-				return ret;
+				goto out;
 			}
 		}
 
@@ -2284,7 +2297,10 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
 
 	/* last one frees, either here or in bio completion for last page */
 	scrub_block_put(sblock);
-	return 0;
+ out:
+	if (pause_req)
+		memalloc_nofs_restore(nofs_flag);
+	return ret;
 }
 
 static void scrub_bio_end_io(struct bio *bio)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2] Btrfs: fix deadlock with memory reclaim during scrub
  2018-11-23 13:45 [PATCH] Btrfs: fix deadlock with memory reclaim during scrub fdmanana
@ 2018-11-23 16:05 ` fdmanana
  2018-11-23 16:13   ` Nikolay Borisov
  2018-11-23 16:41 ` [PATCH v3] " fdmanana
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: fdmanana @ 2018-11-23 16:05 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When a transaction commit starts, it attempts to pause scrub and it blocks
until the scrub is paused. So while the transaction is blocked waiting for
scrub to pause, we can not do memory allocation with GFP_KERNEL while scrub
is running, we must use GFP_NOS to avoid deadlock with reclaim. Checking
for pause requests is done early in the while loop of scrub_stripe(), and
later in the loop, scrub_extent() is called, which in turns calls
scrub_pages(), which does memory allocations using GFP_KERNEL. So use
GFP_NOFS for the memory allocations because at any time a scrub pause
request can happen from another task that started to commit a transaction.

Fixes: 58c4e173847a ("btrfs: scrub: use GFP_KERNEL on the submission path")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---

V2: Make using GFP_NOFS unconditionial. Previous version was racy, as pausing
requests migth happen just after we checked for them.

 fs/btrfs/scrub.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 3be1456b5116..0630ea0881bc 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2205,7 +2205,13 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
 	struct scrub_block *sblock;
 	int index;
 
-	sblock = kzalloc(sizeof(*sblock), GFP_KERNEL);
+	/*
+	 * In order to avoid deadlock with reclaim when there is a transaction
+	 * trying to pause scrub, use GFP_NOFS. The pausing request is done when
+	 * the transaction commit starts, and it blocks the transaction until
+	 * scrub is paused (done at specific points at scrub_stripe()).
+	 */
+	sblock = kzalloc(sizeof(*sblock), GFP_NOFS);
 	if (!sblock) {
 		spin_lock(&sctx->stat_lock);
 		sctx->stat.malloc_errors++;
@@ -2223,7 +2229,7 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
 		struct scrub_page *spage;
 		u64 l = min_t(u64, len, PAGE_SIZE);
 
-		spage = kzalloc(sizeof(*spage), GFP_KERNEL);
+		spage = kzalloc(sizeof(*spage), GFP_NOFS);
 		if (!spage) {
 leave_nomem:
 			spin_lock(&sctx->stat_lock);
@@ -2250,7 +2256,7 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
 			spage->have_csum = 0;
 		}
 		sblock->page_count++;
-		spage->page = alloc_page(GFP_KERNEL);
+		spage->page = alloc_page(GFP_NOFS);
 		if (!spage->page)
 			goto leave_nomem;
 		len -= l;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] Btrfs: fix deadlock with memory reclaim during scrub
  2018-11-23 16:05 ` [PATCH v2] " fdmanana
@ 2018-11-23 16:13   ` Nikolay Borisov
  0 siblings, 0 replies; 12+ messages in thread
From: Nikolay Borisov @ 2018-11-23 16:13 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 23.11.18 г. 18:05 ч., fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When a transaction commit starts, it attempts to pause scrub and it blocks
> until the scrub is paused. So while the transaction is blocked waiting for
> scrub to pause, we can not do memory allocation with GFP_KERNEL while scrub
> is running, we must use GFP_NOS to avoid deadlock with reclaim. Checking
> for pause requests is done early in the while loop of scrub_stripe(), and
> later in the loop, scrub_extent() is called, which in turns calls
> scrub_pages(), which does memory allocations using GFP_KERNEL. So use
> GFP_NOFS for the memory allocations because at any time a scrub pause
> request can happen from another task that started to commit a transaction.
> 
> Fixes: 58c4e173847a ("btrfs: scrub: use GFP_KERNEL on the submission path")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> 
> V2: Make using GFP_NOFS unconditionial. Previous version was racy, as pausing
> requests migth happen just after we checked for them.
> 
>  fs/btrfs/scrub.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 3be1456b5116..0630ea0881bc 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -2205,7 +2205,13 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
>  	struct scrub_block *sblock;
>  	int index;
>  
> -	sblock = kzalloc(sizeof(*sblock), GFP_KERNEL);
> +	/*
> +	 * In order to avoid deadlock with reclaim when there is a transaction
> +	 * trying to pause scrub, use GFP_NOFS. The pausing request is done when
> +	 * the transaction commit starts, and it blocks the transaction until
> +	 * scrub is paused (done at specific points at scrub_stripe()).
> +	 */
> +	sblock = kzalloc(sizeof(*sblock), GFP_NOFS);

Newer code shouldn't use GFP_NOFS, rather leave GFP_KERNEL as is and
instead use the memaloc_nofs_save/memalloc_nofs_restore. For background
information refer to: Documentation/core-api/gfp_mask-from-fs-io.rst

>  	if (!sblock) {
>  		spin_lock(&sctx->stat_lock);
>  		sctx->stat.malloc_errors++;
> @@ -2223,7 +2229,7 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
>  		struct scrub_page *spage;
>  		u64 l = min_t(u64, len, PAGE_SIZE);
>  
> -		spage = kzalloc(sizeof(*spage), GFP_KERNEL);
> +		spage = kzalloc(sizeof(*spage), GFP_NOFS);
>  		if (!spage) {
>  leave_nomem:
>  			spin_lock(&sctx->stat_lock);
> @@ -2250,7 +2256,7 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
>  			spage->have_csum = 0;
>  		}
>  		sblock->page_count++;
> -		spage->page = alloc_page(GFP_KERNEL);
> +		spage->page = alloc_page(GFP_NOFS);
>  		if (!spage->page)
>  			goto leave_nomem;
>  		len -= l;
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3] Btrfs: fix deadlock with memory reclaim during scrub
  2018-11-23 13:45 [PATCH] Btrfs: fix deadlock with memory reclaim during scrub fdmanana
  2018-11-23 16:05 ` [PATCH v2] " fdmanana
@ 2018-11-23 16:41 ` fdmanana
  2018-11-23 18:25 ` [PATCH v4] " fdmanana
  2018-11-26 20:07 ` [PATCH v5] " fdmanana
  3 siblings, 0 replies; 12+ messages in thread
From: fdmanana @ 2018-11-23 16:41 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When a transaction commit starts, it attempts to pause scrub and it blocks
until the scrub is paused. So while the transaction is blocked waiting for
scrub to pause, we can not do memory allocation with GFP_KERNEL while scrub
is running, we must use GFP_NOS to avoid deadlock with reclaim. Checking
for pause requests is done early in the while loop of scrub_stripe(), and
later in the loop, scrub_extent() is called, which in turns calls
scrub_pages(), which does memory allocations using GFP_KERNEL. So use
GFP_NOFS for the memory allocations because at any time a scrub pause
request can happen from another task that started to commit a transaction.

Fixes: 58c4e173847a ("btrfs: scrub: use GFP_KERNEL on the submission path")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---

V2: Make using GFP_NOFS unconditionial. Previous version was racy, as pausing
requests migth happen just after we checked for them.

V3: Use memalloc_nofs_save() just like V1 did.

 fs/btrfs/scrub.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 3be1456b5116..8e9ead5073ec 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2204,13 +2204,24 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
 {
 	struct scrub_block *sblock;
 	int index;
+	unsigned int nofs_flag;
+	int ret = 0;
+
+	/*
+	 * In order to avoid deadlock with reclaim when there is a transaction
+	 * trying to pause scrub, use GFP_NOFS. The pausing request is done when
+	 * the transaction commit starts, and it blocks the transaction until
+	 * scrub is paused (done at specific points at scrub_stripe()).
+	 */
+	nofs_flag = memalloc_nofs_save();
 
 	sblock = kzalloc(sizeof(*sblock), GFP_KERNEL);
 	if (!sblock) {
 		spin_lock(&sctx->stat_lock);
 		sctx->stat.malloc_errors++;
 		spin_unlock(&sctx->stat_lock);
-		return -ENOMEM;
+		ret = -ENOMEM;
+		goto out;
 	}
 
 	/* one ref inside this function, plus one for each page added to
@@ -2230,7 +2241,8 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
 			sctx->stat.malloc_errors++;
 			spin_unlock(&sctx->stat_lock);
 			scrub_block_put(sblock);
-			return -ENOMEM;
+			ret = -ENOMEM;
+			goto out;
 		}
 		BUG_ON(index >= SCRUB_MAX_PAGES_PER_BLOCK);
 		scrub_page_get(spage);
@@ -2269,12 +2281,11 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
 	} else {
 		for (index = 0; index < sblock->page_count; index++) {
 			struct scrub_page *spage = sblock->pagev[index];
-			int ret;
 
 			ret = scrub_add_page_to_rd_bio(sctx, spage);
 			if (ret) {
 				scrub_block_put(sblock);
-				return ret;
+				goto out;
 			}
 		}
 
@@ -2284,7 +2295,9 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
 
 	/* last one frees, either here or in bio completion for last page */
 	scrub_block_put(sblock);
-	return 0;
+out:
+	memalloc_nofs_restore(nofs_flag);
+	return ret;
 }
 
 static void scrub_bio_end_io(struct bio *bio)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4] Btrfs: fix deadlock with memory reclaim during scrub
  2018-11-23 13:45 [PATCH] Btrfs: fix deadlock with memory reclaim during scrub fdmanana
  2018-11-23 16:05 ` [PATCH v2] " fdmanana
  2018-11-23 16:41 ` [PATCH v3] " fdmanana
@ 2018-11-23 18:25 ` fdmanana
  2018-11-26  7:27   ` Nikolay Borisov
  2018-11-26 18:17   ` David Sterba
  2018-11-26 20:07 ` [PATCH v5] " fdmanana
  3 siblings, 2 replies; 12+ messages in thread
From: fdmanana @ 2018-11-23 18:25 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When a transaction commit starts, it attempts to pause scrub and it blocks
until the scrub is paused. So while the transaction is blocked waiting for
scrub to pause, we can not do memory allocation with GFP_KERNEL from scrub,
otherwise we risk getting into a deadlock with reclaim.

Checking for scrub pause requests is done early at the beginning of the
while loop of scrub_stripe() and later in the loop, scrub_extent() and
scrub_raid56_parity() are called, which in turn call scrub_pages() and
scrub_pages_for_parity() respectively. These last two functions do memory
allocations using GFP_KERNEL. Same problem could happen while scrubbing
the super blocks, since it calls scrub_pages().

So make sure GFP_NOFS is used for the memory allocations because at any
time a scrub pause request can happen from another task that started to
commit a transaction.

Fixes: 58c4e173847a ("btrfs: scrub: use GFP_KERNEL on the submission path")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---

V2: Make using GFP_NOFS unconditionial. Previous version was racy, as pausing
requests migth happen just after we checked for them.

V3: Use memalloc_nofs_save() just like V1 did.

V4: Similar problem happened for raid56, which was previously missed, so
    deal with it as well as the case for scrub_supers().

 fs/btrfs/scrub.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 3be1456b5116..e08b7502d1f0 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3779,6 +3779,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 	struct scrub_ctx *sctx;
 	int ret;
 	struct btrfs_device *dev;
+	unsigned int nofs_flag;
 
 	if (btrfs_fs_closing(fs_info))
 		return -EINVAL;
@@ -3882,6 +3883,16 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 	atomic_inc(&fs_info->scrubs_running);
 	mutex_unlock(&fs_info->scrub_lock);
 
+	/*
+	 * In order to avoid deadlock with reclaim when there is a transaction
+	 * trying to pause scrub, make sure we use GFP_NOFS for all the
+	 * allocations done at btrfs_scrub_pages() and scrub_pages_for_parity()
+	 * invoked by our callees. The pausing request is done when the
+	 * transaction commit starts, and it blocks the transaction until scrub
+	 * is paused (done at specific points at scrub_stripe() or right above
+	 * before incrementing fs_info->scrubs_running).
+	 */
+	nofs_flag = memalloc_nofs_save();
 	if (!is_dev_replace) {
 		/*
 		 * by holding device list mutex, we can
@@ -3895,6 +3906,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 	if (!ret)
 		ret = scrub_enumerate_chunks(sctx, dev, start, end,
 					     is_dev_replace);
+	memalloc_nofs_restore(nofs_flag);
 
 	wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0);
 	atomic_dec(&fs_info->scrubs_running);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] Btrfs: fix deadlock with memory reclaim during scrub
  2018-11-23 18:25 ` [PATCH v4] " fdmanana
@ 2018-11-26  7:27   ` Nikolay Borisov
  2018-11-26 18:17   ` David Sterba
  1 sibling, 0 replies; 12+ messages in thread
From: Nikolay Borisov @ 2018-11-26  7:27 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 23.11.18 г. 20:25 ч., fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When a transaction commit starts, it attempts to pause scrub and it blocks
> until the scrub is paused. So while the transaction is blocked waiting for
> scrub to pause, we can not do memory allocation with GFP_KERNEL from scrub,
> otherwise we risk getting into a deadlock with reclaim.
> 
> Checking for scrub pause requests is done early at the beginning of the
> while loop of scrub_stripe() and later in the loop, scrub_extent() and
> scrub_raid56_parity() are called, which in turn call scrub_pages() and
> scrub_pages_for_parity() respectively. These last two functions do memory
> allocations using GFP_KERNEL. Same problem could happen while scrubbing
> the super blocks, since it calls scrub_pages().
> 
> So make sure GFP_NOFS is used for the memory allocations because at any
> time a scrub pause request can happen from another task that started to
> commit a transaction.
> 
> Fixes: 58c4e173847a ("btrfs: scrub: use GFP_KERNEL on the submission path")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
> 
> V2: Make using GFP_NOFS unconditionial. Previous version was racy, as pausing
> requests migth happen just after we checked for them.
> 
> V3: Use memalloc_nofs_save() just like V1 did.
> 
> V4: Similar problem happened for raid56, which was previously missed, so
>     deal with it as well as the case for scrub_supers().
> 
>  fs/btrfs/scrub.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 3be1456b5116..e08b7502d1f0 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -3779,6 +3779,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
>  	struct scrub_ctx *sctx;
>  	int ret;
>  	struct btrfs_device *dev;
> +	unsigned int nofs_flag;
>  
>  	if (btrfs_fs_closing(fs_info))
>  		return -EINVAL;
> @@ -3882,6 +3883,16 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
>  	atomic_inc(&fs_info->scrubs_running);
>  	mutex_unlock(&fs_info->scrub_lock);
>  
> +	/*
> +	 * In order to avoid deadlock with reclaim when there is a transaction
> +	 * trying to pause scrub, make sure we use GFP_NOFS for all the
> +	 * allocations done at btrfs_scrub_pages() and scrub_pages_for_parity()
> +	 * invoked by our callees. The pausing request is done when the
> +	 * transaction commit starts, and it blocks the transaction until scrub
> +	 * is paused (done at specific points at scrub_stripe() or right above
> +	 * before incrementing fs_info->scrubs_running).
> +	 */
> +	nofs_flag = memalloc_nofs_save();
>  	if (!is_dev_replace) {
>  		/*
>  		 * by holding device list mutex, we can
> @@ -3895,6 +3906,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
>  	if (!ret)
>  		ret = scrub_enumerate_chunks(sctx, dev, start, end,
>  					     is_dev_replace);
> +	memalloc_nofs_restore(nofs_flag);
>  
>  	wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0);
>  	atomic_dec(&fs_info->scrubs_running);
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] Btrfs: fix deadlock with memory reclaim during scrub
  2018-11-23 18:25 ` [PATCH v4] " fdmanana
  2018-11-26  7:27   ` Nikolay Borisov
@ 2018-11-26 18:17   ` David Sterba
  2018-11-26 20:10     ` Filipe Manana
  1 sibling, 1 reply; 12+ messages in thread
From: David Sterba @ 2018-11-26 18:17 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Fri, Nov 23, 2018 at 06:25:40PM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When a transaction commit starts, it attempts to pause scrub and it blocks
> until the scrub is paused. So while the transaction is blocked waiting for
> scrub to pause, we can not do memory allocation with GFP_KERNEL from scrub,
> otherwise we risk getting into a deadlock with reclaim.
> 
> Checking for scrub pause requests is done early at the beginning of the
> while loop of scrub_stripe() and later in the loop, scrub_extent() and
> scrub_raid56_parity() are called, which in turn call scrub_pages() and
> scrub_pages_for_parity() respectively. These last two functions do memory
> allocations using GFP_KERNEL. Same problem could happen while scrubbing
> the super blocks, since it calls scrub_pages().
> 
> So make sure GFP_NOFS is used for the memory allocations because at any
> time a scrub pause request can happen from another task that started to
> commit a transaction.
> 
> Fixes: 58c4e173847a ("btrfs: scrub: use GFP_KERNEL on the submission path")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> 
> V2: Make using GFP_NOFS unconditionial. Previous version was racy, as pausing
> requests migth happen just after we checked for them.
> 
> V3: Use memalloc_nofs_save() just like V1 did.
> 
> V4: Similar problem happened for raid56, which was previously missed, so
>     deal with it as well as the case for scrub_supers().

Enclosing the whole scrub to 'nofs' seems like the best option and
future proof. What I missed in 58c4e173847a was the "don't hold big lock
under GFP_KERNEL allocation" pattern.

>  fs/btrfs/scrub.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 3be1456b5116..e08b7502d1f0 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -3779,6 +3779,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
>  	struct scrub_ctx *sctx;
>  	int ret;
>  	struct btrfs_device *dev;
> +	unsigned int nofs_flag;
>  
>  	if (btrfs_fs_closing(fs_info))
>  		return -EINVAL;
> @@ -3882,6 +3883,16 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
>  	atomic_inc(&fs_info->scrubs_running);
>  	mutex_unlock(&fs_info->scrub_lock);
>  
> +	/*
> +	 * In order to avoid deadlock with reclaim when there is a transaction
> +	 * trying to pause scrub, make sure we use GFP_NOFS for all the
> +	 * allocations done at btrfs_scrub_pages() and scrub_pages_for_parity()
> +	 * invoked by our callees. The pausing request is done when the
> +	 * transaction commit starts, and it blocks the transaction until scrub
> +	 * is paused (done at specific points at scrub_stripe() or right above
> +	 * before incrementing fs_info->scrubs_running).

This hilights why there's perhaps no point in trying to make the nofs
section smaller, handling all the interactions between scrub and
transaction would be too complex.

Reviewed-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5] Btrfs: fix deadlock with memory reclaim during scrub
  2018-11-23 13:45 [PATCH] Btrfs: fix deadlock with memory reclaim during scrub fdmanana
                   ` (2 preceding siblings ...)
  2018-11-23 18:25 ` [PATCH v4] " fdmanana
@ 2018-11-26 20:07 ` fdmanana
  3 siblings, 0 replies; 12+ messages in thread
From: fdmanana @ 2018-11-26 20:07 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When a transaction commit starts, it attempts to pause scrub and it blocks
until the scrub is paused. So while the transaction is blocked waiting for
scrub to pause, we can not do memory allocation with GFP_KERNEL from scrub,
otherwise we risk getting into a deadlock with reclaim.

Checking for scrub pause requests is done early at the beginning of the
while loop of scrub_stripe() and later in the loop, scrub_extent() and
scrub_raid56_parity() are called, which in turn call scrub_pages() and
scrub_pages_for_parity() respectively. These last two functions do memory
allocations using GFP_KERNEL. Same problem could happen while scrubbing
the super blocks, since it calls scrub_pages().

We also can not have any of the worker tasks, created by the scrub task,
doing GFP_KERNEL allocations, because before pausing, the scrub task waits
for all the worker tasks to complete (also done at scrub_stripe()).

So make sure GFP_NOFS is used for the memory allocations because at any
time a scrub pause request can happen from another task that started to
commit a transaction.

Fixes: 58c4e173847a ("btrfs: scrub: use GFP_KERNEL on the submission path")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---

V2: Make using GFP_NOFS unconditionial. Previous version was racy, as pausing
requests migth happen just after we checked for them.

V3: Use memalloc_nofs_save() just like V1 did.

V4: Similar problem happened for raid56, which was previously missed, so
    deal with it as well as the case for scrub_supers().

V5: Make sure worker tasks, created by scrub, also don't do GFP_KERNEL
    allocations, because in order to pause, the scrub task waits for all
    the workers to complete first.

 fs/btrfs/scrub.c | 35 ++++++++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 3be1456b5116..392f8a7f65ab 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -322,6 +322,7 @@ static struct full_stripe_lock *insert_full_stripe_lock(
 	struct rb_node *parent = NULL;
 	struct full_stripe_lock *entry;
 	struct full_stripe_lock *ret;
+	unsigned int nofs_flag;
 
 	lockdep_assert_held(&locks_root->lock);
 
@@ -339,8 +340,17 @@ static struct full_stripe_lock *insert_full_stripe_lock(
 		}
 	}
 
-	/* Insert new lock */
+	/*
+	 * Insert new lock.
+	 *
+	 * We must use GFP_NOFS because the scrub task might be waiting for a
+	 * worker task executing this function and in turn a transaction commit
+	 * might be waiting the scrub task to pause (which needs to wait for all
+	 * the worker tasks to complete before pausing).
+	 */
+	nofs_flag = memalloc_nofs_save();
 	ret = kmalloc(sizeof(*ret), GFP_KERNEL);
+	memalloc_nofs_restore(nofs_flag);
 	if (!ret)
 		return ERR_PTR(-ENOMEM);
 	ret->logical = fstripe_logical;
@@ -1622,8 +1632,19 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx,
 	mutex_lock(&sctx->wr_lock);
 again:
 	if (!sctx->wr_curr_bio) {
+		unsigned int nofs_flag;
+
+		/*
+		 * We must use GFP_NOFS because the scrub task might be waiting
+		 * for a worker task executing this function and in turn a
+		 * transaction commit might be waiting the scrub task to pause
+		 * (which needs to wait for all the worker tasks to complete
+		 * before pausing).
+		 */
+		nofs_flag = memalloc_nofs_save();
 		sctx->wr_curr_bio = kzalloc(sizeof(*sctx->wr_curr_bio),
 					      GFP_KERNEL);
+		memalloc_nofs_restore(nofs_flag);
 		if (!sctx->wr_curr_bio) {
 			mutex_unlock(&sctx->wr_lock);
 			return -ENOMEM;
@@ -3779,6 +3800,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 	struct scrub_ctx *sctx;
 	int ret;
 	struct btrfs_device *dev;
+	unsigned int nofs_flag;
 
 	if (btrfs_fs_closing(fs_info))
 		return -EINVAL;
@@ -3882,6 +3904,16 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 	atomic_inc(&fs_info->scrubs_running);
 	mutex_unlock(&fs_info->scrub_lock);
 
+	/*
+	 * In order to avoid deadlock with reclaim when there is a transaction
+	 * trying to pause scrub, make sure we use GFP_NOFS for all the
+	 * allocations done at btrfs_scrub_pages() and scrub_pages_for_parity()
+	 * invoked by our callees. The pausing request is done when the
+	 * transaction commit starts, and it blocks the transaction until scrub
+	 * is paused (done at specific points at scrub_stripe() or right above
+	 * before incrementing fs_info->scrubs_running).
+	 */
+	nofs_flag = memalloc_nofs_save();
 	if (!is_dev_replace) {
 		/*
 		 * by holding device list mutex, we can
@@ -3895,6 +3927,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 	if (!ret)
 		ret = scrub_enumerate_chunks(sctx, dev, start, end,
 					     is_dev_replace);
+	memalloc_nofs_restore(nofs_flag);
 
 	wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0);
 	atomic_dec(&fs_info->scrubs_running);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] Btrfs: fix deadlock with memory reclaim during scrub
  2018-11-26 18:17   ` David Sterba
@ 2018-11-26 20:10     ` Filipe Manana
  2018-11-28 14:22       ` David Sterba
  0 siblings, 1 reply; 12+ messages in thread
From: Filipe Manana @ 2018-11-26 20:10 UTC (permalink / raw)
  To: dsterba, linux-btrfs

On Mon, Nov 26, 2018 at 6:17 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Fri, Nov 23, 2018 at 06:25:40PM +0000, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > When a transaction commit starts, it attempts to pause scrub and it blocks
> > until the scrub is paused. So while the transaction is blocked waiting for
> > scrub to pause, we can not do memory allocation with GFP_KERNEL from scrub,
> > otherwise we risk getting into a deadlock with reclaim.
> >
> > Checking for scrub pause requests is done early at the beginning of the
> > while loop of scrub_stripe() and later in the loop, scrub_extent() and
> > scrub_raid56_parity() are called, which in turn call scrub_pages() and
> > scrub_pages_for_parity() respectively. These last two functions do memory
> > allocations using GFP_KERNEL. Same problem could happen while scrubbing
> > the super blocks, since it calls scrub_pages().
> >
> > So make sure GFP_NOFS is used for the memory allocations because at any
> > time a scrub pause request can happen from another task that started to
> > commit a transaction.
> >
> > Fixes: 58c4e173847a ("btrfs: scrub: use GFP_KERNEL on the submission path")
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> >
> > V2: Make using GFP_NOFS unconditionial. Previous version was racy, as pausing
> > requests migth happen just after we checked for them.
> >
> > V3: Use memalloc_nofs_save() just like V1 did.
> >
> > V4: Similar problem happened for raid56, which was previously missed, so
> >     deal with it as well as the case for scrub_supers().
>
> Enclosing the whole scrub to 'nofs' seems like the best option and
> future proof. What I missed in 58c4e173847a was the "don't hold big lock
> under GFP_KERNEL allocation" pattern.
>
> >  fs/btrfs/scrub.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> > index 3be1456b5116..e08b7502d1f0 100644
> > --- a/fs/btrfs/scrub.c
> > +++ b/fs/btrfs/scrub.c
> > @@ -3779,6 +3779,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
> >       struct scrub_ctx *sctx;
> >       int ret;
> >       struct btrfs_device *dev;
> > +     unsigned int nofs_flag;
> >
> >       if (btrfs_fs_closing(fs_info))
> >               return -EINVAL;
> > @@ -3882,6 +3883,16 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
> >       atomic_inc(&fs_info->scrubs_running);
> >       mutex_unlock(&fs_info->scrub_lock);
> >
> > +     /*
> > +      * In order to avoid deadlock with reclaim when there is a transaction
> > +      * trying to pause scrub, make sure we use GFP_NOFS for all the
> > +      * allocations done at btrfs_scrub_pages() and scrub_pages_for_parity()
> > +      * invoked by our callees. The pausing request is done when the
> > +      * transaction commit starts, and it blocks the transaction until scrub
> > +      * is paused (done at specific points at scrub_stripe() or right above
> > +      * before incrementing fs_info->scrubs_running).
>
> This hilights why there's perhaps no point in trying to make the nofs
> section smaller, handling all the interactions between scrub and
> transaction would be too complex.
>
> Reviewed-by: David Sterba <dsterba@suse.com>

Well, the worker tasks can also not use gfp_kernel, since the scrub
task waits for them to complete before pausing.
I missed this, and 2 reviewers as well, so perhaps it wasn't that
trivial and I shouldn't feel that I miserably failed to identify all
cases for something rather trivial. V5 sent.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] Btrfs: fix deadlock with memory reclaim during scrub
  2018-11-26 20:10     ` Filipe Manana
@ 2018-11-28 14:22       ` David Sterba
  2018-11-28 14:40         ` Filipe Manana
  0 siblings, 1 reply; 12+ messages in thread
From: David Sterba @ 2018-11-28 14:22 UTC (permalink / raw)
  To: Filipe Manana; +Cc: dsterba, linux-btrfs

On Mon, Nov 26, 2018 at 08:10:30PM +0000, Filipe Manana wrote:
> On Mon, Nov 26, 2018 at 6:17 PM David Sterba <dsterba@suse.cz> wrote:
> >
> > On Fri, Nov 23, 2018 at 06:25:40PM +0000, fdmanana@kernel.org wrote:
> > > From: Filipe Manana <fdmanana@suse.com>
> > >
> > > When a transaction commit starts, it attempts to pause scrub and it blocks
> > > until the scrub is paused. So while the transaction is blocked waiting for
> > > scrub to pause, we can not do memory allocation with GFP_KERNEL from scrub,
> > > otherwise we risk getting into a deadlock with reclaim.
> > >
> > > Checking for scrub pause requests is done early at the beginning of the
> > > while loop of scrub_stripe() and later in the loop, scrub_extent() and
> > > scrub_raid56_parity() are called, which in turn call scrub_pages() and
> > > scrub_pages_for_parity() respectively. These last two functions do memory
> > > allocations using GFP_KERNEL. Same problem could happen while scrubbing
> > > the super blocks, since it calls scrub_pages().
> > >
> > > So make sure GFP_NOFS is used for the memory allocations because at any
> > > time a scrub pause request can happen from another task that started to
> > > commit a transaction.
> > >
> > > Fixes: 58c4e173847a ("btrfs: scrub: use GFP_KERNEL on the submission path")
> > > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > > ---
> > >
> > > V2: Make using GFP_NOFS unconditionial. Previous version was racy, as pausing
> > > requests migth happen just after we checked for them.
> > >
> > > V3: Use memalloc_nofs_save() just like V1 did.
> > >
> > > V4: Similar problem happened for raid56, which was previously missed, so
> > >     deal with it as well as the case for scrub_supers().
> >
> > Enclosing the whole scrub to 'nofs' seems like the best option and
> > future proof. What I missed in 58c4e173847a was the "don't hold big lock
> > under GFP_KERNEL allocation" pattern.
> >
> > >  fs/btrfs/scrub.c | 12 ++++++++++++
> > >  1 file changed, 12 insertions(+)
> > >
> > > diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> > > index 3be1456b5116..e08b7502d1f0 100644
> > > --- a/fs/btrfs/scrub.c
> > > +++ b/fs/btrfs/scrub.c
> > > @@ -3779,6 +3779,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
> > >       struct scrub_ctx *sctx;
> > >       int ret;
> > >       struct btrfs_device *dev;
> > > +     unsigned int nofs_flag;
> > >
> > >       if (btrfs_fs_closing(fs_info))
> > >               return -EINVAL;
> > > @@ -3882,6 +3883,16 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
> > >       atomic_inc(&fs_info->scrubs_running);
> > >       mutex_unlock(&fs_info->scrub_lock);
> > >
> > > +     /*
> > > +      * In order to avoid deadlock with reclaim when there is a transaction
> > > +      * trying to pause scrub, make sure we use GFP_NOFS for all the
> > > +      * allocations done at btrfs_scrub_pages() and scrub_pages_for_parity()
> > > +      * invoked by our callees. The pausing request is done when the
> > > +      * transaction commit starts, and it blocks the transaction until scrub
> > > +      * is paused (done at specific points at scrub_stripe() or right above
> > > +      * before incrementing fs_info->scrubs_running).
> >
> > This hilights why there's perhaps no point in trying to make the nofs
> > section smaller, handling all the interactions between scrub and
> > transaction would be too complex.
> >
> > Reviewed-by: David Sterba <dsterba@suse.com>
> 
> Well, the worker tasks can also not use gfp_kernel, since the scrub
> task waits for them to complete before pausing.
> I missed this, and 2 reviewers as well, so perhaps it wasn't that
> trivial and I shouldn't feel that I miserably failed to identify all
> cases for something rather trivial. V5 sent.

You can say that you left it there intentionally, such cookies are a
good drill for reviewers to stay sharp.

When I started the conversions of GFP_NOFS -> GFP_KERNEL, scrub looked
quite safe in this respect but turns out it's not. I was wondering if we
could add some lock assertions before GFP_KERNEL allocations, like:

	assert_lock_not_held(fs_info->device_list_mutex)
	kmalloc(GFP_KERNEL)

and add more locks per subsystem (eg. the scrub lock) and possibly some
convenience wrappers. Michal's scope GFS_NOFS patch series has a
debugging warning where NOFS is used in context where it does not need
to, while having the 'must not hold an important lock' would be a good
debugging helper too.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] Btrfs: fix deadlock with memory reclaim during scrub
  2018-11-28 14:22       ` David Sterba
@ 2018-11-28 14:40         ` Filipe Manana
  2018-12-04 14:47           ` David Sterba
  0 siblings, 1 reply; 12+ messages in thread
From: Filipe Manana @ 2018-11-28 14:40 UTC (permalink / raw)
  To: dsterba, linux-btrfs

On Wed, Nov 28, 2018 at 2:22 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Mon, Nov 26, 2018 at 08:10:30PM +0000, Filipe Manana wrote:
> > On Mon, Nov 26, 2018 at 6:17 PM David Sterba <dsterba@suse.cz> wrote:
> > >
> > > On Fri, Nov 23, 2018 at 06:25:40PM +0000, fdmanana@kernel.org wrote:
> > > > From: Filipe Manana <fdmanana@suse.com>
> > > >
> > > > When a transaction commit starts, it attempts to pause scrub and it blocks
> > > > until the scrub is paused. So while the transaction is blocked waiting for
> > > > scrub to pause, we can not do memory allocation with GFP_KERNEL from scrub,
> > > > otherwise we risk getting into a deadlock with reclaim.
> > > >
> > > > Checking for scrub pause requests is done early at the beginning of the
> > > > while loop of scrub_stripe() and later in the loop, scrub_extent() and
> > > > scrub_raid56_parity() are called, which in turn call scrub_pages() and
> > > > scrub_pages_for_parity() respectively. These last two functions do memory
> > > > allocations using GFP_KERNEL. Same problem could happen while scrubbing
> > > > the super blocks, since it calls scrub_pages().
> > > >
> > > > So make sure GFP_NOFS is used for the memory allocations because at any
> > > > time a scrub pause request can happen from another task that started to
> > > > commit a transaction.
> > > >
> > > > Fixes: 58c4e173847a ("btrfs: scrub: use GFP_KERNEL on the submission path")
> > > > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > > > ---
> > > >
> > > > V2: Make using GFP_NOFS unconditionial. Previous version was racy, as pausing
> > > > requests migth happen just after we checked for them.
> > > >
> > > > V3: Use memalloc_nofs_save() just like V1 did.
> > > >
> > > > V4: Similar problem happened for raid56, which was previously missed, so
> > > >     deal with it as well as the case for scrub_supers().
> > >
> > > Enclosing the whole scrub to 'nofs' seems like the best option and
> > > future proof. What I missed in 58c4e173847a was the "don't hold big lock
> > > under GFP_KERNEL allocation" pattern.
> > >
> > > >  fs/btrfs/scrub.c | 12 ++++++++++++
> > > >  1 file changed, 12 insertions(+)
> > > >
> > > > diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> > > > index 3be1456b5116..e08b7502d1f0 100644
> > > > --- a/fs/btrfs/scrub.c
> > > > +++ b/fs/btrfs/scrub.c
> > > > @@ -3779,6 +3779,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
> > > >       struct scrub_ctx *sctx;
> > > >       int ret;
> > > >       struct btrfs_device *dev;
> > > > +     unsigned int nofs_flag;
> > > >
> > > >       if (btrfs_fs_closing(fs_info))
> > > >               return -EINVAL;
> > > > @@ -3882,6 +3883,16 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
> > > >       atomic_inc(&fs_info->scrubs_running);
> > > >       mutex_unlock(&fs_info->scrub_lock);
> > > >
> > > > +     /*
> > > > +      * In order to avoid deadlock with reclaim when there is a transaction
> > > > +      * trying to pause scrub, make sure we use GFP_NOFS for all the
> > > > +      * allocations done at btrfs_scrub_pages() and scrub_pages_for_parity()
> > > > +      * invoked by our callees. The pausing request is done when the
> > > > +      * transaction commit starts, and it blocks the transaction until scrub
> > > > +      * is paused (done at specific points at scrub_stripe() or right above
> > > > +      * before incrementing fs_info->scrubs_running).
> > >
> > > This hilights why there's perhaps no point in trying to make the nofs
> > > section smaller, handling all the interactions between scrub and
> > > transaction would be too complex.
> > >
> > > Reviewed-by: David Sterba <dsterba@suse.com>
> >
> > Well, the worker tasks can also not use gfp_kernel, since the scrub
> > task waits for them to complete before pausing.
> > I missed this, and 2 reviewers as well, so perhaps it wasn't that
> > trivial and I shouldn't feel that I miserably failed to identify all
> > cases for something rather trivial. V5 sent.
>
> You can say that you left it there intentionally, such cookies are a
> good drill for reviewers to stay sharp.

Unfortunately for me, it was not on purpose.

However there's the small drill of, for the workers only, potentially
moving the memalloc_nofs_save() and
restore to scrub_handle_errored_block(), since the only two possible
gfp_kernel allocations for workers
are during the case where a repair is needed:

scrub_bio_end_io_worker()
  scrub_block_complete()
    scrub_handle_errored_block()
      lock_full_stripe()
        insert_full_stripe_lock()
          -> kmalloc with GFP_KERNEL


scrub_bio_end_io_worker()
   scrub_block_complete()
     scrub_handle_errored_block()
       scrub_write_page_to_dev_replace()
         scrub_add_page_to_wr_bio()
           -> kzalloc with GFP_KERNEL

Just to avoid some duplication.

>
> When I started the conversions of GFP_NOFS -> GFP_KERNEL, scrub looked
> quite safe in this respect but turns out it's not. I was wondering if we
> could add some lock assertions before GFP_KERNEL allocations, like:
>
>         assert_lock_not_held(fs_info->device_list_mutex)
>         kmalloc(GFP_KERNEL)
>
> and add more locks per subsystem (eg. the scrub lock) and possibly some
> convenience wrappers. Michal's scope GFS_NOFS patch series has a
> debugging warning where NOFS is used in context where it does not need
> to, while having the 'must not hold an important lock' would be a good
> debugging helper too.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] Btrfs: fix deadlock with memory reclaim during scrub
  2018-11-28 14:40         ` Filipe Manana
@ 2018-12-04 14:47           ` David Sterba
  0 siblings, 0 replies; 12+ messages in thread
From: David Sterba @ 2018-12-04 14:47 UTC (permalink / raw)
  To: Filipe Manana; +Cc: dsterba, linux-btrfs

On Wed, Nov 28, 2018 at 02:40:07PM +0000, Filipe Manana wrote:
> > > Well, the worker tasks can also not use gfp_kernel, since the scrub
> > > task waits for them to complete before pausing.
> > > I missed this, and 2 reviewers as well, so perhaps it wasn't that
> > > trivial and I shouldn't feel that I miserably failed to identify all
> > > cases for something rather trivial. V5 sent.
> >
> > You can say that you left it there intentionally, such cookies are a
> > good drill for reviewers to stay sharp.
> 
> Unfortunately for me, it was not on purpose.
> 
> However there's the small drill of, for the workers only, potentially
> moving the memalloc_nofs_save() and
> restore to scrub_handle_errored_block(), since the only two possible
> gfp_kernel allocations for workers
> are during the case where a repair is needed:
> 
> scrub_bio_end_io_worker()
>   scrub_block_complete()
>     scrub_handle_errored_block()
>       lock_full_stripe()
>         insert_full_stripe_lock()
>           -> kmalloc with GFP_KERNEL
> 
> 
> scrub_bio_end_io_worker()
>    scrub_block_complete()
>      scrub_handle_errored_block()
>        scrub_write_page_to_dev_replace()
>          scrub_add_page_to_wr_bio()
>            -> kzalloc with GFP_KERNEL
> 
> Just to avoid some duplication.

Sounds like a nice cleanup and more in line with the idea of scoped
GFP_NOFS, the markers should be at a higher level. Starting at the
bottom of the callstack is fine as it's obvious where we want the nofs
protection, and then push it up the call chain. That way it's easier to
review. Please send a followup patch, thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-12-04 14:47 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-23 13:45 [PATCH] Btrfs: fix deadlock with memory reclaim during scrub fdmanana
2018-11-23 16:05 ` [PATCH v2] " fdmanana
2018-11-23 16:13   ` Nikolay Borisov
2018-11-23 16:41 ` [PATCH v3] " fdmanana
2018-11-23 18:25 ` [PATCH v4] " fdmanana
2018-11-26  7:27   ` Nikolay Borisov
2018-11-26 18:17   ` David Sterba
2018-11-26 20:10     ` Filipe Manana
2018-11-28 14:22       ` David Sterba
2018-11-28 14:40         ` Filipe Manana
2018-12-04 14:47           ` David Sterba
2018-11-26 20:07 ` [PATCH v5] " fdmanana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).