linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] dm-region-hash: fix strange usage of mempool_alloc.
@ 2017-04-10  2:11 NeilBrown
  2017-04-21 13:02 ` Mikulas Patocka
  0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2017-04-10  2:11 UTC (permalink / raw)
  To: Alasdair Kergon, Mike Snitzer; +Cc: Mikulas Patocka, dm-devel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1247 bytes --]


mempool_alloc() should only be called with GFP_ATOMIC when
it is not safe to wait. Passing __GFP_NOFAIL to kmalloc()
says that it is safe to wait indefinitely.  So this code is
inconsistent.

Clearly it is OK to wait indefinitely in this code, and
mempool_alloc() is able to do that.  So just use
mempool_alloc, and allow it to sleep.  If no memory is
available it will wait for something to be returned to the
pool, and will retry a normal allocation regularly.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/dm-region-hash.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/md/dm-region-hash.c b/drivers/md/dm-region-hash.c
index 85c32b22a420..a6279f5d779e 100644
--- a/drivers/md/dm-region-hash.c
+++ b/drivers/md/dm-region-hash.c
@@ -287,9 +287,7 @@ static struct dm_region *__rh_alloc(struct dm_region_hash *rh, region_t region)
 {
 	struct dm_region *reg, *nreg;
 
-	nreg = mempool_alloc(rh->region_pool, GFP_ATOMIC);
-	if (unlikely(!nreg))
-		nreg = kmalloc(sizeof(*nreg), GFP_NOIO | __GFP_NOFAIL);
+	nreg = mempool_alloc(rh->region_pool, GFP_NOIO);
 
 	nreg->state = rh->log->type->in_sync(rh->log, region, 1) ?
 		      DM_RH_CLEAN : DM_RH_NOSYNC;
-- 
2.12.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] dm-region-hash: fix strange usage of mempool_alloc.
  2017-04-10  2:11 [PATCH] dm-region-hash: fix strange usage of mempool_alloc NeilBrown
@ 2017-04-21 13:02 ` Mikulas Patocka
  2017-04-24  1:41   ` [dm-devel] " NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: Mikulas Patocka @ 2017-04-21 13:02 UTC (permalink / raw)
  To: NeilBrown; +Cc: Alasdair Kergon, Mike Snitzer, dm-devel, linux-kernel



On Mon, 10 Apr 2017, NeilBrown wrote:

> mempool_alloc() should only be called with GFP_ATOMIC when
> it is not safe to wait. Passing __GFP_NOFAIL to kmalloc()
> says that it is safe to wait indefinitely.  So this code is
> inconsistent.
> 
> Clearly it is OK to wait indefinitely in this code, and
> mempool_alloc() is able to do that.  So just use
> mempool_alloc, and allow it to sleep.  If no memory is
> available it will wait for something to be returned to the
> pool, and will retry a normal allocation regularly.

The region hash code is buggy anyway, because it may allocate more entries 
than the size of the pool and not give them back.

That kmalloc was introduced in the commit c06aad854 to work around a 
deadlock due to incorrect mempool usage.

Your patch slightly increases the probability of the deadlock because 
mempool_alloc does all allocations with __GFP_NOMEMALLOC, so it uses 
higher limit than kmalloc(GFP_NOIO).

Mikulas

> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  drivers/md/dm-region-hash.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/md/dm-region-hash.c b/drivers/md/dm-region-hash.c
> index 85c32b22a420..a6279f5d779e 100644
> --- a/drivers/md/dm-region-hash.c
> +++ b/drivers/md/dm-region-hash.c
> @@ -287,9 +287,7 @@ static struct dm_region *__rh_alloc(struct dm_region_hash *rh, region_t region)
>  {
>  	struct dm_region *reg, *nreg;
>  
> -	nreg = mempool_alloc(rh->region_pool, GFP_ATOMIC);
> -	if (unlikely(!nreg))
> -		nreg = kmalloc(sizeof(*nreg), GFP_NOIO | __GFP_NOFAIL);
> +	nreg = mempool_alloc(rh->region_pool, GFP_NOIO);
>  
>  	nreg->state = rh->log->type->in_sync(rh->log, region, 1) ?
>  		      DM_RH_CLEAN : DM_RH_NOSYNC;
> -- 
> 2.12.2
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dm-devel] [PATCH] dm-region-hash: fix strange usage of    mempool_alloc.
  2017-04-21 13:02 ` Mikulas Patocka
@ 2017-04-24  1:41   ` NeilBrown
  2017-05-03 18:19     ` Mikulas Patocka
  0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2017-04-24  1:41 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: dm-devel, Mike Snitzer, Alasdair Kergon, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1715 bytes --]

On Fri, Apr 21 2017, Mikulas Patocka wrote:

> On Mon, 10 Apr 2017, NeilBrown wrote:
>
>> mempool_alloc() should only be called with GFP_ATOMIC when
>> it is not safe to wait. Passing __GFP_NOFAIL to kmalloc()
>> says that it is safe to wait indefinitely.  So this code is
>> inconsistent.
>> 
>> Clearly it is OK to wait indefinitely in this code, and
>> mempool_alloc() is able to do that.  So just use
>> mempool_alloc, and allow it to sleep.  If no memory is
>> available it will wait for something to be returned to the
>> pool, and will retry a normal allocation regularly.
>
> The region hash code is buggy anyway, because it may allocate more entries 
> than the size of the pool and not give them back.
>
> That kmalloc was introduced in the commit c06aad854 to work around a 
> deadlock due to incorrect mempool usage.
>
> Your patch slightly increases the probability of the deadlock because 
> mempool_alloc does all allocations with __GFP_NOMEMALLOC, so it uses 
> higher limit than kmalloc(GFP_NOIO).
>

Thanks for the review.

I had a look at how the allocation 'dm_region' objects are used,
and it would take a bit of work to make it really safe.
My guess is __rh_find() should be allowed to fail, and the various
callers need to handle failure.
For example, dm_rh_inc_pending() would be given a second bio_list,
and would move any bios for which rh_inc() fails, onto that list.
Then do_writes() would merge that list back into ms->writes.
That way do_mirror() would not block indefinitely and forward progress
could be assured ... maybe.
It would take more work than I'm able to give at the moment, so
I'm happy to just drop this patch.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dm-devel] [PATCH] dm-region-hash: fix strange usage of mempool_alloc.
  2017-04-24  1:41   ` [dm-devel] " NeilBrown
@ 2017-05-03 18:19     ` Mikulas Patocka
  2017-05-04  1:22       ` NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: Mikulas Patocka @ 2017-05-03 18:19 UTC (permalink / raw)
  To: NeilBrown; +Cc: dm-devel, Mike Snitzer, Alasdair Kergon, linux-kernel



On Mon, 24 Apr 2017, NeilBrown wrote:

> On Fri, Apr 21 2017, Mikulas Patocka wrote:
> 
> > On Mon, 10 Apr 2017, NeilBrown wrote:
> >
> >> mempool_alloc() should only be called with GFP_ATOMIC when
> >> it is not safe to wait. Passing __GFP_NOFAIL to kmalloc()
> >> says that it is safe to wait indefinitely.  So this code is
> >> inconsistent.
> >> 
> >> Clearly it is OK to wait indefinitely in this code, and
> >> mempool_alloc() is able to do that.  So just use
> >> mempool_alloc, and allow it to sleep.  If no memory is
> >> available it will wait for something to be returned to the
> >> pool, and will retry a normal allocation regularly.
> >
> > The region hash code is buggy anyway, because it may allocate more entries 
> > than the size of the pool and not give them back.
> >
> > That kmalloc was introduced in the commit c06aad854 to work around a 
> > deadlock due to incorrect mempool usage.
> >
> > Your patch slightly increases the probability of the deadlock because 
> > mempool_alloc does all allocations with __GFP_NOMEMALLOC, so it uses 
> > higher limit than kmalloc(GFP_NOIO).
> >
> 
> Thanks for the review.
> 
> I had a look at how the allocation 'dm_region' objects are used,
> and it would take a bit of work to make it really safe.
> My guess is __rh_find() should be allowed to fail, and the various
> callers need to handle failure.
> For example, dm_rh_inc_pending() would be given a second bio_list,
> and would move any bios for which rh_inc() fails, onto that list.
> Then do_writes() would merge that list back into ms->writes.
> That way do_mirror() would not block indefinitely and forward progress
> could be assured ... maybe.
> It would take more work than I'm able to give at the moment, so
> I'm happy to just drop this patch.
> 
> Thanks,
> NeilBrown

I think that the only way how to fix this would be to preallocate the all 
the regions when the target is created.

But, with the default region size 512kiB, it would cause high memory 
consumption (approximatelly 1GB of RAM for 20TB device).

Mikulas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dm-devel] [PATCH] dm-region-hash: fix strange usage of mempool_alloc.
  2017-05-03 18:19     ` Mikulas Patocka
@ 2017-05-04  1:22       ` NeilBrown
  0 siblings, 0 replies; 5+ messages in thread
From: NeilBrown @ 2017-05-04  1:22 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: dm-devel, Alasdair Kergon, Mike Snitzer, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2521 bytes --]

On Wed, May 03 2017, Mikulas Patocka wrote:

> On Mon, 24 Apr 2017, NeilBrown wrote:
>> 
>> I had a look at how the allocation 'dm_region' objects are used,
>> and it would take a bit of work to make it really safe.
>> My guess is __rh_find() should be allowed to fail, and the various
>> callers need to handle failure.
>> For example, dm_rh_inc_pending() would be given a second bio_list,
>> and would move any bios for which rh_inc() fails, onto that list.
>> Then do_writes() would merge that list back into ms->writes.
>> That way do_mirror() would not block indefinitely and forward progress
>> could be assured ... maybe.
>> It would take more work than I'm able to give at the moment, so
>> I'm happy to just drop this patch.
>> 
>> Thanks,
>> NeilBrown
>
> I think that the only way how to fix this would be to preallocate the all 
> the regions when the target is created.
>
> But, with the default region size 512kiB, it would cause high memory 
> consumption (approximatelly 1GB of RAM for 20TB device).

Two reflections:
 1/ This is close to what md/bitmap does.
   It actually uses a 2-level array for the 'pending' field from
   dm_region, combined with something similar to 'state'.
   The top level is allocated when the device is created.
   Entries in this table are either
     - pointers to a second-level array for 2048 regions
     - entries for 2 giant regions, 1024 times the normal size.

   So if we cannot allocate a page when we need that second level,
    we just use an enormous region and so risk resync taking a bit
    longer if there is a crash.

 2/ Even though md does pre-alloc to a degree, I'm not convinced that it
    is necessary.
    We only need a region to be recorded when it is actively being
    written to, or when it is being recovered.
    We could, in theory, have just one region that is written to and one
    region that is being recovered.  If a writes request arrives for a
    different region it blocks until the current region has no active
    requests.  Then that region is forgotten and the new region
    activated, and the new write allowed to proceed.
    Obviously this would be horribly slow, but it should be
    deadlock-free.
    Using a mempool instead of a single region would then allow multiple
    regions to be active in parallel, which would improve throughput
    without affecting the deadlock status.

    Maybe I'll try to code it and see what happens ... maybe not.

NeilBrown

 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-05-04  1:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-10  2:11 [PATCH] dm-region-hash: fix strange usage of mempool_alloc NeilBrown
2017-04-21 13:02 ` Mikulas Patocka
2017-04-24  1:41   ` [dm-devel] " NeilBrown
2017-05-03 18:19     ` Mikulas Patocka
2017-05-04  1:22       ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).