linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] hugetlb: fix locking in region_add,region_cgh,allocate_file_region_entries
@ 2020-10-23  7:47 Laurent Cremmer
  2020-10-23 10:12 ` David Hildenbrand
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Laurent Cremmer @ 2020-10-23  7:47 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, Mina Almasry, David Rientjes, linux-mm,
	Shuah Khan, Laurent Cremmer

commit 0db9d74ed884  ("hugetlb: disable region_add file_region coalescing")
introduced issues with regards to locking:

- Missing spin_unlock in hugetlb.c:region_add and hugetlb.c:region_cgh
  when returning after an error.
- Missing spin lock  in hugetlb.c:allocate_file_region_entries
  when returning after an error.

The first two errors were spotted using the coccinelle static code
analysis tool while focusing on the mini_lock.cocci script,
whose goal is to find missing unlocks.

make coccicheck mode=REPORT m=mm/hugetlb.c

    mm/hugetlb.c:514:3-9: preceding lock on line 487
    mm/hugetlb.c:564:2-8: preceding lock on line 554

The third instance spotted by manual examination.

In static long region_add(...) and static long region_cgh(...) , releasing
the acquired lock when exiting via their error path was removed.
This will cause these functions to return with a lock held if they do not
succeed.

This patch reintroduces the original error path making sure the lock is
properly released on all exits.

A a new function allocate_file_region_entries was also introduced  that
must be called with a lock held and returned with the lock held.
However the lock is temporarily released during processing but will not
be reacquired on error.

This patch ensures that the lock will be reacquired in the error path also.

Fixes: 0db9d74ed884  ("hugetlb: disable region_add file_region coalescing")
Link: https://lists.elisa.tech/g/development-process/message/289
Signed-off-by: Laurent Cremmer <laurent@oss.volkswagen.com>
Reviewed-by: Oliver Hartkopp <hartkopp@oss.volkswagen.com>
---
 mm/hugetlb.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fe76f8fd5a73..92bea6f77361 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -438,7 +438,7 @@ static int allocate_file_region_entries(struct resv_map *resv,
 		for (i = 0; i < to_allocate; i++) {
 			trg = kmalloc(sizeof(*trg), GFP_KERNEL);
 			if (!trg)
-				goto out_of_memory;
+				goto out_of_memory_unlocked;
 			list_add(&trg->link, &allocated_regions);
 		}
 
@@ -450,7 +450,8 @@ static int allocate_file_region_entries(struct resv_map *resv,
 
 	return 0;
 
-out_of_memory:
+out_of_memory_unlocked:
+	spin_lock(&resv->lock);
 	list_for_each_entry_safe(rg, trg, &allocated_regions, link) {
 		list_del(&rg->link);
 		kfree(rg);
@@ -508,7 +509,8 @@ static long region_add(struct resv_map *resv, long f, long t,
 
 		if (allocate_file_region_entries(
 			    resv, actual_regions_needed - in_regions_needed)) {
-			return -ENOMEM;
+			add = -ENOMEM;
+			goto out_locked;
 		}
 
 		goto retry;
@@ -517,7 +519,7 @@ static long region_add(struct resv_map *resv, long f, long t,
 	add = add_reservation_in_range(resv, f, t, h_cg, h, NULL);
 
 	resv->adds_in_progress -= in_regions_needed;
-
+out_locked:
 	spin_unlock(&resv->lock);
 	VM_BUG_ON(add < 0);
 	return add;
@@ -557,11 +559,14 @@ static long region_chg(struct resv_map *resv, long f, long t,
 	if (*out_regions_needed == 0)
 		*out_regions_needed = 1;
 
-	if (allocate_file_region_entries(resv, *out_regions_needed))
-		return -ENOMEM;
+	if (allocate_file_region_entries(resv, *out_regions_needed)) {
+		chg = -ENOMEM;
+		goto out_locked;
+	}
 
 	resv->adds_in_progress += *out_regions_needed;
 
+out_locked:
 	spin_unlock(&resv->lock);
 	return chg;
 }
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] hugetlb: fix locking in region_add,region_cgh,allocate_file_region_entries
  2020-10-23  7:47 [PATCH] hugetlb: fix locking in region_add,region_cgh,allocate_file_region_entries Laurent Cremmer
@ 2020-10-23 10:12 ` David Hildenbrand
  2020-10-23 11:25   ` Laurent Cremmer
  2020-10-23 11:02 ` Oscar Salvador
  2020-10-28 19:18 ` Mina Almasry
  2 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2020-10-23 10:12 UTC (permalink / raw)
  To: Laurent Cremmer, Mike Kravetz
  Cc: Andrew Morton, Mina Almasry, David Rientjes, linux-mm, Shuah Khan

On 23.10.20 09:47, Laurent Cremmer wrote:
> commit 0db9d74ed884  ("hugetlb: disable region_add file_region coalescing")
> introduced issues with regards to locking:
> 
> - Missing spin_unlock in hugetlb.c:region_add and hugetlb.c:region_cgh
>   when returning after an error.
> - Missing spin lock  in hugetlb.c:allocate_file_region_entries
>   when returning after an error.
> 
> The first two errors were spotted using the coccinelle static code
> analysis tool while focusing on the mini_lock.cocci script,
> whose goal is to find missing unlocks.
> 
> make coccicheck mode=REPORT m=mm/hugetlb.c
> 
>     mm/hugetlb.c:514:3-9: preceding lock on line 487
>     mm/hugetlb.c:564:2-8: preceding lock on line 554
> 
> The third instance spotted by manual examination.
> 
> In static long region_add(...) and static long region_cgh(...) , releasing
> the acquired lock when exiting via their error path was removed.
> This will cause these functions to return with a lock held if they do not
> succeed.
> 
> This patch reintroduces the original error path making sure the lock is
> properly released on all exits.
> 
> A a new function allocate_file_region_entries was also introduced  that
> must be called with a lock held and returned with the lock held.
> However the lock is temporarily released during processing but will not
> be reacquired on error.
> 
> This patch ensures that the lock will be reacquired in the error path also.
> 
> Fixes: 0db9d74ed884  ("hugetlb: disable region_add file_region coalescing")
> Link: https://lists.elisa.tech/g/development-process/message/289
> Signed-off-by: Laurent Cremmer <laurent@oss.volkswagen.com>
> Reviewed-by: Oliver Hartkopp <hartkopp@oss.volkswagen.com>
> ---
>  mm/hugetlb.c | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index fe76f8fd5a73..92bea6f77361 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -438,7 +438,7 @@ static int allocate_file_region_entries(struct resv_map *resv,
>  		for (i = 0; i < to_allocate; i++) {
>  			trg = kmalloc(sizeof(*trg), GFP_KERNEL);

So we're allocating memory with GFP_KERNEL while holding a spinlock?

If my memory isn't completely wrong, that's broken, no? this would
require GFP_ATOMIC?

But I might be messing up things :)

>  			if (!trg)
> -				goto out_of_memory;
> +				goto out_of_memory_unlocked;
>  			list_add(&trg->link, &allocated_regions);
>  		}
>  
> @@ -450,7 +450,8 @@ static int allocate_file_region_entries(struct resv_map *resv,
>  
>  	return 0;
>  
> -out_of_memory:
> +out_of_memory_unlocked:
> +	spin_lock(&resv->lock);
>  	list_for_each_entry_safe(rg, trg, &allocated_regions, link) {
>  		list_del(&rg->link);
>  		kfree(rg);
> @@ -508,7 +509,8 @@ static long region_add(struct resv_map *resv, long f, long t,
>  
>  		if (allocate_file_region_entries(
>  			    resv, actual_regions_needed - in_regions_needed)) {
> -			return -ENOMEM;
> +			add = -ENOMEM;

dito, does this handle atomic context?

> +			goto out_locked;
>  		}
>  
>  		goto retry;
> @@ -517,7 +519,7 @@ static long region_add(struct resv_map *resv, long f, long t,
>  	add = add_reservation_in_range(resv, f, t, h_cg, h, NULL);
>  
>  	resv->adds_in_progress -= in_regions_needed;
> -
> +out_locked:
>  	spin_unlock(&resv->lock);
>  	VM_BUG_ON(add < 0);
>  	return add;
> @@ -557,11 +559,14 @@ static long region_chg(struct resv_map *resv, long f, long t,
>  	if (*out_regions_needed == 0)
>  		*out_regions_needed = 1;
>  
> -	if (allocate_file_region_entries(resv, *out_regions_needed))
> -		return -ENOMEM;
> +	if (allocate_file_region_entries(resv, *out_regions_needed)) {
> +		chg = -ENOMEM;
> +		goto out_locked;
> +	}

dito

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hugetlb: fix locking in region_add,region_cgh,allocate_file_region_entries
  2020-10-23  7:47 [PATCH] hugetlb: fix locking in region_add,region_cgh,allocate_file_region_entries Laurent Cremmer
  2020-10-23 10:12 ` David Hildenbrand
@ 2020-10-23 11:02 ` Oscar Salvador
  2020-10-23 11:32   ` Laurent Cremmer
  2020-10-28 19:18 ` Mina Almasry
  2 siblings, 1 reply; 10+ messages in thread
From: Oscar Salvador @ 2020-10-23 11:02 UTC (permalink / raw)
  To: Laurent Cremmer
  Cc: Mike Kravetz, Andrew Morton, Mina Almasry, David Rientjes,
	linux-mm, Shuah Khan

On Fri, Oct 23, 2020 at 09:47:59AM +0200, Laurent Cremmer wrote:
> The third instance spotted by manual examination.

It is friday, I might be overlooking something obvious, so bear with me.

> In static long region_add(...) and static long region_cgh(...) , releasing
> the acquired lock when exiting via their error path was removed.
> This will cause these functions to return with a lock held if they do not
> succeed.

If they do not suceed, it means that allocate_file_region_entries returned
non-zero, which means that we hit:

	spin_unlock(&resv->lock);
	for (i = 0; i < to_allocate; i++) {
		trg = kmalloc(sizeof(*trg), GFP_KERNEL);
		if (!trg)
			goto out_of_memory;
		list_add(&trg->link, &allocated_regions);
	 }

So, we do spin_unlock before jumping to out_of_memory.

> A a new function allocate_file_region_entries was also introduced  that
> must be called with a lock held and returned with the lock held.
> However the lock is temporarily released during processing but will not
> be reacquired on error.

As stated above, region_add/region_chr exit immediately on allocate_file_region_entries
failure, so it does not need to acquire the lock.


-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hugetlb: fix locking in region_add,region_cgh,allocate_file_region_entries
  2020-10-23 10:12 ` David Hildenbrand
@ 2020-10-23 11:25   ` Laurent Cremmer
  2020-10-23 11:39     ` David Hildenbrand
  0 siblings, 1 reply; 10+ messages in thread
From: Laurent Cremmer @ 2020-10-23 11:25 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Mike Kravetz, Andrew Morton, Mina Almasry, David Rientjes,
	linux-mm, Shuah Khan

>
> On 23.10.20 09:47, Laurent Cremmer wrote:
> > commit 0db9d74ed884  ("hugetlb: disable region_add file_region coalescing")
> > introduced issues with regards to locking:
> >
> > - Missing spin_unlock in hugetlb.c:region_add and hugetlb.c:region_cgh
> >   when returning after an error.
> > - Missing spin lock  in hugetlb.c:allocate_file_region_entries
> >   when returning after an error.
> >
> > The first two errors were spotted using the coccinelle static code
> > analysis tool while focusing on the mini_lock.cocci script,
> > whose goal is to find missing unlocks.
> >
> > make coccicheck mode=REPORT m=mm/hugetlb.c
> >
> >     mm/hugetlb.c:514:3-9: preceding lock on line 487
> >     mm/hugetlb.c:564:2-8: preceding lock on line 554
> >
> > The third instance spotted by manual examination.
> >
> > In static long region_add(...) and static long region_cgh(...) , releasing
> > the acquired lock when exiting via their error path was removed.
> > This will cause these functions to return with a lock held if they do not
> > succeed.
> >
> > This patch reintroduces the original error path making sure the lock is
> > properly released on all exits.
> >
> > A a new function allocate_file_region_entries was also introduced  that
> > must be called with a lock held and returned with the lock held.
> > However the lock is temporarily released during processing but will not
> > be reacquired on error.
> >
> > This patch ensures that the lock will be reacquired in the error path also.
> >
> > Fixes: 0db9d74ed884  ("hugetlb: disable region_add file_region coalescing")
> > Link: https://lists.elisa.tech/g/development-process/message/289
> > Signed-off-by: Laurent Cremmer <laurent@oss.volkswagen.com>
> > Reviewed-by: Oliver Hartkopp <hartkopp@oss.volkswagen.com>
> > ---
> >  mm/hugetlb.c | 17 +++++++++++------
> >  1 file changed, 11 insertions(+), 6 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index fe76f8fd5a73..92bea6f77361 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -438,7 +438,7 @@ static int allocate_file_region_entries(struct resv_map *resv,
> >               for (i = 0; i < to_allocate; i++) {
> >                       trg = kmalloc(sizeof(*trg), GFP_KERNEL);
>
> So we're allocating memory with GFP_KERNEL while holding a spinlock?
>
> If my memory isn't completely wrong, that's broken, no? this would
> require GFP_ATOMIC?
>
> But I might be messing up things :)
>

Ah, the beauty of patches sometimes not telling the whole story :-)
The lock is released in line 437, prior attempting to allocate with
GFP_KERNEL, so  GFP_ATOMIC is not required in this context.

> >                       if (!trg)
> > -                             goto out_of_memory;
> > +                             goto out_of_memory_unlocked;

The problem fixed here is that in this particular situation (i.e.
releasing the lock, attempting to kmalloc(..., GFP_KERNEL) ) if the
allocation failed, the lock would not be reacquired in the error path,
and the function would return without the lock being held on error and
with the lock being held on success, which doesn't feel right.

> >                       list_add(&trg->link, &allocated_regions);
> >               }
> >
> > @@ -450,7 +450,8 @@ static int allocate_file_region_entries(struct resv_map *resv,
> >
> >       return 0;
> >
> > -out_of_memory:
> > +out_of_memory_unlocked:

However, in light of your remarks, I would question re-acquiring the
lock before cleaning up using kfree and would defer it to after the
clean up as ended. Agree?

> > +     spin_lock(&resv->lock);
> >       list_for_each_entry_safe(rg, trg, &allocated_regions, link) {
> >               list_del(&rg->link);
> >               kfree(rg);
> > @@ -508,7 +509,8 @@ static long region_add(struct resv_map *resv, long f, long t,
> >
> >               if (allocate_file_region_entries(
> >                           resv, actual_regions_needed - in_regions_needed)) {
> > -                     return -ENOMEM;
> > +                     add = -ENOMEM;
>
> dito, does this handle atomic context?
>

IMHO it does, since allocate_file_region_entries will drop the lock
when attempting to allocate entries and reacquires it on exit.

>
> > +                     goto out_locked;
> >               }
> >
> >               goto retry;
> > @@ -517,7 +519,7 @@ static long region_add(struct resv_map *resv, long f, long t,
> >       add = add_reservation_in_range(resv, f, t, h_cg, h, NULL);
> >
> >       resv->adds_in_progress -= in_regions_needed;
> > -
> > +out_locked:
> >       spin_unlock(&resv->lock);
> >       VM_BUG_ON(add < 0);
> >       return add;
> > @@ -557,11 +559,14 @@ static long region_chg(struct resv_map *resv, long f, long t,
> >       if (*out_regions_needed == 0)
> >               *out_regions_needed = 1;
> >
> > -     if (allocate_file_region_entries(resv, *out_regions_needed))
> > -             return -ENOMEM;
> > +     if (allocate_file_region_entries(resv, *out_regions_needed)) {
> > +             chg = -ENOMEM;
> > +             goto out_locked;
> > +     }
>
> dito
>

See above, allocate_file_region_entries will return the lock held.

Now, in defence of the original implementation what "looks" like a bug
at first sight might be an attempt to "optimize-out" a pair of
lock/unlock operations on the error path of region_add, and region_chg
by relying on the implicit knowledegfe that
allocate_file_region_entries would return with the lock not being held
on error.
Unfortunately it hinders the readability (and IMHO the
maintainability) of the code.

>
> --
> Thanks,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hugetlb: fix locking in region_add,region_cgh,allocate_file_region_entries
  2020-10-23 11:02 ` Oscar Salvador
@ 2020-10-23 11:32   ` Laurent Cremmer
  2020-10-23 12:11     ` Oscar Salvador
  0 siblings, 1 reply; 10+ messages in thread
From: Laurent Cremmer @ 2020-10-23 11:32 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Mike Kravetz, Andrew Morton, Mina Almasry, David Rientjes,
	linux-mm, Shuah Khan

>
> On Fri, Oct 23, 2020 at 09:47:59AM +0200, Laurent Cremmer wrote:
> > The third instance spotted by manual examination.
>
> It is friday, I might be overlooking something obvious, so bear with me.
>
> > In static long region_add(...) and static long region_cgh(...) , releasing
> > the acquired lock when exiting via their error path was removed.
> > This will cause these functions to return with a lock held if they do not
> > succeed.
>
> If they do not suceed, it means that allocate_file_region_entries returned
> non-zero, which means that we hit:
>
>         spin_unlock(&resv->lock);
>         for (i = 0; i < to_allocate; i++) {
>                 trg = kmalloc(sizeof(*trg), GFP_KERNEL);
>                 if (!trg)
>                         goto out_of_memory;
>                 list_add(&trg->link, &allocated_regions);
>          }
>
> So, we do spin_unlock before jumping to out_of_memory.
>
> > A a new function allocate_file_region_entries was also introduced  that
> > must be called with a lock held and returned with the lock held.
> > However the lock is temporarily released during processing but will not
> > be reacquired on error.
>
> As stated above, region_add/region_chr exit immediately on allocate_file_region_entries
> failure, so it does not need to acquire the lock.
>

I came to the same conclusion with the help of David's remarks :-) .
And in the end, this patch would be more about fixing the readability
of the code than fixing a live problem per-se. If it ain't broken,
don't fix it as they say :-)

It is friday, so thanks for taking the time to look into this.

>
> --
> Oscar Salvador
> SUSE L3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hugetlb: fix locking in region_add,region_cgh,allocate_file_region_entries
  2020-10-23 11:25   ` Laurent Cremmer
@ 2020-10-23 11:39     ` David Hildenbrand
  0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2020-10-23 11:39 UTC (permalink / raw)
  To: Laurent Cremmer
  Cc: Mike Kravetz, Andrew Morton, Mina Almasry, David Rientjes,
	linux-mm, Shuah Khan

On 23.10.20 13:25, Laurent Cremmer wrote:
>>
>> On 23.10.20 09:47, Laurent Cremmer wrote:
>>> commit 0db9d74ed884  ("hugetlb: disable region_add file_region coalescing")
>>> introduced issues with regards to locking:
>>>
>>> - Missing spin_unlock in hugetlb.c:region_add and hugetlb.c:region_cgh
>>>   when returning after an error.
>>> - Missing spin lock  in hugetlb.c:allocate_file_region_entries
>>>   when returning after an error.
>>>
>>> The first two errors were spotted using the coccinelle static code
>>> analysis tool while focusing on the mini_lock.cocci script,
>>> whose goal is to find missing unlocks.
>>>
>>> make coccicheck mode=REPORT m=mm/hugetlb.c
>>>
>>>     mm/hugetlb.c:514:3-9: preceding lock on line 487
>>>     mm/hugetlb.c:564:2-8: preceding lock on line 554
>>>
>>> The third instance spotted by manual examination.
>>>
>>> In static long region_add(...) and static long region_cgh(...) , releasing
>>> the acquired lock when exiting via their error path was removed.
>>> This will cause these functions to return with a lock held if they do not
>>> succeed.
>>>
>>> This patch reintroduces the original error path making sure the lock is
>>> properly released on all exits.
>>>
>>> A a new function allocate_file_region_entries was also introduced  that
>>> must be called with a lock held and returned with the lock held.
>>> However the lock is temporarily released during processing but will not
>>> be reacquired on error.
>>>
>>> This patch ensures that the lock will be reacquired in the error path also.
>>>
>>> Fixes: 0db9d74ed884  ("hugetlb: disable region_add file_region coalescing")
>>> Link: https://lists.elisa.tech/g/development-process/message/289
>>> Signed-off-by: Laurent Cremmer <laurent@oss.volkswagen.com>
>>> Reviewed-by: Oliver Hartkopp <hartkopp@oss.volkswagen.com>
>>> ---
>>>  mm/hugetlb.c | 17 +++++++++++------
>>>  1 file changed, 11 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index fe76f8fd5a73..92bea6f77361 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -438,7 +438,7 @@ static int allocate_file_region_entries(struct resv_map *resv,
>>>               for (i = 0; i < to_allocate; i++) {
>>>                       trg = kmalloc(sizeof(*trg), GFP_KERNEL);
>>
>> So we're allocating memory with GFP_KERNEL while holding a spinlock?
>>
>> If my memory isn't completely wrong, that's broken, no? this would
>> require GFP_ATOMIC?
>>
>> But I might be messing up things :)
>>
> 
> Ah, the beauty of patches sometimes not telling the whole story :-)
> The lock is released in line 437, prior attempting to allocate with
> GFP_KERNEL, so  GFP_ATOMIC is not required in this context.

Ah, I actually misread line 437, sorry :)

[...]

>>>
>>> @@ -450,7 +450,8 @@ static int allocate_file_region_entries(struct resv_map *resv,
>>>
>>>       return 0;
>>>
>>> -out_of_memory:
>>> +out_of_memory_unlocked:
> 
> However, in light of your remarks, I would question re-acquiring the
> lock before cleaning up using kfree and would defer it to after the
> clean up as ended. Agree?
> 
>>> +     spin_lock(&resv->lock);
>>>       list_for_each_entry_safe(rg, trg, &allocated_regions, link) {
>>>               list_del(&rg->link);
>>>               kfree(rg);
>>> @@ -508,7 +509,8 @@ static long region_add(struct resv_map *resv, long f, long t,
>>>
>>>               if (allocate_file_region_entries(
>>>                           resv, actual_regions_needed - in_regions_needed)) {
>>> -                     return -ENOMEM;
>>> +                     add = -ENOMEM;
>>
>> dito, does this handle atomic context?
>>
> 
> IMHO it does, since allocate_file_region_entries will drop the lock
> when attempting to allocate entries and reacquires it on exit.

Right, that's the source of my confusion.

> 
>>
>>> +                     goto out_locked;
>>>               }
>>>
>>>               goto retry;
>>> @@ -517,7 +519,7 @@ static long region_add(struct resv_map *resv, long f, long t,
>>>       add = add_reservation_in_range(resv, f, t, h_cg, h, NULL);
>>>
>>>       resv->adds_in_progress -= in_regions_needed;
>>> -
>>> +out_locked:
>>>       spin_unlock(&resv->lock);
>>>       VM_BUG_ON(add < 0);
>>>       return add;
>>> @@ -557,11 +559,14 @@ static long region_chg(struct resv_map *resv, long f, long t,
>>>       if (*out_regions_needed == 0)
>>>               *out_regions_needed = 1;
>>>
>>> -     if (allocate_file_region_entries(resv, *out_regions_needed))
>>> -             return -ENOMEM;
>>> +     if (allocate_file_region_entries(resv, *out_regions_needed)) {
>>> +             chg = -ENOMEM;
>>> +             goto out_locked;
>>> +     }
>>
>> dito
>>
> 
> See above, allocate_file_region_entries will return the lock held.
> 
> Now, in defence of the original implementation what "looks" like a bug
> at first sight might be an attempt to "optimize-out" a pair of
> lock/unlock operations on the error path of region_add, and region_chg
> by relying on the implicit knowledegfe that
> allocate_file_region_entries would return with the lock not being held
> on error.
> Unfortunately it hinders the readability (and IMHO the
> maintainability) of the code.

Yeah, this use of locking is just prone for BUGs. Horrible if you ask me.

Temporarily dropping locks is one source of confusions and bugs.
Inconsistently returning with either locks held or not is another
horrible thing to do.

I'll have to stare at the code another couple of minutes to figure out
what's actually right or wrong.


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hugetlb: fix locking in region_add,region_cgh,allocate_file_region_entries
  2020-10-23 11:32   ` Laurent Cremmer
@ 2020-10-23 12:11     ` Oscar Salvador
  2020-10-23 16:47       ` Mike Kravetz
  0 siblings, 1 reply; 10+ messages in thread
From: Oscar Salvador @ 2020-10-23 12:11 UTC (permalink / raw)
  To: Laurent Cremmer
  Cc: Mike Kravetz, Andrew Morton, Mina Almasry, David Rientjes,
	linux-mm, Shuah Khan

On Fri, Oct 23, 2020 at 01:32:54PM +0200, Laurent Cremmer wrote:
> I came to the same conclusion with the help of David's remarks :-) .
> And in the end, this patch would be more about fixing the readability
> of the code than fixing a live problem per-se. If it ain't broken,
> don't fix it as they say :-)

Yeah, definitely the code could benefit from a cleanup, so no one
has to stare at it for a while before he gets the idea.

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hugetlb: fix locking in region_add,region_cgh,allocate_file_region_entries
  2020-10-23 12:11     ` Oscar Salvador
@ 2020-10-23 16:47       ` Mike Kravetz
  2020-10-27  7:34         ` Laurent Cremmer
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Kravetz @ 2020-10-23 16:47 UTC (permalink / raw)
  To: Oscar Salvador, Laurent Cremmer
  Cc: Andrew Morton, Mina Almasry, David Rientjes, linux-mm, Shuah Khan

On 10/23/20 5:11 AM, Oscar Salvador wrote:
> On Fri, Oct 23, 2020 at 01:32:54PM +0200, Laurent Cremmer wrote:
>> I came to the same conclusion with the help of David's remarks :-) .
>> And in the end, this patch would be more about fixing the readability
>> of the code than fixing a live problem per-se. If it ain't broken,
>> don't fix it as they say :-)
> 
> Yeah, definitely the code could benefit from a cleanup, so no one
> has to stare at it for a while before he gets the idea.

INDEED, those *region* routines dealing with reservation maps are difficult
to understand, and could benefit from some cleanup.  Recent changes to add
support for reservation cgroup support made them even more complex.

Laurent, thank you for taking the time to look into this.  And thanks to
Oscar and David for their analysis.  The suggested patch would make the
code more readable.  However, that would be at the expense of another
(and unnecessary) lock/unlock cycle.  I'll add cleanup of these routines
to my 'todo' list, but would be happy to work with anyone else who wants
to take on this task.
-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hugetlb: fix locking in region_add,region_cgh,allocate_file_region_entries
  2020-10-23 16:47       ` Mike Kravetz
@ 2020-10-27  7:34         ` Laurent Cremmer
  0 siblings, 0 replies; 10+ messages in thread
From: Laurent Cremmer @ 2020-10-27  7:34 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Oscar Salvador, Andrew Morton, Mina Almasry, David Rientjes,
	linux-mm, Shuah Khan

>
> On 10/23/20 5:11 AM, Oscar Salvador wrote:
> > On Fri, Oct 23, 2020 at 01:32:54PM +0200, Laurent Cremmer wrote:
> >> I came to the same conclusion with the help of David's remarks :-) .
> >> And in the end, this patch would be more about fixing the readability
> >> of the code than fixing a live problem per-se. If it ain't broken,
> >> don't fix it as they say :-)
> >
> > Yeah, definitely the code could benefit from a cleanup, so no one
> > has to stare at it for a while before he gets the idea.
>
> INDEED, those *region* routines dealing with reservation maps are difficult
> to understand, and could benefit from some cleanup.  Recent changes to add
> support for reservation cgroup support made them even more complex.
>
> Laurent, thank you for taking the time to look into this.  And thanks to
> Oscar and David for their analysis.  The suggested patch would make the
> code more readable.  However, that would be at the expense of another
> (and unnecessary) lock/unlock cycle.  I'll add cleanup of these routines
> to my 'todo' list, but would be happy to work with anyone else who wants
> to take on this task.
> --

Hi Mike,

I gave it a deeper look (although  I'm not familiar enough with the
whole subsystem to
really qualify!) and could not come up with an easy  "way" out.
In the end it might depend on how often the error path is taken and if
the unnecessary
lock/unlock cycle does really hurt under pressure.
What really tripped me (and smatch) is the erroneous "
__must_hold(&resv->lock)"
annotation of the allocate_file_region_entries function.
It would be better suited to the function above (add_reservation_in_range).

If you think that adding documentation would be helpful, I can do that.

In any case, thanks for taking the time to review and consider adding to your
todo list. much appreciated.

Laurent.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hugetlb: fix locking in region_add,region_cgh,allocate_file_region_entries
  2020-10-23  7:47 [PATCH] hugetlb: fix locking in region_add,region_cgh,allocate_file_region_entries Laurent Cremmer
  2020-10-23 10:12 ` David Hildenbrand
  2020-10-23 11:02 ` Oscar Salvador
@ 2020-10-28 19:18 ` Mina Almasry
  2 siblings, 0 replies; 10+ messages in thread
From: Mina Almasry @ 2020-10-28 19:18 UTC (permalink / raw)
  To: Laurent Cremmer
  Cc: Mike Kravetz, Andrew Morton, David Rientjes, Linux-MM, Shuah Khan

On Fri, Oct 23, 2020 at 12:58 AM Laurent Cremmer
<laurent@oss.volkswagen.com> wrote:
>
> commit 0db9d74ed884  ("hugetlb: disable region_add file_region coalescing")
> introduced issues with regards to locking:
>
> - Missing spin_unlock in hugetlb.c:region_add and hugetlb.c:region_cgh
>   when returning after an error.
> - Missing spin lock  in hugetlb.c:allocate_file_region_entries
>   when returning after an error.
>
> The first two errors were spotted using the coccinelle static code
> analysis tool while focusing on the mini_lock.cocci script,
> whose goal is to find missing unlocks.
>
> make coccicheck mode=REPORT m=mm/hugetlb.c
>
>     mm/hugetlb.c:514:3-9: preceding lock on line 487
>     mm/hugetlb.c:564:2-8: preceding lock on line 554
>
> The third instance spotted by manual examination.
>
> In static long region_add(...) and static long region_cgh(...) , releasing
> the acquired lock when exiting via their error path was removed.
> This will cause these functions to return with a lock held if they do not
> succeed.
>
> This patch reintroduces the original error path making sure the lock is
> properly released on all exits.
>
> A a new function allocate_file_region_entries was also introduced  that
> must be called with a lock held and returned with the lock held.
> However the lock is temporarily released during processing but will not
> be reacquired on error.
>
> This patch ensures that the lock will be reacquired in the error path also.
>
> Fixes: 0db9d74ed884  ("hugetlb: disable region_add file_region coalescing")
> Link: https://lists.elisa.tech/g/development-process/message/289
> Signed-off-by: Laurent Cremmer <laurent@oss.volkswagen.com>
> Reviewed-by: Oliver Hartkopp <hartkopp@oss.volkswagen.com>
> ---
>  mm/hugetlb.c | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)
>

Sorry for the late reply, I've been out a few days. I'm catching up to
this now. It looks like I missed some unlocks/locks on error paths. I
don't recall this being an intentional optimization; likely just an
oversight. I think adding the proper locking is good for readability
and shouldn't impact perf too much since it's just error paths, but
I'll defer to Mike here. I'm adding some minor comments in case Mike
decides to go through with this.

> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index fe76f8fd5a73..92bea6f77361 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -438,7 +438,7 @@ static int allocate_file_region_entries(struct resv_map *resv,
>                 for (i = 0; i < to_allocate; i++) {
>                         trg = kmalloc(sizeof(*trg), GFP_KERNEL);
>                         if (!trg)
> -                               goto out_of_memory;
> +                               goto out_of_memory_unlocked;

nit: I would not rename this. IMO specifying _unlocked is only useful
if we had another label called 'out_of_memory'.

>                         list_add(&trg->link, &allocated_regions);
>                 }
>
> @@ -450,7 +450,8 @@ static int allocate_file_region_entries(struct resv_map *resv,
>
>         return 0;
>
> -out_of_memory:
> +out_of_memory_unlocked:
> +       spin_lock(&resv->lock);
>         list_for_each_entry_safe(rg, trg, &allocated_regions, link) {
>                 list_del(&rg->link);
>                 kfree(rg);
> @@ -508,7 +509,8 @@ static long region_add(struct resv_map *resv, long f, long t,
>
>                 if (allocate_file_region_entries(
>                             resv, actual_regions_needed - in_regions_needed)) {
> -                       return -ENOMEM;
> +                       add = -ENOMEM;
> +                       goto out_locked;

I don't think this works, since we VM_BUG_ON(add < 0). I think it's
acceptable to just unlock and return here. Or, for better readability,
add a variable called err = 0 at the top, set that variable here, and
jump to a label that unlocks and returns err.

>                 }
>
>                 goto retry;
> @@ -517,7 +519,7 @@ static long region_add(struct resv_map *resv, long f, long t,
>         add = add_reservation_in_range(resv, f, t, h_cg, h, NULL);
>
>         resv->adds_in_progress -= in_regions_needed;
> -
> +out_locked:
>         spin_unlock(&resv->lock);
>         VM_BUG_ON(add < 0);
>         return add;
> @@ -557,11 +559,14 @@ static long region_chg(struct resv_map *resv, long f, long t,
>         if (*out_regions_needed == 0)
>                 *out_regions_needed = 1;
>
> -       if (allocate_file_region_entries(resv, *out_regions_needed))
> -               return -ENOMEM;
> +       if (allocate_file_region_entries(resv, *out_regions_needed)) {
> +               chg = -ENOMEM;
> +               goto out_locked;
> +       }
>
>         resv->adds_in_progress += *out_regions_needed;
>
> +out_locked:
>         spin_unlock(&resv->lock);
>         return chg;
>  }
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-10-28 19:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-23  7:47 [PATCH] hugetlb: fix locking in region_add,region_cgh,allocate_file_region_entries Laurent Cremmer
2020-10-23 10:12 ` David Hildenbrand
2020-10-23 11:25   ` Laurent Cremmer
2020-10-23 11:39     ` David Hildenbrand
2020-10-23 11:02 ` Oscar Salvador
2020-10-23 11:32   ` Laurent Cremmer
2020-10-23 12:11     ` Oscar Salvador
2020-10-23 16:47       ` Mike Kravetz
2020-10-27  7:34         ` Laurent Cremmer
2020-10-28 19:18 ` Mina Almasry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).