[PATCH] hugetlb: cond_resched for set_max_huge_pages and follow_hugetlb

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] hugetlb: cond_resched for set_max_huge_pages and follow_hugetlb_page
@ 2015-07-23 21:54 Spencer Baugh
  2015-07-23 22:08 ` David Rientjes
  2015-07-24  6:59 ` Michal Hocko
  0 siblings, 2 replies; 10+ messages in thread
From: Spencer Baugh @ 2015-07-23 21:54 UTC (permalink / raw)
  To: Andrew Morton, Naoya Horiguchi, David Rientjes, Davidlohr Bueso,
	Mike Kravetz, Luiz Capitulino, open list:MEMORY MANAGEMENT,
	open list
  Cc: Joern Engel, Spencer Baugh, Joern Engel, Spencer Baugh

From: Joern Engel <joern@logfs.org>

~150ms scheduler latency for both observed in the wild.

Signed-off-by: Joern Engel <joern@logfs.org>
Signed-off-by: Spencer Baugh <sbaugh@catern.com>
---
 mm/hugetlb.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a8c3087..2eb6919 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1836,6 +1836,7 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count,
 			ret = alloc_fresh_gigantic_page(h, nodes_allowed);
 		else
 			ret = alloc_fresh_huge_page(h, nodes_allowed);
+		cond_resched();
 		spin_lock(&hugetlb_lock);
 		if (!ret)
 			goto out;
@@ -3521,6 +3522,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 				spin_unlock(ptl);
 			ret = hugetlb_fault(mm, vma, vaddr,
 				(flags & FOLL_WRITE) ? FAULT_FLAG_WRITE : 0);
+			cond_resched();
 			if (!(ret & VM_FAULT_ERROR))
 				continue;
 
-- 
2.5.0.rc3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] hugetlb: cond_resched for set_max_huge_pages and follow_hugetlb_page
  2015-07-23 21:54 [PATCH] hugetlb: cond_resched for set_max_huge_pages and follow_hugetlb_page Spencer Baugh
@ 2015-07-23 22:08 ` David Rientjes
  2015-07-23 22:36   ` Jörn Engel
  2015-07-24  6:59 ` Michal Hocko
  1 sibling, 1 reply; 10+ messages in thread
From: David Rientjes @ 2015-07-23 22:08 UTC (permalink / raw)
  To: Spencer Baugh
  Cc: Andrew Morton, Naoya Horiguchi, Davidlohr Bueso, Mike Kravetz,
	Luiz Capitulino, open list:MEMORY MANAGEMENT, open list,
	Joern Engel, Spencer Baugh, Joern Engel

On Thu, 23 Jul 2015, Spencer Baugh wrote:

> From: Joern Engel <joern@logfs.org>
> 
> ~150ms scheduler latency for both observed in the wild.
> 
> Signed-off-by: Joern Engel <joern@logfs.org>
> Signed-off-by: Spencer Baugh <sbaugh@catern.com>
> ---
>  mm/hugetlb.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index a8c3087..2eb6919 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1836,6 +1836,7 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count,
>  			ret = alloc_fresh_gigantic_page(h, nodes_allowed);
>  		else
>  			ret = alloc_fresh_huge_page(h, nodes_allowed);
> +		cond_resched();
>  		spin_lock(&hugetlb_lock);
>  		if (!ret)
>  			goto out;

This is wrong, you'd want to do any cond_resched() before the page 
allocation to avoid racing with an update to h->nr_huge_pages or 
h->surplus_huge_pages while hugetlb_lock was dropped that would result in 
the page having been uselessly allocated.

> @@ -3521,6 +3522,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  				spin_unlock(ptl);
>  			ret = hugetlb_fault(mm, vma, vaddr,
>  				(flags & FOLL_WRITE) ? FAULT_FLAG_WRITE : 0);
> +			cond_resched();
>  			if (!(ret & VM_FAULT_ERROR))
>  				continue;
>  

This is almost certainly the wrong placement as well since it's inserted 
inside a conditional inside a while loop and there's no reason to 
hugetlb_fault(), schedule, and then check the return value.  You need to 
insert your cond_resched()'s in legitimate places.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hugetlb: cond_resched for set_max_huge_pages and follow_hugetlb_page
  2015-07-23 22:08 ` David Rientjes
@ 2015-07-23 22:36   ` Jörn Engel
  2015-07-23 22:54     ` David Rientjes
  0 siblings, 1 reply; 10+ messages in thread
From: Jörn Engel @ 2015-07-23 22:36 UTC (permalink / raw)
  To: David Rientjes
  Cc: Spencer Baugh, Andrew Morton, Naoya Horiguchi, Davidlohr Bueso,
	Mike Kravetz, Luiz Capitulino, open list:MEMORY MANAGEMENT,
	open list, Spencer Baugh, Joern Engel

On Thu, Jul 23, 2015 at 03:08:58PM -0700, David Rientjes wrote:
> On Thu, 23 Jul 2015, Spencer Baugh wrote:
> > From: Joern Engel <joern@logfs.org>
> > 
> > ~150ms scheduler latency for both observed in the wild.
> > 
> > Signed-off-by: Joern Engel <joern@logfs.org>
> > Signed-off-by: Spencer Baugh <sbaugh@catern.com>
> > ---
> >  mm/hugetlb.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index a8c3087..2eb6919 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1836,6 +1836,7 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count,
> >  			ret = alloc_fresh_gigantic_page(h, nodes_allowed);
> >  		else
> >  			ret = alloc_fresh_huge_page(h, nodes_allowed);
> > +		cond_resched();
> >  		spin_lock(&hugetlb_lock);
> >  		if (!ret)
> >  			goto out;
> 
> This is wrong, you'd want to do any cond_resched() before the page 
> allocation to avoid racing with an update to h->nr_huge_pages or 
> h->surplus_huge_pages while hugetlb_lock was dropped that would result in 
> the page having been uselessly allocated.

There are three options.  Either
	/* some allocation */
	cond_resched();
or
	cond_resched();
	/* some allocation */
or
	if (cond_resched()) {
		spin_lock(&hugetlb_lock);
		continue;
	}
	/* some allocation */

I think you want the second option instead of the first.  That way we
have a little less memory allocation for the time we are scheduled out.
Sure, we can do that.  It probably doesn't make a big difference either
way, but why not.

If you are asking for the third option, I would rather avoid that.  It
makes the code more complex and doesn't change the fact that we have a
race and better be able to handle the race.  The code size growth will
likely cost us more performance that we would ever gain.  nr_huge_pages
tends to get updated once per system boot.

> > @@ -3521,6 +3522,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> >  				spin_unlock(ptl);
> >  			ret = hugetlb_fault(mm, vma, vaddr,
> >  				(flags & FOLL_WRITE) ? FAULT_FLAG_WRITE : 0);
> > +			cond_resched();
> >  			if (!(ret & VM_FAULT_ERROR))
> >  				continue;
> >  
> 
> This is almost certainly the wrong placement as well since it's inserted 
> inside a conditional inside a while loop and there's no reason to 
> hugetlb_fault(), schedule, and then check the return value.  You need to 
> insert your cond_resched()'s in legitimate places.

I assume you want the second option here as well.  Am I right?

Jörn

--
Sometimes it pays to stay in bed on Monday, rather than spending the rest
of the week debugging Monday's code.
-- Christopher Thompson

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hugetlb: cond_resched for set_max_huge_pages and follow_hugetlb_page
  2015-07-23 22:36   ` Jörn Engel
@ 2015-07-23 22:54     ` David Rientjes
  2015-07-23 23:09       ` Jörn Engel
  0 siblings, 1 reply; 10+ messages in thread
From: David Rientjes @ 2015-07-23 22:54 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Spencer Baugh, Andrew Morton, Naoya Horiguchi, Davidlohr Bueso,
	Mike Kravetz, Luiz Capitulino, open list:MEMORY MANAGEMENT,
	open list, Spencer Baugh, Joern Engel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1795 bytes --]

On Thu, 23 Jul 2015, Jörn Engel wrote:

> > This is wrong, you'd want to do any cond_resched() before the page 
> > allocation to avoid racing with an update to h->nr_huge_pages or 
> > h->surplus_huge_pages while hugetlb_lock was dropped that would result in 
> > the page having been uselessly allocated.
> 
> There are three options.  Either
> 	/* some allocation */
> 	cond_resched();
> or
> 	cond_resched();
> 	/* some allocation */
> or
> 	if (cond_resched()) {
> 		spin_lock(&hugetlb_lock);
> 		continue;
> 	}
> 	/* some allocation */
> 
> I think you want the second option instead of the first.  That way we
> have a little less memory allocation for the time we are scheduled out.
> Sure, we can do that.  It probably doesn't make a big difference either
> way, but why not.
> 

The loop is dropping the lock simply to do the allocation and it needs to 
compare with the user-written number of hugepages to allocate.

What we don't want is to allocate, reschedule, and check if we really 
needed to allocate.  That's what your patch does because it races with 
persistent_huge_page().  It's probably the worst place to do it.

Rather, what you want to do is check if you need to allocate, reschedule 
if needed (and if so, recheck), and then allocate.

> If you are asking for the third option, I would rather avoid that.  It
> makes the code more complex and doesn't change the fact that we have a
> race and better be able to handle the race.  The code size growth will
> likely cost us more performance that we would ever gain.  nr_huge_pages
> tends to get updated once per system boot.
> 

Your third option is nonsensical, you didn't save the state of whether you 
locked the lock so you can't reliably unlock it, and you cannot hold a 
spinlock while allocating in this context.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hugetlb: cond_resched for set_max_huge_pages and follow_hugetlb_page
  2015-07-23 22:54     ` David Rientjes
@ 2015-07-23 23:09       ` Jörn Engel
  2015-07-24 19:49         ` David Rientjes
  0 siblings, 1 reply; 10+ messages in thread
From: Jörn Engel @ 2015-07-23 23:09 UTC (permalink / raw)
  To: David Rientjes
  Cc: Spencer Baugh, Andrew Morton, Naoya Horiguchi, Davidlohr Bueso,
	Mike Kravetz, Luiz Capitulino, open list:MEMORY MANAGEMENT,
	open list, Spencer Baugh, Joern Engel

On Thu, Jul 23, 2015 at 03:54:43PM -0700, David Rientjes wrote:
> On Thu, 23 Jul 2015, Jörn Engel wrote:
> 
> > > This is wrong, you'd want to do any cond_resched() before the page 
> > > allocation to avoid racing with an update to h->nr_huge_pages or 
> > > h->surplus_huge_pages while hugetlb_lock was dropped that would result in 
> > > the page having been uselessly allocated.
> > 
> > There are three options.  Either
> > 	/* some allocation */
> > 	cond_resched();
> > or
> > 	cond_resched();
> > 	/* some allocation */
> > or
> > 	if (cond_resched()) {
> > 		spin_lock(&hugetlb_lock);
> > 		continue;
> > 	}
> > 	/* some allocation */
> > 
> > I think you want the second option instead of the first.  That way we
> > have a little less memory allocation for the time we are scheduled out.
> > Sure, we can do that.  It probably doesn't make a big difference either
> > way, but why not.
> > 
> 
> The loop is dropping the lock simply to do the allocation and it needs to 
> compare with the user-written number of hugepages to allocate.

And at this point the existing code is racy.  Page allocation might
block for minutes trying to free some memory.  A cond_resched doesn't
change that - it only increases the odds of hitting the race window.

> What we don't want is to allocate, reschedule, and check if we really 
> needed to allocate.  That's what your patch does because it races with 
> persistent_huge_page().  It's probably the worst place to do it.
> 
> Rather, what you want to do is check if you need to allocate, reschedule 
> if needed (and if so, recheck), and then allocate.
> 
> > If you are asking for the third option, I would rather avoid that.  It
> > makes the code more complex and doesn't change the fact that we have a
> > race and better be able to handle the race.  The code size growth will
> > likely cost us more performance that we would ever gain.  nr_huge_pages
> > tends to get updated once per system boot.
> 
> Your third option is nonsensical, you didn't save the state of whether you 
> locked the lock so you can't reliably unlock it, and you cannot hold a 
> spinlock while allocating in this context.

Are we looking at the same code?  Mine looks like this:
	while (count > persistent_huge_pages(h)) {
		/*
		 * If this allocation races such that we no longer need the
		 * page, free_huge_page will handle it by freeing the page
		 * and reducing the surplus.
		 */
		spin_unlock(&hugetlb_lock);
		if (hstate_is_gigantic(h))
			ret = alloc_fresh_gigantic_page(h, nodes_allowed);
		else
			ret = alloc_fresh_huge_page(h, nodes_allowed);
		spin_lock(&hugetlb_lock);
		if (!ret)
			goto out;

		/* Bail for signals. Probably ctrl-c from user */
		if (signal_pending(current))
			goto out;
	}

What state is there to save?  We just called spin_unlock, we did a
schedule and if we want to continue without doing page allocation we
better take the lock again.  Or do you want to go even more complex and
check for signals as well?

The case you are concerned about is rare.  It is so rare that it doesn't
matter from a performance point of view, only for correctness.  And if
we hit the rare case, the worst harm would be an unnecessary allocation
that we return back to the system.  How much complexity do you think it
is worth to avoid this allocation?  How much runtime will the bigger
text size cost you in the common cases?

What matters to me is the scheduler latency.  That is real and happens
reliably once per boot.

Jörn

--
Chance favors only the prepared mind.
-- Louis Pasteur

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hugetlb: cond_resched for set_max_huge_pages and follow_hugetlb_page
  2015-07-23 21:54 [PATCH] hugetlb: cond_resched for set_max_huge_pages and follow_hugetlb_page Spencer Baugh
  2015-07-23 22:08 ` David Rientjes
@ 2015-07-24  6:59 ` Michal Hocko
  2015-07-24 17:12   ` Jörn Engel
  1 sibling, 1 reply; 10+ messages in thread
From: Michal Hocko @ 2015-07-24  6:59 UTC (permalink / raw)
  To: Spencer Baugh
  Cc: Andrew Morton, Naoya Horiguchi, David Rientjes, Davidlohr Bueso,
	Mike Kravetz, Luiz Capitulino, open list:MEMORY MANAGEMENT,
	open list, Joern Engel, Spencer Baugh, Joern Engel

On Thu 23-07-15 14:54:31, Spencer Baugh wrote:
> From: Joern Engel <joern@logfs.org>
> 
> ~150ms scheduler latency for both observed in the wild.

This is way to vague. Could you describe your problem somehow more,
please?
There are schduling points in the page allocator (when it triggers the
reclaim), why are those not sufficient? Or do you manage to allocate
many hugetlb pages without performing the reclaim and that leads to
soft lockups?

> 
> Signed-off-by: Joern Engel <joern@logfs.org>
> Signed-off-by: Spencer Baugh <sbaugh@catern.com>
> ---
>  mm/hugetlb.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index a8c3087..2eb6919 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1836,6 +1836,7 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count,
>  			ret = alloc_fresh_gigantic_page(h, nodes_allowed);
>  		else
>  			ret = alloc_fresh_huge_page(h, nodes_allowed);
> +		cond_resched();
>  		spin_lock(&hugetlb_lock);
>  		if (!ret)
>  			goto out;
> @@ -3521,6 +3522,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  				spin_unlock(ptl);
>  			ret = hugetlb_fault(mm, vma, vaddr,
>  				(flags & FOLL_WRITE) ? FAULT_FLAG_WRITE : 0);
> +			cond_resched();
>  			if (!(ret & VM_FAULT_ERROR))
>  				continue;
>  
> -- 
> 2.5.0.rc3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hugetlb: cond_resched for set_max_huge_pages and follow_hugetlb_page
  2015-07-24  6:59 ` Michal Hocko
@ 2015-07-24 17:12   ` Jörn Engel
  2015-07-24 20:28     ` Davidlohr Bueso
  0 siblings, 1 reply; 10+ messages in thread
From: Jörn Engel @ 2015-07-24 17:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Spencer Baugh, Andrew Morton, Naoya Horiguchi, David Rientjes,
	Davidlohr Bueso, Mike Kravetz, Luiz Capitulino,
	open list:MEMORY MANAGEMENT, open list, Spencer Baugh,
	Joern Engel

On Fri, Jul 24, 2015 at 08:59:59AM +0200, Michal Hocko wrote:
> On Thu 23-07-15 14:54:31, Spencer Baugh wrote:
> > From: Joern Engel <joern@logfs.org>
> > 
> > ~150ms scheduler latency for both observed in the wild.
> 
> This is way to vague. Could you describe your problem somehow more,
> please?
> There are schduling points in the page allocator (when it triggers the
> reclaim), why are those not sufficient? Or do you manage to allocate
> many hugetlb pages without performing the reclaim and that leads to
> soft lockups?

We don't use transparent hugepages - they cause too much latency.
Instead we reserve somewhere around 3/4 or so of physical memory for
hugepages.  "sysctl -w vm.nr_hugepages=100000" or something similar in a
startup script.

Since it is early in boot we don't go through page reclaim.

Jörn

--
Everything should be made as simple as possible, but not simpler.
-- Albert Einstein

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hugetlb: cond_resched for set_max_huge_pages and follow_hugetlb_page
  2015-07-23 23:09       ` Jörn Engel
@ 2015-07-24 19:49         ` David Rientjes
  2015-07-24 20:49           ` Jörn Engel
  0 siblings, 1 reply; 10+ messages in thread
From: David Rientjes @ 2015-07-24 19:49 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Spencer Baugh, Andrew Morton, Naoya Horiguchi, Davidlohr Bueso,
	Mike Kravetz, Luiz Capitulino, open list:MEMORY MANAGEMENT,
	open list, Spencer Baugh, Joern Engel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2165 bytes --]

On Thu, 23 Jul 2015, Jörn Engel wrote:

> > The loop is dropping the lock simply to do the allocation and it needs to 
> > compare with the user-written number of hugepages to allocate.
> 
> And at this point the existing code is racy.  Page allocation might
> block for minutes trying to free some memory.  A cond_resched doesn't
> change that - it only increases the odds of hitting the race window.
> 

The existing code has always been racy, it explicitly admits this, the 
problem is that your patch is making the race window larger.

> Are we looking at the same code?  Mine looks like this:
> 	while (count > persistent_huge_pages(h)) {
> 		/*
> 		 * If this allocation races such that we no longer need the
> 		 * page, free_huge_page will handle it by freeing the page
> 		 * and reducing the surplus.
> 		 */
> 		spin_unlock(&hugetlb_lock);
> 		if (hstate_is_gigantic(h))
> 			ret = alloc_fresh_gigantic_page(h, nodes_allowed);
> 		else
> 			ret = alloc_fresh_huge_page(h, nodes_allowed);
> 		spin_lock(&hugetlb_lock);
> 		if (!ret)
> 			goto out;
> 
> 		/* Bail for signals. Probably ctrl-c from user */
> 		if (signal_pending(current))
> 			goto out;
> 	}
> 

I don't see the cond_resched() you propose to add, but the need for it is 
obvious with a large user-written nr_hugepages in the above loop.

The suggestion is to check the conditional, reschedule if needed (and if 
so, recheck the conditional), and then allocate.

Your third option looks fine and the best place to do the cond_resched().  
I was looking at your second option when I responded and compared it to 
the first.  We don't want to do cond_resched() immediately before or after 
the allocation, the net result is the same: we may be pointlessly 
allocating the hugepage and each hugepage allocation can be very 
heavyweight.

So I agree with your third option from the previous email.

You may also want to include the actual text of the warning from the 
kernel log in your commit message.  When people encounter this, then will 
probably grep in the kernel logs for some keywords to see if it was 
already fixed and I fear your current commit message may allow it to be 
missed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hugetlb: cond_resched for set_max_huge_pages and follow_hugetlb_page
  2015-07-24 17:12   ` Jörn Engel
@ 2015-07-24 20:28     ` Davidlohr Bueso
  0 siblings, 0 replies; 10+ messages in thread
From: Davidlohr Bueso @ 2015-07-24 20:28 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Michal Hocko, Spencer Baugh, Andrew Morton, Naoya Horiguchi,
	David Rientjes, Mike Kravetz, Luiz Capitulino,
	open list:MEMORY MANAGEMENT, open list, Spencer Baugh,
	Joern Engel

On Fri, 2015-07-24 at 10:12 -0700, Jörn Engel wrote:
> On Fri, Jul 24, 2015 at 08:59:59AM +0200, Michal Hocko wrote:
> > On Thu 23-07-15 14:54:31, Spencer Baugh wrote:
> > > From: Joern Engel <joern@logfs.org>
> > > 
> > > ~150ms scheduler latency for both observed in the wild.
> > 
> > This is way to vague. Could you describe your problem somehow more,
> > please?
> > There are schduling points in the page allocator (when it triggers the
> > reclaim), why are those not sufficient? Or do you manage to allocate
> > many hugetlb pages without performing the reclaim and that leads to
> > soft lockups?
> 
> We don't use transparent hugepages - they cause too much latency.
> Instead we reserve somewhere around 3/4 or so of physical memory for
> hugepages.  "sysctl -w vm.nr_hugepages=100000" or something similar in a
> startup script.
> 
> Since it is early in boot we don't go through page reclaim.

Still, please be more verbose about what you _are_ encountering. Iow,
please have decent changelog in v2.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hugetlb: cond_resched for set_max_huge_pages and follow_hugetlb_page
  2015-07-24 19:49         ` David Rientjes
@ 2015-07-24 20:49           ` Jörn Engel
  0 siblings, 0 replies; 10+ messages in thread
From: Jörn Engel @ 2015-07-24 20:49 UTC (permalink / raw)
  To: David Rientjes
  Cc: Spencer Baugh, Andrew Morton, Naoya Horiguchi, Davidlohr Bueso,
	Mike Kravetz, Luiz Capitulino, open list:MEMORY MANAGEMENT,
	open list, Spencer Baugh, Joern Engel

On Fri, Jul 24, 2015 at 12:49:14PM -0700, David Rientjes wrote:
> 
> I don't see the cond_resched() you propose to add, but the need for it is 
> obvious with a large user-written nr_hugepages in the above loop.
> 
> The suggestion is to check the conditional, reschedule if needed (and if 
> so, recheck the conditional), and then allocate.
> 
> Your third option looks fine and the best place to do the cond_resched().  
> I was looking at your second option when I responded and compared it to 
> the first.  We don't want to do cond_resched() immediately before or after 
> the allocation, the net result is the same: we may be pointlessly 
> allocating the hugepage and each hugepage allocation can be very 
> heavyweight.
> 
> So I agree with your third option from the previous email.

All right.  We are talking about the same thing now.  But I previously
argued that the pointless allocation will a) not impact correctness and
b) be so rare as to not impact performance.  The problem with the third
option is that it adds a bit of constant overhead all the time to
compensate for not doing the pointless allocation.

On my systems at least, the pointless allocation will happen, on
average, less than once per boot.  Unless my systems are vastly
unrepresentative, the third option doesn't look appealing to me.

> You may also want to include the actual text of the warning from the 
> kernel log in your commit message.  When people encounter this, then will 
> probably grep in the kernel logs for some keywords to see if it was 
> already fixed and I fear your current commit message may allow it to be 
> missed.

Ack.

I should still have those warning in logfiles somewhere and can hunt
them down.

Jörn

--
Act only according to that maxim whereby you can, at the same time,
will that it should become a universal law.
-- Immanuel Kant

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-07-24 20:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-23 21:54 [PATCH] hugetlb: cond_resched for set_max_huge_pages and follow_hugetlb_page Spencer Baugh
2015-07-23 22:08 ` David Rientjes
2015-07-23 22:36   ` Jörn Engel
2015-07-23 22:54     ` David Rientjes
2015-07-23 23:09       ` Jörn Engel
2015-07-24 19:49         ` David Rientjes
2015-07-24 20:49           ` Jörn Engel
2015-07-24  6:59 ` Michal Hocko
2015-07-24 17:12   ` Jörn Engel
2015-07-24 20:28     ` Davidlohr Bueso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).