linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.cz>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mel@csn.ul.ie>, Hugh Dickins <hughd@google.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Andi Kleen <andi@firstfloor.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage
Date: Wed, 20 Mar 2013 08:57:36 +0100	[thread overview]
Message-ID: <20130320075736.GC20045@dhcp22.suse.cz> (raw)
In-Reply-To: <1363751733-1fg9kic6-mutt-n-horiguchi@ah.jp.nec.com>

On Tue 19-03-13 23:55:33, Naoya Horiguchi wrote:
> On Mon, Mar 18, 2013 at 05:07:37PM +0100, Michal Hocko wrote:
> > On Thu 21-02-13 14:41:47, Naoya Horiguchi wrote:
[...]
> > > As for larger hugepages (1GB for x86_64), it's not easy to do hotremove
> > > over them because it's larger than memory block. So we now simply leave
> > > it to fail as it is.
> > 
> > What we could do is to check whether there is a free gb huge page on
> > other node and migrate there.
> 
> Correct, and 1GB page migration needs more code in migration core code
> (mainly it's related to migration entry in pud) and enough testing,
> so I want to do it in separate patchset.

Sure, this was just a note that it is achievable not that it has to be
done in the patchset.

[...]
> > > diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c
> > > index ccf9995..c28e6c9 100644
> > > --- v3.8.orig/mm/hugetlb.c
> > > +++ v3.8/mm/hugetlb.c
> > > @@ -843,6 +843,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
> > >  	return ret;
> > >  }
> > >  
> > > +/* Dissolve a given free hugepage into free pages. */
> > > +void dissolve_free_huge_page(struct page *page)
> > > +{
> > > +	if (PageHuge(page) && !page_count(page)) {
> > 
> > Could you clarify why you are cheking page_count here? I assume it is to
> > make sure the page is free but what prevents it being increased before
> > you take hugetlb_lock?
> 
> There's nothing to prevent it, so it's not safe to check refcount outside
> hugetlb_lock.

OK, so the lock has to be moved up.

[...]
> > > diff --git v3.8.orig/mm/memory_hotplug.c v3.8/mm/memory_hotplug.c
> > > index d04ed87..6418de2 100644
> > > --- v3.8.orig/mm/memory_hotplug.c
> > > +++ v3.8/mm/memory_hotplug.c
> > > @@ -29,6 +29,7 @@
> > >  #include <linux/suspend.h>
> > >  #include <linux/mm_inline.h>
> > >  #include <linux/firmware-map.h>
> > > +#include <linux/hugetlb.h>
> > >  
> > >  #include <asm/tlbflush.h>
> > >  
> > > @@ -985,10 +986,12 @@ static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
> > >  }
> > >  
> > >  /*
> > > - * Scanning pfn is much easier than scanning lru list.
> > > - * Scan pfn from start to end and Find LRU page.
> > > + * Scan pfn range [start,end) to find movable/migratable pages (LRU pages
> > > + * and hugepages). We scan pfn because it's much easier than scanning over
> > > + * linked list. This function returns the pfn of the first found movable
> > > + * page if it's found, otherwise 0.
> > >   */
> > > -static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
> > > +static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
> > >  {
> > >  	unsigned long pfn;
> > >  	struct page *page;
> > > @@ -997,6 +1000,12 @@ static unsigned long scan_lru_pages(unsigned long start, unsigned long end)
> > >  			page = pfn_to_page(pfn);
> > >  			if (PageLRU(page))
> > >  				return pfn;
> > > +			if (PageHuge(page)) {
> > > +				if (is_hugepage_movable(page))
> > > +					return pfn;
> > > +				else
> > > +					pfn += (1 << compound_order(page)) - 1;
> > > +			}
> > 
> > scan_lru_pages's name gets really confusing after this change because
> > hugetlb pages are not on the LRU. Maybe it would be good to rename it.
> 
> Yes, and that's done in right above chunk.

bahh, I am blind. I got confused by the name in the hunk header. Sorry
about that.

> 
> > 
> > >  		}
> > >  	}
> > >  	return 0;
> > > @@ -1019,6 +1028,30 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> > >  		page = pfn_to_page(pfn);
> > >  		if (!get_page_unless_zero(page))
> > >  			continue;
> > 
> > All tail pages have 0 reference count (according to prep_compound_page)
> > so they would be skipped anyway. This makes the below pfn tweaks
> > pointless.
> 
> I was totally mistaken about what we should do here, sorry. If we call
> do_migrate_range() for 1GB hugepage, we should return with error (maybe -EBUSY)
> instead of just skipping it, otherwise the caller __offline_pages() repeats
> 'goto repeat' until timeout. In order to do that, we had better insert
> if(PageHuge) block before getting refcount. And ...
> 
> > > +		if (PageHuge(page)) {
> > > +			/*
> > > +			 * Larger hugepage (1GB for x86_64) is larger than
> > > +			 * memory block, so pfn scan can start at the tail
> > > +			 * page of larger hugepage. In such case,
> > > +			 * we simply skip the hugepage and move the cursor
> > > +			 * to the last tail page.
> > > +			 */
> > > +			if (PageTail(page)) {
> > > +				struct page *head = compound_head(page);
> > > +				pfn = page_to_pfn(head) +
> > > +					(1 << compound_order(head)) - 1;
> > > +				put_page(page);
> > > +				continue;
> > > +			}
> > > +			pfn = (1 << compound_order(page)) - 1;
> > > +			if (huge_page_size(page_hstate(page)) != PMD_SIZE) {
> > > +				put_page(page);
> > > +				continue;
> > > +			}
> > 
> > There might be other hugepage sizes which fit into memblock so this test
> > doesn't seem right.
> 
> yes, so compound_order(head) > PFN_SECTION_SHIFT would be better.

I would rather see compound_order(head) < MAX_ORDER to be more coupled
with the allocator.

> I'll replace this chunk with the following if I don't get any other
> suggestion.
> 
> @@ -1017,6 +1026,21 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  		if (!pfn_valid(pfn))
>  			continue;
>  		page = pfn_to_page(pfn);
> +
> +		if (PageHuge(page)) {
> +			struct page *head = compound_head(page);
> +			pfn = page_to_pfn(head) + (1 << compound_order(head)) - 1;

I do not think this is safe without an elevated ref count. Your page
might be on the way to be freed. So you need to put get_page_unless_zero
before compound_order check.

Besides that I do not see too much point in optimizing this path on the
code complexity behalf. Sure we would call get_page_unless_zero
pointlessly for all tail pages but this is hardly a hot path.

> +			if (compound_order(head) > PFN_SECTION_SHIFT) {
> +				ret = -EBUSY;
> +				break;
> +			}
> +			if (!get_page_unless_zero(page))

Should be head.

> +				continue;
> +			list_move_tail(&head->lru, &source);
> +			move_pages -= 1 << compound_order(head);
> +			continue;
> +		}
> +
>  		if (!get_page_unless_zero(page))
>  			continue;
>  		/*
> 
> Thanks,
> Naoya
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2013-03-20  7:57 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-21 19:41 [RFC][PATCH 0/9] extend hugepage migration Naoya Horiguchi
2013-02-21 19:41 ` [PATCH 1/9] migrate: add migrate_entry_wait_huge() Naoya Horiguchi
2013-03-18 14:51   ` Michal Hocko
2013-03-19  0:06     ` Naoya Horiguchi
2013-03-19 23:57   ` Simon Jeons
2013-03-20 21:53     ` Naoya Horiguchi
2013-03-20 23:36       ` Simon Jeons
2013-04-04  4:57         ` Simon Jeons
2013-02-21 19:41 ` [PATCH 2/9] migrate: make core migration code aware of hugepage Naoya Horiguchi
2013-03-18 15:22   ` Michal Hocko
2013-03-18 15:33     ` Michal Hocko
2013-03-19  0:06       ` Naoya Horiguchi
2013-02-21 19:41 ` [PATCH 3/9] soft-offline: use migrate_pages() instead of migrate_huge_page() Naoya Horiguchi
2013-02-27  7:25   ` Chen Gong
2013-02-27 17:06     ` Naoya Horiguchi
2013-02-27 17:57       ` Naoya Horiguchi
2013-02-21 19:41 ` [PATCH 4/9] migrate: clean up migrate_huge_page() Naoya Horiguchi
2013-02-21 19:41 ` [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage Naoya Horiguchi
2013-03-18 15:40   ` Michal Hocko
2013-03-19  0:07     ` Naoya Horiguchi
2013-03-19  7:11       ` Michal Hocko
2013-03-20  6:12         ` Naoya Horiguchi
2013-03-20  7:41           ` Michal Hocko
2013-03-20  0:31       ` Simon Jeons
2013-03-20 21:59         ` Naoya Horiguchi
2013-03-21  0:06           ` Simon Jeons
2013-02-21 19:41 ` [PATCH 6/9] migrate: enable move_pages() " Naoya Horiguchi
2013-02-21 19:41 ` [PATCH 7/9] mbind: enable mbind() " Naoya Horiguchi
2013-02-21 19:41 ` [PATCH 8/9] memory-hotplug: enable memory hotplug to handle hugepage Naoya Horiguchi
2013-02-23  7:05   ` Hillf Danton
2013-02-25 16:57     ` Naoya Horiguchi
2013-02-27  7:36   ` Chen Gong
2013-02-27 17:16     ` Naoya Horiguchi
2013-03-18 16:07   ` Michal Hocko
2013-03-20  3:55     ` Naoya Horiguchi
2013-03-20  7:57       ` Michal Hocko [this message]
2013-03-20  1:03   ` Simon Jeons
2013-03-20 22:05     ` Naoya Horiguchi
2013-03-20 23:55       ` Simon Jeons
2013-02-21 19:41 ` [PATCH 9/9] remove /proc/sys/vm/hugepages_treat_as_movable Naoya Horiguchi
2013-02-28  6:02   ` KOSAKI Motohiro
2013-02-28 18:16     ` Naoya Horiguchi
2013-03-18 15:51   ` Michal Hocko
2013-03-19  0:07     ` Naoya Horiguchi
2013-03-19 23:43 ` [RFC][PATCH 0/9] extend hugepage migration Simon Jeons
2013-03-20 21:35   ` Naoya Horiguchi
2013-03-20 23:49     ` Simon Jeons
2013-03-21 12:56       ` Michal Hocko
2013-03-21 23:46         ` Simon Jeons
     [not found]           ` <20130322081532.GC31457@dhcp22.suse.cz>
2013-04-05  1:14             ` Simon Jeons
2013-04-05  8:08               ` Michal Hocko
2013-04-05  9:00                 ` Simon Jeons
2013-04-05  9:30                   ` Michal Hocko
2013-04-07  0:32                     ` Simon Jeons
2013-04-07 14:05                       ` KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130320075736.GC20045@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=hughd@google.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=n-horiguchi@ah.jp.nec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).