All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00 of 41] Transparent Hugepage Support #15
@ 2010-03-26 17:00 Andrea Arcangeli
  2010-03-26 17:00 ` [PATCH 01 of 41] define MADV_HUGEPAGE Andrea Arcangeli
                   ` (42 more replies)
  0 siblings, 43 replies; 73+ messages in thread
From: Andrea Arcangeli @ 2010-03-26 17:00 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Marcelo Tosatti, Adam Litke, Avi Kivity, Izik Eidus,
	Hugh Dickins, Nick Piggin, Rik van Riel, Mel Gorman, Dave Hansen,
	Benjamin Herrenschmidt, Ingo Molnar, Mike Travis,
	KAMEZAWA Hiroyuki, Christoph Lameter, Chris Wright, bpicco,
	KOSAKI Motohiro, Balbir Singh, Arnd Bergmann, Michael S. Tsirkin,
	Peter Zijlstra, Johannes Weiner

Hello,

this fixes a potential issue with regard to simultaneous 4k and 2M TLB entries
in split_huge_page (at pratically zero cost, so I didn't need to add a fake
feature flag and it's a lot safer to do it this way just in case).
split_large_page in change_page_attr has the same issue too, but I've no idea
how to fix it there because the pmd cannot be marked non present at any given
time as change_page_attr may be running on ram below 640k and that is the same
pmd where the kernel .text resides. However I doubt it'll ever be a practical
problem. Other cpus also has a lot of warnings and risks in allowing
simultaneous TLB entries of different size.

Johannes also sent a cute optimization to split split_huge_page_vma/mm he 
converted those in a single split_huge_page_pmd and in addition he also sent
native support for hugepages in both mincore and mprotect. Which shows how
deep he already understands the whole huge_memory.c and its usage in the
callers.  Seeing significant contributions like this I think further confirms
this is the way to go. Thanks a lot Johannes.

The ability to bisect before the mincore and mprotect native implementations 
is one of the huge benefits of this approach. The hardest of all will be to 
add swap native support to 2M pages later (as it involves to make the 
swapcache 2M capable and that in turn means it expodes more than the rest all
over the pagecache code) but I think first we've other priorities:

1) merge memory compaction
2) writing a HPAGE_PMD_ORDER front slab allocator. I don't think memory
   compaction is capable of relocating slab entries in-use (correct me if I'm
   wrong, I think it's impossible as long as the slab entries are mapped by 2M
   pages and not 4k ptes like vmalloc). So the idea is that we should have the
   slab allocate 2M if it fails, 1M if it fails 512k etc... until it fallbacks
   to 4k. Otherwise the slab will fragment the memory badly by allocating with
   alloc_page(). Basically the buddy allocator will guarantee the slab will
   generate as much fragement as possible because it does its best to keep the
   high order pages for who asks for them. Probably the fallback should
   happen inside the buddy allocator instead of calling alloc_pages
   repeteadly, that should avoid taking a flood of locks. Basically
   the buddy should give the worst possible fragmentation effect to users that
   should be relocated, while the other users that cannot be relocated and
   only use 4k pages will better use a front allocator on top of alloc_pages.
   Something like alloc_page_not_relocatable() that will do its stuff
   internally and try to keep those in the same 2M pages. This alone should
   help tremendously and I think it's orthogonal to the memory compaction of
   the relocatable stuff. Or maybe we should just live with a large chunk of
   the memory not being relocatable, but I like this idea because it's more
   dynamic and it won't have fixed rule "limit the slab to 0-1g range". And
   it'd tend to try to keep fragmentation down even if we spill over the 1G
   range. (1g is purely made up number)
3) teach ksm to merge hugepages. I talked about this with Izik and we agree
   the current ksm tree algorithm will be the best at that compared to ksm
   algorithms.


To run KVM on top on this and take advantage of hugepages you need a few liner
patch I posted to qemu-devel to take care of aligning the start of the guest
memory so that the guest physical address and host virtual address will have
the same subpage numbers.

	http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.34-rc2-mm1/transparent_hugepage-15
	http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.34-rc2-mm1/transparent_hugepage-15.gz

I'd be nice to have this merged in -mm.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread
* [PATCH 00 of 41] Transparent Hugepage Support #17
@ 2010-04-02  0:41 Andrea Arcangeli
  2010-04-02  0:41 ` [PATCH 13 of 41] special pmd_trans_* functions Andrea Arcangeli
  0 siblings, 1 reply; 73+ messages in thread
From: Andrea Arcangeli @ 2010-04-02  0:41 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Marcelo Tosatti, Adam Litke, Avi Kivity, Izik Eidus,
	Hugh Dickins, Nick Piggin, Rik van Riel, Mel Gorman, Dave Hansen,
	Benjamin Herrenschmidt, Ingo Molnar, Mike Travis,
	KAMEZAWA Hiroyuki, Christoph Lameter, Chris Wright, bpicco,
	KOSAKI Motohiro, Balbir Singh, Arnd Bergmann, Michael S. Tsirkin,
	Peter Zijlstra, Johannes Weiner, Daisuke Nishimura

Hello,

With some heavy forking and split_huge_page stressing testcase, I found a
slight problem probably made visible by the anon_vma_chain: during the
anon_vma walk of __split_huge_page_splitting, page_check_address_pmd run in a
pmd that had the splitting bit set. The splitting but was set by a previously
forked process calling split_huge_page on its private page belonging to the
child anon_vma. The parent still has visiblity on the vma of the child so the
rmap walk of the parent covers the child too, but the split of the child page
can happen in parallel now. This triggered a VM_BUG_ON false positive and it
was enough to move the check on the page above the check to fix it. (it would
not have been noticeable with CONFIG_DEBUG_VM=n). All runs back flawless now
with the debug turned on.

@@ -1109,9 +1109,11 @@ new file mode 100644
 +	pmd = pmd_offset(pud, address);
 +	if (pmd_none(*pmd))
 +		goto out;
++	if (pmd_page(*pmd) != page)
++		goto out;
 +	VM_BUG_ON(flag == PAGE_CHECK_ADDRESS_PMD_NOTSPLITTING_FLAG &&
 +		  pmd_trans_splitting(*pmd));
-+	if (pmd_trans_huge(*pmd) && pmd_page(*pmd) == page) {
++	if (pmd_trans_huge(*pmd)) {
 +		VM_BUG_ON(flag == PAGE_CHECK_ADDRESS_PMD_SPLITTING_FLAG &&
 +			  !pmd_trans_splitting(*pmd));
 +		ret = pmd;

Then there was one more issues while testing ksm and khugepaged co-existing and
mergeing and collapsing pages on the same vma simultanously (which works fine
now in #17). One check for PageTransCompound was missing in ksm and another
had to be converted from PageTransHuge to PageTransCompound.

This also has the fixed version of the remove-PG_buddy patch, that moves
memory_hotplug bootmem typing code to use page->lru.next with a proper enum to
freeup mapcount -2 for PG_buddy semantics.

Not included by email but available in the directory there is the
latest version of the ksm-swapcache fix (waiting a comment from Hugh to
deliver it separately).

	http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.34-rc2-mm1/transparent_hugepage-17/
	http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.34-rc2-mm1/transparent_hugepage-17.gz

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread
* [PATCH 00 of 41] Transparent Hugepage Support #16
@ 2010-03-29 18:37 Andrea Arcangeli
  2010-03-29 18:37 ` [PATCH 13 of 41] special pmd_trans_* functions Andrea Arcangeli
  0 siblings, 1 reply; 73+ messages in thread
From: Andrea Arcangeli @ 2010-03-29 18:37 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Marcelo Tosatti, Adam Litke, Avi Kivity, Izik Eidus,
	Hugh Dickins, Nick Piggin, Rik van Riel, Mel Gorman, Dave Hansen,
	Benjamin Herrenschmidt, Ingo Molnar, Mike Travis,
	KAMEZAWA Hiroyuki, Christoph Lameter, Chris Wright, bpicco,
	KOSAKI Motohiro, Balbir Singh, Arnd Bergmann, Michael S. Tsirkin,
	Peter Zijlstra, Johannes Weiner, Daisuke Nishimura

Hello Andrew,

This is again against 2.6.34-rc1-mm1+ as before (I didn't find any newer -mm).

This removes PG_buddy and allows the PAE 32bit build with CONFIG_X86_PAT=y &&
CONFIG_X86_PAE=y && CONFIG_SPARSEMEM =y and fixes two bits in memcg_compound.

Removing an unconditional unnecessary PG_ bitflag is overall a gain anyway
(the added one is conditional to CONFIG_TRANSPARENT_HUGEPAGE which could be
turned off on 32bit archs depending on which feature is more or less important
to the user configuring the kernel).

        http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.34-rc2-mm1/transparent_hugepage-16/
        http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.34-rc2-mm1/transparent_hugepage-16.gz

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread
* [PATCH 00 of 41] Transparent Hugepage Support #15
@ 2010-03-26 16:48 Andrea Arcangeli
  2010-03-26 16:48 ` [PATCH 13 of 41] special pmd_trans_* functions Andrea Arcangeli
  0 siblings, 1 reply; 73+ messages in thread
From: Andrea Arcangeli @ 2010-03-26 16:48 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Marcelo Tosatti, Adam Litke, Avi Kivity, Izik Eidus,
	Hugh Dickins, Nick Piggin, Rik van Riel, Mel Gorman, Dave Hansen,
	Benjamin Herrenschmidt, Ingo Molnar, Mike Travis,
	KAMEZAWA Hiroyuki, Christoph Lameter, Chris Wright, bpicco,
	KOSAKI Motohiro, Balbir Singh, Arnd Bergmann, Michael S. Tsirkin,
	Peter Zijlstra, Johannes Weiner

Hello,

this fixes a potential issue with regard to simultaneous 4k and 2M TLB entries
in split_huge_page (at pratically zero cost, so I didn't need to add a fake
feature flag and it's a lot safer to do it this way just in case).
split_large_page in change_page_attr has the same issue too, but I've no idea
how to fix it there because the pmd cannot be marked non present at any given
time as change_page_attr may be running on ram below 640k and that is the same
pmd where the kernel .text resides. However I doubt it'll ever be a practical
problem. Other cpus also has a lot of warnings and risks in allowing
simultaneous TLB entries of different size.

Johannes also sent a cute optimization to split split_huge_page_vma/mm he
converted those in a single split_huge_page_pmd and in addition he also sent
native support for hugepages in both mincore and mprotect. Which shows how
deep he already understands the whole huge_memory.c and its usage in the
callers.  Seeing significant contributions like this I think further confirms
this is the way to go. Thanks a lot Johannes.

The ability to bisect before the mincore and mprotect native implementations
is one of the huge benefits of this approach. The hardest of all will be to
add swap native support to 2M pages later (as it involves to make the
swapcache 2M capable and that in turn means it expodes all over the
pagecache code) but I think first we've other priorities:

1) merge memory compaction
2) 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2010-04-02  0:45 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-26 17:00 [PATCH 00 of 41] Transparent Hugepage Support #15 Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 01 of 41] define MADV_HUGEPAGE Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 02 of 41] compound_lock Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 03 of 41] alter compound get_page/put_page Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 04 of 41] update futex compound knowledge Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 05 of 41] fix bad_page to show the real reason the page is bad Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 06 of 41] clear compound mapping Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 07 of 41] add native_set_pmd_at Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 08 of 41] add pmd paravirt ops Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 09 of 41] no paravirt version of pmd ops Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 10 of 41] export maybe_mkwrite Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 11 of 41] comment reminder in destroy_compound_page Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 12 of 41] config_transparent_hugepage Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 13 of 41] special pmd_trans_* functions Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 14 of 41] add pmd mangling generic functions Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 15 of 41] add pmd mangling functions to x86 Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 16 of 41] bail out gup_fast on splitting pmd Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 17 of 41] pte alloc trans splitting Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 18 of 41] add pmd mmu_notifier helpers Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 19 of 41] clear page compound Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 20 of 41] add pmd_huge_pte to mm_struct Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 21 of 41] split_huge_page_mm/vma Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 22 of 41] split_huge_page paging Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 23 of 41] clear_copy_huge_page Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 24 of 41] kvm mmu transparent hugepage support Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 25 of 41] _GFP_NO_KSWAPD Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 26 of 41] don't alloc harder for gfp nomemalloc even if nowait Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 27 of 41] transparent hugepage core Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 28 of 41] verify pmd_trans_huge isn't leaking Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 29 of 41] madvise(MADV_HUGEPAGE) Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 30 of 41] pmd_trans_huge migrate bugcheck Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 31 of 41] memcg compound Andrea Arcangeli
2010-03-29  1:57   ` Daisuke Nishimura
2010-03-29 18:23     ` Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 32 of 41] memcg huge memory Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 33 of 41] transparent hugepage vmstat Andrea Arcangeli
2010-03-29  2:13   ` Daisuke Nishimura
2010-03-29 18:21     ` Andrea Arcangeli
2010-03-30  0:40       ` Daisuke Nishimura
2010-03-26 17:00 ` [PATCH 34 of 41] khugepaged Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 35 of 41] don't leave orhpaned swap cache after ksm merging Andrea Arcangeli
2010-03-26 17:16   ` Rik van Riel
2010-03-26 17:23     ` Andrea Arcangeli
2010-03-26 21:32       ` Hugh Dickins
2010-03-27  1:08         ` Andrea Arcangeli
2010-03-29 14:01           ` Andrea Arcangeli
2010-03-30  6:56             ` Hugh Dickins
2010-04-01 16:47               ` Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 36 of 41] skip transhuge pages in ksm for now Andrea Arcangeli
2010-03-26 17:20   ` Rik van Riel
2010-03-26 17:00 ` [PATCH 37 of 41] add x86 32bit support Andrea Arcangeli
2010-03-26 17:45   ` Rik van Riel
2010-03-26 17:54   ` Johannes Weiner
2010-03-26 19:54     ` Andrea Arcangeli
2010-03-26 17:00 ` [PATCH 38 of 41] mincore transparent hugepage support Andrea Arcangeli
2010-03-26 18:13   ` Rik van Riel
2010-03-26 17:00 ` [PATCH 39 of 41] add pmd_modify Andrea Arcangeli
2010-03-26 18:24   ` Rik van Riel
2010-03-26 17:00 ` [PATCH 40 of 41] mprotect: pass vma down to page table walkers Andrea Arcangeli
2010-03-26 18:26   ` Rik van Riel
2010-03-26 17:00 ` [PATCH 41 of 41] mprotect: transparent huge page support Andrea Arcangeli
2010-03-26 18:27   ` Rik van Riel
2010-03-26 17:36 ` [PATCH 00 of 41] Transparent Hugepage Support #15 Mel Gorman
2010-03-26 18:07   ` Andrea Arcangeli
2010-03-26 21:09     ` Mel Gorman
2010-03-26 18:00 ` Christoph Lameter
2010-03-26 18:23   ` Andrea Arcangeli
2010-03-26 18:44     ` Christoph Lameter
2010-03-26 19:34       ` Andrea Arcangeli
2010-03-26 19:55         ` Christoph Lameter
  -- strict thread matches above, loose matches on Subject: below --
2010-04-02  0:41 [PATCH 00 of 41] Transparent Hugepage Support #17 Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 13 of 41] special pmd_trans_* functions Andrea Arcangeli
2010-03-29 18:37 [PATCH 00 of 41] Transparent Hugepage Support #16 Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 13 of 41] special pmd_trans_* functions Andrea Arcangeli
2010-03-26 16:48 [PATCH 00 of 41] Transparent Hugepage Support #15 Andrea Arcangeli
2010-03-26 16:48 ` [PATCH 13 of 41] special pmd_trans_* functions Andrea Arcangeli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.