From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: benh@kernel.crashing.org, paulus@samba.org
Cc: linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org
Subject: [PATCH -V5 00/25] THP support for PPC64
Date: Thu, 4 Apr 2013 11:27:38 +0530 [thread overview]
Message-ID: <1365055083-31956-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> (raw)
Hi,
This patchset adds transparent hugepage support for PPC64.
TODO:
* hash preload support in update_mmu_cache_pmd (we don't do that for hugetlb)
Some numbers:
The latency measurements code from Anton found at
http://ozlabs.org/~anton/junkcode/latency2001.c
THP disabled 64K page size
------------------------
[root@llmp24l02 ~]# ./latency2001 8G
8589934592 731.73 cycles 205.77 ns
[root@llmp24l02 ~]# ./latency2001 8G
8589934592 743.39 cycles 209.05 ns
[root@llmp24l02 ~]#
THP disabled large page via hugetlbfs
-------------------------------------
[root@llmp24l02 ~]# ./latency2001 -l 8G
8589934592 416.09 cycles 117.01 ns
[root@llmp24l02 ~]# ./latency2001 -l 8G
8589934592 415.74 cycles 116.91 ns
THP enabled 64K page size.
----------------
[root@llmp24l02 ~]# ./latency2001 8G
8589934592 405.07 cycles 113.91 ns
[root@llmp24l02 ~]# ./latency2001 8G
8589934592 411.82 cycles 115.81 ns
[root@llmp24l02 ~]#
We are close to hugetlbfs in latency and we can achieve this with zero
config/page reservation. Most of the allocations above are fault allocated.
Another test that does 50000000 random access over 1GB area goes from
2.65 seconds to 1.07 seconds with this patchset.
split_huge_page impact:
---------------------
To look at the performance impact of large page invalidate, I tried the below
experiment. The test involved, accessing a large contiguous region of memory
location as below
for (i = 0; i < size; i += PAGE_SIZE)
data[i] = i;
We wanted to access the data in sequential order so that we look at the
worst case THP performance. Accesing the data in sequential order implies
we have the Page table cached and overhead of TLB miss is as minimal as
possible. We also don't touch the entire page, because that can result in
cache evict.
After we touched the full range as above, we now call mprotect on each
of that page. A mprotect will result in a hugepage split. This should
allow us to measure the impact of hugepage split.
for (i = 0; i < size; i += PAGE_SIZE)
mprotect(&data[i], PAGE_SIZE, PROT_READ);
Split hugepage impact:
---------------------
THP enabled: 2.851561705 seconds for test completion
THP disable: 3.599146098 seconds for test completion
We are 20.7% better than non THP case even when we have all the large pages split.
Detailed output:
THP enabled:
---------------------------------------
[root@llmp24l02 ~]# cat /proc/vmstat | grep thp
thp_fault_alloc 0
thp_fault_fallback 0
thp_collapse_alloc 0
thp_collapse_alloc_failed 0
thp_split 0
thp_zero_page_alloc 0
thp_zero_page_alloc_failed 0
[root@llmp24l02 ~]# /root/thp/tools/perf/perf stat -e page-faults,dTLB-load-misses ./split-huge-page-mpro 20G
time taken to touch all the data in ns: 2763096913
Performance counter stats for './split-huge-page-mpro 20G':
1,581 page-faults
3,159 dTLB-load-misses
2.851561705 seconds time elapsed
[root@llmp24l02 ~]#
[root@llmp24l02 ~]# cat /proc/vmstat | grep thp
thp_fault_alloc 1279
thp_fault_fallback 0
thp_collapse_alloc 0
thp_collapse_alloc_failed 0
thp_split 1279
thp_zero_page_alloc 0
thp_zero_page_alloc_failed 0
[root@llmp24l02 ~]#
77.05% split-huge-page [kernel.kallsyms] [k] .clear_user_page
7.10% split-huge-page [kernel.kallsyms] [k] .perf_event_mmap_ctx
1.51% split-huge-page split-huge-page-mpro [.] 0x0000000000000a70
0.96% split-huge-page [unknown] [H] 0x000000000157e3bc
0.81% split-huge-page [kernel.kallsyms] [k] .up_write
0.76% split-huge-page [kernel.kallsyms] [k] .perf_event_mmap
0.76% split-huge-page [kernel.kallsyms] [k] .down_write
0.74% split-huge-page [kernel.kallsyms] [k] .lru_add_page_tail
0.61% split-huge-page [kernel.kallsyms] [k] .split_huge_page
0.59% split-huge-page [kernel.kallsyms] [k] .change_protection
0.51% split-huge-page [kernel.kallsyms] [k] .release_pages
0.96% split-huge-page [unknown] [H] 0x000000000157e3bc
|
|--79.44%-- reloc_start
| |
| |--86.54%-- .__pSeries_lpar_hugepage_invalidate
| | .pSeries_lpar_hugepage_invalidate
| | .hpte_need_hugepage_flush
| | .split_huge_page
| | .__split_huge_page_pmd
| | .vma_adjust
| | .vma_merge
| | .mprotect_fixup
| | .SyS_mprotect
THP disabled:
---------------
[root@llmp24l02 ~]# echo never > /sys/kernel/mm/transparent_hugepage/enabled
[root@llmp24l02 ~]# /root/thp/tools/perf/perf stat -e page-faults,dTLB-load-misses ./split-huge-page-mpro 20G
time taken to touch all the data in ns: 3513767220
Performance counter stats for './split-huge-page-mpro 20G':
3,27,726 page-faults
3,29,654 dTLB-load-misses
3.599146098 seconds time elapsed
[root@llmp24l02 ~]#
Changes from V4:
* Fix bad page error in page_table_alloc
BUG: Bad page state in process stream pfn:f1a59
page:f0000000034dc378 count:1 mapcount:0 mapping: (null) index:0x0
[c000000f322c77d0] [c00000000015e198] .bad_page+0xe8/0x140
[c000000f322c7860] [c00000000015e3c4] .free_pages_prepare+0x1d4/0x1e0
[c000000f322c7910] [c000000000160450] .free_hot_cold_page+0x50/0x230
[c000000f322c79c0] [c00000000003ad18] .page_table_alloc+0x168/0x1c0
Changes from V3:
* PowerNV boot fixes
Change from V2:
* Change patch "powerpc: Reduce PTE table memory wastage" to use much simpler approach
for PTE page sharing.
* Changes to handle huge pages in KVM code.
* Address other review comments
Changes from V1
* Address review comments
* More patch split
* Add batch hpte invalidate for hugepages.
Changes from RFC V2:
* Address review comments
* More code cleanup and patch split
Changes from RFC V1:
* HugeTLB fs now works
* Compile issues fixed
* rebased to v3.8
* Patch series reorded so that ppc64 cleanups and MM THP changes are moved
early in the series. This should help in picking those patches early.
Thanks,
-aneesh
next reply other threads:[~2013-04-04 5:58 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-04 5:57 Aneesh Kumar K.V [this message]
2013-04-04 5:57 ` [PATCH -V5 01/25] powerpc: Use signed formatting when printing error Aneesh Kumar K.V
2013-04-04 5:57 ` [PATCH -V5 02/25] powerpc: Save DAR and DSISR in pt_regs on MCE Aneesh Kumar K.V
2013-04-04 5:57 ` [PATCH -V5 03/25] powerpc: Don't hard code the size of pte page Aneesh Kumar K.V
2013-04-04 5:57 ` [PATCH -V5 04/25] powerpc: Reduce the PTE_INDEX_SIZE Aneesh Kumar K.V
2013-04-11 7:10 ` David Gibson
2013-04-04 5:57 ` [PATCH -V5 05/25] powerpc: Move the pte free routines from common header Aneesh Kumar K.V
2013-04-04 5:57 ` [PATCH -V5 06/25] powerpc: Reduce PTE table memory wastage Aneesh Kumar K.V
2013-04-10 4:46 ` David Gibson
2013-04-10 6:29 ` Aneesh Kumar K.V
2013-04-10 7:04 ` David Gibson
2013-04-10 7:53 ` Aneesh Kumar K.V
2013-04-10 17:47 ` Aneesh Kumar K.V
2013-04-11 1:20 ` David Gibson
2013-04-11 1:12 ` David Gibson
2013-04-10 7:14 ` Michael Ellerman
2013-04-10 7:54 ` Aneesh Kumar K.V
2013-04-10 8:52 ` Aneesh Kumar K.V
2013-04-04 5:57 ` [PATCH -V5 07/25] powerpc: Use encode avpn where we need only avpn values Aneesh Kumar K.V
2013-04-04 5:57 ` [PATCH -V5 08/25] powerpc: Decode the pte-lp-encoding bits correctly Aneesh Kumar K.V
2013-04-10 7:19 ` David Gibson
2013-04-10 8:11 ` Aneesh Kumar K.V
2013-04-10 17:49 ` Aneesh Kumar K.V
2013-04-11 1:28 ` David Gibson
2013-04-04 5:57 ` [PATCH -V5 09/25] powerpc: Fix hpte_decode to use the correct decoding for page sizes Aneesh Kumar K.V
2013-04-11 3:20 ` David Gibson
2013-04-04 5:57 ` [PATCH -V5 10/25] powerpc: print both base and actual page size on hash failure Aneesh Kumar K.V
2013-04-11 3:21 ` David Gibson
2013-04-04 5:57 ` [PATCH -V5 11/25] powerpc: Print page size info during boot Aneesh Kumar K.V
2013-04-04 5:57 ` [PATCH -V5 12/25] powerpc: Return all the valid pte ecndoing in KVM_PPC_GET_SMMU_INFO ioctl Aneesh Kumar K.V
2013-04-11 3:24 ` David Gibson
2013-04-11 5:11 ` Aneesh Kumar K.V
2013-04-11 5:57 ` David Gibson
2013-04-04 5:57 ` [PATCH -V5 13/25] powerpc: Update tlbie/tlbiel as per ISA doc Aneesh Kumar K.V
2013-04-11 3:30 ` David Gibson
2013-04-11 5:20 ` Aneesh Kumar K.V
2013-04-11 6:16 ` David Gibson
2013-04-11 6:36 ` Aneesh Kumar K.V
2013-04-04 5:57 ` [PATCH -V5 14/25] mm/THP: HPAGE_SHIFT is not a #define on some arch Aneesh Kumar K.V
2013-04-11 3:36 ` David Gibson
2013-04-04 5:57 ` [PATCH -V5 15/25] mm/THP: Add pmd args to pgtable deposit and withdraw APIs Aneesh Kumar K.V
2013-04-11 3:40 ` David Gibson
2013-04-04 5:57 ` [PATCH -V5 16/25] mm/THP: withdraw the pgtable after pmdp related operations Aneesh Kumar K.V
2013-04-04 5:57 ` [PATCH -V5 17/25] powerpc/THP: Implement transparent hugepages for ppc64 Aneesh Kumar K.V
2013-04-11 5:38 ` David Gibson
2013-04-11 7:40 ` Aneesh Kumar K.V
2013-04-12 0:51 ` David Gibson
2013-04-12 5:06 ` Aneesh Kumar K.V
2013-04-12 5:39 ` David Gibson
2013-04-04 5:57 ` [PATCH -V5 18/25] powerpc/THP: Double the PMD table size for THP Aneesh Kumar K.V
2013-04-11 6:18 ` David Gibson
2013-04-04 5:57 ` [PATCH -V5 19/25] powerpc/THP: Differentiate THP PMD entries from HUGETLB PMD entries Aneesh Kumar K.V
2013-04-10 7:21 ` Michael Ellerman
2013-04-10 18:26 ` Aneesh Kumar K.V
2013-04-12 1:28 ` David Gibson
2013-04-04 5:57 ` [PATCH -V5 20/25] powerpc/THP: Add code to handle HPTE faults for large pages Aneesh Kumar K.V
2013-04-12 4:01 ` David Gibson
2013-04-04 5:57 ` [PATCH -V5 21/25] powerpc: Handle hugepage in perf callchain Aneesh Kumar K.V
2013-04-12 1:34 ` David Gibson
2013-04-12 5:05 ` Aneesh Kumar K.V
2013-04-04 5:58 ` [PATCH -V5 22/25] powerpc/THP: get_user_pages_fast changes Aneesh Kumar K.V
2013-04-12 1:41 ` David Gibson
2013-04-04 5:58 ` [PATCH -V5 23/25] powerpc/THP: Enable THP on PPC64 Aneesh Kumar K.V
2013-04-04 5:58 ` [PATCH -V5 24/25] powerpc: Optimize hugepage invalidate Aneesh Kumar K.V
2013-04-12 4:21 ` David Gibson
2013-04-14 10:02 ` Aneesh Kumar K.V
2013-04-15 1:18 ` David Gibson
2013-04-04 5:58 ` [PATCH -V5 25/25] powerpc: Handle hugepages in kvm Aneesh Kumar K.V
2013-04-04 6:00 ` [PATCH -V5 00/25] THP support for PPC64 Simon Jeons
2013-04-04 6:10 ` Aneesh Kumar K.V
2013-04-04 6:14 ` Simon Jeons
2013-04-04 8:38 ` Aneesh Kumar K.V
2013-04-19 1:55 ` Simon Jeons
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1365055083-31956-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com \
--to=aneesh.kumar@linux.vnet.ibm.com \
--cc=benh@kernel.crashing.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).