All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -V2 -mm 0/4] mm, huge page: Copy target sub-page last when copy huge page
@ 2018-05-24  0:58 Huang, Ying
  2018-05-24  0:58 ` [PATCH -V2 -mm 1/4] mm, clear_huge_page: Move order algorithm into a separate function Huang, Ying
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Huang, Ying @ 2018-05-24  0:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Andi Kleen, Jan Kara,
	Michal Hocko, Andrea Arcangeli, Kirill A. Shutemov,
	Matthew Wilcox, Hugh Dickins, Minchan Kim, Shaohua Li,
	Christopher Lameter, Mike Kravetz

From: Huang Ying <ying.huang@intel.com>

Huge page helps to reduce TLB miss rate, but it has higher cache
footprint, sometimes this may cause some issue.  For example, when
copying huge page on x86_64 platform, the cache footprint is 4M.  But
on a Xeon E5 v3 2699 CPU, there are 18 cores, 36 threads, and only 45M
LLC (last level cache).  That is, in average, there are 2.5M LLC for
each core and 1.25M LLC for each thread.

If the cache contention is heavy when copying the huge page, and we
copy the huge page from the begin to the end, it is possible that the
begin of huge page is evicted from the cache after we finishing
copying the end of the huge page.  And it is possible for the
application to access the begin of the huge page after copying the
huge page.

In commit c79b57e462b5d ("mm: hugetlb: clear target sub-page last when
clearing huge page"), to keep the cache lines of the target subpage
hot, the order to clear the subpages in the huge page in
clear_huge_page() is changed to clearing the subpage which is furthest
from the target subpage firstly, and the target subpage last.  The
similar order changing helps huge page copying too.  That is
implemented in this patchset.

The patchset is a generic optimization which should benefit quite some
workloads, not for a specific use case.  To demonstrate the
performance benefit of the patchset, we have tested it with
vm-scalability run on transparent huge page.

With this patchset, the throughput increases ~16.6% in vm-scalability
anon-cow-seq test case with 36 processes on a 2 socket Xeon E5 v3 2699
system (36 cores, 72 threads).  The test case set
/sys/kernel/mm/transparent_hugepage/enabled to be always, mmap() a big
anonymous memory area and populate it, then forked 36 child processes,
each writes to the anonymous memory area from the begin to the end, so
cause copy on write.  For each child process, other child processes
could be seen as other workloads which generate heavy cache pressure.
At the same time, the IPC (instruction per cycle) increased from 0.63
to 0.78, and the time spent in user space is reduced ~7.2%.

Changelog:

V2:

- As suggested by Mike Kravetz, put subpage order algorithm into a
  separate patch to avoid code duplication and reduce maintenance
  overhead.

- Add hugetlbfs support

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-05-25 15:38 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-24  0:58 [PATCH -V2 -mm 0/4] mm, huge page: Copy target sub-page last when copy huge page Huang, Ying
2018-05-24  0:58 ` [PATCH -V2 -mm 1/4] mm, clear_huge_page: Move order algorithm into a separate function Huang, Ying
2018-05-24 20:55   ` Mike Kravetz
2018-05-24  0:58 ` [PATCH -V2 -mm 2/4] mm, huge page: Copy target sub-page last when copy huge page Huang, Ying
2018-05-24 21:25   ` Mike Kravetz
2018-05-24  0:58 ` [PATCH -V2 -mm 3/4] mm, hugetlbfs: Rename address to haddr in hugetlb_cow() Huang, Ying
2018-05-24 21:42   ` Mike Kravetz
2018-05-25  0:34     ` Huang, Ying
2018-05-24  0:58 ` [PATCH -V2 -mm 4/4] mm, hugetlbfs: Pass fault address to cow handler Huang, Ying
2018-05-24 22:27   ` Mike Kravetz
2018-05-25 15:38 ` [PATCH -V2 -mm 0/4] mm, huge page: Copy target sub-page last when copy huge page Christopher Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.