All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] IPI: Avoid to use 2 cache lines for one call_single_data
@ 2017-08-02  8:52 Huang, Ying
  2017-08-02  8:52 ` [PATCH 1/3] percpu: Add alloc_percpu_aligned() Huang, Ying
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Huang, Ying @ 2017-08-02  8:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Tejun Heo, Christoph Lameter, Joerg Roedel,
	Ingo Molnar, Michael Ellerman, Borislav Petkov, Thomas Gleixner,
	Juergen Gross, Aaron Lu, Huang Ying

From: Huang Ying <ying.huang@intel.com>

struct call_single_data is used in IPI to transfer information between
CPUs.  Its size is bigger than sizeof(unsigned long) and less than
cache line size.  Now, it is allocated with no any alignment
requirement.  This makes it possible for allocated call_single_data to
cross 2 cache lines.  So that double the number of the cache lines
that need to be transferred among CPUs.  This is resolved by aligning
the allocated call_single_data with cache line size.

To allocate cache line size aligned percpu memory dynamically,
alloc_percpu_aligned() is introduced and used in iova drivers too.

To test the effect of the patch, we use the vm-scalability multiple
thread swap test case (swap-w-seq-mt).  The test will create multiple
threads and each thread will eat memory until all RAM and part of swap
is used, so that huge number of IPI will be triggered when unmapping
memory.  In the test, the throughput of memory writing improves ~5%
compared with misaligned call_single_data because of faster IPI.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-08-29 14:29 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-02  8:52 [PATCH 0/3] IPI: Avoid to use 2 cache lines for one call_single_data Huang, Ying
2017-08-02  8:52 ` [PATCH 1/3] percpu: Add alloc_percpu_aligned() Huang, Ying
2017-08-02 13:50   ` Christopher Lameter
2017-08-03  0:33     ` Huang, Ying
2017-08-02  8:52 ` [PATCH 2/3] iova: Use alloc_percpu_aligned() Huang, Ying
2017-08-02  8:52 ` [PATCH 3/3] IPI: Avoid to use 2 cache lines for one call_single_data Huang, Ying
2017-08-02 10:18   ` Eric Dumazet
2017-08-02 10:53     ` Peter Zijlstra
2017-08-03  8:35     ` Huang, Ying
2017-08-03  8:57       ` Peter Zijlstra
2017-08-04  1:28         ` Huang, Ying
2017-08-04  2:05           ` Huang, Ying
2017-08-04  9:27             ` Peter Zijlstra
2017-08-05  0:47               ` Huang, Ying
2017-08-07  8:28                 ` Peter Zijlstra
2017-08-08  4:30                   ` Huang, Ying
2017-08-14  5:44                     ` Huang, Ying
2017-08-28  5:19                       ` Huang, Ying
2017-08-28  8:49                         ` Peter Zijlstra
2017-08-29 14:23                     ` [tip:locking/core] smp: Avoid using two cache lines for struct call_single_data tip-bot for Ying Huang
2017-08-04  9:20           ` [PATCH 3/3] IPI: Avoid to use 2 cache lines for one call_single_data Peter Zijlstra
2017-08-02 13:54 ` [PATCH 0/3] " Christopher Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.