All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/irq: do not substract irq_tlb_count from irq_call_count
@ 2016-08-11  7:44 Aaron Lu
  2016-08-11  9:13 ` Ingo Molnar
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Aaron Lu @ 2016-08-11  7:44 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Alex Shi, Tomoki Sekiyama, Davidlohr Bueso, Huang Ying,
	Ingo Molnar, Thomas Gleixner, "H. Peter Anvin"

Since commit 52aec3308db8 ("x86/tlb: replace INVALIDATE_TLB_VECTOR by
CALL_FUNCTION_VECTOR"), the tlb remote shootdown is done through call
function vector. That commit didn't take care of irq_tlb_count so later
commit fd0f5869724f ("x86: Distinguish TLB shootdown interrupts from
other functions call interrupts") tried to fix it.

The fix assumes every increase of irq_tlb_count has a corresponding
increase of irq_call_count. So the irq_call_count is always bigger than
irq_tlb_count and we could substract irq_tlb_count from irq_call_count.

Unfortunately this is not true for the smp_call_function_single case.
The IPI is only sent if the target CPU's call_single_queue is empty when
adding a csd into it in generic_exec_single. That means if two threads
are both adding flush tlb csds to the same CPU's call_single_queue, only
one IPI is sent. In other words, the irq_call_count is incremented by 1
but irq_tlb_count is incremented by 2. Over time, irq_tlb_count will be
bigger than irq_call_count and the substract will produce a very large
irq_call_count value due to overflow.

Considering that:
1 it's not worth to send more IPIs for the sake of accurate counting of
  irq_call_count in generic_exec_single;
2 it's not easy to tell if the call function interrupt is for TLB
  shootdown in __smp_call_function_single_interrupt.
Not to exclude TLB shootdown from call function count seems to be the
simplest fix and this patch just did that.

This is found by LKP's cyclic performance regression tracking recently
with the vm-scalability test suite. I have bisected to commit
0a7ce4b5a632 ("mm/rmap: share the i_mmap_rwsem"). This commit didn't do
anything wrong but revealed the irq_call_count problem. IIUC, the commit
makes rwc->remap_one in rmap_walk_file concurrent with multiple threads.
When remap_one is try_to_unmap_one, then multiple threads could queue
flush tlb to the same CPU but only one IPI will be sent.

Since the commit enter Linux v3.19, the counting problem only shows up
from v3.19. Considering this is a behaviour change, I'm not sure if I
should add the stable tag here.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
 arch/x86/include/asm/hardirq.h | 4 ----
 arch/x86/kernel/irq.c          | 3 +--
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index 7178043b0e1d..59405a248fc2 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -22,10 +22,6 @@ typedef struct {
 #ifdef CONFIG_SMP
 	unsigned int irq_resched_count;
 	unsigned int irq_call_count;
-	/*
-	 * irq_tlb_count is double-counted in irq_call_count, so it must be
-	 * subtracted from irq_call_count when displaying irq_call_count
-	 */
 	unsigned int irq_tlb_count;
 #endif
 #ifdef CONFIG_X86_THERMAL_VECTOR
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 61521dc19c10..9f669fdd2010 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -102,8 +102,7 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 	seq_puts(p, "  Rescheduling interrupts\n");
 	seq_printf(p, "%*s: ", prec, "CAL");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->irq_call_count -
-					irq_stats(j)->irq_tlb_count);
+		seq_printf(p, "%10u ", irq_stats(j)->irq_call_count);
 	seq_puts(p, "  Function call interrupts\n");
 	seq_printf(p, "%*s: ", prec, "TLB");
 	for_each_online_cpu(j)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/irq: do not substract irq_tlb_count from irq_call_count
  2016-08-11  7:44 [PATCH] x86/irq: do not substract irq_tlb_count from irq_call_count Aaron Lu
@ 2016-08-11  9:13 ` Ingo Molnar
  2016-08-11  9:16   ` Aaron Lu
  2016-08-11 11:57 ` [tip:x86/urgent] x86/irq: Do " tip-bot for Aaron Lu
  2016-08-11 15:13 ` [PATCH] x86/irq: do " Huang, Ying
  2 siblings, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2016-08-11  9:13 UTC (permalink / raw)
  To: Aaron Lu
  Cc: x86, linux-kernel, Alex Shi, Tomoki Sekiyama, Davidlohr Bueso,
	Huang Ying, Ingo Molnar, Thomas Gleixner,
	"H. Peter Anvin"


* Aaron Lu <aaron.lu@intel.com> wrote:

> This is found by LKP's cyclic performance regression tracking recently
> with the vm-scalability test suite. I have bisected to commit
> 0a7ce4b5a632 ("mm/rmap: share the i_mmap_rwsem"). This commit didn't do
> anything wrong but revealed the irq_call_count problem. IIUC, the commit
> makes rwc->remap_one in rmap_walk_file concurrent with multiple threads.
> When remap_one is try_to_unmap_one, then multiple threads could queue
> flush tlb to the same CPU but only one IPI will be sent.

Note, for some reason the commit ID you used is wrong, the real one is:

  3dec0ba0be6a ("mm/rmap: share the i_mmap_rwsem")

I have fixed this in the changelog.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/irq: do not substract irq_tlb_count from irq_call_count
  2016-08-11  9:13 ` Ingo Molnar
@ 2016-08-11  9:16   ` Aaron Lu
  0 siblings, 0 replies; 6+ messages in thread
From: Aaron Lu @ 2016-08-11  9:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86, linux-kernel, Alex Shi, Tomoki Sekiyama, Davidlohr Bueso,
	Huang Ying, Ingo Molnar, Thomas Gleixner, H. Peter Anvin

On 08/11/2016 05:13 PM, Ingo Molnar wrote:
> 
> * Aaron Lu <aaron.lu@intel.com> wrote:
> 
>> This is found by LKP's cyclic performance regression tracking recently
>> with the vm-scalability test suite. I have bisected to commit
>> 0a7ce4b5a632 ("mm/rmap: share the i_mmap_rwsem"). This commit didn't do
>> anything wrong but revealed the irq_call_count problem. IIUC, the commit
>> makes rwc->remap_one in rmap_walk_file concurrent with multiple threads.
>> When remap_one is try_to_unmap_one, then multiple threads could queue
>> flush tlb to the same CPU but only one IPI will be sent.
> 
> Note, for some reason the commit ID you used is wrong, the real one is:

Oops, I looked at a test branch where I cherry-picked that commit, sorry.

> 
>   3dec0ba0be6a ("mm/rmap: share the i_mmap_rwsem")
> 
> I have fixed this in the changelog.

Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip:x86/urgent] x86/irq: Do not substract irq_tlb_count from irq_call_count
  2016-08-11  7:44 [PATCH] x86/irq: do not substract irq_tlb_count from irq_call_count Aaron Lu
  2016-08-11  9:13 ` Ingo Molnar
@ 2016-08-11 11:57 ` tip-bot for Aaron Lu
  2016-08-11 15:13 ` [PATCH] x86/irq: do " Huang, Ying
  2 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Aaron Lu @ 2016-08-11 11:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tomoki.sekiyama.qu, luto, jpoimboe, ying.huang, torvalds,
	aaron.lu, peterz, hpa, dave, mingo, dvlasenk, brgerst, alex.shi,
	linux-kernel, bp, tglx

Commit-ID:  82ba4faca1bffad429f15c90c980ffd010366c25
Gitweb:     http://git.kernel.org/tip/82ba4faca1bffad429f15c90c980ffd010366c25
Author:     Aaron Lu <aaron.lu@intel.com>
AuthorDate: Thu, 11 Aug 2016 15:44:30 +0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 11 Aug 2016 11:14:59 +0200

x86/irq: Do not substract irq_tlb_count from irq_call_count

Since commit:

  52aec3308db8 ("x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR")

the TLB remote shootdown is done through call function vector. That
commit didn't take care of irq_tlb_count, which a later commit:

  fd0f5869724f ("x86: Distinguish TLB shootdown interrupts from other functions call interrupts")

... tried to fix.

The fix assumes every increase of irq_tlb_count has a corresponding
increase of irq_call_count. So the irq_call_count is always bigger than
irq_tlb_count and we could substract irq_tlb_count from irq_call_count.

Unfortunately this is not true for the smp_call_function_single() case.
The IPI is only sent if the target CPU's call_single_queue is empty when
adding a csd into it in generic_exec_single. That means if two threads
are both adding flush tlb csds to the same CPU's call_single_queue, only
one IPI is sent. In other words, the irq_call_count is incremented by 1
but irq_tlb_count is incremented by 2. Over time, irq_tlb_count will be
bigger than irq_call_count and the substract will produce a very large
irq_call_count value due to overflow.

Considering that:

  1) it's not worth to send more IPIs for the sake of accurate counting of
     irq_call_count in generic_exec_single();

  2) it's not easy to tell if the call function interrupt is for TLB
     shootdown in __smp_call_function_single_interrupt().

Not to exclude TLB shootdown from call function count seems to be the
simplest fix and this patch just does that.

This bug was found by LKP's cyclic performance regression tracking recently
with the vm-scalability test suite. I have bisected to commit:

  3dec0ba0be6a ("mm/rmap: share the i_mmap_rwsem")

This commit didn't do anything wrong but revealed the irq_call_count
problem. IIUC, the commit makes rwc->remap_one in rmap_walk_file
concurrent with multiple threads.  When remap_one is try_to_unmap_one(),
then multiple threads could queue flush TLB to the same CPU but only
one IPI will be sent.

Since the commit was added in Linux v3.19, the counting problem only
shows up from v3.19 onwards.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
Link: http://lkml.kernel.org/r/20160811074430.GA18163@aaronlu.sh.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/hardirq.h | 4 ----
 arch/x86/kernel/irq.c          | 3 +--
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index 7178043..59405a2 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -22,10 +22,6 @@ typedef struct {
 #ifdef CONFIG_SMP
 	unsigned int irq_resched_count;
 	unsigned int irq_call_count;
-	/*
-	 * irq_tlb_count is double-counted in irq_call_count, so it must be
-	 * subtracted from irq_call_count when displaying irq_call_count
-	 */
 	unsigned int irq_tlb_count;
 #endif
 #ifdef CONFIG_X86_THERMAL_VECTOR
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 61521dc..9f669fd 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -102,8 +102,7 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 	seq_puts(p, "  Rescheduling interrupts\n");
 	seq_printf(p, "%*s: ", prec, "CAL");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->irq_call_count -
-					irq_stats(j)->irq_tlb_count);
+		seq_printf(p, "%10u ", irq_stats(j)->irq_call_count);
 	seq_puts(p, "  Function call interrupts\n");
 	seq_printf(p, "%*s: ", prec, "TLB");
 	for_each_online_cpu(j)

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/irq: do not substract irq_tlb_count from irq_call_count
  2016-08-11  7:44 [PATCH] x86/irq: do not substract irq_tlb_count from irq_call_count Aaron Lu
  2016-08-11  9:13 ` Ingo Molnar
  2016-08-11 11:57 ` [tip:x86/urgent] x86/irq: Do " tip-bot for Aaron Lu
@ 2016-08-11 15:13 ` Huang, Ying
  2016-08-12  2:16   ` Aaron Lu
  2 siblings, 1 reply; 6+ messages in thread
From: Huang, Ying @ 2016-08-11 15:13 UTC (permalink / raw)
  To: Aaron Lu
  Cc: x86, linux-kernel, Alex Shi, Tomoki Sekiyama, Davidlohr Bueso,
	Huang Ying, Ingo Molnar, Thomas Gleixner,
	"H. Peter Anvin"

Aaron Lu <aaron.lu@intel.com> writes:

> Since commit 52aec3308db8 ("x86/tlb: replace INVALIDATE_TLB_VECTOR by
> CALL_FUNCTION_VECTOR"), the tlb remote shootdown is done through call
> function vector. That commit didn't take care of irq_tlb_count so later
> commit fd0f5869724f ("x86: Distinguish TLB shootdown interrupts from
> other functions call interrupts") tried to fix it.
>
> The fix assumes every increase of irq_tlb_count has a corresponding
> increase of irq_call_count. So the irq_call_count is always bigger than
> irq_tlb_count and we could substract irq_tlb_count from irq_call_count.
>
> Unfortunately this is not true for the smp_call_function_single case.
> The IPI is only sent if the target CPU's call_single_queue is empty when
> adding a csd into it in generic_exec_single. That means if two threads
> are both adding flush tlb csds to the same CPU's call_single_queue, only
> one IPI is sent. In other words, the irq_call_count is incremented by 1
> but irq_tlb_count is incremented by 2. Over time, irq_tlb_count will be
> bigger than irq_call_count and the substract will produce a very large
> irq_call_count value due to overflow.
>
> Considering that:
> 1 it's not worth to send more IPIs for the sake of accurate counting of
>   irq_call_count in generic_exec_single;
> 2 it's not easy to tell if the call function interrupt is for TLB
>   shootdown in __smp_call_function_single_interrupt.
> Not to exclude TLB shootdown from call function count seems to be the
> simplest fix and this patch just did that.
>
> This is found by LKP's cyclic performance regression tracking recently
> with the vm-scalability test suite. I have bisected to commit
> 0a7ce4b5a632 ("mm/rmap: share the i_mmap_rwsem"). This commit didn't do
> anything wrong but revealed the irq_call_count problem. IIUC, the commit
> makes rwc->remap_one in rmap_walk_file concurrent with multiple threads.
> When remap_one is try_to_unmap_one, then multiple threads could queue
> flush tlb to the same CPU but only one IPI will be sent.
>
> Since the commit enter Linux v3.19, the counting problem only shows up
> from v3.19. Considering this is a behaviour change, I'm not sure if I
> should add the stable tag here.
>
> Signed-off-by: Aaron Lu <aaron.lu@intel.com>

Thanks for fix.  You forget to add :)

Reported-by: "Huang, Ying" <ying.huang@intel.com>

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/irq: do not substract irq_tlb_count from irq_call_count
  2016-08-11 15:13 ` [PATCH] x86/irq: do " Huang, Ying
@ 2016-08-12  2:16   ` Aaron Lu
  0 siblings, 0 replies; 6+ messages in thread
From: Aaron Lu @ 2016-08-12  2:16 UTC (permalink / raw)
  To: Huang, Ying
  Cc: x86, linux-kernel, Alex Shi, Tomoki Sekiyama, Davidlohr Bueso,
	Ingo Molnar, Thomas Gleixner, H. Peter Anvin

On 08/11/2016 11:13 PM, Huang, Ying wrote:
> Aaron Lu <aaron.lu@intel.com> writes:
> 
>> Since commit 52aec3308db8 ("x86/tlb: replace INVALIDATE_TLB_VECTOR by
>> CALL_FUNCTION_VECTOR"), the tlb remote shootdown is done through call
>> function vector. That commit didn't take care of irq_tlb_count so later
>> commit fd0f5869724f ("x86: Distinguish TLB shootdown interrupts from
>> other functions call interrupts") tried to fix it.
>>
>> The fix assumes every increase of irq_tlb_count has a corresponding
>> increase of irq_call_count. So the irq_call_count is always bigger than
>> irq_tlb_count and we could substract irq_tlb_count from irq_call_count.
>>
>> Unfortunately this is not true for the smp_call_function_single case.
>> The IPI is only sent if the target CPU's call_single_queue is empty when
>> adding a csd into it in generic_exec_single. That means if two threads
>> are both adding flush tlb csds to the same CPU's call_single_queue, only
>> one IPI is sent. In other words, the irq_call_count is incremented by 1
>> but irq_tlb_count is incremented by 2. Over time, irq_tlb_count will be
>> bigger than irq_call_count and the substract will produce a very large
>> irq_call_count value due to overflow.
>>
>> Considering that:
>> 1 it's not worth to send more IPIs for the sake of accurate counting of
>>   irq_call_count in generic_exec_single;
>> 2 it's not easy to tell if the call function interrupt is for TLB
>>   shootdown in __smp_call_function_single_interrupt.
>> Not to exclude TLB shootdown from call function count seems to be the
>> simplest fix and this patch just did that.
>>
>> This is found by LKP's cyclic performance regression tracking recently
>> with the vm-scalability test suite. I have bisected to commit
>> 0a7ce4b5a632 ("mm/rmap: share the i_mmap_rwsem"). This commit didn't do
>> anything wrong but revealed the irq_call_count problem. IIUC, the commit
>> makes rwc->remap_one in rmap_walk_file concurrent with multiple threads.
>> When remap_one is try_to_unmap_one, then multiple threads could queue
>> flush tlb to the same CPU but only one IPI will be sent.
>>
>> Since the commit enter Linux v3.19, the counting problem only shows up
>> from v3.19. Considering this is a behaviour change, I'm not sure if I
>> should add the stable tag here.
>>
>> Signed-off-by: Aaron Lu <aaron.lu@intel.com>
> 
> Thanks for fix.  You forget to add :)
> 
> Reported-by: "Huang, Ying" <ying.huang@intel.com>

Oh right, sorry about that.

Regards,
Aaron

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-08-12  2:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-11  7:44 [PATCH] x86/irq: do not substract irq_tlb_count from irq_call_count Aaron Lu
2016-08-11  9:13 ` Ingo Molnar
2016-08-11  9:16   ` Aaron Lu
2016-08-11 11:57 ` [tip:x86/urgent] x86/irq: Do " tip-bot for Aaron Lu
2016-08-11 15:13 ` [PATCH] x86/irq: do " Huang, Ying
2016-08-12  2:16   ` Aaron Lu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.