linux-next-20110923: warning kernel/rcutree.c:1833

All of lore.kernel.org
 help / color / mirror / Atom feed

* linux-next-20110923: warning kernel/rcutree.c:1833
@ 2011-09-25  0:24 Kirill A. Shutemov
  2011-09-25  5:08 ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Kirill A. Shutemov @ 2011-09-25  0:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Frederic Weisbecker, Dipankar Sarma, Paul E. McKenney,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

[   29.974288] ------------[ cut here ]------------
[   29.974308] WARNING: at /home/kas/git/public/linux-next/kernel/rcutree.c:1833 rcu_needs_cpu+0xff
[   29.974316] Hardware name: HP EliteBook 8440p
[   29.974321] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iple_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc rfcomm bnep acpi_cpufreq mperfckd fscache auth_rpcgss nfs_acl sunrpc ext2 loop kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_idtideodev media v4l2_compat_ioctl32 snd_seq bluetooth drm_kms_helper snd_timer tpm_infineon snd_seq_drt tpm_tis hp_accel intel_ips soundcore lis3lv02d tpm rfkill i2c_algo_bit snd_page_alloc i2c_core c16 sha256_generic aesni_intel cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod sg sr_mod sd_mod cd thermal_sys [last unloaded: scsi_wait_scan]
[   29.974517] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc7-next-20110923 #2
[   29.974521] Call Trace:
[   29.974525]  <IRQ>  [<ffffffff8104d72a>] warn_slowpath_common+0x7a/0xb0
[   29.974540]  [<ffffffff8104d775>] warn_slowpath_null+0x15/0x20
[   29.974546]  [<ffffffff810bffdf>] rcu_needs_cpu+0xff/0x110
[   29.974555]  [<ffffffff8108396f>] tick_nohz_stop_sched_tick+0x13f/0x3d0
[   29.974563]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
[   29.974571]  [<ffffffff81055622>] irq_exit+0xa2/0xd0
[   29.974578]  [<ffffffff8101ee75>] smp_apic_timer_interrupt+0x85/0x1c0
[   29.974585]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
[   29.974592]  [<ffffffff81436e1e>] apic_timer_interrupt+0x6e/0x80
[   29.974596]  <EOI>  [<ffffffff81297abd>] ? acpi_hw_read+0x4a/0x51
[   29.974609]  [<ffffffff81087a07>] ? lock_acquire+0xa7/0x160
[   29.974615]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
[   29.974622]  [<ffffffff81432a16>] __atomic_notifier_call_chain+0x56/0xb0
[   29.974631]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
[   29.974642]  [<ffffffff8130ebb6>] ? cpuidle_idle_call+0x106/0x350
[   29.974651]  [<ffffffff81432a81>] atomic_notifier_call_chain+0x11/0x20
[   29.974661]  [<ffffffff81001233>] cpu_idle+0xe3/0x120
[   29.974672]  [<ffffffff8141e34b>] start_secondary+0x1fd/0x204
[   29.974681] ---[ end trace 6c1d44095a3bb7c5 ]---

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-25  0:24 linux-next-20110923: warning kernel/rcutree.c:1833 Kirill A. Shutemov
@ 2011-09-25  5:08 ` Paul E. McKenney
  2011-09-25 11:26   ` Kirill A. Shutemov
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2011-09-25  5:08 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-kernel, Frederic Weisbecker, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Sun, Sep 25, 2011 at 03:24:09AM +0300, Kirill A. Shutemov wrote:
> [   29.974288] ------------[ cut here ]------------
> [   29.974308] WARNING: at /home/kas/git/public/linux-next/kernel/rcutree.c:1833 rcu_needs_cpu+0xff
> [   29.974316] Hardware name: HP EliteBook 8440p
> [   29.974321] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iple_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc rfcomm bnep acpi_cpufreq mperfckd fscache auth_rpcgss nfs_acl sunrpc ext2 loop kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_idtideodev media v4l2_compat_ioctl32 snd_seq bluetooth drm_kms_helper snd_timer tpm_infineon snd_seq_drt tpm_tis hp_accel intel_ips soundcore lis3lv02d tpm rfkill i2c_algo_bit snd_page_alloc i2c_core c16 sha256_generic aesni_intel cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod sg sr_mod sd_mod cd thermal_sys [last unloaded: scsi_wait_scan]
> [   29.974517] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc7-next-20110923 #2
> [   29.974521] Call Trace:
> [   29.974525]  <IRQ>  [<ffffffff8104d72a>] warn_slowpath_common+0x7a/0xb0
> [   29.974540]  [<ffffffff8104d775>] warn_slowpath_null+0x15/0x20
> [   29.974546]  [<ffffffff810bffdf>] rcu_needs_cpu+0xff/0x110
> [   29.974555]  [<ffffffff8108396f>] tick_nohz_stop_sched_tick+0x13f/0x3d0
> [   29.974563]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> [   29.974571]  [<ffffffff81055622>] irq_exit+0xa2/0xd0
> [   29.974578]  [<ffffffff8101ee75>] smp_apic_timer_interrupt+0x85/0x1c0
> [   29.974585]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> [   29.974592]  [<ffffffff81436e1e>] apic_timer_interrupt+0x6e/0x80
> [   29.974596]  <EOI>  [<ffffffff81297abd>] ? acpi_hw_read+0x4a/0x51
> [   29.974609]  [<ffffffff81087a07>] ? lock_acquire+0xa7/0x160
> [   29.974615]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> [   29.974622]  [<ffffffff81432a16>] __atomic_notifier_call_chain+0x56/0xb0
> [   29.974631]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> [   29.974642]  [<ffffffff8130ebb6>] ? cpuidle_idle_call+0x106/0x350
> [   29.974651]  [<ffffffff81432a81>] atomic_notifier_call_chain+0x11/0x20
> [   29.974661]  [<ffffffff81001233>] cpu_idle+0xe3/0x120
> [   29.974672]  [<ffffffff8141e34b>] start_secondary+0x1fd/0x204
> [   29.974681] ---[ end trace 6c1d44095a3bb7c5 ]---

Do the following help?

	https://lkml.org/lkml/2011/9/17/47
	https://lkml.org/lkml/2011/9/17/45
	https://lkml.org/lkml/2011/9/17/43

							Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-25  5:08 ` Paul E. McKenney
@ 2011-09-25 11:26   ` Kirill A. Shutemov
  2011-09-25 13:06     ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Kirill A. Shutemov @ 2011-09-25 11:26 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Frederic Weisbecker, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Sat, Sep 24, 2011 at 10:08:26PM -0700, Paul E. McKenney wrote:
> On Sun, Sep 25, 2011 at 03:24:09AM +0300, Kirill A. Shutemov wrote:
> > [   29.974288] ------------[ cut here ]------------
> > [   29.974308] WARNING: at /home/kas/git/public/linux-next/kernel/rcutree.c:1833 rcu_needs_cpu+0xff
> > [   29.974316] Hardware name: HP EliteBook 8440p
> > [   29.974321] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iple_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc rfcomm bnep acpi_cpufreq mperfckd fscache auth_rpcgss nfs_acl sunrpc ext2 loop kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_idtideodev media v4l2_compat_ioctl32 snd_seq bluetooth drm_kms_helper snd_timer tpm_infineon snd_seq_drt tpm_tis hp_accel intel_ips soundcore lis3lv02d tpm rfkill i2c_algo_bit snd_page_alloc i2c_core c16 sha256_generic aesni_intel cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod sg sr_mod sd_mod cd thermal_sys [last unloaded: scsi_wait_scan]
> > [   29.974517] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc7-next-20110923 #2
> > [   29.974521] Call Trace:
> > [   29.974525]  <IRQ>  [<ffffffff8104d72a>] warn_slowpath_common+0x7a/0xb0
> > [   29.974540]  [<ffffffff8104d775>] warn_slowpath_null+0x15/0x20
> > [   29.974546]  [<ffffffff810bffdf>] rcu_needs_cpu+0xff/0x110
> > [   29.974555]  [<ffffffff8108396f>] tick_nohz_stop_sched_tick+0x13f/0x3d0
> > [   29.974563]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > [   29.974571]  [<ffffffff81055622>] irq_exit+0xa2/0xd0
> > [   29.974578]  [<ffffffff8101ee75>] smp_apic_timer_interrupt+0x85/0x1c0
> > [   29.974585]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > [   29.974592]  [<ffffffff81436e1e>] apic_timer_interrupt+0x6e/0x80
> > [   29.974596]  <EOI>  [<ffffffff81297abd>] ? acpi_hw_read+0x4a/0x51
> > [   29.974609]  [<ffffffff81087a07>] ? lock_acquire+0xa7/0x160
> > [   29.974615]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > [   29.974622]  [<ffffffff81432a16>] __atomic_notifier_call_chain+0x56/0xb0
> > [   29.974631]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > [   29.974642]  [<ffffffff8130ebb6>] ? cpuidle_idle_call+0x106/0x350
> > [   29.974651]  [<ffffffff81432a81>] atomic_notifier_call_chain+0x11/0x20
> > [   29.974661]  [<ffffffff81001233>] cpu_idle+0xe3/0x120
> > [   29.974672]  [<ffffffff8141e34b>] start_secondary+0x1fd/0x204
> > [   29.974681] ---[ end trace 6c1d44095a3bb7c5 ]---
> 
> Do the following help?
> 
> 	https://lkml.org/lkml/2011/9/17/47
> 	https://lkml.org/lkml/2011/9/17/45
> 	https://lkml.org/lkml/2011/9/17/43

Yes. Thanks.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-25 11:26   ` Kirill A. Shutemov
@ 2011-09-25 13:06     ` Frederic Weisbecker
  2011-09-25 14:19       ` Kirill A. Shutemov
  2011-09-25 16:48       ` Paul E. McKenney
  0 siblings, 2 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2011-09-25 13:06 UTC (permalink / raw)
  To: Kirill A. Shutemov, Paul E. McKenney
  Cc: linux-kernel, Dipankar Sarma, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Lai Jiangshan

On Sun, Sep 25, 2011 at 02:26:37PM +0300, Kirill A. Shutemov wrote:
> On Sat, Sep 24, 2011 at 10:08:26PM -0700, Paul E. McKenney wrote:
> > On Sun, Sep 25, 2011 at 03:24:09AM +0300, Kirill A. Shutemov wrote:
> > > [   29.974288] ------------[ cut here ]------------
> > > [   29.974308] WARNING: at /home/kas/git/public/linux-next/kernel/rcutree.c:1833 rcu_needs_cpu+0xff
> > > [   29.974316] Hardware name: HP EliteBook 8440p
> > > [   29.974321] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iple_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc rfcomm bnep acpi_cpufreq mperfckd fscache auth_rpcgss nfs_acl sunrpc ext2 loop kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_idtideodev media v4l2_compat_ioctl32 snd_seq bluetooth drm_kms_helper snd_timer tpm_infineon snd_seq_drt tpm_tis hp_accel intel_ips soundcore lis3lv02d tpm rfkill i2c_algo_bit snd_page_alloc i2c_core c16 sha256_generic aesni_intel cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod sg sr_mod sd_mod cd thermal_sys [last unloaded: scsi_wait_scan]
> > > [   29.974517] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc7-next-20110923 #2
> > > [   29.974521] Call Trace:
> > > [   29.974525]  <IRQ>  [<ffffffff8104d72a>] warn_slowpath_common+0x7a/0xb0
> > > [   29.974540]  [<ffffffff8104d775>] warn_slowpath_null+0x15/0x20
> > > [   29.974546]  [<ffffffff810bffdf>] rcu_needs_cpu+0xff/0x110
> > > [   29.974555]  [<ffffffff8108396f>] tick_nohz_stop_sched_tick+0x13f/0x3d0
> > > [   29.974563]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > [   29.974571]  [<ffffffff81055622>] irq_exit+0xa2/0xd0
> > > [   29.974578]  [<ffffffff8101ee75>] smp_apic_timer_interrupt+0x85/0x1c0
> > > [   29.974585]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > [   29.974592]  [<ffffffff81436e1e>] apic_timer_interrupt+0x6e/0x80
> > > [   29.974596]  <EOI>  [<ffffffff81297abd>] ? acpi_hw_read+0x4a/0x51
> > > [   29.974609]  [<ffffffff81087a07>] ? lock_acquire+0xa7/0x160
> > > [   29.974615]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > [   29.974622]  [<ffffffff81432a16>] __atomic_notifier_call_chain+0x56/0xb0
> > > [   29.974631]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > [   29.974642]  [<ffffffff8130ebb6>] ? cpuidle_idle_call+0x106/0x350
> > > [   29.974651]  [<ffffffff81432a81>] atomic_notifier_call_chain+0x11/0x20
> > > [   29.974661]  [<ffffffff81001233>] cpu_idle+0xe3/0x120
> > > [   29.974672]  [<ffffffff8141e34b>] start_secondary+0x1fd/0x204
> > > [   29.974681] ---[ end trace 6c1d44095a3bb7c5 ]---
> > 
> > Do the following help?
> > 
> > 	https://lkml.org/lkml/2011/9/17/47
> > 	https://lkml.org/lkml/2011/9/17/45
> > 	https://lkml.org/lkml/2011/9/17/43
> 
> Yes. Thanks.

I believe that doesn't really fix the issue. But the warning is not
easy to trigger. You simply haven't hit it by chance after applying
the patches.

This happens when the idle notifier callchain is called in idle
and is interrupted in the middle. So we have called rcu_read_lock()
but haven't yet released with rcu_read_unlock(), and in the end
of the interrupt we call tick_nohz_stop_sched_tick() -> rcu_needs_cpu()
which is illegal while in an rcu read side critical section.

No idea how to solve that. Any use of RCU after the tick gets stopped
is concerned here. If it is really required that rcu_needs_cpu() can't
be called in an rcu read side critical sectionn then it's not going
to be easy to fix.

But I don't really understand that requirement. rcu_needs_cpu() simply
checks if we don't have callbacks to handle. So I don't understand how
read side is concerned. It's rather the write side.
The rule I can imagine instead is: don't call __call_rcu() once the tick is
stopped.

But I'm certainly missing something.

Paul?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-25 13:06     ` Frederic Weisbecker
@ 2011-09-25 14:19       ` Kirill A. Shutemov
  2011-09-25 16:48       ` Paul E. McKenney
  1 sibling, 0 replies; 57+ messages in thread
From: Kirill A. Shutemov @ 2011-09-25 14:19 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, linux-kernel, Dipankar Sarma, Thomas Gleixner,
	Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Sun, Sep 25, 2011 at 03:06:25PM +0200, Frederic Weisbecker wrote:
> On Sun, Sep 25, 2011 at 02:26:37PM +0300, Kirill A. Shutemov wrote:
> > On Sat, Sep 24, 2011 at 10:08:26PM -0700, Paul E. McKenney wrote:
> > > On Sun, Sep 25, 2011 at 03:24:09AM +0300, Kirill A. Shutemov wrote:
> > > > [   29.974288] ------------[ cut here ]------------
> > > > [   29.974308] WARNING: at /home/kas/git/public/linux-next/kernel/rcutree.c:1833 rcu_needs_cpu+0xff
> > > > [   29.974316] Hardware name: HP EliteBook 8440p
> > > > [   29.974321] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iple_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc rfcomm bnep acpi_cpufreq mperfckd fscache auth_rpcgss nfs_acl sunrpc ext2 loop kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_idtideodev media v4l2_compat_ioctl32 snd_seq bluetooth drm_kms_helper snd_timer tpm_infineon snd_seq_drt tpm_tis hp_accel intel_ips soundcore lis3lv02d tpm rfkill i2c_algo_bit snd_page_alloc i2c_core c16 sha256_generic aesni_intel cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod sg sr_mod sd_mod cd thermal_sys [last unloaded: scsi_wait_scan]
> > > > [   29.974517] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc7-next-20110923 #2
> > > > [   29.974521] Call Trace:
> > > > [   29.974525]  <IRQ>  [<ffffffff8104d72a>] warn_slowpath_common+0x7a/0xb0
> > > > [   29.974540]  [<ffffffff8104d775>] warn_slowpath_null+0x15/0x20
> > > > [   29.974546]  [<ffffffff810bffdf>] rcu_needs_cpu+0xff/0x110
> > > > [   29.974555]  [<ffffffff8108396f>] tick_nohz_stop_sched_tick+0x13f/0x3d0
> > > > [   29.974563]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > [   29.974571]  [<ffffffff81055622>] irq_exit+0xa2/0xd0
> > > > [   29.974578]  [<ffffffff8101ee75>] smp_apic_timer_interrupt+0x85/0x1c0
> > > > [   29.974585]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > [   29.974592]  [<ffffffff81436e1e>] apic_timer_interrupt+0x6e/0x80
> > > > [   29.974596]  <EOI>  [<ffffffff81297abd>] ? acpi_hw_read+0x4a/0x51
> > > > [   29.974609]  [<ffffffff81087a07>] ? lock_acquire+0xa7/0x160
> > > > [   29.974615]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > [   29.974622]  [<ffffffff81432a16>] __atomic_notifier_call_chain+0x56/0xb0
> > > > [   29.974631]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > [   29.974642]  [<ffffffff8130ebb6>] ? cpuidle_idle_call+0x106/0x350
> > > > [   29.974651]  [<ffffffff81432a81>] atomic_notifier_call_chain+0x11/0x20
> > > > [   29.974661]  [<ffffffff81001233>] cpu_idle+0xe3/0x120
> > > > [   29.974672]  [<ffffffff8141e34b>] start_secondary+0x1fd/0x204
> > > > [   29.974681] ---[ end trace 6c1d44095a3bb7c5 ]---
> > > 
> > > Do the following help?
> > > 
> > > 	https://lkml.org/lkml/2011/9/17/47
> > > 	https://lkml.org/lkml/2011/9/17/45
> > > 	https://lkml.org/lkml/2011/9/17/43
> > 
> > Yes. Thanks.
> 
> I believe that doesn't really fix the issue. But the warning is not
> easy to trigger. You simply haven't hit it by chance after applying
> the patches.

Triggered a bit later:

[  154.617416] ------------[ cut here ]------------
[  154.617437] WARNING: at /home/kas/git/public/linux-next/kernel/rcutree.c:1833 rcu_needs_cpu+0xff/0x110()
[  154.617444] Hardware name: HP EliteBook 8440p
[  154.617449] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc acpi_cpufreq mperf rfcomm bnep cpufreq_powersave cpufreq_userspace cpufreq_conservative cpufreq_stats binfmt_misc fuse nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc ext2 loop kvm_intel kvm i915 arc4 iwlagn mac80211 snd_hda_codec_hdmi snd_hda_codec_idt drm_kms_helper snd_hda_intel pcmcia drm snd_hda_codec uvcvideo snd_hwdep ecb snd_pcm videodev tpm_infineon btusb snd_seq media bluetooth v4l2_compat_ioctl32 cfg80211 snd_timer snd_seq_device yenta_socket evdev pcmcia_rsrc psmouse snd parport_pc pcmcia_core parport serio_raw hp_accel intel_ips rfkill i2c_algo_bit tpm_tis lis3lv02d i2c_core soundcore tpm container tpm_bios snd_page_alloc input_polldev video battery ac power_supply processor button ext4 mbcache jbd2 crc16 sha256_generic aesni_intel cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod sg sd_mod sr_mod cdrom crc_t10dif sdhci_pci sdhci mmc_core ahci libahci libata scsi_mod ehci_hcd e1000e usbcore thermal thermal_sys [last unloaded: scsi_wait_scan]
[  154.617681] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc7-next-20110923+ #3
[  154.617685] Call Trace:
[  154.617688]  <IRQ>  [<ffffffff8104d72a>] warn_slowpath_common+0x7a/0xb0
[  154.617704]  [<ffffffff8104d775>] warn_slowpath_null+0x15/0x20
[  154.617711]  [<ffffffff810c00df>] rcu_needs_cpu+0xff/0x110
[  154.617719]  [<ffffffff81083855>] tick_nohz_stop_sched_tick.isra.9+0x105/0x3c0
[  154.617726]  [<ffffffff81083cae>] tick_nohz_irq_exit+0x2e/0x40
[  154.617732]  [<ffffffff81055620>] irq_exit+0xa0/0xd0
[  154.617740]  [<ffffffff8100412e>] do_IRQ+0x5e/0xd0
[  154.617748]  [<ffffffff81432ac0>] ? notifier_call_chain+0x70/0x70
[  154.617756]  [<ffffffff8142f66e>] common_interrupt+0x6e/0x6e
[  154.617759]  <EOI>  [<ffffffff81297bbd>] ? acpi_hw_read+0x4a/0x51
[  154.617772]  [<ffffffff81087b07>] ? lock_acquire+0xa7/0x160
[  154.617777]  [<ffffffff81432ac0>] ? notifier_call_chain+0x70/0x70
[  154.617784]  [<ffffffff81432b16>] __atomic_notifier_call_chain+0x56/0xb0
[  154.617789]  [<ffffffff81432ac0>] ? notifier_call_chain+0x70/0x70
[  154.617797]  [<ffffffff8130ecb6>] ? cpuidle_idle_call+0x106/0x350
[  154.617803]  [<ffffffff81432b81>] atomic_notifier_call_chain+0x11/0x20
[  154.617809]  [<ffffffff81001233>] cpu_idle+0xe3/0x120
[  154.617818]  [<ffffffff8141e44b>] start_secondary+0x1fd/0x204
[  154.617823] ---[ end trace 64c64a258e1aa463 ]---

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-25 13:06     ` Frederic Weisbecker
  2011-09-25 14:19       ` Kirill A. Shutemov
@ 2011-09-25 16:48       ` Paul E. McKenney
  2011-09-26  1:04         ` Frederic Weisbecker
  1 sibling, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2011-09-25 16:48 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Sun, Sep 25, 2011 at 03:06:25PM +0200, Frederic Weisbecker wrote:
> On Sun, Sep 25, 2011 at 02:26:37PM +0300, Kirill A. Shutemov wrote:
> > On Sat, Sep 24, 2011 at 10:08:26PM -0700, Paul E. McKenney wrote:
> > > On Sun, Sep 25, 2011 at 03:24:09AM +0300, Kirill A. Shutemov wrote:
> > > > [   29.974288] ------------[ cut here ]------------
> > > > [   29.974308] WARNING: at /home/kas/git/public/linux-next/kernel/rcutree.c:1833 rcu_needs_cpu+0xff
> > > > [   29.974316] Hardware name: HP EliteBook 8440p
> > > > [   29.974321] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iple_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc rfcomm bnep acpi_cpufreq mperfckd fscache auth_rpcgss nfs_acl sunrpc ext2 loop kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_idtideodev media v4l2_compat_ioctl32 snd_seq bluetooth drm_kms_helper snd_timer tpm_infineon snd_seq_drt tpm_tis hp_accel intel_ips soundcore lis3lv02d tpm rfkill i2c_algo_bit snd_page_alloc i2c_core c16 sha256_generic aesni_intel cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod sg sr_mod sd_mod cd thermal_sys [last unloaded: scsi_wait_scan]
> > > > [   29.974517] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc7-next-20110923 #2
> > > > [   29.974521] Call Trace:
> > > > [   29.974525]  <IRQ>  [<ffffffff8104d72a>] warn_slowpath_common+0x7a/0xb0
> > > > [   29.974540]  [<ffffffff8104d775>] warn_slowpath_null+0x15/0x20
> > > > [   29.974546]  [<ffffffff810bffdf>] rcu_needs_cpu+0xff/0x110
> > > > [   29.974555]  [<ffffffff8108396f>] tick_nohz_stop_sched_tick+0x13f/0x3d0
> > > > [   29.974563]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > [   29.974571]  [<ffffffff81055622>] irq_exit+0xa2/0xd0
> > > > [   29.974578]  [<ffffffff8101ee75>] smp_apic_timer_interrupt+0x85/0x1c0
> > > > [   29.974585]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > [   29.974592]  [<ffffffff81436e1e>] apic_timer_interrupt+0x6e/0x80
> > > > [   29.974596]  <EOI>  [<ffffffff81297abd>] ? acpi_hw_read+0x4a/0x51
> > > > [   29.974609]  [<ffffffff81087a07>] ? lock_acquire+0xa7/0x160
> > > > [   29.974615]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > [   29.974622]  [<ffffffff81432a16>] __atomic_notifier_call_chain+0x56/0xb0
> > > > [   29.974631]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > [   29.974642]  [<ffffffff8130ebb6>] ? cpuidle_idle_call+0x106/0x350
> > > > [   29.974651]  [<ffffffff81432a81>] atomic_notifier_call_chain+0x11/0x20
> > > > [   29.974661]  [<ffffffff81001233>] cpu_idle+0xe3/0x120
> > > > [   29.974672]  [<ffffffff8141e34b>] start_secondary+0x1fd/0x204
> > > > [   29.974681] ---[ end trace 6c1d44095a3bb7c5 ]---
> > > 
> > > Do the following help?
> > > 
> > > 	https://lkml.org/lkml/2011/9/17/47
> > > 	https://lkml.org/lkml/2011/9/17/45
> > > 	https://lkml.org/lkml/2011/9/17/43
> > 
> > Yes. Thanks.
> 
> I believe that doesn't really fix the issue. But the warning is not
> easy to trigger. You simply haven't hit it by chance after applying
> the patches.
> 
> This happens when the idle notifier callchain is called in idle
> and is interrupted in the middle. So we have called rcu_read_lock()
> but haven't yet released with rcu_read_unlock(), and in the end
> of the interrupt we call tick_nohz_stop_sched_tick() -> rcu_needs_cpu()
> which is illegal while in an rcu read side critical section.
> 
> No idea how to solve that. Any use of RCU after the tick gets stopped
> is concerned here. If it is really required that rcu_needs_cpu() can't
> be called in an rcu read side critical sectionn then it's not going
> to be easy to fix.
> 
> But I don't really understand that requirement. rcu_needs_cpu() simply
> checks if we don't have callbacks to handle. So I don't understand how
> read side is concerned. It's rather the write side.
> The rule I can imagine instead is: don't call __call_rcu() once the tick is
> stopped.
> 
> But I'm certainly missing something.
> 
> Paul?

This is required for RCU_FAST_NO_HZ, which checks to see whether the
current CPU can accelerate the current grace period so as to enter
dyntick-idle mode sooner than it would otherwise.  This takes effect
in the situation where rcu_needs_cpu() sees that there are callbacks.
It then notes a quiescent state (which is illegal in an RCU read-side
critical section), calls force_quiescent_state(), and so on.  For this
to work, the current CPU must be in an RCU read-side critical section.

If this cannot be made to work, another option is to call a new RCU
function in the case where rcu_needs_cpu() returned false, but after
the RCU read-side critical section has exited.  This new RCU function
could then attempt to rearrange RCU so as to allow the CPU to enter
dyntick-idle mode more quickly.  It is more important for this to
happen when the CPU is going idle than when it is executing a user
process.

So, is this doable?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-25 16:48       ` Paul E. McKenney
@ 2011-09-26  1:04         ` Frederic Weisbecker
  2011-09-26  1:10           ` Frederic Weisbecker
  2011-09-26  1:25           ` Paul E. McKenney
  0 siblings, 2 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2011-09-26  1:04 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Sun, Sep 25, 2011 at 09:48:04AM -0700, Paul E. McKenney wrote:
> On Sun, Sep 25, 2011 at 03:06:25PM +0200, Frederic Weisbecker wrote:
> > On Sun, Sep 25, 2011 at 02:26:37PM +0300, Kirill A. Shutemov wrote:
> > > On Sat, Sep 24, 2011 at 10:08:26PM -0700, Paul E. McKenney wrote:
> > > > On Sun, Sep 25, 2011 at 03:24:09AM +0300, Kirill A. Shutemov wrote:
> > > > > [   29.974288] ------------[ cut here ]------------
> > > > > [   29.974308] WARNING: at /home/kas/git/public/linux-next/kernel/rcutree.c:1833 rcu_needs_cpu+0xff
> > > > > [   29.974316] Hardware name: HP EliteBook 8440p
> > > > > [   29.974321] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iple_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc rfcomm bnep acpi_cpufreq mperfckd fscache auth_rpcgss nfs_acl sunrpc ext2 loop kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_idtideodev media v4l2_compat_ioctl32 snd_seq bluetooth drm_kms_helper snd_timer tpm_infineon snd_seq_drt tpm_tis hp_accel intel_ips soundcore lis3lv02d tpm rfkill i2c_algo_bit snd_page_alloc i2c_core c16 sha256_generic aesni_intel cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod sg sr_mod sd_mod cd thermal_sys [last unloaded: scsi_wait_scan]
> > > > > [   29.974517] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc7-next-20110923 #2
> > > > > [   29.974521] Call Trace:
> > > > > [   29.974525]  <IRQ>  [<ffffffff8104d72a>] warn_slowpath_common+0x7a/0xb0
> > > > > [   29.974540]  [<ffffffff8104d775>] warn_slowpath_null+0x15/0x20
> > > > > [   29.974546]  [<ffffffff810bffdf>] rcu_needs_cpu+0xff/0x110
> > > > > [   29.974555]  [<ffffffff8108396f>] tick_nohz_stop_sched_tick+0x13f/0x3d0
> > > > > [   29.974563]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > [   29.974571]  [<ffffffff81055622>] irq_exit+0xa2/0xd0
> > > > > [   29.974578]  [<ffffffff8101ee75>] smp_apic_timer_interrupt+0x85/0x1c0
> > > > > [   29.974585]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > [   29.974592]  [<ffffffff81436e1e>] apic_timer_interrupt+0x6e/0x80
> > > > > [   29.974596]  <EOI>  [<ffffffff81297abd>] ? acpi_hw_read+0x4a/0x51
> > > > > [   29.974609]  [<ffffffff81087a07>] ? lock_acquire+0xa7/0x160
> > > > > [   29.974615]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > [   29.974622]  [<ffffffff81432a16>] __atomic_notifier_call_chain+0x56/0xb0
> > > > > [   29.974631]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > [   29.974642]  [<ffffffff8130ebb6>] ? cpuidle_idle_call+0x106/0x350
> > > > > [   29.974651]  [<ffffffff81432a81>] atomic_notifier_call_chain+0x11/0x20
> > > > > [   29.974661]  [<ffffffff81001233>] cpu_idle+0xe3/0x120
> > > > > [   29.974672]  [<ffffffff8141e34b>] start_secondary+0x1fd/0x204
> > > > > [   29.974681] ---[ end trace 6c1d44095a3bb7c5 ]---
> > > > 
> > > > Do the following help?
> > > > 
> > > > 	https://lkml.org/lkml/2011/9/17/47
> > > > 	https://lkml.org/lkml/2011/9/17/45
> > > > 	https://lkml.org/lkml/2011/9/17/43
> > > 
> > > Yes. Thanks.
> > 
> > I believe that doesn't really fix the issue. But the warning is not
> > easy to trigger. You simply haven't hit it by chance after applying
> > the patches.
> > 
> > This happens when the idle notifier callchain is called in idle
> > and is interrupted in the middle. So we have called rcu_read_lock()
> > but haven't yet released with rcu_read_unlock(), and in the end
> > of the interrupt we call tick_nohz_stop_sched_tick() -> rcu_needs_cpu()
> > which is illegal while in an rcu read side critical section.
> > 
> > No idea how to solve that. Any use of RCU after the tick gets stopped
> > is concerned here. If it is really required that rcu_needs_cpu() can't
> > be called in an rcu read side critical sectionn then it's not going
> > to be easy to fix.
> > 
> > But I don't really understand that requirement. rcu_needs_cpu() simply
> > checks if we don't have callbacks to handle. So I don't understand how
> > read side is concerned. It's rather the write side.
> > The rule I can imagine instead is: don't call __call_rcu() once the tick is
> > stopped.
> > 
> > But I'm certainly missing something.
> > 
> > Paul?
> 
> This is required for RCU_FAST_NO_HZ, which checks to see whether the
> current CPU can accelerate the current grace period so as to enter
> dyntick-idle mode sooner than it would otherwise.  This takes effect
> in the situation where rcu_needs_cpu() sees that there are callbacks.
> It then notes a quiescent state (which is illegal in an RCU read-side
> critical section), calls force_quiescent_state(), and so on.  For this
> to work, the current CPU must be in an RCU read-side critical section.

You mean it must *not* be in an RCU read-side critical section (ie: in a
quiescent state)?

That assumption at least fails anytime in idle for the RCU
sched flavour given that preemption is disabled in the idle loop.

> If this cannot be made to work, another option is to call a new RCU
> function in the case where rcu_needs_cpu() returned false, but after
> the RCU read-side critical section has exited.

You mean when rcu_needs_cpu() returns true (when we have callbacks
enqueued)?

> This new RCU function
> could then attempt to rearrange RCU so as to allow the CPU to enter
> dyntick-idle mode more quickly.  It is more important for this to
> happen when the CPU is going idle than when it is executing a user
> process.
> 
> So, is this doable?

At least not when we have RCU sched callbacks enqueued, given preemption
is disabled. But that sounds plausible in order to accelerate the switch
to dyntick-idle mode when we only have rcu and/or rcu bh callbacks.

So if I understand correctly we would check if we are in an rcu read side
critical section when we call rcu_needs_cpu(). If so then we keep
the tick alive. Afterward when we exit the rcu read side critical section
(rcu_read_unlock/local_bh_enable), we notice that specific state and
we try to accelerate the rcu callbacks processing from there to switch
to dynticks idle mode, right?

So that requires some specific counter in rcu_read_lock() for the
!CONFIG_PREEMPT case so that we know if we are interrupting an
rcu read side critical section from rcu_needs_cpu(). For the
bh case we probably can just check in_softirq().

Also if we know we are interrupting a read side section, why not just
keep the tick alive and retry the next tick? Interrupting such
section looks rare enough that it wouldn't have much impact
and that avoids specific hooks in rcu_read_unlock() and local_bh_enable().

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-26  1:04         ` Frederic Weisbecker
@ 2011-09-26  1:10           ` Frederic Weisbecker
  2011-09-26  1:26             ` Paul E. McKenney
  2011-09-26  1:25           ` Paul E. McKenney
  1 sibling, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2011-09-26  1:10 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

2011/9/26 Frederic Weisbecker <fweisbec@gmail.com>:
> On Sun, Sep 25, 2011 at 09:48:04AM -0700, Paul E. McKenney wrote:
>> This is required for RCU_FAST_NO_HZ, which checks to see whether the
>> current CPU can accelerate the current grace period so as to enter
>> dyntick-idle mode sooner than it would otherwise.  This takes effect
>> in the situation where rcu_needs_cpu() sees that there are callbacks.
>> It then notes a quiescent state (which is illegal in an RCU read-side
>> critical section), calls force_quiescent_state(), and so on.  For this
>> to work, the current CPU must be in an RCU read-side critical section.
>
> You mean it must *not* be in an RCU read-side critical section (ie: in a
> quiescent state)?
>
> That assumption at least fails anytime in idle for the RCU
> sched flavour given that preemption is disabled in the idle loop.
>
>> If this cannot be made to work, another option is to call a new RCU
>> function in the case where rcu_needs_cpu() returned false, but after
>> the RCU read-side critical section has exited.
>
> You mean when rcu_needs_cpu() returns true (when we have callbacks
> enqueued)?
>
>> This new RCU function
>> could then attempt to rearrange RCU so as to allow the CPU to enter
>> dyntick-idle mode more quickly.  It is more important for this to
>> happen when the CPU is going idle than when it is executing a user
>> process.
>>
>> So, is this doable?
>
> At least not when we have RCU sched callbacks enqueued, given preemption
> is disabled. But that sounds plausible in order to accelerate the switch
> to dyntick-idle mode when we only have rcu and/or rcu bh callbacks.

But the RCU sched case could be dealt with if we embrace every use of
it with rcu_read_lock_sched() and rcu_read_unlock_sched(), or some light
version that just increases a local counter that rcu_needs_cpu() could check.

It's an easy thing to add: we can ensure preempt is disabled when we call it
and we can force rcu_dereference_sched() to depend on it.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-26  1:04         ` Frederic Weisbecker
  2011-09-26  1:10           ` Frederic Weisbecker
@ 2011-09-26  1:25           ` Paul E. McKenney
  2011-09-26  8:48             ` Frederic Weisbecker
  2011-09-26  8:49             ` Frederic Weisbecker
  1 sibling, 2 replies; 57+ messages in thread
From: Paul E. McKenney @ 2011-09-26  1:25 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Mon, Sep 26, 2011 at 03:04:21AM +0200, Frederic Weisbecker wrote:
> On Sun, Sep 25, 2011 at 09:48:04AM -0700, Paul E. McKenney wrote:
> > On Sun, Sep 25, 2011 at 03:06:25PM +0200, Frederic Weisbecker wrote:
> > > On Sun, Sep 25, 2011 at 02:26:37PM +0300, Kirill A. Shutemov wrote:
> > > > On Sat, Sep 24, 2011 at 10:08:26PM -0700, Paul E. McKenney wrote:
> > > > > On Sun, Sep 25, 2011 at 03:24:09AM +0300, Kirill A. Shutemov wrote:
> > > > > > [   29.974288] ------------[ cut here ]------------
> > > > > > [   29.974308] WARNING: at /home/kas/git/public/linux-next/kernel/rcutree.c:1833 rcu_needs_cpu+0xff
> > > > > > [   29.974316] Hardware name: HP EliteBook 8440p
> > > > > > [   29.974321] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iple_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc rfcomm bnep acpi_cpufreq mperfckd fscache auth_rpcgss nfs_acl sunrpc ext2 loop kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_idtideodev media v4l2_compat_ioctl32 snd_seq bluetooth drm_kms_helper snd_timer tpm_infineon snd_seq_drt tpm_tis hp_accel intel_ips soundcore lis3lv02d tpm rfkill i2c_algo_bit snd_page_alloc i2c_core c16 sha256_generic aesni_intel cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod sg sr_mod sd_mod cd thermal_sys [last unloaded: scsi_wait_scan]
> > > > > > [   29.974517] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc7-next-20110923 #2
> > > > > > [   29.974521] Call Trace:
> > > > > > [   29.974525]  <IRQ>  [<ffffffff8104d72a>] warn_slowpath_common+0x7a/0xb0
> > > > > > [   29.974540]  [<ffffffff8104d775>] warn_slowpath_null+0x15/0x20
> > > > > > [   29.974546]  [<ffffffff810bffdf>] rcu_needs_cpu+0xff/0x110
> > > > > > [   29.974555]  [<ffffffff8108396f>] tick_nohz_stop_sched_tick+0x13f/0x3d0
> > > > > > [   29.974563]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > > [   29.974571]  [<ffffffff81055622>] irq_exit+0xa2/0xd0
> > > > > > [   29.974578]  [<ffffffff8101ee75>] smp_apic_timer_interrupt+0x85/0x1c0
> > > > > > [   29.974585]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > > [   29.974592]  [<ffffffff81436e1e>] apic_timer_interrupt+0x6e/0x80
> > > > > > [   29.974596]  <EOI>  [<ffffffff81297abd>] ? acpi_hw_read+0x4a/0x51
> > > > > > [   29.974609]  [<ffffffff81087a07>] ? lock_acquire+0xa7/0x160
> > > > > > [   29.974615]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > > [   29.974622]  [<ffffffff81432a16>] __atomic_notifier_call_chain+0x56/0xb0
> > > > > > [   29.974631]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > > [   29.974642]  [<ffffffff8130ebb6>] ? cpuidle_idle_call+0x106/0x350
> > > > > > [   29.974651]  [<ffffffff81432a81>] atomic_notifier_call_chain+0x11/0x20
> > > > > > [   29.974661]  [<ffffffff81001233>] cpu_idle+0xe3/0x120
> > > > > > [   29.974672]  [<ffffffff8141e34b>] start_secondary+0x1fd/0x204
> > > > > > [   29.974681] ---[ end trace 6c1d44095a3bb7c5 ]---
> > > > > 
> > > > > Do the following help?
> > > > > 
> > > > > 	https://lkml.org/lkml/2011/9/17/47
> > > > > 	https://lkml.org/lkml/2011/9/17/45
> > > > > 	https://lkml.org/lkml/2011/9/17/43
> > > > 
> > > > Yes. Thanks.
> > > 
> > > I believe that doesn't really fix the issue. But the warning is not
> > > easy to trigger. You simply haven't hit it by chance after applying
> > > the patches.
> > > 
> > > This happens when the idle notifier callchain is called in idle
> > > and is interrupted in the middle. So we have called rcu_read_lock()
> > > but haven't yet released with rcu_read_unlock(), and in the end
> > > of the interrupt we call tick_nohz_stop_sched_tick() -> rcu_needs_cpu()
> > > which is illegal while in an rcu read side critical section.
> > > 
> > > No idea how to solve that. Any use of RCU after the tick gets stopped
> > > is concerned here. If it is really required that rcu_needs_cpu() can't
> > > be called in an rcu read side critical sectionn then it's not going
> > > to be easy to fix.
> > > 
> > > But I don't really understand that requirement. rcu_needs_cpu() simply
> > > checks if we don't have callbacks to handle. So I don't understand how
> > > read side is concerned. It's rather the write side.
> > > The rule I can imagine instead is: don't call __call_rcu() once the tick is
> > > stopped.
> > > 
> > > But I'm certainly missing something.
> > > 
> > > Paul?
> > 
> > This is required for RCU_FAST_NO_HZ, which checks to see whether the
> > current CPU can accelerate the current grace period so as to enter
> > dyntick-idle mode sooner than it would otherwise.  This takes effect
> > in the situation where rcu_needs_cpu() sees that there are callbacks.
> > It then notes a quiescent state (which is illegal in an RCU read-side
> > critical section), calls force_quiescent_state(), and so on.  For this
> > to work, the current CPU must be in an RCU read-side critical section.
> 
> You mean it must *not* be in an RCU read-side critical section (ie: in a
> quiescent state)?

Yes, you are right, it must -not- be in an RCU read-side critical section.

> That assumption at least fails anytime in idle for the RCU
> sched flavour given that preemption is disabled in the idle loop.

Except that the idle loop is a quiescent state.

> > If this cannot be made to work, another option is to call a new RCU
> > function in the case where rcu_needs_cpu() returned false, but after
> > the RCU read-side critical section has exited.
> 
> You mean when rcu_needs_cpu() returns true (when we have callbacks
> enqueued)?

Yes.  I definitely am having problems with polarity this weekend.  :-/

> > This new RCU function
> > could then attempt to rearrange RCU so as to allow the CPU to enter
> > dyntick-idle mode more quickly.  It is more important for this to
> > happen when the CPU is going idle than when it is executing a user
> > process.
> > 
> > So, is this doable?
> 
> At least not when we have RCU sched callbacks enqueued, given preemption
> is disabled. But that sounds plausible in order to accelerate the switch
> to dyntick-idle mode when we only have rcu and/or rcu bh callbacks.

Again, the idle loop is a quiescent state for RCU-sched.

> So if I understand correctly we would check if we are in an rcu read side
> critical section when we call rcu_needs_cpu(). If so then we keep
> the tick alive. Afterward when we exit the rcu read side critical section
> (rcu_read_unlock/local_bh_enable), we notice that specific state and
> we try to accelerate the rcu callbacks processing from there to switch
> to dynticks idle mode, right?
> 
> So that requires some specific counter in rcu_read_lock() for the
> !CONFIG_PREEMPT case so that we know if we are interrupting an
> rcu read side critical section from rcu_needs_cpu(). For the
> bh case we probably can just check in_softirq().
> 
> Also if we know we are interrupting a read side section, why not just
> keep the tick alive and retry the next tick? Interrupting such
> section looks rare enough that it wouldn't have much impact
> and that avoids specific hooks in rcu_read_unlock() and local_bh_enable().

Good point.  Perhaps only bother with this if returning to idle, then?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-26  1:10           ` Frederic Weisbecker
@ 2011-09-26  1:26             ` Paul E. McKenney
  2011-09-26  1:41               ` Paul E. McKenney
  2011-09-26  9:20               ` Frederic Weisbecker
  0 siblings, 2 replies; 57+ messages in thread
From: Paul E. McKenney @ 2011-09-26  1:26 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Mon, Sep 26, 2011 at 03:10:33AM +0200, Frederic Weisbecker wrote:
> 2011/9/26 Frederic Weisbecker <fweisbec@gmail.com>:
> > On Sun, Sep 25, 2011 at 09:48:04AM -0700, Paul E. McKenney wrote:
> >> This is required for RCU_FAST_NO_HZ, which checks to see whether the
> >> current CPU can accelerate the current grace period so as to enter
> >> dyntick-idle mode sooner than it would otherwise.  This takes effect
> >> in the situation where rcu_needs_cpu() sees that there are callbacks.
> >> It then notes a quiescent state (which is illegal in an RCU read-side
> >> critical section), calls force_quiescent_state(), and so on.  For this
> >> to work, the current CPU must be in an RCU read-side critical section.
> >
> > You mean it must *not* be in an RCU read-side critical section (ie: in a
> > quiescent state)?
> >
> > That assumption at least fails anytime in idle for the RCU
> > sched flavour given that preemption is disabled in the idle loop.
> >
> >> If this cannot be made to work, another option is to call a new RCU
> >> function in the case where rcu_needs_cpu() returned false, but after
> >> the RCU read-side critical section has exited.
> >
> > You mean when rcu_needs_cpu() returns true (when we have callbacks
> > enqueued)?
> >
> >> This new RCU function
> >> could then attempt to rearrange RCU so as to allow the CPU to enter
> >> dyntick-idle mode more quickly.  It is more important for this to
> >> happen when the CPU is going idle than when it is executing a user
> >> process.
> >>
> >> So, is this doable?
> >
> > At least not when we have RCU sched callbacks enqueued, given preemption
> > is disabled. But that sounds plausible in order to accelerate the switch
> > to dyntick-idle mode when we only have rcu and/or rcu bh callbacks.
> 
> But the RCU sched case could be dealt with if we embrace every use of
> it with rcu_read_lock_sched() and rcu_read_unlock_sched(), or some light
> version that just increases a local counter that rcu_needs_cpu() could check.
> 
> It's an easy thing to add: we can ensure preempt is disabled when we call it
> and we can force rcu_dereference_sched() to depend on it.

Or just check to see if this is the first level of interrupt from the
idle task after the scheduler is up.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-26  1:26             ` Paul E. McKenney
@ 2011-09-26  1:41               ` Paul E. McKenney
  2011-09-26  9:39                 ` Frederic Weisbecker
  2011-09-26  9:42                 ` Frederic Weisbecker
  2011-09-26  9:20               ` Frederic Weisbecker
  1 sibling, 2 replies; 57+ messages in thread
From: Paul E. McKenney @ 2011-09-26  1:41 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Sun, Sep 25, 2011 at 06:26:11PM -0700, Paul E. McKenney wrote:
> On Mon, Sep 26, 2011 at 03:10:33AM +0200, Frederic Weisbecker wrote:
> > 2011/9/26 Frederic Weisbecker <fweisbec@gmail.com>:
> > > On Sun, Sep 25, 2011 at 09:48:04AM -0700, Paul E. McKenney wrote:
> > >> This is required for RCU_FAST_NO_HZ, which checks to see whether the
> > >> current CPU can accelerate the current grace period so as to enter
> > >> dyntick-idle mode sooner than it would otherwise.  This takes effect
> > >> in the situation where rcu_needs_cpu() sees that there are callbacks.
> > >> It then notes a quiescent state (which is illegal in an RCU read-side
> > >> critical section), calls force_quiescent_state(), and so on.  For this
> > >> to work, the current CPU must be in an RCU read-side critical section.
> > >
> > > You mean it must *not* be in an RCU read-side critical section (ie: in a
> > > quiescent state)?
> > >
> > > That assumption at least fails anytime in idle for the RCU
> > > sched flavour given that preemption is disabled in the idle loop.
> > >
> > >> If this cannot be made to work, another option is to call a new RCU
> > >> function in the case where rcu_needs_cpu() returned false, but after
> > >> the RCU read-side critical section has exited.
> > >
> > > You mean when rcu_needs_cpu() returns true (when we have callbacks
> > > enqueued)?
> > >
> > >> This new RCU function
> > >> could then attempt to rearrange RCU so as to allow the CPU to enter
> > >> dyntick-idle mode more quickly.  It is more important for this to
> > >> happen when the CPU is going idle than when it is executing a user
> > >> process.
> > >>
> > >> So, is this doable?
> > >
> > > At least not when we have RCU sched callbacks enqueued, given preemption
> > > is disabled. But that sounds plausible in order to accelerate the switch
> > > to dyntick-idle mode when we only have rcu and/or rcu bh callbacks.
> > 
> > But the RCU sched case could be dealt with if we embrace every use of
> > it with rcu_read_lock_sched() and rcu_read_unlock_sched(), or some light
> > version that just increases a local counter that rcu_needs_cpu() could check.
> > 
> > It's an easy thing to add: we can ensure preempt is disabled when we call it
> > and we can force rcu_dereference_sched() to depend on it.
> 
> Or just check to see if this is the first level of interrupt from the
> idle task after the scheduler is up.

Hmmm...  Is it the case that rcu_needs_cpu() gets called from within an
RCU read-side critical section only when called from an interrupt that
interrupted an RCU read-side critical section (keeping in mind that the
idle loop is a quiescent state regardless of preemption)?

If so, I should be able to do the appropriate checks within
rcu_needs_cpu().

The reason I didn't think of this earlier was that I thought that
rcu_needs_cpu() could be invoked from the idle notifier, which is itself
in an RCU read-side critical section.

								Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-26  1:25           ` Paul E. McKenney
@ 2011-09-26  8:48             ` Frederic Weisbecker
  2011-09-26  8:49             ` Frederic Weisbecker
  1 sibling, 0 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2011-09-26  8:48 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Sun, Sep 25, 2011 at 06:25:01PM -0700, Paul E. McKenney wrote:
> On Mon, Sep 26, 2011 at 03:04:21AM +0200, Frederic Weisbecker wrote:
> > On Sun, Sep 25, 2011 at 09:48:04AM -0700, Paul E. McKenney wrote:
> > > On Sun, Sep 25, 2011 at 03:06:25PM +0200, Frederic Weisbecker wrote:
> > > > On Sun, Sep 25, 2011 at 02:26:37PM +0300, Kirill A. Shutemov wrote:
> > > > > On Sat, Sep 24, 2011 at 10:08:26PM -0700, Paul E. McKenney wrote:
> > > > > > On Sun, Sep 25, 2011 at 03:24:09AM +0300, Kirill A. Shutemov wrote:
> > > > > > > [   29.974288] ------------[ cut here ]------------
> > > > > > > [   29.974308] WARNING: at /home/kas/git/public/linux-next/kernel/rcutree.c:1833 rcu_needs_cpu+0xff
> > > > > > > [   29.974316] Hardware name: HP EliteBook 8440p
> > > > > > > [   29.974321] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iple_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc rfcomm bnep acpi_cpufreq mperfckd fscache auth_rpcgss nfs_acl sunrpc ext2 loop kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_idtideodev media v4l2_compat_ioctl32 snd_seq bluetooth drm_kms_helper snd_timer tpm_infineon snd_seq_drt tpm_tis hp_accel intel_ips soundcore lis3lv02d tpm rfkill i2c_algo_bit snd_page_alloc i2c_core c16 sha256_generic aesni_intel cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod sg sr_mod sd_mod cd thermal_sys [last unloaded: scsi_wait_scan]
> > > > > > > [   29.974517] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc7-next-20110923 #2
> > > > > > > [   29.974521] Call Trace:
> > > > > > > [   29.974525]  <IRQ>  [<ffffffff8104d72a>] warn_slowpath_common+0x7a/0xb0
> > > > > > > [   29.974540]  [<ffffffff8104d775>] warn_slowpath_null+0x15/0x20
> > > > > > > [   29.974546]  [<ffffffff810bffdf>] rcu_needs_cpu+0xff/0x110
> > > > > > > [   29.974555]  [<ffffffff8108396f>] tick_nohz_stop_sched_tick+0x13f/0x3d0
> > > > > > > [   29.974563]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > > > [   29.974571]  [<ffffffff81055622>] irq_exit+0xa2/0xd0
> > > > > > > [   29.974578]  [<ffffffff8101ee75>] smp_apic_timer_interrupt+0x85/0x1c0
> > > > > > > [   29.974585]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > > > [   29.974592]  [<ffffffff81436e1e>] apic_timer_interrupt+0x6e/0x80
> > > > > > > [   29.974596]  <EOI>  [<ffffffff81297abd>] ? acpi_hw_read+0x4a/0x51
> > > > > > > [   29.974609]  [<ffffffff81087a07>] ? lock_acquire+0xa7/0x160
> > > > > > > [   29.974615]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > > > [   29.974622]  [<ffffffff81432a16>] __atomic_notifier_call_chain+0x56/0xb0
> > > > > > > [   29.974631]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > > > [   29.974642]  [<ffffffff8130ebb6>] ? cpuidle_idle_call+0x106/0x350
> > > > > > > [   29.974651]  [<ffffffff81432a81>] atomic_notifier_call_chain+0x11/0x20
> > > > > > > [   29.974661]  [<ffffffff81001233>] cpu_idle+0xe3/0x120
> > > > > > > [   29.974672]  [<ffffffff8141e34b>] start_secondary+0x1fd/0x204
> > > > > > > [   29.974681] ---[ end trace 6c1d44095a3bb7c5 ]---
> > > > > > 
> > > > > > Do the following help?
> > > > > > 
> > > > > > 	https://lkml.org/lkml/2011/9/17/47
> > > > > > 	https://lkml.org/lkml/2011/9/17/45
> > > > > > 	https://lkml.org/lkml/2011/9/17/43
> > > > > 
> > > > > Yes. Thanks.
> > > > 
> > > > I believe that doesn't really fix the issue. But the warning is not
> > > > easy to trigger. You simply haven't hit it by chance after applying
> > > > the patches.
> > > > 
> > > > This happens when the idle notifier callchain is called in idle
> > > > and is interrupted in the middle. So we have called rcu_read_lock()
> > > > but haven't yet released with rcu_read_unlock(), and in the end
> > > > of the interrupt we call tick_nohz_stop_sched_tick() -> rcu_needs_cpu()
> > > > which is illegal while in an rcu read side critical section.
> > > > 
> > > > No idea how to solve that. Any use of RCU after the tick gets stopped
> > > > is concerned here. If it is really required that rcu_needs_cpu() can't
> > > > be called in an rcu read side critical sectionn then it's not going
> > > > to be easy to fix.
> > > > 
> > > > But I don't really understand that requirement. rcu_needs_cpu() simply
> > > > checks if we don't have callbacks to handle. So I don't understand how
> > > > read side is concerned. It's rather the write side.
> > > > The rule I can imagine instead is: don't call __call_rcu() once the tick is
> > > > stopped.
> > > > 
> > > > But I'm certainly missing something.
> > > > 
> > > > Paul?
> > > 
> > > This is required for RCU_FAST_NO_HZ, which checks to see whether the
> > > current CPU can accelerate the current grace period so as to enter
> > > dyntick-idle mode sooner than it would otherwise.  This takes effect
> > > in the situation where rcu_needs_cpu() sees that there are callbacks.
> > > It then notes a quiescent state (which is illegal in an RCU read-side
> > > critical section), calls force_quiescent_state(), and so on.  For this
> > > to work, the current CPU must be in an RCU read-side critical section.
> > 
> > You mean it must *not* be in an RCU read-side critical section (ie: in a
> > quiescent state)?
> 
> Yes, you are right, it must -not- be in an RCU read-side critical section.
> 
> > That assumption at least fails anytime in idle for the RCU
> > sched flavour given that preemption is disabled in the idle loop.
> 
> Except that the idle loop is a quiescent state.

Oh right. That seem to exclude any tracing in idle.

> > So if I understand correctly we would check if we are in an rcu read side
> > critical section when we call rcu_needs_cpu(). If so then we keep
> > the tick alive. Afterward when we exit the rcu read side critical section
> > (rcu_read_unlock/local_bh_enable), we notice that specific state and
> > we try to accelerate the rcu callbacks processing from there to switch
> > to dynticks idle mode, right?
> > 
> > So that requires some specific counter in rcu_read_lock() for the
> > !CONFIG_PREEMPT case so that we know if we are interrupting an
> > rcu read side critical section from rcu_needs_cpu(). For the
> > bh case we probably can just check in_softirq().
> > 
> > Also if we know we are interrupting a read side section, why not just
> > keep the tick alive and retry the next tick? Interrupting such
> > section looks rare enough that it wouldn't have much impact
> > and that avoids specific hooks in rcu_read_unlock() and local_bh_enable().
> 
> Good point.  Perhaps only bother with this if returning to idle, then?

Looks good.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-26  1:25           ` Paul E. McKenney
  2011-09-26  8:48             ` Frederic Weisbecker
@ 2011-09-26  8:49             ` Frederic Weisbecker
  2011-09-26 22:30               ` Paul E. McKenney
  1 sibling, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2011-09-26  8:49 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Sun, Sep 25, 2011 at 06:25:01PM -0700, Paul E. McKenney wrote:
> On Mon, Sep 26, 2011 at 03:04:21AM +0200, Frederic Weisbecker wrote:
> > On Sun, Sep 25, 2011 at 09:48:04AM -0700, Paul E. McKenney wrote:
> > > On Sun, Sep 25, 2011 at 03:06:25PM +0200, Frederic Weisbecker wrote:
> > > > On Sun, Sep 25, 2011 at 02:26:37PM +0300, Kirill A. Shutemov wrote:
> > > > > On Sat, Sep 24, 2011 at 10:08:26PM -0700, Paul E. McKenney wrote:
> > > > > > On Sun, Sep 25, 2011 at 03:24:09AM +0300, Kirill A. Shutemov wrote:
> > > > > > > [   29.974288] ------------[ cut here ]------------
> > > > > > > [   29.974308] WARNING: at /home/kas/git/public/linux-next/kernel/rcutree.c:1833 rcu_needs_cpu+0xff
> > > > > > > [   29.974316] Hardware name: HP EliteBook 8440p
> > > > > > > [   29.974321] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iple_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc rfcomm bnep acpi_cpufreq mperfckd fscache auth_rpcgss nfs_acl sunrpc ext2 loop kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_idtideodev media v4l2_compat_ioctl32 snd_seq bluetooth drm_kms_helper snd_timer tpm_infineon snd_seq_drt tpm_tis hp_accel intel_ips soundcore lis3lv02d tpm rfkill i2c_algo_bit snd_page_alloc i2c_core c16 sha256_generic aesni_intel cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod sg sr_mod sd_mod cd thermal_sys [last unloaded: scsi_wait_scan]
> > > > > > > [   29.974517] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc7-next-20110923 #2
> > > > > > > [   29.974521] Call Trace:
> > > > > > > [   29.974525]  <IRQ>  [<ffffffff8104d72a>] warn_slowpath_common+0x7a/0xb0
> > > > > > > [   29.974540]  [<ffffffff8104d775>] warn_slowpath_null+0x15/0x20
> > > > > > > [   29.974546]  [<ffffffff810bffdf>] rcu_needs_cpu+0xff/0x110
> > > > > > > [   29.974555]  [<ffffffff8108396f>] tick_nohz_stop_sched_tick+0x13f/0x3d0
> > > > > > > [   29.974563]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > > > [   29.974571]  [<ffffffff81055622>] irq_exit+0xa2/0xd0
> > > > > > > [   29.974578]  [<ffffffff8101ee75>] smp_apic_timer_interrupt+0x85/0x1c0
> > > > > > > [   29.974585]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > > > [   29.974592]  [<ffffffff81436e1e>] apic_timer_interrupt+0x6e/0x80
> > > > > > > [   29.974596]  <EOI>  [<ffffffff81297abd>] ? acpi_hw_read+0x4a/0x51
> > > > > > > [   29.974609]  [<ffffffff81087a07>] ? lock_acquire+0xa7/0x160
> > > > > > > [   29.974615]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > > > [   29.974622]  [<ffffffff81432a16>] __atomic_notifier_call_chain+0x56/0xb0
> > > > > > > [   29.974631]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > > > [   29.974642]  [<ffffffff8130ebb6>] ? cpuidle_idle_call+0x106/0x350
> > > > > > > [   29.974651]  [<ffffffff81432a81>] atomic_notifier_call_chain+0x11/0x20
> > > > > > > [   29.974661]  [<ffffffff81001233>] cpu_idle+0xe3/0x120
> > > > > > > [   29.974672]  [<ffffffff8141e34b>] start_secondary+0x1fd/0x204
> > > > > > > [   29.974681] ---[ end trace 6c1d44095a3bb7c5 ]---
> > > > > > 
> > > > > > Do the following help?
> > > > > > 
> > > > > > 	https://lkml.org/lkml/2011/9/17/47
> > > > > > 	https://lkml.org/lkml/2011/9/17/45
> > > > > > 	https://lkml.org/lkml/2011/9/17/43
> > > > > 
> > > > > Yes. Thanks.
> > > > 
> > > > I believe that doesn't really fix the issue. But the warning is not
> > > > easy to trigger. You simply haven't hit it by chance after applying
> > > > the patches.
> > > > 
> > > > This happens when the idle notifier callchain is called in idle
> > > > and is interrupted in the middle. So we have called rcu_read_lock()
> > > > but haven't yet released with rcu_read_unlock(), and in the end
> > > > of the interrupt we call tick_nohz_stop_sched_tick() -> rcu_needs_cpu()
> > > > which is illegal while in an rcu read side critical section.
> > > > 
> > > > No idea how to solve that. Any use of RCU after the tick gets stopped
> > > > is concerned here. If it is really required that rcu_needs_cpu() can't
> > > > be called in an rcu read side critical sectionn then it's not going
> > > > to be easy to fix.
> > > > 
> > > > But I don't really understand that requirement. rcu_needs_cpu() simply
> > > > checks if we don't have callbacks to handle. So I don't understand how
> > > > read side is concerned. It's rather the write side.
> > > > The rule I can imagine instead is: don't call __call_rcu() once the tick is
> > > > stopped.
> > > > 
> > > > But I'm certainly missing something.
> > > > 
> > > > Paul?
> > > 
> > > This is required for RCU_FAST_NO_HZ, which checks to see whether the
> > > current CPU can accelerate the current grace period so as to enter
> > > dyntick-idle mode sooner than it would otherwise.  This takes effect
> > > in the situation where rcu_needs_cpu() sees that there are callbacks.
> > > It then notes a quiescent state (which is illegal in an RCU read-side
> > > critical section), calls force_quiescent_state(), and so on.  For this
> > > to work, the current CPU must be in an RCU read-side critical section.
> > 
> > You mean it must *not* be in an RCU read-side critical section (ie: in a
> > quiescent state)?
> 
> Yes, you are right, it must -not- be in an RCU read-side critical section.
> 
> > That assumption at least fails anytime in idle for the RCU
> > sched flavour given that preemption is disabled in the idle loop.
> 
> Except that the idle loop is a quiescent state.
> 
> > > If this cannot be made to work, another option is to call a new RCU
> > > function in the case where rcu_needs_cpu() returned false, but after
> > > the RCU read-side critical section has exited.
> > 
> > You mean when rcu_needs_cpu() returns true (when we have callbacks
> > enqueued)?
> 
> Yes.  I definitely am having problems with polarity this weekend.  :-/
> 
> > > This new RCU function
> > > could then attempt to rearrange RCU so as to allow the CPU to enter
> > > dyntick-idle mode more quickly.  It is more important for this to
> > > happen when the CPU is going idle than when it is executing a user
> > > process.
> > > 
> > > So, is this doable?
> > 
> > At least not when we have RCU sched callbacks enqueued, given preemption
> > is disabled. But that sounds plausible in order to accelerate the switch
> > to dyntick-idle mode when we only have rcu and/or rcu bh callbacks.
> 
> Again, the idle loop is a quiescent state for RCU-sched.

We need to add an idle_cpu() check in rcu_read_lock_sched_held()
and rcu_read_lock_bh_held().

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-26  1:26             ` Paul E. McKenney
  2011-09-26  1:41               ` Paul E. McKenney
@ 2011-09-26  9:20               ` Frederic Weisbecker
  2011-09-26 22:50                 ` Paul E. McKenney
  1 sibling, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2011-09-26  9:20 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Sun, Sep 25, 2011 at 06:26:11PM -0700, Paul E. McKenney wrote:
> On Mon, Sep 26, 2011 at 03:10:33AM +0200, Frederic Weisbecker wrote:
> > 2011/9/26 Frederic Weisbecker <fweisbec@gmail.com>:
> > > On Sun, Sep 25, 2011 at 09:48:04AM -0700, Paul E. McKenney wrote:
> > >> This is required for RCU_FAST_NO_HZ, which checks to see whether the
> > >> current CPU can accelerate the current grace period so as to enter
> > >> dyntick-idle mode sooner than it would otherwise.  This takes effect
> > >> in the situation where rcu_needs_cpu() sees that there are callbacks.
> > >> It then notes a quiescent state (which is illegal in an RCU read-side
> > >> critical section), calls force_quiescent_state(), and so on.  For this
> > >> to work, the current CPU must be in an RCU read-side critical section.
> > >
> > > You mean it must *not* be in an RCU read-side critical section (ie: in a
> > > quiescent state)?
> > >
> > > That assumption at least fails anytime in idle for the RCU
> > > sched flavour given that preemption is disabled in the idle loop.
> > >
> > >> If this cannot be made to work, another option is to call a new RCU
> > >> function in the case where rcu_needs_cpu() returned false, but after
> > >> the RCU read-side critical section has exited.
> > >
> > > You mean when rcu_needs_cpu() returns true (when we have callbacks
> > > enqueued)?
> > >
> > >> This new RCU function
> > >> could then attempt to rearrange RCU so as to allow the CPU to enter
> > >> dyntick-idle mode more quickly.  It is more important for this to
> > >> happen when the CPU is going idle than when it is executing a user
> > >> process.
> > >>
> > >> So, is this doable?
> > >
> > > At least not when we have RCU sched callbacks enqueued, given preemption
> > > is disabled. But that sounds plausible in order to accelerate the switch
> > > to dyntick-idle mode when we only have rcu and/or rcu bh callbacks.
> > 
> > But the RCU sched case could be dealt with if we embrace every use of
> > it with rcu_read_lock_sched() and rcu_read_unlock_sched(), or some light
> > version that just increases a local counter that rcu_needs_cpu() could check.
> > 
> > It's an easy thing to add: we can ensure preempt is disabled when we call it
> > and we can force rcu_dereference_sched() to depend on it.
> 
> Or just check to see if this is the first level of interrupt from the
> idle task after the scheduler is up.

I believe it's always the case. tick_nohz_stop_sched_tick() is only called
from the first level of interrupt in irq_exit().

There is always some race window, as it's based on preempt offset: between
the sub_preempt_count and the softirqs begin and between softirqs end and the end
of the interrupt. But an "idle_cpu() || in_interrupt()" check in rcu_read_lock_sched_held()
should catch those offenders.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-26  1:41               ` Paul E. McKenney
@ 2011-09-26  9:39                 ` Frederic Weisbecker
  2011-09-26 22:34                   ` Paul E. McKenney
  2011-09-26  9:42                 ` Frederic Weisbecker
  1 sibling, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2011-09-26  9:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Sun, Sep 25, 2011 at 06:41:18PM -0700, Paul E. McKenney wrote:
> On Sun, Sep 25, 2011 at 06:26:11PM -0700, Paul E. McKenney wrote:
> > On Mon, Sep 26, 2011 at 03:10:33AM +0200, Frederic Weisbecker wrote:
> > > 2011/9/26 Frederic Weisbecker <fweisbec@gmail.com>:
> > > > On Sun, Sep 25, 2011 at 09:48:04AM -0700, Paul E. McKenney wrote:
> > > >> This is required for RCU_FAST_NO_HZ, which checks to see whether the
> > > >> current CPU can accelerate the current grace period so as to enter
> > > >> dyntick-idle mode sooner than it would otherwise.  This takes effect
> > > >> in the situation where rcu_needs_cpu() sees that there are callbacks.
> > > >> It then notes a quiescent state (which is illegal in an RCU read-side
> > > >> critical section), calls force_quiescent_state(), and so on.  For this
> > > >> to work, the current CPU must be in an RCU read-side critical section.
> > > >
> > > > You mean it must *not* be in an RCU read-side critical section (ie: in a
> > > > quiescent state)?
> > > >
> > > > That assumption at least fails anytime in idle for the RCU
> > > > sched flavour given that preemption is disabled in the idle loop.
> > > >
> > > >> If this cannot be made to work, another option is to call a new RCU
> > > >> function in the case where rcu_needs_cpu() returned false, but after
> > > >> the RCU read-side critical section has exited.
> > > >
> > > > You mean when rcu_needs_cpu() returns true (when we have callbacks
> > > > enqueued)?
> > > >
> > > >> This new RCU function
> > > >> could then attempt to rearrange RCU so as to allow the CPU to enter
> > > >> dyntick-idle mode more quickly.  It is more important for this to
> > > >> happen when the CPU is going idle than when it is executing a user
> > > >> process.
> > > >>
> > > >> So, is this doable?
> > > >
> > > > At least not when we have RCU sched callbacks enqueued, given preemption
> > > > is disabled. But that sounds plausible in order to accelerate the switch
> > > > to dyntick-idle mode when we only have rcu and/or rcu bh callbacks.
> > > 
> > > But the RCU sched case could be dealt with if we embrace every use of
> > > it with rcu_read_lock_sched() and rcu_read_unlock_sched(), or some light
> > > version that just increases a local counter that rcu_needs_cpu() could check.
> > > 
> > > It's an easy thing to add: we can ensure preempt is disabled when we call it
> > > and we can force rcu_dereference_sched() to depend on it.
> > 
> > Or just check to see if this is the first level of interrupt from the
> > idle task after the scheduler is up.
> 
> Hmmm...  Is it the case that rcu_needs_cpu() gets called from within an
> RCU read-side critical section only when called from an interrupt that
> interrupted an RCU read-side critical section (keeping in mind that the
> idle loop is a quiescent state regardless of preemption)?

Yeah. rcu_needs_cpu() can be called from an irq that either interrupted
an rcu read side critical section or a bh one. But not a sched one if
we forbid rcu sched uses in the preempt offset race windows I described
in a previous mail.

> 
> If so, I should be able to do the appropriate checks within
> rcu_needs_cpu().

Right. But to know if you interrupted an rcu read side, don't you
need a specific counter when !CONFIG_PREEMPT?

> The reason I didn't think of this earlier was that I thought that
> rcu_needs_cpu() could be invoked from the idle notifier, which is itself
> in an RCU read-side critical section.
> 
> 								Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-26  1:41               ` Paul E. McKenney
  2011-09-26  9:39                 ` Frederic Weisbecker
@ 2011-09-26  9:42                 ` Frederic Weisbecker
  2011-09-26 22:35                   ` Paul E. McKenney
  1 sibling, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2011-09-26  9:42 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Sun, Sep 25, 2011 at 06:41:18PM -0700, Paul E. McKenney wrote:
> On Sun, Sep 25, 2011 at 06:26:11PM -0700, Paul E. McKenney wrote:
> > On Mon, Sep 26, 2011 at 03:10:33AM +0200, Frederic Weisbecker wrote:
> > > 2011/9/26 Frederic Weisbecker <fweisbec@gmail.com>:
> > > > On Sun, Sep 25, 2011 at 09:48:04AM -0700, Paul E. McKenney wrote:
> > > >> This is required for RCU_FAST_NO_HZ, which checks to see whether the
> > > >> current CPU can accelerate the current grace period so as to enter
> > > >> dyntick-idle mode sooner than it would otherwise.  This takes effect
> > > >> in the situation where rcu_needs_cpu() sees that there are callbacks.
> > > >> It then notes a quiescent state (which is illegal in an RCU read-side
> > > >> critical section), calls force_quiescent_state(), and so on.  For this
> > > >> to work, the current CPU must be in an RCU read-side critical section.
> > > >
> > > > You mean it must *not* be in an RCU read-side critical section (ie: in a
> > > > quiescent state)?
> > > >
> > > > That assumption at least fails anytime in idle for the RCU
> > > > sched flavour given that preemption is disabled in the idle loop.
> > > >
> > > >> If this cannot be made to work, another option is to call a new RCU
> > > >> function in the case where rcu_needs_cpu() returned false, but after
> > > >> the RCU read-side critical section has exited.
> > > >
> > > > You mean when rcu_needs_cpu() returns true (when we have callbacks
> > > > enqueued)?
> > > >
> > > >> This new RCU function
> > > >> could then attempt to rearrange RCU so as to allow the CPU to enter
> > > >> dyntick-idle mode more quickly.  It is more important for this to
> > > >> happen when the CPU is going idle than when it is executing a user
> > > >> process.
> > > >>
> > > >> So, is this doable?
> > > >
> > > > At least not when we have RCU sched callbacks enqueued, given preemption
> > > > is disabled. But that sounds plausible in order to accelerate the switch
> > > > to dyntick-idle mode when we only have rcu and/or rcu bh callbacks.
> > > 
> > > But the RCU sched case could be dealt with if we embrace every use of
> > > it with rcu_read_lock_sched() and rcu_read_unlock_sched(), or some light
> > > version that just increases a local counter that rcu_needs_cpu() could check.
> > > 
> > > It's an easy thing to add: we can ensure preempt is disabled when we call it
> > > and we can force rcu_dereference_sched() to depend on it.
> > 
> > Or just check to see if this is the first level of interrupt from the
> > idle task after the scheduler is up.
> 
> Hmmm...  Is it the case that rcu_needs_cpu() gets called from within an
> RCU read-side critical section only when called from an interrupt that
> interrupted an RCU read-side critical section (keeping in mind that the
> idle loop is a quiescent state regardless of preemption)?
> 
> If so, I should be able to do the appropriate checks within
> rcu_needs_cpu().

It sounds better to me if you can do all the checks from rcu_needs_cpu()
so that all you need is to wait for another jiffy to escape the read side
critical section.

Doing something from the read side exit path would require some weird
trickiness.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-26  8:49             ` Frederic Weisbecker
@ 2011-09-26 22:30               ` Paul E. McKenney
  2011-09-27 11:55                 ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2011-09-26 22:30 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Mon, Sep 26, 2011 at 10:49:53AM +0200, Frederic Weisbecker wrote:
> On Sun, Sep 25, 2011 at 06:25:01PM -0700, Paul E. McKenney wrote:
> > On Mon, Sep 26, 2011 at 03:04:21AM +0200, Frederic Weisbecker wrote:
> > > On Sun, Sep 25, 2011 at 09:48:04AM -0700, Paul E. McKenney wrote:
> > > > On Sun, Sep 25, 2011 at 03:06:25PM +0200, Frederic Weisbecker wrote:
> > > > > On Sun, Sep 25, 2011 at 02:26:37PM +0300, Kirill A. Shutemov wrote:
> > > > > > On Sat, Sep 24, 2011 at 10:08:26PM -0700, Paul E. McKenney wrote:
> > > > > > > On Sun, Sep 25, 2011 at 03:24:09AM +0300, Kirill A. Shutemov wrote:
> > > > > > > > [   29.974288] ------------[ cut here ]------------
> > > > > > > > [   29.974308] WARNING: at /home/kas/git/public/linux-next/kernel/rcutree.c:1833 rcu_needs_cpu+0xff
> > > > > > > > [   29.974316] Hardware name: HP EliteBook 8440p
> > > > > > > > [   29.974321] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iple_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc rfcomm bnep acpi_cpufreq mperfckd fscache auth_rpcgss nfs_acl sunrpc ext2 loop kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_idtideodev media v4l2_compat_ioctl32 snd_seq bluetooth drm_kms_helper snd_timer tpm_infineon snd_seq_drt tpm_tis hp_accel intel_ips soundcore lis3lv02d tpm rfkill i2c_algo_bit snd_page_alloc i2c_core c16 sha256_generic aesni_intel cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod sg sr_mod sd_mod cd thermal_sys [last unloaded: scsi_wait_scan]
> > > > > > > > [   29.974517] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc7-next-20110923 #2
> > > > > > > > [   29.974521] Call Trace:
> > > > > > > > [   29.974525]  <IRQ>  [<ffffffff8104d72a>] warn_slowpath_common+0x7a/0xb0
> > > > > > > > [   29.974540]  [<ffffffff8104d775>] warn_slowpath_null+0x15/0x20
> > > > > > > > [   29.974546]  [<ffffffff810bffdf>] rcu_needs_cpu+0xff/0x110
> > > > > > > > [   29.974555]  [<ffffffff8108396f>] tick_nohz_stop_sched_tick+0x13f/0x3d0
> > > > > > > > [   29.974563]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > > > > [   29.974571]  [<ffffffff81055622>] irq_exit+0xa2/0xd0
> > > > > > > > [   29.974578]  [<ffffffff8101ee75>] smp_apic_timer_interrupt+0x85/0x1c0
> > > > > > > > [   29.974585]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > > > > [   29.974592]  [<ffffffff81436e1e>] apic_timer_interrupt+0x6e/0x80
> > > > > > > > [   29.974596]  <EOI>  [<ffffffff81297abd>] ? acpi_hw_read+0x4a/0x51
> > > > > > > > [   29.974609]  [<ffffffff81087a07>] ? lock_acquire+0xa7/0x160
> > > > > > > > [   29.974615]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > > > > [   29.974622]  [<ffffffff81432a16>] __atomic_notifier_call_chain+0x56/0xb0
> > > > > > > > [   29.974631]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > > > > [   29.974642]  [<ffffffff8130ebb6>] ? cpuidle_idle_call+0x106/0x350
> > > > > > > > [   29.974651]  [<ffffffff81432a81>] atomic_notifier_call_chain+0x11/0x20
> > > > > > > > [   29.974661]  [<ffffffff81001233>] cpu_idle+0xe3/0x120
> > > > > > > > [   29.974672]  [<ffffffff8141e34b>] start_secondary+0x1fd/0x204
> > > > > > > > [   29.974681] ---[ end trace 6c1d44095a3bb7c5 ]---
> > > > > > > 
> > > > > > > Do the following help?
> > > > > > > 
> > > > > > > 	https://lkml.org/lkml/2011/9/17/47
> > > > > > > 	https://lkml.org/lkml/2011/9/17/45
> > > > > > > 	https://lkml.org/lkml/2011/9/17/43
> > > > > > 
> > > > > > Yes. Thanks.
> > > > > 
> > > > > I believe that doesn't really fix the issue. But the warning is not
> > > > > easy to trigger. You simply haven't hit it by chance after applying
> > > > > the patches.
> > > > > 
> > > > > This happens when the idle notifier callchain is called in idle
> > > > > and is interrupted in the middle. So we have called rcu_read_lock()
> > > > > but haven't yet released with rcu_read_unlock(), and in the end
> > > > > of the interrupt we call tick_nohz_stop_sched_tick() -> rcu_needs_cpu()
> > > > > which is illegal while in an rcu read side critical section.
> > > > > 
> > > > > No idea how to solve that. Any use of RCU after the tick gets stopped
> > > > > is concerned here. If it is really required that rcu_needs_cpu() can't
> > > > > be called in an rcu read side critical sectionn then it's not going
> > > > > to be easy to fix.
> > > > > 
> > > > > But I don't really understand that requirement. rcu_needs_cpu() simply
> > > > > checks if we don't have callbacks to handle. So I don't understand how
> > > > > read side is concerned. It's rather the write side.
> > > > > The rule I can imagine instead is: don't call __call_rcu() once the tick is
> > > > > stopped.
> > > > > 
> > > > > But I'm certainly missing something.
> > > > > 
> > > > > Paul?
> > > > 
> > > > This is required for RCU_FAST_NO_HZ, which checks to see whether the
> > > > current CPU can accelerate the current grace period so as to enter
> > > > dyntick-idle mode sooner than it would otherwise.  This takes effect
> > > > in the situation where rcu_needs_cpu() sees that there are callbacks.
> > > > It then notes a quiescent state (which is illegal in an RCU read-side
> > > > critical section), calls force_quiescent_state(), and so on.  For this
> > > > to work, the current CPU must be in an RCU read-side critical section.
> > > 
> > > You mean it must *not* be in an RCU read-side critical section (ie: in a
> > > quiescent state)?
> > 
> > Yes, you are right, it must -not- be in an RCU read-side critical section.
> > 
> > > That assumption at least fails anytime in idle for the RCU
> > > sched flavour given that preemption is disabled in the idle loop.
> > 
> > Except that the idle loop is a quiescent state.
> > 
> > > > If this cannot be made to work, another option is to call a new RCU
> > > > function in the case where rcu_needs_cpu() returned false, but after
> > > > the RCU read-side critical section has exited.
> > > 
> > > You mean when rcu_needs_cpu() returns true (when we have callbacks
> > > enqueued)?
> > 
> > Yes.  I definitely am having problems with polarity this weekend.  :-/
> > 
> > > > This new RCU function
> > > > could then attempt to rearrange RCU so as to allow the CPU to enter
> > > > dyntick-idle mode more quickly.  It is more important for this to
> > > > happen when the CPU is going idle than when it is executing a user
> > > > process.
> > > > 
> > > > So, is this doable?
> > > 
> > > At least not when we have RCU sched callbacks enqueued, given preemption
> > > is disabled. But that sounds plausible in order to accelerate the switch
> > > to dyntick-idle mode when we only have rcu and/or rcu bh callbacks.
> > 
> > Again, the idle loop is a quiescent state for RCU-sched.
> 
> We need to add an idle_cpu() check in rcu_read_lock_sched_held()
> and rcu_read_lock_bh_held().

OK, but just to make sure...  For this to work, the idle notifiers would
need to be invoked from outside of the idle task.  If the idle notifiers
are instead invoked from idle-task context, then RCU needs to get more
precise about what it thinks of as "idle", probably via a per-CPU variable
that gets set and cleared within each idle loop.

/me reminisces about the time when the definition of "idle" seemed so
simple...  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-26  9:39                 ` Frederic Weisbecker
@ 2011-09-26 22:34                   ` Paul E. McKenney
  2011-09-27 12:07                     ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2011-09-26 22:34 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Mon, Sep 26, 2011 at 11:39:41AM +0200, Frederic Weisbecker wrote:
> On Sun, Sep 25, 2011 at 06:41:18PM -0700, Paul E. McKenney wrote:
> > On Sun, Sep 25, 2011 at 06:26:11PM -0700, Paul E. McKenney wrote:
> > > On Mon, Sep 26, 2011 at 03:10:33AM +0200, Frederic Weisbecker wrote:
> > > > 2011/9/26 Frederic Weisbecker <fweisbec@gmail.com>:
> > > > > On Sun, Sep 25, 2011 at 09:48:04AM -0700, Paul E. McKenney wrote:
> > > > >> This is required for RCU_FAST_NO_HZ, which checks to see whether the
> > > > >> current CPU can accelerate the current grace period so as to enter
> > > > >> dyntick-idle mode sooner than it would otherwise.  This takes effect
> > > > >> in the situation where rcu_needs_cpu() sees that there are callbacks.
> > > > >> It then notes a quiescent state (which is illegal in an RCU read-side
> > > > >> critical section), calls force_quiescent_state(), and so on.  For this
> > > > >> to work, the current CPU must be in an RCU read-side critical section.
> > > > >
> > > > > You mean it must *not* be in an RCU read-side critical section (ie: in a
> > > > > quiescent state)?
> > > > >
> > > > > That assumption at least fails anytime in idle for the RCU
> > > > > sched flavour given that preemption is disabled in the idle loop.
> > > > >
> > > > >> If this cannot be made to work, another option is to call a new RCU
> > > > >> function in the case where rcu_needs_cpu() returned false, but after
> > > > >> the RCU read-side critical section has exited.
> > > > >
> > > > > You mean when rcu_needs_cpu() returns true (when we have callbacks
> > > > > enqueued)?
> > > > >
> > > > >> This new RCU function
> > > > >> could then attempt to rearrange RCU so as to allow the CPU to enter
> > > > >> dyntick-idle mode more quickly.  It is more important for this to
> > > > >> happen when the CPU is going idle than when it is executing a user
> > > > >> process.
> > > > >>
> > > > >> So, is this doable?
> > > > >
> > > > > At least not when we have RCU sched callbacks enqueued, given preemption
> > > > > is disabled. But that sounds plausible in order to accelerate the switch
> > > > > to dyntick-idle mode when we only have rcu and/or rcu bh callbacks.
> > > > 
> > > > But the RCU sched case could be dealt with if we embrace every use of
> > > > it with rcu_read_lock_sched() and rcu_read_unlock_sched(), or some light
> > > > version that just increases a local counter that rcu_needs_cpu() could check.
> > > > 
> > > > It's an easy thing to add: we can ensure preempt is disabled when we call it
> > > > and we can force rcu_dereference_sched() to depend on it.
> > > 
> > > Or just check to see if this is the first level of interrupt from the
> > > idle task after the scheduler is up.
> > 
> > Hmmm...  Is it the case that rcu_needs_cpu() gets called from within an
> > RCU read-side critical section only when called from an interrupt that
> > interrupted an RCU read-side critical section (keeping in mind that the
> > idle loop is a quiescent state regardless of preemption)?
> 
> Yeah. rcu_needs_cpu() can be called from an irq that either interrupted
> an rcu read side critical section or a bh one. But not a sched one if
> we forbid rcu sched uses in the preempt offset race windows I described
> in a previous mail.

But can't I just assume that if rcu_needs_cpu is invoked within
a second-level interrupt handler that it might be in any type of
RCU read-side critical section?  I could determine this by checking
RCU's dyntick-idle nesting state.

Such checks are not necessary if CONFIG_NO_HZ=n because in that
case rcu_needs_cpu() is just checking the callback queues, with
no assumptions about quiescent states.

> > If so, I should be able to do the appropriate checks within
> > rcu_needs_cpu().
> 
> Right. But to know if you interrupted an rcu read side, don't you
> need a specific counter when !CONFIG_PREEMPT?

Not if it is OK to assume that rcu_needs_cpu() can only be called from
within an RCU read-side interrupt handler if it is invoked from within a
second-level interrupt handler or if it interrupted some non-dyntick-idle
process-level code.

So, is this assumption valid?

							Thanx, Paul

> > The reason I didn't think of this earlier was that I thought that
> > rcu_needs_cpu() could be invoked from the idle notifier, which is itself
> > in an RCU read-side critical section.
> > 
> > 								Thanx, Paul
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-26  9:42                 ` Frederic Weisbecker
@ 2011-09-26 22:35                   ` Paul E. McKenney
  0 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2011-09-26 22:35 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Mon, Sep 26, 2011 at 11:42:06AM +0200, Frederic Weisbecker wrote:
> On Sun, Sep 25, 2011 at 06:41:18PM -0700, Paul E. McKenney wrote:
> > On Sun, Sep 25, 2011 at 06:26:11PM -0700, Paul E. McKenney wrote:
> > > On Mon, Sep 26, 2011 at 03:10:33AM +0200, Frederic Weisbecker wrote:
> > > > 2011/9/26 Frederic Weisbecker <fweisbec@gmail.com>:
> > > > > On Sun, Sep 25, 2011 at 09:48:04AM -0700, Paul E. McKenney wrote:
> > > > >> This is required for RCU_FAST_NO_HZ, which checks to see whether the
> > > > >> current CPU can accelerate the current grace period so as to enter
> > > > >> dyntick-idle mode sooner than it would otherwise.  This takes effect
> > > > >> in the situation where rcu_needs_cpu() sees that there are callbacks.
> > > > >> It then notes a quiescent state (which is illegal in an RCU read-side
> > > > >> critical section), calls force_quiescent_state(), and so on.  For this
> > > > >> to work, the current CPU must be in an RCU read-side critical section.
> > > > >
> > > > > You mean it must *not* be in an RCU read-side critical section (ie: in a
> > > > > quiescent state)?
> > > > >
> > > > > That assumption at least fails anytime in idle for the RCU
> > > > > sched flavour given that preemption is disabled in the idle loop.
> > > > >
> > > > >> If this cannot be made to work, another option is to call a new RCU
> > > > >> function in the case where rcu_needs_cpu() returned false, but after
> > > > >> the RCU read-side critical section has exited.
> > > > >
> > > > > You mean when rcu_needs_cpu() returns true (when we have callbacks
> > > > > enqueued)?
> > > > >
> > > > >> This new RCU function
> > > > >> could then attempt to rearrange RCU so as to allow the CPU to enter
> > > > >> dyntick-idle mode more quickly.  It is more important for this to
> > > > >> happen when the CPU is going idle than when it is executing a user
> > > > >> process.
> > > > >>
> > > > >> So, is this doable?
> > > > >
> > > > > At least not when we have RCU sched callbacks enqueued, given preemption
> > > > > is disabled. But that sounds plausible in order to accelerate the switch
> > > > > to dyntick-idle mode when we only have rcu and/or rcu bh callbacks.
> > > > 
> > > > But the RCU sched case could be dealt with if we embrace every use of
> > > > it with rcu_read_lock_sched() and rcu_read_unlock_sched(), or some light
> > > > version that just increases a local counter that rcu_needs_cpu() could check.
> > > > 
> > > > It's an easy thing to add: we can ensure preempt is disabled when we call it
> > > > and we can force rcu_dereference_sched() to depend on it.
> > > 
> > > Or just check to see if this is the first level of interrupt from the
> > > idle task after the scheduler is up.
> > 
> > Hmmm...  Is it the case that rcu_needs_cpu() gets called from within an
> > RCU read-side critical section only when called from an interrupt that
> > interrupted an RCU read-side critical section (keeping in mind that the
> > idle loop is a quiescent state regardless of preemption)?
> > 
> > If so, I should be able to do the appropriate checks within
> > rcu_needs_cpu().
> 
> It sounds better to me if you can do all the checks from rcu_needs_cpu()
> so that all you need is to wait for another jiffy to escape the read side
> critical section.
> 
> Doing something from the read side exit path would require some weird
> trickiness.

Agreed, I really do want to avoid checks in rcu_read_unlock().

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-26  9:20               ` Frederic Weisbecker
@ 2011-09-26 22:50                 ` Paul E. McKenney
  2011-09-27 12:16                   ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2011-09-26 22:50 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Mon, Sep 26, 2011 at 11:20:55AM +0200, Frederic Weisbecker wrote:
> On Sun, Sep 25, 2011 at 06:26:11PM -0700, Paul E. McKenney wrote:
> > On Mon, Sep 26, 2011 at 03:10:33AM +0200, Frederic Weisbecker wrote:
> > > 2011/9/26 Frederic Weisbecker <fweisbec@gmail.com>:
> > > > On Sun, Sep 25, 2011 at 09:48:04AM -0700, Paul E. McKenney wrote:
> > > >> This is required for RCU_FAST_NO_HZ, which checks to see whether the
> > > >> current CPU can accelerate the current grace period so as to enter
> > > >> dyntick-idle mode sooner than it would otherwise.  This takes effect
> > > >> in the situation where rcu_needs_cpu() sees that there are callbacks.
> > > >> It then notes a quiescent state (which is illegal in an RCU read-side
> > > >> critical section), calls force_quiescent_state(), and so on.  For this
> > > >> to work, the current CPU must be in an RCU read-side critical section.
> > > >
> > > > You mean it must *not* be in an RCU read-side critical section (ie: in a
> > > > quiescent state)?
> > > >
> > > > That assumption at least fails anytime in idle for the RCU
> > > > sched flavour given that preemption is disabled in the idle loop.
> > > >
> > > >> If this cannot be made to work, another option is to call a new RCU
> > > >> function in the case where rcu_needs_cpu() returned false, but after
> > > >> the RCU read-side critical section has exited.
> > > >
> > > > You mean when rcu_needs_cpu() returns true (when we have callbacks
> > > > enqueued)?
> > > >
> > > >> This new RCU function
> > > >> could then attempt to rearrange RCU so as to allow the CPU to enter
> > > >> dyntick-idle mode more quickly.  It is more important for this to
> > > >> happen when the CPU is going idle than when it is executing a user
> > > >> process.
> > > >>
> > > >> So, is this doable?
> > > >
> > > > At least not when we have RCU sched callbacks enqueued, given preemption
> > > > is disabled. But that sounds plausible in order to accelerate the switch
> > > > to dyntick-idle mode when we only have rcu and/or rcu bh callbacks.
> > > 
> > > But the RCU sched case could be dealt with if we embrace every use of
> > > it with rcu_read_lock_sched() and rcu_read_unlock_sched(), or some light
> > > version that just increases a local counter that rcu_needs_cpu() could check.
> > > 
> > > It's an easy thing to add: we can ensure preempt is disabled when we call it
> > > and we can force rcu_dereference_sched() to depend on it.
> > 
> > Or just check to see if this is the first level of interrupt from the
> > idle task after the scheduler is up.
> 
> I believe it's always the case. tick_nohz_stop_sched_tick() is only called
> from the first level of interrupt in irq_exit().

OK, good, let me see if I really understand this...

Case 1: The interrupt interrupted non-dyntick-idle code.  In this case,
	rcu_needs_cpu() can look at the dyntick-idle state and determine
	that it might not be in a quiescent state.

Case 2: The interrupt interrupted dyntick-idle code.  In this case,
	the interrupted code had better not be in an RCU read-side
	critical section, and rcu_needs_cpu() should be able to
	detect this as well.

Case 3: The interrupt interrupted the process of transitioning to
	or from dyntick-idle mode.  This should be prohibited by
	the local_irq_save() calls, right?

> There is always some race window, as it's based on preempt offset: between
> the sub_preempt_count and the softirqs begin and between softirqs end and the end
> of the interrupt. But an "idle_cpu() || in_interrupt()" check in rcu_read_lock_sched_held()
> should catch those offenders.

But all of this stuff looks to me to be called from the context
of the idle task, so that idle_cpu() will always return "true"...

							Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-26 22:30               ` Paul E. McKenney
@ 2011-09-27 11:55                 ` Frederic Weisbecker
  0 siblings, 0 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2011-09-27 11:55 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Mon, Sep 26, 2011 at 03:30:10PM -0700, Paul E. McKenney wrote:
> On Mon, Sep 26, 2011 at 10:49:53AM +0200, Frederic Weisbecker wrote:
> > On Sun, Sep 25, 2011 at 06:25:01PM -0700, Paul E. McKenney wrote:
> > > On Mon, Sep 26, 2011 at 03:04:21AM +0200, Frederic Weisbecker wrote:
> > > > On Sun, Sep 25, 2011 at 09:48:04AM -0700, Paul E. McKenney wrote:
> > > > > On Sun, Sep 25, 2011 at 03:06:25PM +0200, Frederic Weisbecker wrote:
> > > > > > On Sun, Sep 25, 2011 at 02:26:37PM +0300, Kirill A. Shutemov wrote:
> > > > > > > On Sat, Sep 24, 2011 at 10:08:26PM -0700, Paul E. McKenney wrote:
> > > > > > > > On Sun, Sep 25, 2011 at 03:24:09AM +0300, Kirill A. Shutemov wrote:
> > > > > > > > > [   29.974288] ------------[ cut here ]------------
> > > > > > > > > [   29.974308] WARNING: at /home/kas/git/public/linux-next/kernel/rcutree.c:1833 rcu_needs_cpu+0xff
> > > > > > > > > [   29.974316] Hardware name: HP EliteBook 8440p
> > > > > > > > > [   29.974321] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iple_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc rfcomm bnep acpi_cpufreq mperfckd fscache auth_rpcgss nfs_acl sunrpc ext2 loop kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_idtideodev media v4l2_compat_ioctl32 snd_seq bluetooth drm_kms_helper snd_timer tpm_infineon snd_seq_drt tpm_tis hp_accel intel_ips soundcore lis3lv02d tpm rfkill i2c_algo_bit snd_page_alloc i2c_core c16 sha256_generic aesni_intel cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod sg sr_mod sd_mod cd thermal_sys [last unloaded: scsi_wait_scan]
> > > > > > > > > [   29.974517] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc7-next-20110923 #2
> > > > > > > > > [   29.974521] Call Trace:
> > > > > > > > > [   29.974525]  <IRQ>  [<ffffffff8104d72a>] warn_slowpath_common+0x7a/0xb0
> > > > > > > > > [   29.974540]  [<ffffffff8104d775>] warn_slowpath_null+0x15/0x20
> > > > > > > > > [   29.974546]  [<ffffffff810bffdf>] rcu_needs_cpu+0xff/0x110
> > > > > > > > > [   29.974555]  [<ffffffff8108396f>] tick_nohz_stop_sched_tick+0x13f/0x3d0
> > > > > > > > > [   29.974563]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > > > > > [   29.974571]  [<ffffffff81055622>] irq_exit+0xa2/0xd0
> > > > > > > > > [   29.974578]  [<ffffffff8101ee75>] smp_apic_timer_interrupt+0x85/0x1c0
> > > > > > > > > [   29.974585]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > > > > > [   29.974592]  [<ffffffff81436e1e>] apic_timer_interrupt+0x6e/0x80
> > > > > > > > > [   29.974596]  <EOI>  [<ffffffff81297abd>] ? acpi_hw_read+0x4a/0x51
> > > > > > > > > [   29.974609]  [<ffffffff81087a07>] ? lock_acquire+0xa7/0x160
> > > > > > > > > [   29.974615]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > > > > > [   29.974622]  [<ffffffff81432a16>] __atomic_notifier_call_chain+0x56/0xb0
> > > > > > > > > [   29.974631]  [<ffffffff814329c0>] ? notifier_call_chain+0x70/0x70
> > > > > > > > > [   29.974642]  [<ffffffff8130ebb6>] ? cpuidle_idle_call+0x106/0x350
> > > > > > > > > [   29.974651]  [<ffffffff81432a81>] atomic_notifier_call_chain+0x11/0x20
> > > > > > > > > [   29.974661]  [<ffffffff81001233>] cpu_idle+0xe3/0x120
> > > > > > > > > [   29.974672]  [<ffffffff8141e34b>] start_secondary+0x1fd/0x204
> > > > > > > > > [   29.974681] ---[ end trace 6c1d44095a3bb7c5 ]---
> > > > > > > > 
> > > > > > > > Do the following help?
> > > > > > > > 
> > > > > > > > 	https://lkml.org/lkml/2011/9/17/47
> > > > > > > > 	https://lkml.org/lkml/2011/9/17/45
> > > > > > > > 	https://lkml.org/lkml/2011/9/17/43
> > > > > > > 
> > > > > > > Yes. Thanks.
> > > > > > 
> > > > > > I believe that doesn't really fix the issue. But the warning is not
> > > > > > easy to trigger. You simply haven't hit it by chance after applying
> > > > > > the patches.
> > > > > > 
> > > > > > This happens when the idle notifier callchain is called in idle
> > > > > > and is interrupted in the middle. So we have called rcu_read_lock()
> > > > > > but haven't yet released with rcu_read_unlock(), and in the end
> > > > > > of the interrupt we call tick_nohz_stop_sched_tick() -> rcu_needs_cpu()
> > > > > > which is illegal while in an rcu read side critical section.
> > > > > > 
> > > > > > No idea how to solve that. Any use of RCU after the tick gets stopped
> > > > > > is concerned here. If it is really required that rcu_needs_cpu() can't
> > > > > > be called in an rcu read side critical sectionn then it's not going
> > > > > > to be easy to fix.
> > > > > > 
> > > > > > But I don't really understand that requirement. rcu_needs_cpu() simply
> > > > > > checks if we don't have callbacks to handle. So I don't understand how
> > > > > > read side is concerned. It's rather the write side.
> > > > > > The rule I can imagine instead is: don't call __call_rcu() once the tick is
> > > > > > stopped.
> > > > > > 
> > > > > > But I'm certainly missing something.
> > > > > > 
> > > > > > Paul?
> > > > > 
> > > > > This is required for RCU_FAST_NO_HZ, which checks to see whether the
> > > > > current CPU can accelerate the current grace period so as to enter
> > > > > dyntick-idle mode sooner than it would otherwise.  This takes effect
> > > > > in the situation where rcu_needs_cpu() sees that there are callbacks.
> > > > > It then notes a quiescent state (which is illegal in an RCU read-side
> > > > > critical section), calls force_quiescent_state(), and so on.  For this
> > > > > to work, the current CPU must be in an RCU read-side critical section.
> > > > 
> > > > You mean it must *not* be in an RCU read-side critical section (ie: in a
> > > > quiescent state)?
> > > 
> > > Yes, you are right, it must -not- be in an RCU read-side critical section.
> > > 
> > > > That assumption at least fails anytime in idle for the RCU
> > > > sched flavour given that preemption is disabled in the idle loop.
> > > 
> > > Except that the idle loop is a quiescent state.
> > > 
> > > > > If this cannot be made to work, another option is to call a new RCU
> > > > > function in the case where rcu_needs_cpu() returned false, but after
> > > > > the RCU read-side critical section has exited.
> > > > 
> > > > You mean when rcu_needs_cpu() returns true (when we have callbacks
> > > > enqueued)?
> > > 
> > > Yes.  I definitely am having problems with polarity this weekend.  :-/
> > > 
> > > > > This new RCU function
> > > > > could then attempt to rearrange RCU so as to allow the CPU to enter
> > > > > dyntick-idle mode more quickly.  It is more important for this to
> > > > > happen when the CPU is going idle than when it is executing a user
> > > > > process.
> > > > > 
> > > > > So, is this doable?
> > > > 
> > > > At least not when we have RCU sched callbacks enqueued, given preemption
> > > > is disabled. But that sounds plausible in order to accelerate the switch
> > > > to dyntick-idle mode when we only have rcu and/or rcu bh callbacks.
> > > 
> > > Again, the idle loop is a quiescent state for RCU-sched.
> > 
> > We need to add an idle_cpu() check in rcu_read_lock_sched_held()
> > and rcu_read_lock_bh_held().
> 
> OK, but just to make sure...  For this to work, the idle notifiers would
> need to be invoked from outside of the idle task.  If the idle notifiers
> are instead invoked from idle-task context, then RCU needs to get more
> precise about what it thinks of as "idle", probably via a per-CPU variable
> that gets set and cleared within each idle loop.

But the idle notifiers run under RCU RCU, not RCU sched. Is the fact we are
in idle a quiescent state also for that flavor? Hmm, may be in the !PREEMPT
case? :-s

> 
> /me reminisces about the time when the definition of "idle" seemed so
> simple...  ;-)

;-)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-26 22:34                   ` Paul E. McKenney
@ 2011-09-27 12:07                     ` Frederic Weisbecker
  0 siblings, 0 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2011-09-27 12:07 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Mon, Sep 26, 2011 at 03:34:26PM -0700, Paul E. McKenney wrote:
> On Mon, Sep 26, 2011 at 11:39:41AM +0200, Frederic Weisbecker wrote:
> > On Sun, Sep 25, 2011 at 06:41:18PM -0700, Paul E. McKenney wrote:
> > > On Sun, Sep 25, 2011 at 06:26:11PM -0700, Paul E. McKenney wrote:
> > > > On Mon, Sep 26, 2011 at 03:10:33AM +0200, Frederic Weisbecker wrote:
> > > > > 2011/9/26 Frederic Weisbecker <fweisbec@gmail.com>:
> > > > > > On Sun, Sep 25, 2011 at 09:48:04AM -0700, Paul E. McKenney wrote:
> > > > > >> This is required for RCU_FAST_NO_HZ, which checks to see whether the
> > > > > >> current CPU can accelerate the current grace period so as to enter
> > > > > >> dyntick-idle mode sooner than it would otherwise.  This takes effect
> > > > > >> in the situation where rcu_needs_cpu() sees that there are callbacks.
> > > > > >> It then notes a quiescent state (which is illegal in an RCU read-side
> > > > > >> critical section), calls force_quiescent_state(), and so on.  For this
> > > > > >> to work, the current CPU must be in an RCU read-side critical section.
> > > > > >
> > > > > > You mean it must *not* be in an RCU read-side critical section (ie: in a
> > > > > > quiescent state)?
> > > > > >
> > > > > > That assumption at least fails anytime in idle for the RCU
> > > > > > sched flavour given that preemption is disabled in the idle loop.
> > > > > >
> > > > > >> If this cannot be made to work, another option is to call a new RCU
> > > > > >> function in the case where rcu_needs_cpu() returned false, but after
> > > > > >> the RCU read-side critical section has exited.
> > > > > >
> > > > > > You mean when rcu_needs_cpu() returns true (when we have callbacks
> > > > > > enqueued)?
> > > > > >
> > > > > >> This new RCU function
> > > > > >> could then attempt to rearrange RCU so as to allow the CPU to enter
> > > > > >> dyntick-idle mode more quickly.  It is more important for this to
> > > > > >> happen when the CPU is going idle than when it is executing a user
> > > > > >> process.
> > > > > >>
> > > > > >> So, is this doable?
> > > > > >
> > > > > > At least not when we have RCU sched callbacks enqueued, given preemption
> > > > > > is disabled. But that sounds plausible in order to accelerate the switch
> > > > > > to dyntick-idle mode when we only have rcu and/or rcu bh callbacks.
> > > > > 
> > > > > But the RCU sched case could be dealt with if we embrace every use of
> > > > > it with rcu_read_lock_sched() and rcu_read_unlock_sched(), or some light
> > > > > version that just increases a local counter that rcu_needs_cpu() could check.
> > > > > 
> > > > > It's an easy thing to add: we can ensure preempt is disabled when we call it
> > > > > and we can force rcu_dereference_sched() to depend on it.
> > > > 
> > > > Or just check to see if this is the first level of interrupt from the
> > > > idle task after the scheduler is up.
> > > 
> > > Hmmm...  Is it the case that rcu_needs_cpu() gets called from within an
> > > RCU read-side critical section only when called from an interrupt that
> > > interrupted an RCU read-side critical section (keeping in mind that the
> > > idle loop is a quiescent state regardless of preemption)?
> > 
> > Yeah. rcu_needs_cpu() can be called from an irq that either interrupted
> > an rcu read side critical section or a bh one. But not a sched one if
> > we forbid rcu sched uses in the preempt offset race windows I described
> > in a previous mail.
> 
> But can't I just assume that if rcu_needs_cpu is invoked within
> a second-level interrupt handler that it might be in any type of
> RCU read-side critical section?  I could determine this by checking
> RCU's dyntick-idle nesting state.

No, rcu_needs_cpu() can only be called from the first level of interrupt.

> 
> Such checks are not necessary if CONFIG_NO_HZ=n because in that
> case rcu_needs_cpu() is just checking the callback queues, with
> no assumptions about quiescent states.

I believe it's not even called when CONFIG_NO_HZ=n

> 
> > > If so, I should be able to do the appropriate checks within
> > > rcu_needs_cpu().
> > 
> > Right. But to know if you interrupted an rcu read side, don't you
> > need a specific counter when !CONFIG_PREEMPT?
> 
> Not if it is OK to assume that rcu_needs_cpu() can only be called from
> within an RCU read-side interrupt handler if it is invoked from within a
> second-level interrupt handler or if it interrupted some non-dyntick-idle
> process-level code.
> 
> So, is this assumption valid?

Not sure I understand what you mean. But currently it can only be called from:

- idle
- first interrupt level, interrupting idle, but at a time where in_interrupt() returns 0

With idle beeing or not in extended quiescent state.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-26 22:50                 ` Paul E. McKenney
@ 2011-09-27 12:16                   ` Frederic Weisbecker
  2011-09-27 18:01                     ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2011-09-27 12:16 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Mon, Sep 26, 2011 at 03:50:32PM -0700, Paul E. McKenney wrote:
> On Mon, Sep 26, 2011 at 11:20:55AM +0200, Frederic Weisbecker wrote:
> > On Sun, Sep 25, 2011 at 06:26:11PM -0700, Paul E. McKenney wrote:
> > > On Mon, Sep 26, 2011 at 03:10:33AM +0200, Frederic Weisbecker wrote:
> > > > 2011/9/26 Frederic Weisbecker <fweisbec@gmail.com>:
> > > > > On Sun, Sep 25, 2011 at 09:48:04AM -0700, Paul E. McKenney wrote:
> > > > >> This is required for RCU_FAST_NO_HZ, which checks to see whether the
> > > > >> current CPU can accelerate the current grace period so as to enter
> > > > >> dyntick-idle mode sooner than it would otherwise.  This takes effect
> > > > >> in the situation where rcu_needs_cpu() sees that there are callbacks.
> > > > >> It then notes a quiescent state (which is illegal in an RCU read-side
> > > > >> critical section), calls force_quiescent_state(), and so on.  For this
> > > > >> to work, the current CPU must be in an RCU read-side critical section.
> > > > >
> > > > > You mean it must *not* be in an RCU read-side critical section (ie: in a
> > > > > quiescent state)?
> > > > >
> > > > > That assumption at least fails anytime in idle for the RCU
> > > > > sched flavour given that preemption is disabled in the idle loop.
> > > > >
> > > > >> If this cannot be made to work, another option is to call a new RCU
> > > > >> function in the case where rcu_needs_cpu() returned false, but after
> > > > >> the RCU read-side critical section has exited.
> > > > >
> > > > > You mean when rcu_needs_cpu() returns true (when we have callbacks
> > > > > enqueued)?
> > > > >
> > > > >> This new RCU function
> > > > >> could then attempt to rearrange RCU so as to allow the CPU to enter
> > > > >> dyntick-idle mode more quickly.  It is more important for this to
> > > > >> happen when the CPU is going idle than when it is executing a user
> > > > >> process.
> > > > >>
> > > > >> So, is this doable?
> > > > >
> > > > > At least not when we have RCU sched callbacks enqueued, given preemption
> > > > > is disabled. But that sounds plausible in order to accelerate the switch
> > > > > to dyntick-idle mode when we only have rcu and/or rcu bh callbacks.
> > > > 
> > > > But the RCU sched case could be dealt with if we embrace every use of
> > > > it with rcu_read_lock_sched() and rcu_read_unlock_sched(), or some light
> > > > version that just increases a local counter that rcu_needs_cpu() could check.
> > > > 
> > > > It's an easy thing to add: we can ensure preempt is disabled when we call it
> > > > and we can force rcu_dereference_sched() to depend on it.
> > > 
> > > Or just check to see if this is the first level of interrupt from the
> > > idle task after the scheduler is up.
> > 
> > I believe it's always the case. tick_nohz_stop_sched_tick() is only called
> > from the first level of interrupt in irq_exit().
> 
> OK, good, let me see if I really understand this...
> 
> Case 1: The interrupt interrupted non-dyntick-idle code.  In this case,
> 	rcu_needs_cpu() can look at the dyntick-idle state and determine
> 	that it might not be in a quiescent state.

I guess by dyntick idle code you mean the fact that the RCU in is
extended quiescent state? (Not just the tick is stopped)

If so yeah that looks good.

> 
> Case 2: The interrupt interrupted dyntick-idle code.  In this case,
> 	the interrupted code had better not be in an RCU read-side
> 	critical section, and rcu_needs_cpu() should be able to
> 	detect this as well.

Yeah.

We already do the appropriate debug checks from the RCU read side
APIs so I guess rcu_needs_cpu() doesn't even need to do its own
debugging checks here about extended qs.

But indeed it can return right away if we are in extended qs.

> 
> Case 3: The interrupt interrupted the process of transitioning to
> 	or from dyntick-idle mode.  This should be prohibited by
> 	the local_irq_save() calls, right?

Indeed.

> 
> > There is always some race window, as it's based on preempt offset: between
> > the sub_preempt_count and the softirqs begin and between softirqs end and the end
> > of the interrupt. But an "idle_cpu() || in_interrupt()" check in rcu_read_lock_sched_held()
> > should catch those offenders.
> 
> But all of this stuff looks to me to be called from the context
> of the idle task, so that idle_cpu() will always return "true"...

I meant "idle_cpu() && !in_interrupt()" that should return false in
rcu_read_lock_sched_held().

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-27 12:16                   ` Frederic Weisbecker
@ 2011-09-27 18:01                     ` Paul E. McKenney
  2011-09-28 12:31                       ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2011-09-27 18:01 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Tue, Sep 27, 2011 at 02:16:50PM +0200, Frederic Weisbecker wrote:
> On Mon, Sep 26, 2011 at 03:50:32PM -0700, Paul E. McKenney wrote:
> > On Mon, Sep 26, 2011 at 11:20:55AM +0200, Frederic Weisbecker wrote:
> > > On Sun, Sep 25, 2011 at 06:26:11PM -0700, Paul E. McKenney wrote:
> > > > On Mon, Sep 26, 2011 at 03:10:33AM +0200, Frederic Weisbecker wrote:
> > > > > 2011/9/26 Frederic Weisbecker <fweisbec@gmail.com>:
> > > > > > On Sun, Sep 25, 2011 at 09:48:04AM -0700, Paul E. McKenney wrote:
> > > > > >> This is required for RCU_FAST_NO_HZ, which checks to see whether the
> > > > > >> current CPU can accelerate the current grace period so as to enter
> > > > > >> dyntick-idle mode sooner than it would otherwise.  This takes effect
> > > > > >> in the situation where rcu_needs_cpu() sees that there are callbacks.
> > > > > >> It then notes a quiescent state (which is illegal in an RCU read-side
> > > > > >> critical section), calls force_quiescent_state(), and so on.  For this
> > > > > >> to work, the current CPU must be in an RCU read-side critical section.
> > > > > >
> > > > > > You mean it must *not* be in an RCU read-side critical section (ie: in a
> > > > > > quiescent state)?
> > > > > >
> > > > > > That assumption at least fails anytime in idle for the RCU
> > > > > > sched flavour given that preemption is disabled in the idle loop.
> > > > > >
> > > > > >> If this cannot be made to work, another option is to call a new RCU
> > > > > >> function in the case where rcu_needs_cpu() returned false, but after
> > > > > >> the RCU read-side critical section has exited.
> > > > > >
> > > > > > You mean when rcu_needs_cpu() returns true (when we have callbacks
> > > > > > enqueued)?
> > > > > >
> > > > > >> This new RCU function
> > > > > >> could then attempt to rearrange RCU so as to allow the CPU to enter
> > > > > >> dyntick-idle mode more quickly.  It is more important for this to
> > > > > >> happen when the CPU is going idle than when it is executing a user
> > > > > >> process.
> > > > > >>
> > > > > >> So, is this doable?
> > > > > >
> > > > > > At least not when we have RCU sched callbacks enqueued, given preemption
> > > > > > is disabled. But that sounds plausible in order to accelerate the switch
> > > > > > to dyntick-idle mode when we only have rcu and/or rcu bh callbacks.
> > > > > 
> > > > > But the RCU sched case could be dealt with if we embrace every use of
> > > > > it with rcu_read_lock_sched() and rcu_read_unlock_sched(), or some light
> > > > > version that just increases a local counter that rcu_needs_cpu() could check.
> > > > > 
> > > > > It's an easy thing to add: we can ensure preempt is disabled when we call it
> > > > > and we can force rcu_dereference_sched() to depend on it.
> > > > 
> > > > Or just check to see if this is the first level of interrupt from the
> > > > idle task after the scheduler is up.
> > > 
> > > I believe it's always the case. tick_nohz_stop_sched_tick() is only called
> > > from the first level of interrupt in irq_exit().
> > 
> > OK, good, let me see if I really understand this...
> > 
> > Case 1: The interrupt interrupted non-dyntick-idle code.  In this case,
> > 	rcu_needs_cpu() can look at the dyntick-idle state and determine
> > 	that it might not be in a quiescent state.
> 
> I guess by dyntick idle code you mean the fact that the RCU in is
> extended quiescent state? (Not just the tick is stopped)
> 
> If so yeah that looks good.
> 
> > 
> > Case 2: The interrupt interrupted dyntick-idle code.  In this case,
> > 	the interrupted code had better not be in an RCU read-side
> > 	critical section, and rcu_needs_cpu() should be able to
> > 	detect this as well.
> 
> Yeah.
> 
> We already do the appropriate debug checks from the RCU read side
> APIs so I guess rcu_needs_cpu() doesn't even need to do its own
> debugging checks here about extended qs.
> 
> But indeed it can return right away if we are in extended qs.
> 
> > 
> > Case 3: The interrupt interrupted the process of transitioning to
> > 	or from dyntick-idle mode.  This should be prohibited by
> > 	the local_irq_save() calls, right?
> 
> Indeed.
> 
> > 
> > > There is always some race window, as it's based on preempt offset: between
> > > the sub_preempt_count and the softirqs begin and between softirqs end and the end
> > > of the interrupt. But an "idle_cpu() || in_interrupt()" check in rcu_read_lock_sched_held()
> > > should catch those offenders.
> > 
> > But all of this stuff looks to me to be called from the context
> > of the idle task, so that idle_cpu() will always return "true"...
> 
> I meant "idle_cpu() && !in_interrupt()" that should return false in
> rcu_read_lock_sched_held().

The problem is that the idle tasks now seem to make quite a bit of use
of RCU on entry to and exit from the idle loop itself, for example,
via tracing.  So it seems like it is time to have the idle loop
explictly tell RCU when the idle extended quiescent state is in effect.

An experimental patch along these lines is included below.  Does this
approach seem reasonable, or am I missing something subtle (or even
not so subtle) here?

							Thanx, Paul

------------------------------------------------------------------------

rcu: Explicitly track idle CPUs.

In the good old days, RCU simply checked to see if it was running in
the context of an idle task to determine whether or not it was in the
idle extended quiescent state.  However, the entry to and exit from
idle has become more ornate over the years, and some of this processing
now uses RCU while running in the context of the idle task.  It is
therefore no longer reasonable to assume that anything running in the
context of one of the idle tasks is in an extended quiscent state.

This commit therefore explicitly tracks whether each CPU is in the
idle loop, allowing the idle task to use RCU anywhere except in those
portions of the idle loops where RCU has been explicitly informed that
it is in a quiescent state.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 9d40e42..5b7e62c 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -177,6 +177,9 @@ extern void rcu_sched_qs(int cpu);
 extern void rcu_bh_qs(int cpu);
 extern void rcu_check_callbacks(int cpu, int user);
 struct notifier_block;
+extern void rcu_idle_enter(void);
+extern void rcu_idle_exit(void);
+extern int rcu_is_cpu_idle(void);
 
 #ifdef CONFIG_NO_HZ
 
@@ -187,10 +190,12 @@ extern void rcu_exit_nohz(void);
 
 static inline void rcu_enter_nohz(void)
 {
+	rcu_idle_enter();
 }
 
 static inline void rcu_exit_nohz(void)
 {
+	rcu_idle_exit();
 }
 
 #endif /* #else #ifdef CONFIG_NO_HZ */
diff --git a/include/linux/tick.h b/include/linux/tick.h
index 375e7d8..cd9e2d1 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -131,8 +131,16 @@ extern ktime_t tick_nohz_get_sleep_length(void);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
 extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
 # else
-static inline void tick_nohz_idle_enter(bool rcu_ext_qs) { }
-static inline void tick_nohz_idle_exit(void) { }
+static inline void tick_nohz_idle_enter(bool rcu_ext_qs)
+{
+	if (rcu_ext_qs())
+		rcu_idle_enter();
+}
+static inline void tick_nohz_idle_exit(void)
+{
+	if (rcu_ext_qs())
+		rcu_idle_exit();
+}
 static inline ktime_t tick_nohz_get_sleep_length(void)
 {
 	ktime_t len = { .tv64 = NSEC_PER_SEC/HZ };
diff --git a/kernel/rcu.h b/kernel/rcu.h
index f600868..220b4fe 100644
--- a/kernel/rcu.h
+++ b/kernel/rcu.h
@@ -23,6 +23,8 @@
 #ifndef __LINUX_RCU_H
 #define __LINUX_RCU_H
 
+/* Avoid tracing overhead if not configure, mostly for RCU_TINY's benefit. */
+
 #ifdef CONFIG_RCU_TRACE
 #define RCU_TRACE(stmt) stmt
 #else /* #ifdef CONFIG_RCU_TRACE */
diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index e4d8a98..daf3f92 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -114,6 +114,45 @@ EXPORT_SYMBOL_GPL(rcu_read_lock_bh_held);
 
 #endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
 
+/*
+ * In the good old days, RCU just checked to see if the idle task was
+ * running to check for being in the idle loop, which is an extended
+ * quiescent state.  However, RCU is now used by a number of architectures
+ * on the way to the idle loop, but still executed by the idle tasks.
+ * Therefore, we now provide a per-CPU variable for tracking whether
+ * or not a given CPU is idle from an RCU perspective.
+ */
+DEFINE_PER_CPU(char, rcu_cpu_is_idle);
+
+/*
+ * Inform RCU that the current CPU is entering the idle extended quiescent
+ * state.  Preemption must be disabled.
+ */
+void rcu_idle_enter(void)
+{
+	__this_cpu_write(rcu_cpu_is_idle, 1);
+}
+
+/*
+ * Inform RCU that the current CPU is leaving the idle extended quiescent
+ * state.  Preemption must again be disabled.
+ */
+void rcu_idle_exit(void)
+{
+	__this_cpu_write(rcu_cpu_is_idle, 0);
+}
+
+/*
+ * Check to see whether RCU thinks that the current CPU is idle.
+ * Note that interrupt handlers from idle may or may not be counted
+ * as idle by this approach.  If you care about the difference,
+ * be sure to check explicitly like rcu_check_callbacks() does.
+ */
+int rcu_is_cpu_idle(void)
+{
+	return __this_cpu_read(rcu_cpu_is_idle);
+}
+
 struct rcu_synchronize {
 	struct rcu_head head;
 	struct completion completion;
diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
index 9e493b9..6d7207d 100644
--- a/kernel/rcutiny.c
+++ b/kernel/rcutiny.c
@@ -65,8 +65,10 @@ static long rcu_dynticks_nesting = 1;
  */
 void rcu_enter_nohz(void)
 {
-	if (--rcu_dynticks_nesting == 0)
+	if (--rcu_dynticks_nesting == 0) {
 		rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
+		rcu_idle_enter();
+	}
 }
 
 /*
@@ -75,7 +77,8 @@ void rcu_enter_nohz(void)
  */
 void rcu_exit_nohz(void)
 {
-	rcu_dynticks_nesting++;
+	if (rcu_dynticks_nesting++ == 0)
+		rcu_idle_exit();
 }
 
 
@@ -146,7 +149,7 @@ void rcu_bh_qs(int cpu)
 void rcu_check_callbacks(int cpu, int user)
 {
 	if (user ||
-	    (idle_cpu(cpu) &&
+	    (rcu_is_cpu_idle() &&
 	     !in_softirq() &&
 	     hardirq_count() <= (1 << HARDIRQ_SHIFT)))
 		rcu_sched_qs(cpu);
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 6a7b9bb..00a0eb0 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -370,6 +370,7 @@ void rcu_enter_nohz(void)
 	atomic_inc(&rdtp->dynticks);
 	smp_mb__after_atomic_inc();  /* Force ordering with next sojourn. */
 	WARN_ON_ONCE(atomic_read(&rdtp->dynticks) & 0x1);
+	rcu_idle_enter();
 	local_irq_restore(flags);
 }
 
@@ -396,6 +397,7 @@ void rcu_exit_nohz(void)
 	smp_mb__after_atomic_inc();  /* See above. */
 	WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
 	trace_rcu_dyntick("End");
+	rcu_idle_exit();
 	local_irq_restore(flags);
 }
 
@@ -1360,7 +1362,7 @@ void rcu_check_callbacks(int cpu, int user)
 {
 	trace_rcu_utilization("Start scheduler-tick");
 	if (user ||
-	    (idle_cpu(cpu) && rcu_scheduler_active &&
+	    (rcu_is_cpu_idle() && rcu_scheduler_active &&
 	     !in_softirq() && hardirq_count() <= (1 << HARDIRQ_SHIFT))) {
 
 		/*

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-27 18:01                     ` Paul E. McKenney
@ 2011-09-28 12:31                       ` Frederic Weisbecker
  2011-09-28 18:40                         ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2011-09-28 12:31 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Tue, Sep 27, 2011 at 11:01:42AM -0700, Paul E. McKenney wrote:
> On Tue, Sep 27, 2011 at 02:16:50PM +0200, Frederic Weisbecker wrote:
> > On Mon, Sep 26, 2011 at 03:50:32PM -0700, Paul E. McKenney wrote:
> > > On Mon, Sep 26, 2011 at 11:20:55AM +0200, Frederic Weisbecker wrote:
> > > > On Sun, Sep 25, 2011 at 06:26:11PM -0700, Paul E. McKenney wrote:
> > > > > On Mon, Sep 26, 2011 at 03:10:33AM +0200, Frederic Weisbecker wrote:
> > > > > > 2011/9/26 Frederic Weisbecker <fweisbec@gmail.com>:
> > > > > > > On Sun, Sep 25, 2011 at 09:48:04AM -0700, Paul E. McKenney wrote:
> > > > > > >> This is required for RCU_FAST_NO_HZ, which checks to see whether the
> > > > > > >> current CPU can accelerate the current grace period so as to enter
> > > > > > >> dyntick-idle mode sooner than it would otherwise.  This takes effect
> > > > > > >> in the situation where rcu_needs_cpu() sees that there are callbacks.
> > > > > > >> It then notes a quiescent state (which is illegal in an RCU read-side
> > > > > > >> critical section), calls force_quiescent_state(), and so on.  For this
> > > > > > >> to work, the current CPU must be in an RCU read-side critical section.
> > > > > > >
> > > > > > > You mean it must *not* be in an RCU read-side critical section (ie: in a
> > > > > > > quiescent state)?
> > > > > > >
> > > > > > > That assumption at least fails anytime in idle for the RCU
> > > > > > > sched flavour given that preemption is disabled in the idle loop.
> > > > > > >
> > > > > > >> If this cannot be made to work, another option is to call a new RCU
> > > > > > >> function in the case where rcu_needs_cpu() returned false, but after
> > > > > > >> the RCU read-side critical section has exited.
> > > > > > >
> > > > > > > You mean when rcu_needs_cpu() returns true (when we have callbacks
> > > > > > > enqueued)?
> > > > > > >
> > > > > > >> This new RCU function
> > > > > > >> could then attempt to rearrange RCU so as to allow the CPU to enter
> > > > > > >> dyntick-idle mode more quickly.  It is more important for this to
> > > > > > >> happen when the CPU is going idle than when it is executing a user
> > > > > > >> process.
> > > > > > >>
> > > > > > >> So, is this doable?
> > > > > > >
> > > > > > > At least not when we have RCU sched callbacks enqueued, given preemption
> > > > > > > is disabled. But that sounds plausible in order to accelerate the switch
> > > > > > > to dyntick-idle mode when we only have rcu and/or rcu bh callbacks.
> > > > > > 
> > > > > > But the RCU sched case could be dealt with if we embrace every use of
> > > > > > it with rcu_read_lock_sched() and rcu_read_unlock_sched(), or some light
> > > > > > version that just increases a local counter that rcu_needs_cpu() could check.
> > > > > > 
> > > > > > It's an easy thing to add: we can ensure preempt is disabled when we call it
> > > > > > and we can force rcu_dereference_sched() to depend on it.
> > > > > 
> > > > > Or just check to see if this is the first level of interrupt from the
> > > > > idle task after the scheduler is up.
> > > > 
> > > > I believe it's always the case. tick_nohz_stop_sched_tick() is only called
> > > > from the first level of interrupt in irq_exit().
> > > 
> > > OK, good, let me see if I really understand this...
> > > 
> > > Case 1: The interrupt interrupted non-dyntick-idle code.  In this case,
> > > 	rcu_needs_cpu() can look at the dyntick-idle state and determine
> > > 	that it might not be in a quiescent state.
> > 
> > I guess by dyntick idle code you mean the fact that the RCU in is
> > extended quiescent state? (Not just the tick is stopped)
> > 
> > If so yeah that looks good.
> > 
> > > 
> > > Case 2: The interrupt interrupted dyntick-idle code.  In this case,
> > > 	the interrupted code had better not be in an RCU read-side
> > > 	critical section, and rcu_needs_cpu() should be able to
> > > 	detect this as well.
> > 
> > Yeah.
> > 
> > We already do the appropriate debug checks from the RCU read side
> > APIs so I guess rcu_needs_cpu() doesn't even need to do its own
> > debugging checks here about extended qs.
> > 
> > But indeed it can return right away if we are in extended qs.
> > 
> > > 
> > > Case 3: The interrupt interrupted the process of transitioning to
> > > 	or from dyntick-idle mode.  This should be prohibited by
> > > 	the local_irq_save() calls, right?
> > 
> > Indeed.
> > 
> > > 
> > > > There is always some race window, as it's based on preempt offset: between
> > > > the sub_preempt_count and the softirqs begin and between softirqs end and the end
> > > > of the interrupt. But an "idle_cpu() || in_interrupt()" check in rcu_read_lock_sched_held()
> > > > should catch those offenders.
> > > 
> > > But all of this stuff looks to me to be called from the context
> > > of the idle task, so that idle_cpu() will always return "true"...
> > 
> > I meant "idle_cpu() && !in_interrupt()" that should return false in
> > rcu_read_lock_sched_held().
> 
> The problem is that the idle tasks now seem to make quite a bit of use
> of RCU on entry to and exit from the idle loop itself, for example,
> via tracing.  So it seems like it is time to have the idle loop
> explictly tell RCU when the idle extended quiescent state is in effect.
> 
> An experimental patch along these lines is included below.  Does this
> approach seem reasonable, or am I missing something subtle (or even
> not so subtle) here?
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> rcu: Explicitly track idle CPUs.
> 
> In the good old days, RCU simply checked to see if it was running in
> the context of an idle task to determine whether or not it was in the
> idle extended quiescent state.  However, the entry to and exit from
> idle has become more ornate over the years, and some of this processing
> now uses RCU while running in the context of the idle task.  It is
> therefore no longer reasonable to assume that anything running in the
> context of one of the idle tasks is in an extended quiscent state.
> 
> This commit therefore explicitly tracks whether each CPU is in the
> idle loop, allowing the idle task to use RCU anywhere except in those
> portions of the idle loops where RCU has been explicitly informed that
> it is in a quiescent state.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

I fear we indeed need that now.

Just some comments:

> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 9d40e42..5b7e62c 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -177,6 +177,9 @@ extern void rcu_sched_qs(int cpu);
>  extern void rcu_bh_qs(int cpu);
>  extern void rcu_check_callbacks(int cpu, int user);
>  struct notifier_block;
> +extern void rcu_idle_enter(void);
> +extern void rcu_idle_exit(void);
> +extern int rcu_is_cpu_idle(void);
>  
>  #ifdef CONFIG_NO_HZ
>  
> @@ -187,10 +190,12 @@ extern void rcu_exit_nohz(void);
>  
>  static inline void rcu_enter_nohz(void)
>  {
> +	rcu_idle_enter();
>  }
>  
>  static inline void rcu_exit_nohz(void)
>  {
> +	rcu_idle_exit();
>  }
>  
>  #endif /* #else #ifdef CONFIG_NO_HZ */
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index 375e7d8..cd9e2d1 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -131,8 +131,16 @@ extern ktime_t tick_nohz_get_sleep_length(void);
>  extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
>  extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
>  # else
> -static inline void tick_nohz_idle_enter(bool rcu_ext_qs) { }
> -static inline void tick_nohz_idle_exit(void) { }
> +static inline void tick_nohz_idle_enter(bool rcu_ext_qs)
> +{
> +	if (rcu_ext_qs())
> +		rcu_idle_enter();
> +}

rcu_ext_qs is not a function.

> +static inline void tick_nohz_idle_exit(void)
> +{
> +	if (rcu_ext_qs())
> +		rcu_idle_exit();
> +}

So we probably need to track whether we entered in rcu_ext_qs
so that we can know if we cann rcu_idle_exit(). Or may
be pass the rcu_ext_qs parameter down to tick_nohz_idle_exit() as well.

>  static inline ktime_t tick_nohz_get_sleep_length(void)
>  {
>  	ktime_t len = { .tv64 = NSEC_PER_SEC/HZ };
> diff --git a/kernel/rcu.h b/kernel/rcu.h
> index f600868..220b4fe 100644
> --- a/kernel/rcu.h
> +++ b/kernel/rcu.h
> @@ -23,6 +23,8 @@
>  #ifndef __LINUX_RCU_H
>  #define __LINUX_RCU_H
>  
> +/* Avoid tracing overhead if not configure, mostly for RCU_TINY's benefit. */
> +
>  #ifdef CONFIG_RCU_TRACE
>  #define RCU_TRACE(stmt) stmt
>  #else /* #ifdef CONFIG_RCU_TRACE */
<snip>
> diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
> index 9e493b9..6d7207d 100644
> --- a/kernel/rcutiny.c
> +++ b/kernel/rcutiny.c
> @@ -65,8 +65,10 @@ static long rcu_dynticks_nesting = 1;
>   */
>  void rcu_enter_nohz(void)
>  {
> -	if (--rcu_dynticks_nesting == 0)
> +	if (--rcu_dynticks_nesting == 0) {
>  		rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
> +		rcu_idle_enter();

Although idle and rcu/nohz are still close notions, it sounds
more logical the other way around in the ordering:

tick_nohz_idle_enter() {
	rcu_idle_enter() {
		rcu_enter_nohz();
	}
}

tick_nohz_irq_exit() {
        rcu_idle_enter() {
                rcu_enter_nohz();
        }
}

Because rcu ext qs is something used by idle, not the opposite.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-28 12:31                       ` Frederic Weisbecker
@ 2011-09-28 18:40                         ` Paul E. McKenney
  2011-09-28 23:46                           ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2011-09-28 18:40 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Wed, Sep 28, 2011 at 02:31:21PM +0200, Frederic Weisbecker wrote:
> On Tue, Sep 27, 2011 at 11:01:42AM -0700, Paul E. McKenney wrote:
> > On Tue, Sep 27, 2011 at 02:16:50PM +0200, Frederic Weisbecker wrote:

[ . . . ]

> > > > But all of this stuff looks to me to be called from the context
> > > > of the idle task, so that idle_cpu() will always return "true"...
> > > 
> > > I meant "idle_cpu() && !in_interrupt()" that should return false in
> > > rcu_read_lock_sched_held().
> > 
> > The problem is that the idle tasks now seem to make quite a bit of use
> > of RCU on entry to and exit from the idle loop itself, for example,
> > via tracing.  So it seems like it is time to have the idle loop
> > explictly tell RCU when the idle extended quiescent state is in effect.
> > 
> > An experimental patch along these lines is included below.  Does this
> > approach seem reasonable, or am I missing something subtle (or even
> > not so subtle) here?
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > rcu: Explicitly track idle CPUs.
> > 
> > In the good old days, RCU simply checked to see if it was running in
> > the context of an idle task to determine whether or not it was in the
> > idle extended quiescent state.  However, the entry to and exit from
> > idle has become more ornate over the years, and some of this processing
> > now uses RCU while running in the context of the idle task.  It is
> > therefore no longer reasonable to assume that anything running in the
> > context of one of the idle tasks is in an extended quiscent state.
> > 
> > This commit therefore explicitly tracks whether each CPU is in the
> > idle loop, allowing the idle task to use RCU anywhere except in those
> > portions of the idle loops where RCU has been explicitly informed that
> > it is in a quiescent state.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> I fear we indeed need that now.

And we probably need to factor this patch stack.  Given the number
of warnings and errors due to RCU's confusion about what means "idle",
we simply are not bisectable as is.

Nevertheless, see below for an incremental patch based on your feedback.

> Just some comments:
> 
> > 
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 9d40e42..5b7e62c 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -177,6 +177,9 @@ extern void rcu_sched_qs(int cpu);
> >  extern void rcu_bh_qs(int cpu);
> >  extern void rcu_check_callbacks(int cpu, int user);
> >  struct notifier_block;
> > +extern void rcu_idle_enter(void);
> > +extern void rcu_idle_exit(void);
> > +extern int rcu_is_cpu_idle(void);
> >  
> >  #ifdef CONFIG_NO_HZ
> >  
> > @@ -187,10 +190,12 @@ extern void rcu_exit_nohz(void);
> >  
> >  static inline void rcu_enter_nohz(void)
> >  {
> > +	rcu_idle_enter();
> >  }
> >  
> >  static inline void rcu_exit_nohz(void)
> >  {
> > +	rcu_idle_exit();
> >  }
> >  
> >  #endif /* #else #ifdef CONFIG_NO_HZ */
> > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > index 375e7d8..cd9e2d1 100644
> > --- a/include/linux/tick.h
> > +++ b/include/linux/tick.h
> > @@ -131,8 +131,16 @@ extern ktime_t tick_nohz_get_sleep_length(void);
> >  extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
> >  extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
> >  # else
> > -static inline void tick_nohz_idle_enter(bool rcu_ext_qs) { }
> > -static inline void tick_nohz_idle_exit(void) { }
> > +static inline void tick_nohz_idle_enter(bool rcu_ext_qs)
> > +{
> > +	if (rcu_ext_qs())
> > +		rcu_idle_enter();
> > +}
> 
> rcu_ext_qs is not a function.

Ooooh...  Good catch.  Would you believe that gcc didn't complain?
Or maybe my scripts are missing some gcc complaints.  But I would
expect the following to catch them:

	egrep -q "Stop|Error|error:|warning:|improperly set"

Anything I am missing?

(And yes, I did remove the "()" in both cases.)

> > +static inline void tick_nohz_idle_exit(void)
> > +{
> > +	if (rcu_ext_qs())
> > +		rcu_idle_exit();
> > +}
> 
> So we probably need to track whether we entered in rcu_ext_qs
> so that we can know if we cann rcu_idle_exit(). Or may
> be pass the rcu_ext_qs parameter down to tick_nohz_idle_exit() as well.

Good point.

> >  static inline ktime_t tick_nohz_get_sleep_length(void)
> >  {
> >  	ktime_t len = { .tv64 = NSEC_PER_SEC/HZ };
> > diff --git a/kernel/rcu.h b/kernel/rcu.h
> > index f600868..220b4fe 100644
> > --- a/kernel/rcu.h
> > +++ b/kernel/rcu.h
> > @@ -23,6 +23,8 @@
> >  #ifndef __LINUX_RCU_H
> >  #define __LINUX_RCU_H
> >  
> > +/* Avoid tracing overhead if not configure, mostly for RCU_TINY's benefit. */
> > +
> >  #ifdef CONFIG_RCU_TRACE
> >  #define RCU_TRACE(stmt) stmt
> >  #else /* #ifdef CONFIG_RCU_TRACE */
> <snip>
> > diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
> > index 9e493b9..6d7207d 100644
> > --- a/kernel/rcutiny.c
> > +++ b/kernel/rcutiny.c
> > @@ -65,8 +65,10 @@ static long rcu_dynticks_nesting = 1;
> >   */
> >  void rcu_enter_nohz(void)
> >  {
> > -	if (--rcu_dynticks_nesting == 0)
> > +	if (--rcu_dynticks_nesting == 0) {
> >  		rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
> > +		rcu_idle_enter();
> 
> Although idle and rcu/nohz are still close notions, it sounds
> more logical the other way around in the ordering:
> 
> tick_nohz_idle_enter() {
> 	rcu_idle_enter() {
> 		rcu_enter_nohz();
> 	}
> }
> 
> tick_nohz_irq_exit() {
>         rcu_idle_enter() {
>                 rcu_enter_nohz();
>         }
> }
> 
> Because rcu ext qs is something used by idle, not the opposite.

The problem I have with this is that it is rcu_enter_nohz() that tracks
the irq nesting required to correctly decide whether or not we are going
to really go to idle state.  Furthermore, there are cases where we
do enter idle but do not enter nohz, and that has to be handled correctly
as well.

Now, it is quite possible that I am suffering a senior moment and just
failing to see how to structure this in the design where rcu_idle_enter()
invokes rcu_enter_nohz(), but regardless, I am failing to see how to
structure this so that it works correctly.

Please feel free to enlighten me!

							Thanx, Paul

------------------------------------------------------------------------

rcu: Add rcu_ext_qs argument to tick_nohz_idle_exit()

When the system is built with CONFIG_NO_HZ=n, tick_nohz_idle_exit()
does not have enough information to determine whether or not it should
tell RCU that an idle extended quiescent state has started.  This commit
therefore adds an rcu_ext_qs argument to tick_nohz_idle_exit() to supply
this information.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 51b0e39..e155474 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -208,7 +208,7 @@ void cpu_idle(void)
 			}
 		}
 		leds_event(led_idle_end);
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit(true);
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/avr32/kernel/process.c b/arch/avr32/kernel/process.c
index 5041c84..a563d5f 100644
--- a/arch/avr32/kernel/process.c
+++ b/arch/avr32/kernel/process.c
@@ -37,7 +37,7 @@ void cpu_idle(void)
 		tick_nohz_idle_enter(true);
 		while (!need_resched())
 			cpu_idle_sleep();
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit(true);
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
index f22a0da..c99f57c 100644
--- a/arch/blackfin/kernel/process.c
+++ b/arch/blackfin/kernel/process.c
@@ -91,7 +91,7 @@ void cpu_idle(void)
 		tick_nohz_idle_enter(true);
 		while (!need_resched())
 			idle();
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit(true);
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
index 0f5290f..843fab3 100644
--- a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -106,7 +106,7 @@ void cpu_idle(void)
 		tick_nohz_idle_enter(true);
 		while (!need_resched())
 			idle();
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit(true);
 
 		preempt_enable_no_resched();
 		schedule();
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index 20be814..58269b1 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -77,7 +77,7 @@ void __noreturn cpu_idle(void)
 		     system_state == SYSTEM_BOOTING))
 			play_dead();
 #endif
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit(true);
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index a0e31a7..6c124a4 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -93,7 +93,7 @@ void cpu_idle(void)
 
 		HMT_medium();
 		ppc64_runlatch_on();
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit(true);
 		preempt_enable_no_resched();
 		if (cpu_should_die())
 			cpu_die();
diff --git a/arch/powerpc/platforms/iseries/setup.c b/arch/powerpc/platforms/iseries/setup.c
index f239427..0df97a9 100644
--- a/arch/powerpc/platforms/iseries/setup.c
+++ b/arch/powerpc/platforms/iseries/setup.c
@@ -576,7 +576,7 @@ static void iseries_shared_idle(void)
 		}
 
 		ppc64_runlatch_on();
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit(true);
 
 		if (hvlpevent_is_pending())
 			process_iSeries_events();
@@ -609,7 +609,7 @@ static void iseries_dedicated_idle(void)
 		}
 
 		ppc64_runlatch_on();
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit(true);
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
index 3dbaf59..1763390 100644
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -93,7 +93,7 @@ void cpu_idle(void)
 		tick_nohz_idle_enter(true);
 		while (!need_resched())
 			default_idle();
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit(true);
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c
index bb0a627..c238f42 100644
--- a/arch/sh/kernel/idle.c
+++ b/arch/sh/kernel/idle.c
@@ -109,7 +109,7 @@ void cpu_idle(void)
 			start_critical_timings();
 		}
 
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit(true);
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
index 3c5d363..76b5786 100644
--- a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -100,7 +100,7 @@ void cpu_idle(void)
 		while (!need_resched() && !cpu_is_offline(cpu))
 			sparc64_yield(cpu);
 
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit(true);
 
 		preempt_enable_no_resched();
 
diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index 727dc85..bdcdafe 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -105,7 +105,7 @@ void cpu_idle(void)
 				local_irq_enable();
 			current_thread_info()->status |= TS_POLLING;
 		}
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit(true);
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 5693d6d..72d2ffe 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -248,7 +248,7 @@ void default_idle(void)
 		tick_nohz_idle_enter(true);
 		nsecs = disable_timer();
 		idle_sleep(nsecs);
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit(true);
 	}
 }
 
diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c
index afa50d9..371b935 100644
--- a/arch/unicore32/kernel/process.c
+++ b/arch/unicore32/kernel/process.c
@@ -63,7 +63,7 @@ void cpu_idle(void)
 			local_irq_enable();
 			start_critical_timings();
 		}
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit(true);
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 8c2faa9..9e557d9 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -112,7 +112,7 @@ void cpu_idle(void)
 			pm_idle();
 			start_critical_timings();
 		}
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit(true);
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index dee2e6c..93e1b09 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -150,7 +150,7 @@ void cpu_idle(void)
 			__exit_idle();
 		}
 
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit(false);
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/include/linux/tick.h b/include/linux/tick.h
index cd9e2d1..ec481bd 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -125,7 +125,7 @@ static inline int tick_oneshot_mode_active(void) { return 0; }
 
 # ifdef CONFIG_NO_HZ
 extern void tick_nohz_idle_enter(bool rcu_ext_qs);
-extern void tick_nohz_idle_exit(void);
+extern void tick_nohz_idle_exit(bool rcu_ext_qs);
 extern void tick_nohz_irq_exit(void);
 extern ktime_t tick_nohz_get_sleep_length(void);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
@@ -133,12 +133,12 @@ extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
 # else
 static inline void tick_nohz_idle_enter(bool rcu_ext_qs)
 {
-	if (rcu_ext_qs())
+	if (rcu_ext_qs)
 		rcu_idle_enter();
 }
-static inline void tick_nohz_idle_exit(void)
+static inline void tick_nohz_idle_exit(bool rcu_ext_qs)
 {
-	if (rcu_ext_qs())
+	if (rcu_ext_qs)
 		rcu_idle_exit();
 }
 static inline ktime_t tick_nohz_get_sleep_length(void)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index cd1a54e..914a0bd 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -535,7 +535,7 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
  *
  * Restart the idle tick when the CPU is woken up from idle
  */
-void tick_nohz_idle_exit(void)
+void tick_nohz_idle_exit(bool rcu_ext_qs)
 {
 	int cpu = smp_processor_id();
 	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
@@ -559,7 +559,7 @@ void tick_nohz_idle_exit(void)
 
 	ts->inidle = 0;
 
-	if (ts->rcu_ext_qs) {
+	if (rcu_ext_qs) {
 		rcu_exit_nohz();
 		ts->rcu_ext_qs = 0;
 	}


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-28 18:40                         ` Paul E. McKenney
@ 2011-09-28 23:46                           ` Frederic Weisbecker
  2011-09-29  0:55                             ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2011-09-28 23:46 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Wed, Sep 28, 2011 at 11:40:25AM -0700, Paul E. McKenney wrote:
> On Wed, Sep 28, 2011 at 02:31:21PM +0200, Frederic Weisbecker wrote:
> > On Tue, Sep 27, 2011 at 11:01:42AM -0700, Paul E. McKenney wrote:
> > > On Tue, Sep 27, 2011 at 02:16:50PM +0200, Frederic Weisbecker wrote:
> 
> [ . . . ]
> 
> > > > > But all of this stuff looks to me to be called from the context
> > > > > of the idle task, so that idle_cpu() will always return "true"...
> > > > 
> > > > I meant "idle_cpu() && !in_interrupt()" that should return false in
> > > > rcu_read_lock_sched_held().
> > > 
> > > The problem is that the idle tasks now seem to make quite a bit of use
> > > of RCU on entry to and exit from the idle loop itself, for example,
> > > via tracing.  So it seems like it is time to have the idle loop
> > > explictly tell RCU when the idle extended quiescent state is in effect.
> > > 
> > > An experimental patch along these lines is included below.  Does this
> > > approach seem reasonable, or am I missing something subtle (or even
> > > not so subtle) here?
> > > 
> > > 							Thanx, Paul
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > > rcu: Explicitly track idle CPUs.
> > > 
> > > In the good old days, RCU simply checked to see if it was running in
> > > the context of an idle task to determine whether or not it was in the
> > > idle extended quiescent state.  However, the entry to and exit from
> > > idle has become more ornate over the years, and some of this processing
> > > now uses RCU while running in the context of the idle task.  It is
> > > therefore no longer reasonable to assume that anything running in the
> > > context of one of the idle tasks is in an extended quiscent state.
> > > 
> > > This commit therefore explicitly tracks whether each CPU is in the
> > > idle loop, allowing the idle task to use RCU anywhere except in those
> > > portions of the idle loops where RCU has been explicitly informed that
> > > it is in a quiescent state.
> > > 
> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > I fear we indeed need that now.
> 
> And we probably need to factor this patch stack.  Given the number
> of warnings and errors due to RCU's confusion about what means "idle",
> we simply are not bisectable as is.

Not sure what you mean. You want to split that specific patch or
others?

> > > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > > index 375e7d8..cd9e2d1 100644
> > > --- a/include/linux/tick.h
> > > +++ b/include/linux/tick.h
> > > @@ -131,8 +131,16 @@ extern ktime_t tick_nohz_get_sleep_length(void);
> > >  extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
> > >  extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
> > >  # else
> > > -static inline void tick_nohz_idle_enter(bool rcu_ext_qs) { }
> > > -static inline void tick_nohz_idle_exit(void) { }
> > > +static inline void tick_nohz_idle_enter(bool rcu_ext_qs)
> > > +{
> > > +	if (rcu_ext_qs())
> > > +		rcu_idle_enter();
> > > +}
> > 
> > rcu_ext_qs is not a function.
> 
> Ooooh...  Good catch.  Would you believe that gcc didn't complain?
> Or maybe my scripts are missing some gcc complaints.  But I would
> expect the following to catch them:
> 
> 	egrep -q "Stop|Error|error:|warning:|improperly set"
> 
> Anything I am missing?

No idea :)

> > Although idle and rcu/nohz are still close notions, it sounds
> > more logical the other way around in the ordering:
> > 
> > tick_nohz_idle_enter() {
> > 	rcu_idle_enter() {
> > 		rcu_enter_nohz();
> > 	}
> > }
> > 
> > tick_nohz_irq_exit() {
> >         rcu_idle_enter() {
> >                 rcu_enter_nohz();
> >         }
> > }
> > 
> > Because rcu ext qs is something used by idle, not the opposite.
> 
> The problem I have with this is that it is rcu_enter_nohz() that tracks
> the irq nesting required to correctly decide whether or not we are going
> to really go to idle state.  Furthermore, there are cases where we
> do enter idle but do not enter nohz, and that has to be handled correctly
> as well.
> 
> Now, it is quite possible that I am suffering a senior moment and just
> failing to see how to structure this in the design where rcu_idle_enter()
> invokes rcu_enter_nohz(), but regardless, I am failing to see how to
> structure this so that it works correctly.
> 
> Please feel free to enlighten me!

Ah I realize that you want to call rcu_idle_exit() when we enter
the first level interrupt and rcu_idle_enter() when we exit it
to return to idle loop.

But we use that check:

	if (user ||
	    (rcu_is_cpu_idle() &&
 	     !in_softirq() &&
 	     hardirq_count() <= (1 << HARDIRQ_SHIFT)))
 		rcu_sched_qs(cpu);

So we ensure that by the time we call rcu_check_callbacks(), we are not nesting
in another interrupt.

That said we found RCU uses after we decrement the hardirq offset and until
we reach rcu_irq_exit(). So rcu_check_callbacks() may miss these places
and account spurious quiescent states.

But between sub_preempt_count() and rcu_irq_exit(), irqs are disabled
AFAIK so we can't be interrupted by rcu_check_callbacks(), except during the
softirqs processing. But we have that ordering:

add_preempt_count(SOTFIRQ_OFFSET)
local_irq_enable()

do softirqs

local_irq_disable()
sub_preempt_count(SOTFIRQ_OFFSET)

So the !in_softirq() check covers us during the time we process softirqs.

The only assumption we need is that there is no place between
sub_preempt_count(IRQ_EXIT_OFFSET) and rcu_irq_ext() that has
irqs enabled and that is an rcu read side critical section.

I'm not aware of any automatic check to ensure that though.

Anyway, the delta patch looks good. Just a little thing:

> -void tick_nohz_idle_exit(void)
> +void tick_nohz_idle_exit(bool rcu_ext_qs)

It becomes weird to have both idle_enter/idle_exit having
that parameter.

Would it make sense to have tick_nohz_idle_[exit|enter]_norcu()
and a version without norcu?

>  {
>  	int cpu = smp_processor_id();
>  	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
> @@ -559,7 +559,7 @@ void tick_nohz_idle_exit(void)
>  
>  	ts->inidle = 0;
>  
> -	if (ts->rcu_ext_qs) {
> +	if (rcu_ext_qs) {
>  		rcu_exit_nohz();
>  		ts->rcu_ext_qs = 0;
>  	}

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-28 23:46                           ` Frederic Weisbecker
@ 2011-09-29  0:55                             ` Paul E. McKenney
  2011-09-29  4:49                               ` Paul E. McKenney
  2011-09-29 12:30                               ` Frederic Weisbecker
  0 siblings, 2 replies; 57+ messages in thread
From: Paul E. McKenney @ 2011-09-29  0:55 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Thu, Sep 29, 2011 at 01:46:36AM +0200, Frederic Weisbecker wrote:
> On Wed, Sep 28, 2011 at 11:40:25AM -0700, Paul E. McKenney wrote:
> > On Wed, Sep 28, 2011 at 02:31:21PM +0200, Frederic Weisbecker wrote:
> > > On Tue, Sep 27, 2011 at 11:01:42AM -0700, Paul E. McKenney wrote:
> > > > On Tue, Sep 27, 2011 at 02:16:50PM +0200, Frederic Weisbecker wrote:
> > 
> > [ . . . ]
> > 
> > > > > > But all of this stuff looks to me to be called from the context
> > > > > > of the idle task, so that idle_cpu() will always return "true"...
> > > > > 
> > > > > I meant "idle_cpu() && !in_interrupt()" that should return false in
> > > > > rcu_read_lock_sched_held().
> > > > 
> > > > The problem is that the idle tasks now seem to make quite a bit of use
> > > > of RCU on entry to and exit from the idle loop itself, for example,
> > > > via tracing.  So it seems like it is time to have the idle loop
> > > > explictly tell RCU when the idle extended quiescent state is in effect.
> > > > 
> > > > An experimental patch along these lines is included below.  Does this
> > > > approach seem reasonable, or am I missing something subtle (or even
> > > > not so subtle) here?
> > > > 
> > > > 							Thanx, Paul
> > > > 
> > > > ------------------------------------------------------------------------
> > > > 
> > > > rcu: Explicitly track idle CPUs.
> > > > 
> > > > In the good old days, RCU simply checked to see if it was running in
> > > > the context of an idle task to determine whether or not it was in the
> > > > idle extended quiescent state.  However, the entry to and exit from
> > > > idle has become more ornate over the years, and some of this processing
> > > > now uses RCU while running in the context of the idle task.  It is
> > > > therefore no longer reasonable to assume that anything running in the
> > > > context of one of the idle tasks is in an extended quiscent state.
> > > > 
> > > > This commit therefore explicitly tracks whether each CPU is in the
> > > > idle loop, allowing the idle task to use RCU anywhere except in those
> > > > portions of the idle loops where RCU has been explicitly informed that
> > > > it is in a quiescent state.
> > > > 
> > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > 
> > > I fear we indeed need that now.
> > 
> > And we probably need to factor this patch stack.  Given the number
> > of warnings and errors due to RCU's confusion about what means "idle",
> > we simply are not bisectable as is.
> 
> Not sure what you mean. You want to split that specific patch or
> others?

It looks to me that having my pair of patches on top of yours is
really ugly.  If we are going to introduce the per-CPU idle variable,
we should make a patch stack that uses that from the start.  This allows
me to bisect to track down the failures I am seeing on Power.

If you are too busy, I can take this on, but we might get better results
if you did it.  (And I certainly cannot complain about the large amount
of time and energy that you have put into this -- plus the reduction in
OS jitter will be really cool to have!)

> > > > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > > > index 375e7d8..cd9e2d1 100644
> > > > --- a/include/linux/tick.h
> > > > +++ b/include/linux/tick.h
> > > > @@ -131,8 +131,16 @@ extern ktime_t tick_nohz_get_sleep_length(void);
> > > >  extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
> > > >  extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
> > > >  # else
> > > > -static inline void tick_nohz_idle_enter(bool rcu_ext_qs) { }
> > > > -static inline void tick_nohz_idle_exit(void) { }
> > > > +static inline void tick_nohz_idle_enter(bool rcu_ext_qs)
> > > > +{
> > > > +	if (rcu_ext_qs())
> > > > +		rcu_idle_enter();
> > > > +}
> > > 
> > > rcu_ext_qs is not a function.
> > 
> > Ooooh...  Good catch.  Would you believe that gcc didn't complain?
> > Or maybe my scripts are missing some gcc complaints.  But I would
> > expect the following to catch them:
> > 
> > 	egrep -q "Stop|Error|error:|warning:|improperly set"
> > 
> > Anything I am missing?
> 
> No idea :)
> 
> > > Although idle and rcu/nohz are still close notions, it sounds
> > > more logical the other way around in the ordering:
> > > 
> > > tick_nohz_idle_enter() {
> > > 	rcu_idle_enter() {
> > > 		rcu_enter_nohz();
> > > 	}
> > > }
> > > 
> > > tick_nohz_irq_exit() {
> > >         rcu_idle_enter() {
> > >                 rcu_enter_nohz();
> > >         }
> > > }
> > > 
> > > Because rcu ext qs is something used by idle, not the opposite.

Re-reading this makes me realize that I would instead say that idle
is an example of an RCU extended quiescent state, or that the rcu_ext_qs
argument to the various functions is used to indicate whether or not
we are immediately entering/leaving idle from RCU's viewpoint.

So what were you really trying to say here?  ;-)

> > The problem I have with this is that it is rcu_enter_nohz() that tracks
> > the irq nesting required to correctly decide whether or not we are going
> > to really go to idle state.  Furthermore, there are cases where we
> > do enter idle but do not enter nohz, and that has to be handled correctly
> > as well.
> > 
> > Now, it is quite possible that I am suffering a senior moment and just
> > failing to see how to structure this in the design where rcu_idle_enter()
> > invokes rcu_enter_nohz(), but regardless, I am failing to see how to
> > structure this so that it works correctly.
> > 
> > Please feel free to enlighten me!
> 
> Ah I realize that you want to call rcu_idle_exit() when we enter
> the first level interrupt and rcu_idle_enter() when we exit it
> to return to idle loop.
> 
> But we use that check:
> 
> 	if (user ||
> 	    (rcu_is_cpu_idle() &&
>  	     !in_softirq() &&
>  	     hardirq_count() <= (1 << HARDIRQ_SHIFT)))
>  		rcu_sched_qs(cpu);
> 
> So we ensure that by the time we call rcu_check_callbacks(), we are not nesting
> in another interrupt.

But I would like to enable checks for entering/exiting idle while
within an RCU read-side critical section.  The idea is to move
the checks from their currently somewhat problematic location in
rcu_needs_cpu_quick_check() to somewhere more sensible.  My current
thought is to move them rcu_enter_nohz() and rcu_exit_nohz() near the
calls to rcu_idle_enter() and rcu_idle_exit(), respectively.

This would mean that they operated only in NO_HZ kernels with lockdep
enabled, but I am good with that because to do otherwise would require
adding nesting-level counters to the non-NO_HZ case, which I would like
to avoid, expecially for TINY_RCU.

> That said we found RCU uses after we decrement the hardirq offset and until
> we reach rcu_irq_exit(). So rcu_check_callbacks() may miss these places
> and account spurious quiescent states.
> 
> But between sub_preempt_count() and rcu_irq_exit(), irqs are disabled
> AFAIK so we can't be interrupted by rcu_check_callbacks(), except during the
> softirqs processing. But we have that ordering:
> 
> add_preempt_count(SOTFIRQ_OFFSET)
> local_irq_enable()
> 
> do softirqs
> 
> local_irq_disable()
> sub_preempt_count(SOTFIRQ_OFFSET)
> 
> So the !in_softirq() check covers us during the time we process softirqs.
> 
> The only assumption we need is that there is no place between
> sub_preempt_count(IRQ_EXIT_OFFSET) and rcu_irq_ext() that has
> irqs enabled and that is an rcu read side critical section.
> 
> I'm not aware of any automatic check to ensure that though.

Nor am I, which is why I am looking to the checks in
rcu_enter_nohz() and rcu_exit_nohz() called out above.

> Anyway, the delta patch looks good.

OK, my current plans are to start forward-porting to -rc8, and I would
like to have this pair of delta patches or something like them pulled
into your stack.

>                                     Just a little thing:
> 
> > -void tick_nohz_idle_exit(void)
> > +void tick_nohz_idle_exit(bool rcu_ext_qs)
> 
> It becomes weird to have both idle_enter/idle_exit having
> that parameter.
> 
> Would it make sense to have tick_nohz_idle_[exit|enter]_norcu()
> and a version without norcu?

Given that we need to make this work in CONFIG_NO_HZ=n kernels, I believe
that the current API is OK.  But if you would like to change the API
during the forward-port to -rc8, I am also OK with the alternative API
you suggest.

							Thanx, Paul

> >  {
> >  	int cpu = smp_processor_id();
> >  	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
> > @@ -559,7 +559,7 @@ void tick_nohz_idle_exit(void)
> >  
> >  	ts->inidle = 0;
> >  
> > -	if (ts->rcu_ext_qs) {
> > +	if (rcu_ext_qs) {
> >  		rcu_exit_nohz();
> >  		ts->rcu_ext_qs = 0;
> >  	}

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-29  0:55                             ` Paul E. McKenney
@ 2011-09-29  4:49                               ` Paul E. McKenney
  2011-09-29 12:30                               ` Frederic Weisbecker
  1 sibling, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2011-09-29  4:49 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Wed, Sep 28, 2011 at 05:55:45PM -0700, Paul E. McKenney wrote:
> On Thu, Sep 29, 2011 at 01:46:36AM +0200, Frederic Weisbecker wrote:
> > On Wed, Sep 28, 2011 at 11:40:25AM -0700, Paul E. McKenney wrote:
> > > On Wed, Sep 28, 2011 at 02:31:21PM +0200, Frederic Weisbecker wrote:
> > > > On Tue, Sep 27, 2011 at 11:01:42AM -0700, Paul E. McKenney wrote:
> > > > > On Tue, Sep 27, 2011 at 02:16:50PM +0200, Frederic Weisbecker wrote:
> > > 
> > > [ . . . ]
> > > 
> > > > > > > But all of this stuff looks to me to be called from the context
> > > > > > > of the idle task, so that idle_cpu() will always return "true"...
> > > > > > 
> > > > > > I meant "idle_cpu() && !in_interrupt()" that should return false in
> > > > > > rcu_read_lock_sched_held().
> > > > > 
> > > > > The problem is that the idle tasks now seem to make quite a bit of use
> > > > > of RCU on entry to and exit from the idle loop itself, for example,
> > > > > via tracing.  So it seems like it is time to have the idle loop
> > > > > explictly tell RCU when the idle extended quiescent state is in effect.
> > > > > 
> > > > > An experimental patch along these lines is included below.  Does this
> > > > > approach seem reasonable, or am I missing something subtle (or even
> > > > > not so subtle) here?
> > > > > 
> > > > > 							Thanx, Paul
> > > > > 
> > > > > ------------------------------------------------------------------------
> > > > > 
> > > > > rcu: Explicitly track idle CPUs.
> > > > > 
> > > > > In the good old days, RCU simply checked to see if it was running in
> > > > > the context of an idle task to determine whether or not it was in the
> > > > > idle extended quiescent state.  However, the entry to and exit from
> > > > > idle has become more ornate over the years, and some of this processing
> > > > > now uses RCU while running in the context of the idle task.  It is
> > > > > therefore no longer reasonable to assume that anything running in the
> > > > > context of one of the idle tasks is in an extended quiscent state.
> > > > > 
> > > > > This commit therefore explicitly tracks whether each CPU is in the
> > > > > idle loop, allowing the idle task to use RCU anywhere except in those
> > > > > portions of the idle loops where RCU has been explicitly informed that
> > > > > it is in a quiescent state.
> > > > > 
> > > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > 
> > > > I fear we indeed need that now.
> > > 
> > > And we probably need to factor this patch stack.  Given the number
> > > of warnings and errors due to RCU's confusion about what means "idle",
> > > we simply are not bisectable as is.
> > 
> > Not sure what you mean. You want to split that specific patch or
> > others?
> 
> It looks to me that having my pair of patches on top of yours is
> really ugly.  If we are going to introduce the per-CPU idle variable,
> we should make a patch stack that uses that from the start.  This allows
> me to bisect to track down the failures I am seeing on Power.

And I should hasten to add that I am not blaming you for these problems.
You have been finding at least as many problems in other code than in
your own, in some cases making other problems easier to reproduce.  But
either way, they need to be fixed to make the upcoming merge window.

							Thanx, Paul

> If you are too busy, I can take this on, but we might get better results
> if you did it.  (And I certainly cannot complain about the large amount
> of time and energy that you have put into this -- plus the reduction in
> OS jitter will be really cool to have!)
> 
> > > > > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > > > > index 375e7d8..cd9e2d1 100644
> > > > > --- a/include/linux/tick.h
> > > > > +++ b/include/linux/tick.h
> > > > > @@ -131,8 +131,16 @@ extern ktime_t tick_nohz_get_sleep_length(void);
> > > > >  extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
> > > > >  extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
> > > > >  # else
> > > > > -static inline void tick_nohz_idle_enter(bool rcu_ext_qs) { }
> > > > > -static inline void tick_nohz_idle_exit(void) { }
> > > > > +static inline void tick_nohz_idle_enter(bool rcu_ext_qs)
> > > > > +{
> > > > > +	if (rcu_ext_qs())
> > > > > +		rcu_idle_enter();
> > > > > +}
> > > > 
> > > > rcu_ext_qs is not a function.
> > > 
> > > Ooooh...  Good catch.  Would you believe that gcc didn't complain?
> > > Or maybe my scripts are missing some gcc complaints.  But I would
> > > expect the following to catch them:
> > > 
> > > 	egrep -q "Stop|Error|error:|warning:|improperly set"
> > > 
> > > Anything I am missing?
> > 
> > No idea :)
> > 
> > > > Although idle and rcu/nohz are still close notions, it sounds
> > > > more logical the other way around in the ordering:
> > > > 
> > > > tick_nohz_idle_enter() {
> > > > 	rcu_idle_enter() {
> > > > 		rcu_enter_nohz();
> > > > 	}
> > > > }
> > > > 
> > > > tick_nohz_irq_exit() {
> > > >         rcu_idle_enter() {
> > > >                 rcu_enter_nohz();
> > > >         }
> > > > }
> > > > 
> > > > Because rcu ext qs is something used by idle, not the opposite.
> 
> Re-reading this makes me realize that I would instead say that idle
> is an example of an RCU extended quiescent state, or that the rcu_ext_qs
> argument to the various functions is used to indicate whether or not
> we are immediately entering/leaving idle from RCU's viewpoint.
> 
> So what were you really trying to say here?  ;-)
> 
> > > The problem I have with this is that it is rcu_enter_nohz() that tracks
> > > the irq nesting required to correctly decide whether or not we are going
> > > to really go to idle state.  Furthermore, there are cases where we
> > > do enter idle but do not enter nohz, and that has to be handled correctly
> > > as well.
> > > 
> > > Now, it is quite possible that I am suffering a senior moment and just
> > > failing to see how to structure this in the design where rcu_idle_enter()
> > > invokes rcu_enter_nohz(), but regardless, I am failing to see how to
> > > structure this so that it works correctly.
> > > 
> > > Please feel free to enlighten me!
> > 
> > Ah I realize that you want to call rcu_idle_exit() when we enter
> > the first level interrupt and rcu_idle_enter() when we exit it
> > to return to idle loop.
> > 
> > But we use that check:
> > 
> > 	if (user ||
> > 	    (rcu_is_cpu_idle() &&
> >  	     !in_softirq() &&
> >  	     hardirq_count() <= (1 << HARDIRQ_SHIFT)))
> >  		rcu_sched_qs(cpu);
> > 
> > So we ensure that by the time we call rcu_check_callbacks(), we are not nesting
> > in another interrupt.
> 
> But I would like to enable checks for entering/exiting idle while
> within an RCU read-side critical section.  The idea is to move
> the checks from their currently somewhat problematic location in
> rcu_needs_cpu_quick_check() to somewhere more sensible.  My current
> thought is to move them rcu_enter_nohz() and rcu_exit_nohz() near the
> calls to rcu_idle_enter() and rcu_idle_exit(), respectively.
> 
> This would mean that they operated only in NO_HZ kernels with lockdep
> enabled, but I am good with that because to do otherwise would require
> adding nesting-level counters to the non-NO_HZ case, which I would like
> to avoid, expecially for TINY_RCU.
> 
> > That said we found RCU uses after we decrement the hardirq offset and until
> > we reach rcu_irq_exit(). So rcu_check_callbacks() may miss these places
> > and account spurious quiescent states.
> > 
> > But between sub_preempt_count() and rcu_irq_exit(), irqs are disabled
> > AFAIK so we can't be interrupted by rcu_check_callbacks(), except during the
> > softirqs processing. But we have that ordering:
> > 
> > add_preempt_count(SOTFIRQ_OFFSET)
> > local_irq_enable()
> > 
> > do softirqs
> > 
> > local_irq_disable()
> > sub_preempt_count(SOTFIRQ_OFFSET)
> > 
> > So the !in_softirq() check covers us during the time we process softirqs.
> > 
> > The only assumption we need is that there is no place between
> > sub_preempt_count(IRQ_EXIT_OFFSET) and rcu_irq_ext() that has
> > irqs enabled and that is an rcu read side critical section.
> > 
> > I'm not aware of any automatic check to ensure that though.
> 
> Nor am I, which is why I am looking to the checks in
> rcu_enter_nohz() and rcu_exit_nohz() called out above.
> 
> > Anyway, the delta patch looks good.
> 
> OK, my current plans are to start forward-porting to -rc8, and I would
> like to have this pair of delta patches or something like them pulled
> into your stack.
> 
> >                                     Just a little thing:
> > 
> > > -void tick_nohz_idle_exit(void)
> > > +void tick_nohz_idle_exit(bool rcu_ext_qs)
> > 
> > It becomes weird to have both idle_enter/idle_exit having
> > that parameter.
> > 
> > Would it make sense to have tick_nohz_idle_[exit|enter]_norcu()
> > and a version without norcu?
> 
> Given that we need to make this work in CONFIG_NO_HZ=n kernels, I believe
> that the current API is OK.  But if you would like to change the API
> during the forward-port to -rc8, I am also OK with the alternative API
> you suggest.
> 
> 							Thanx, Paul
> 
> > >  {
> > >  	int cpu = smp_processor_id();
> > >  	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
> > > @@ -559,7 +559,7 @@ void tick_nohz_idle_exit(void)
> > >  
> > >  	ts->inidle = 0;
> > >  
> > > -	if (ts->rcu_ext_qs) {
> > > +	if (rcu_ext_qs) {
> > >  		rcu_exit_nohz();
> > >  		ts->rcu_ext_qs = 0;
> > >  	}


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-29  0:55                             ` Paul E. McKenney
  2011-09-29  4:49                               ` Paul E. McKenney
@ 2011-09-29 12:30                               ` Frederic Weisbecker
  2011-09-29 17:12                                 ` Paul E. McKenney
  1 sibling, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2011-09-29 12:30 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Wed, Sep 28, 2011 at 05:55:45PM -0700, Paul E. McKenney wrote:
> On Thu, Sep 29, 2011 at 01:46:36AM +0200, Frederic Weisbecker wrote:
> > Not sure what you mean. You want to split that specific patch or
> > others?
> 
> It looks to me that having my pair of patches on top of yours is
> really ugly.  If we are going to introduce the per-CPU idle variable,
> we should make a patch stack that uses that from the start.  This allows
> me to bisect to track down the failures I am seeing on Power.

Yeah right. My patches fix the use on extended qs in idle. But if
idle itself is considered as a quiescent state all along, that's about
useless. So it sounds indeed better in that order.

> If you are too busy, I can take this on, but we might get better results
> if you did it.  (And I certainly cannot complain about the large amount
> of time and energy that you have put into this -- plus the reduction in
> OS jitter will be really cool to have!)

No problem, I can take it.

> > > > Although idle and rcu/nohz are still close notions, it sounds
> > > > more logical the other way around in the ordering:
> > > > 
> > > > tick_nohz_idle_enter() {
> > > > 	rcu_idle_enter() {
> > > > 		rcu_enter_nohz();
> > > > 	}
> > > > }
> > > > 
> > > > tick_nohz_irq_exit() {
> > > >         rcu_idle_enter() {
> > > >                 rcu_enter_nohz();
> > > >         }
> > > > }
> > > > 
> > > > Because rcu ext qs is something used by idle, not the opposite.
> 
> Re-reading this makes me realize that I would instead say that idle
> is an example of an RCU extended quiescent state, or that the rcu_ext_qs
> argument to the various functions is used to indicate whether or not
> we are immediately entering/leaving idle from RCU's viewpoint.
> 
> So what were you really trying to say here?  ;-)

I was thinking about the fact that idle is a caller of rcu_enter_nohz().
And there may be more callers of it in the future. So I thought it may
be better to keep rcu_enter_nohz() idle-agnostic.

But it's fine, there are other ways to call rcu_idle_enter()/rcu_idle_exit()
from the right places other than from rcu_enter/exit_nohz().
We have tick_check_idle() on irq entry and tick_nohz_irq_exit(), both are called
on the first interrupt level in idle.

So I can change that easily for the nohz cpusets.

> > > The problem I have with this is that it is rcu_enter_nohz() that tracks
> > > the irq nesting required to correctly decide whether or not we are going
> > > to really go to idle state.  Furthermore, there are cases where we
> > > do enter idle but do not enter nohz, and that has to be handled correctly
> > > as well.
> > > 
> > > Now, it is quite possible that I am suffering a senior moment and just
> > > failing to see how to structure this in the design where rcu_idle_enter()
> > > invokes rcu_enter_nohz(), but regardless, I am failing to see how to
> > > structure this so that it works correctly.
> > > 
> > > Please feel free to enlighten me!
> > 
> > Ah I realize that you want to call rcu_idle_exit() when we enter
> > the first level interrupt and rcu_idle_enter() when we exit it
> > to return to idle loop.
> > 
> > But we use that check:
> > 
> > 	if (user ||
> > 	    (rcu_is_cpu_idle() &&
> >  	     !in_softirq() &&
> >  	     hardirq_count() <= (1 << HARDIRQ_SHIFT)))
> >  		rcu_sched_qs(cpu);
> > 
> > So we ensure that by the time we call rcu_check_callbacks(), we are not nesting
> > in another interrupt.
> 
> But I would like to enable checks for entering/exiting idle while
> within an RCU read-side critical section. The idea is to move
> the checks from their currently somewhat problematic location in
> rcu_needs_cpu_quick_check() to somewhere more sensible.  My current
> thought is to move them rcu_enter_nohz() and rcu_exit_nohz() near the
> calls to rcu_idle_enter() and rcu_idle_exit(), respectively.

So, checking if we are calling rcu_idle_enter() while in an RCU
read side critical section?

But we already have checks that RCU read side API are not called in
extended quiescent state.

> This would mean that they operated only in NO_HZ kernels with lockdep
> enabled, but I am good with that because to do otherwise would require
> adding nesting-level counters to the non-NO_HZ case, which I would like
> to avoid, expecially for TINY_RCU.

There can be a secondary check in rcu_read_lock_held() and friends to
ensures that rcu_is_idle_cpu(). In the non-NO_HZ case it's useful to
find similar issues.

In fact we could remove the check for rcu_extended_qs() in read side
APIs and check instead rcu_is_idle_cpu(). That would work in any
config and not only NO_HZ.

But I hope we can actually keep the check for RCU extended quiescent
state so that when rcu_enter_nohz() is called from other places than
idle, we are ready for it.

I believe it's fine to have both checks in PROVE_RCU.

> 
> > That said we found RCU uses after we decrement the hardirq offset and until
> > we reach rcu_irq_exit(). So rcu_check_callbacks() may miss these places
> > and account spurious quiescent states.
> > 
> > But between sub_preempt_count() and rcu_irq_exit(), irqs are disabled
> > AFAIK so we can't be interrupted by rcu_check_callbacks(), except during the
> > softirqs processing. But we have that ordering:
> > 
> > add_preempt_count(SOTFIRQ_OFFSET)
> > local_irq_enable()
> > 
> > do softirqs
> > 
> > local_irq_disable()
> > sub_preempt_count(SOTFIRQ_OFFSET)
> > 
> > So the !in_softirq() check covers us during the time we process softirqs.
> > 
> > The only assumption we need is that there is no place between
> > sub_preempt_count(IRQ_EXIT_OFFSET) and rcu_irq_ext() that has
> > irqs enabled and that is an rcu read side critical section.
> > 
> > I'm not aware of any automatic check to ensure that though.
> 
> Nor am I, which is why I am looking to the checks in
> rcu_enter_nohz() and rcu_exit_nohz() called out above.

Yep.

> > Anyway, the delta patch looks good.
> 
> OK, my current plans are to start forward-porting to -rc8, and I would
> like to have this pair of delta patches or something like them pulled
> into your stack.

Sure I can take your patches (I'm going to merge the delta into the first).
But if you want a rebase against -rc8, it's going to be easier if you
do that rebase on the branch you want me to work on. Then I work on top
of it.

For example we can take your rcu/dynticks, rewind to
"rcu: Make synchronize_sched_expedited() better at work sharing"
771c326f20029a9f30b9a58237c9a5d5ddc1763d, rebase on top of -rc8
and I rebase my patches (yours included) on top of it and I repost.

Right?

> >                                     Just a little thing:
> > 
> > > -void tick_nohz_idle_exit(void)
> > > +void tick_nohz_idle_exit(bool rcu_ext_qs)
> > 
> > It becomes weird to have both idle_enter/idle_exit having
> > that parameter.
> > 
> > Would it make sense to have tick_nohz_idle_[exit|enter]_norcu()
> > and a version without norcu?
> 
> Given that we need to make this work in CONFIG_NO_HZ=n kernels, I believe
> that the current API is OK.  But if you would like to change the API
> during the forward-port to -rc8, I am also OK with the alternative API
> you suggest.

Fine. I'll do that rename.

Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-29 12:30                               ` Frederic Weisbecker
@ 2011-09-29 17:12                                 ` Paul E. McKenney
  2011-09-29 17:19                                   ` Paul E. McKenney
  2011-09-30 13:11                                   ` Frederic Weisbecker
  0 siblings, 2 replies; 57+ messages in thread
From: Paul E. McKenney @ 2011-09-29 17:12 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Thu, Sep 29, 2011 at 02:30:44PM +0200, Frederic Weisbecker wrote:
> On Wed, Sep 28, 2011 at 05:55:45PM -0700, Paul E. McKenney wrote:
> > On Thu, Sep 29, 2011 at 01:46:36AM +0200, Frederic Weisbecker wrote:
> > > Not sure what you mean. You want to split that specific patch or
> > > others?
> > 
> > It looks to me that having my pair of patches on top of yours is
> > really ugly.  If we are going to introduce the per-CPU idle variable,
> > we should make a patch stack that uses that from the start.  This allows
> > me to bisect to track down the failures I am seeing on Power.
> 
> Yeah right. My patches fix the use on extended qs in idle. But if
> idle itself is considered as a quiescent state all along, that's about
> useless. So it sounds indeed better in that order.
> 
> > If you are too busy, I can take this on, but we might get better results
> > if you did it.  (And I certainly cannot complain about the large amount
> > of time and energy that you have put into this -- plus the reduction in
> > OS jitter will be really cool to have!)
> 
> No problem, I can take it.

Very good!  Rebasing and testing going well thus far.

> > > > > Although idle and rcu/nohz are still close notions, it sounds
> > > > > more logical the other way around in the ordering:
> > > > > 
> > > > > tick_nohz_idle_enter() {
> > > > > 	rcu_idle_enter() {
> > > > > 		rcu_enter_nohz();
> > > > > 	}
> > > > > }
> > > > > 
> > > > > tick_nohz_irq_exit() {
> > > > >         rcu_idle_enter() {
> > > > >                 rcu_enter_nohz();
> > > > >         }
> > > > > }
> > > > > 
> > > > > Because rcu ext qs is something used by idle, not the opposite.
> > 
> > Re-reading this makes me realize that I would instead say that idle
> > is an example of an RCU extended quiescent state, or that the rcu_ext_qs
> > argument to the various functions is used to indicate whether or not
> > we are immediately entering/leaving idle from RCU's viewpoint.
> > 
> > So what were you really trying to say here?  ;-)
> 
> I was thinking about the fact that idle is a caller of rcu_enter_nohz().
> And there may be more callers of it in the future. So I thought it may
> be better to keep rcu_enter_nohz() idle-agnostic.
> 
> But it's fine, there are other ways to call rcu_idle_enter()/rcu_idle_exit()
> from the right places other than from rcu_enter/exit_nohz().
> We have tick_check_idle() on irq entry and tick_nohz_irq_exit(), both are called
> on the first interrupt level in idle.
> 
> So I can change that easily for the nohz cpusets.

Heh!  From what I can see, we were both wrong!

My thought at this point is to make it so that rcu_enter_nohz() and
rcu_exit_nohz() are renamed to rcu_enter_idle() and rcu_exit_idle()
respectively.  I drop the per-CPU variable and the added functions
from one of my patches.  These functions, along with rcu_irq_enter(),
rcu_irq_exit(), rcu_nmi_enter(), and rcu_nmi_exit(), are moved out from
under CONFIG_NO_HZ.  This allows these functions to track idle state
regardless of the setting of CONFIG_NO_HZ.  It also separates the state
of the scheduling-clock tick from RCU's view of CPU idleness, which
simplifies things.

I will put something together along these lines.

> > > > The problem I have with this is that it is rcu_enter_nohz() that tracks
> > > > the irq nesting required to correctly decide whether or not we are going
> > > > to really go to idle state.  Furthermore, there are cases where we
> > > > do enter idle but do not enter nohz, and that has to be handled correctly
> > > > as well.
> > > > 
> > > > Now, it is quite possible that I am suffering a senior moment and just
> > > > failing to see how to structure this in the design where rcu_idle_enter()
> > > > invokes rcu_enter_nohz(), but regardless, I am failing to see how to
> > > > structure this so that it works correctly.
> > > > 
> > > > Please feel free to enlighten me!
> > > 
> > > Ah I realize that you want to call rcu_idle_exit() when we enter
> > > the first level interrupt and rcu_idle_enter() when we exit it
> > > to return to idle loop.
> > > 
> > > But we use that check:
> > > 
> > > 	if (user ||
> > > 	    (rcu_is_cpu_idle() &&
> > >  	     !in_softirq() &&
> > >  	     hardirq_count() <= (1 << HARDIRQ_SHIFT)))
> > >  		rcu_sched_qs(cpu);
> > > 
> > > So we ensure that by the time we call rcu_check_callbacks(), we are not nesting
> > > in another interrupt.
> > 
> > But I would like to enable checks for entering/exiting idle while
> > within an RCU read-side critical section. The idea is to move
> > the checks from their currently somewhat problematic location in
> > rcu_needs_cpu_quick_check() to somewhere more sensible.  My current
> > thought is to move them rcu_enter_nohz() and rcu_exit_nohz() near the
> > calls to rcu_idle_enter() and rcu_idle_exit(), respectively.
> 
> So, checking if we are calling rcu_idle_enter() while in an RCU
> read side critical section?
> 
> But we already have checks that RCU read side API are not called in
> extended quiescent state.

Both checks are good.  The existing checks catch this kind of error:

1.	CPU 0 goes idle, entering an RCU extended quiescent state.
2.	CPU 0 illegally enters an RCU read-side critical section.

The new check catches this kind of error:

1.	CPU 0 enters an RCU read-side critical section.
2.	CPU 0 goes idle, entering an RCU extended quiescent state,
	but illegally so because it is still in an RCU read-side
	critical section.

> > This would mean that they operated only in NO_HZ kernels with lockdep
> > enabled, but I am good with that because to do otherwise would require
> > adding nesting-level counters to the non-NO_HZ case, which I would like
> > to avoid, expecially for TINY_RCU.

And my reworking of RCU's NO_HZ code to instead be idle code removes
the NO_HZ-only restriction.  Getting rid of the additional per-CPU
variable reduces the TINY_RCU overhead to acceptable levels.

> There can be a secondary check in rcu_read_lock_held() and friends to
> ensures that rcu_is_idle_cpu(). In the non-NO_HZ case it's useful to
> find similar issues.
> 
> In fact we could remove the check for rcu_extended_qs() in read side
> APIs and check instead rcu_is_idle_cpu(). That would work in any
> config and not only NO_HZ.
> 
> But I hope we can actually keep the check for RCU extended quiescent
> state so that when rcu_enter_nohz() is called from other places than
> idle, we are ready for it.
> 
> I believe it's fine to have both checks in PROVE_RCU.

Agreed, I have not yet revisited rcu_extended_qs(), but some change
might be useful.

> > > That said we found RCU uses after we decrement the hardirq offset and until
> > > we reach rcu_irq_exit(). So rcu_check_callbacks() may miss these places
> > > and account spurious quiescent states.
> > > 
> > > But between sub_preempt_count() and rcu_irq_exit(), irqs are disabled
> > > AFAIK so we can't be interrupted by rcu_check_callbacks(), except during the
> > > softirqs processing. But we have that ordering:
> > > 
> > > add_preempt_count(SOTFIRQ_OFFSET)
> > > local_irq_enable()
> > > 
> > > do softirqs
> > > 
> > > local_irq_disable()
> > > sub_preempt_count(SOTFIRQ_OFFSET)
> > > 
> > > So the !in_softirq() check covers us during the time we process softirqs.
> > > 
> > > The only assumption we need is that there is no place between
> > > sub_preempt_count(IRQ_EXIT_OFFSET) and rcu_irq_ext() that has
> > > irqs enabled and that is an rcu read side critical section.
> > > 
> > > I'm not aware of any automatic check to ensure that though.
> > 
> > Nor am I, which is why I am looking to the checks in
> > rcu_enter_nohz() and rcu_exit_nohz() called out above.
> 
> Yep.
> 
> > > Anyway, the delta patch looks good.
> > 
> > OK, my current plans are to start forward-porting to -rc8, and I would
> > like to have this pair of delta patches or something like them pulled
> > into your stack.
> 
> Sure I can take your patches (I'm going to merge the delta into the first).
> But if you want a rebase against -rc8, it's going to be easier if you
> do that rebase on the branch you want me to work on. Then I work on top
> of it.
> 
> For example we can take your rcu/dynticks, rewind to
> "rcu: Make synchronize_sched_expedited() better at work sharing"
> 771c326f20029a9f30b9a58237c9a5d5ddc1763d, rebase on top of -rc8
> and I rebase my patches (yours included) on top of it and I repost.
> 
> Right?

Yep!  Your earlier three patches look to need some extended-quiescent-state
rework as well:

b5566f3d: Detect illegal rcu dereference in extended quiescent state
ee05e5a4: Inform the user about dynticks-idle mode on PROVE_RCU warning
fa5d22cf: Warn when rcu_read_lock() is used in extended quiescent state

So I will leave these out and let you rebase them.

> > >                                     Just a little thing:
> > > 
> > > > -void tick_nohz_idle_exit(void)
> > > > +void tick_nohz_idle_exit(bool rcu_ext_qs)
> > > 
> > > It becomes weird to have both idle_enter/idle_exit having
> > > that parameter.
> > > 
> > > Would it make sense to have tick_nohz_idle_[exit|enter]_norcu()
> > > and a version without norcu?
> > 
> > Given that we need to make this work in CONFIG_NO_HZ=n kernels, I believe
> > that the current API is OK.  But if you would like to change the API
> > during the forward-port to -rc8, I am also OK with the alternative API
> > you suggest.
> 
> Fine. I'll do that rename.

Works for me!  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-29 17:12                                 ` Paul E. McKenney
@ 2011-09-29 17:19                                   ` Paul E. McKenney
  2011-09-29 23:18                                     ` Paul E. McKenney
  2011-09-30 13:11                                   ` Frederic Weisbecker
  1 sibling, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2011-09-29 17:19 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Thu, Sep 29, 2011 at 10:12:05AM -0700, Paul E. McKenney wrote:
> On Thu, Sep 29, 2011 at 02:30:44PM +0200, Frederic Weisbecker wrote:

[ . . . ]

> > Sure I can take your patches (I'm going to merge the delta into the first).
> > But if you want a rebase against -rc8, it's going to be easier if you
> > do that rebase on the branch you want me to work on. Then I work on top
> > of it.
> > 
> > For example we can take your rcu/dynticks, rewind to
> > "rcu: Make synchronize_sched_expedited() better at work sharing"
> > 771c326f20029a9f30b9a58237c9a5d5ddc1763d, rebase on top of -rc8
> > and I rebase my patches (yours included) on top of it and I repost.
> > 
> > Right?
> 
> Yep!  Your earlier three patches look to need some extended-quiescent-state
> rework as well:
> 
> b5566f3d: Detect illegal rcu dereference in extended quiescent state
> ee05e5a4: Inform the user about dynticks-idle mode on PROVE_RCU warning
> fa5d22cf: Warn when rcu_read_lock() is used in extended quiescent state
> 
> So I will leave these out and let you rebase them.

And for whatever it is worth, my following patches depend on yours as
well, so will not include them in the rebase:

61cf7640: Remove one layer of abstraction from PROVE_RCU checking
9a4d1ce3: Warn when srcu_read_lock() is used in an extended quiescent state

							Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-29 17:19                                   ` Paul E. McKenney
@ 2011-09-29 23:18                                     ` Paul E. McKenney
  0 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2011-09-29 23:18 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Thu, Sep 29, 2011 at 10:19:01AM -0700, Paul E. McKenney wrote:
> On Thu, Sep 29, 2011 at 10:12:05AM -0700, Paul E. McKenney wrote:
> > On Thu, Sep 29, 2011 at 02:30:44PM +0200, Frederic Weisbecker wrote:
> 
> [ . . . ]
> 
> > > Sure I can take your patches (I'm going to merge the delta into the first).
> > > But if you want a rebase against -rc8, it's going to be easier if you
> > > do that rebase on the branch you want me to work on. Then I work on top
> > > of it.
> > > 
> > > For example we can take your rcu/dynticks, rewind to
> > > "rcu: Make synchronize_sched_expedited() better at work sharing"
> > > 771c326f20029a9f30b9a58237c9a5d5ddc1763d, rebase on top of -rc8
> > > and I rebase my patches (yours included) on top of it and I repost.
> > > 
> > > Right?
> > 
> > Yep!  Your earlier three patches look to need some extended-quiescent-state
> > rework as well:
> > 
> > b5566f3d: Detect illegal rcu dereference in extended quiescent state
> > ee05e5a4: Inform the user about dynticks-idle mode on PROVE_RCU warning
> > fa5d22cf: Warn when rcu_read_lock() is used in extended quiescent state
> > 
> > So I will leave these out and let you rebase them.
> 
> And for whatever it is worth, my following patches depend on yours as
> well, so will not include them in the rebase:
> 
> 61cf7640: Remove one layer of abstraction from PROVE_RCU checking
> 9a4d1ce3: Warn when srcu_read_lock() is used in an extended quiescent state

And the new stack passes moderate rcutorture testing, and is available
at https://github.com/paulmckrcu/linux on branch rcu/next.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-29 17:12                                 ` Paul E. McKenney
  2011-09-29 17:19                                   ` Paul E. McKenney
@ 2011-09-30 13:11                                   ` Frederic Weisbecker
  2011-09-30 15:29                                     ` Paul E. McKenney
  1 sibling, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2011-09-30 13:11 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Thu, Sep 29, 2011 at 10:12:05AM -0700, Paul E. McKenney wrote:
> On Thu, Sep 29, 2011 at 02:30:44PM +0200, Frederic Weisbecker wrote:
> > I was thinking about the fact that idle is a caller of rcu_enter_nohz().
> > And there may be more callers of it in the future. So I thought it may
> > be better to keep rcu_enter_nohz() idle-agnostic.
> > 
> > But it's fine, there are other ways to call rcu_idle_enter()/rcu_idle_exit()
> > from the right places other than from rcu_enter/exit_nohz().
> > We have tick_check_idle() on irq entry and tick_nohz_irq_exit(), both are called
> > on the first interrupt level in idle.
> > 
> > So I can change that easily for the nohz cpusets.
> 
> Heh!  From what I can see, we were both wrong!
> 
> My thought at this point is to make it so that rcu_enter_nohz() and
> rcu_exit_nohz() are renamed to rcu_enter_idle() and rcu_exit_idle()
> respectively.  I drop the per-CPU variable and the added functions
> from one of my patches.  These functions, along with rcu_irq_enter(),
> rcu_irq_exit(), rcu_nmi_enter(), and rcu_nmi_exit(), are moved out from
> under CONFIG_NO_HZ.  This allows these functions to track idle state
> regardless of the setting of CONFIG_NO_HZ.  It also separates the state
> of the scheduling-clock tick from RCU's view of CPU idleness, which
> simplifies things.
> 
> I will put something together along these lines.

Should I wait for your updated patch before rebasing?

> 
> > > > > The problem I have with this is that it is rcu_enter_nohz() that tracks
> > > > > the irq nesting required to correctly decide whether or not we are going
> > > > > to really go to idle state.  Furthermore, there are cases where we
> > > > > do enter idle but do not enter nohz, and that has to be handled correctly
> > > > > as well.
> > > > > 
> > > > > Now, it is quite possible that I am suffering a senior moment and just
> > > > > failing to see how to structure this in the design where rcu_idle_enter()
> > > > > invokes rcu_enter_nohz(), but regardless, I am failing to see how to
> > > > > structure this so that it works correctly.
> > > > > 
> > > > > Please feel free to enlighten me!
> > > > 
> > > > Ah I realize that you want to call rcu_idle_exit() when we enter
> > > > the first level interrupt and rcu_idle_enter() when we exit it
> > > > to return to idle loop.
> > > > 
> > > > But we use that check:
> > > > 
> > > > 	if (user ||
> > > > 	    (rcu_is_cpu_idle() &&
> > > >  	     !in_softirq() &&
> > > >  	     hardirq_count() <= (1 << HARDIRQ_SHIFT)))
> > > >  		rcu_sched_qs(cpu);
> > > > 
> > > > So we ensure that by the time we call rcu_check_callbacks(), we are not nesting
> > > > in another interrupt.
> > > 
> > > But I would like to enable checks for entering/exiting idle while
> > > within an RCU read-side critical section. The idea is to move
> > > the checks from their currently somewhat problematic location in
> > > rcu_needs_cpu_quick_check() to somewhere more sensible.  My current
> > > thought is to move them rcu_enter_nohz() and rcu_exit_nohz() near the
> > > calls to rcu_idle_enter() and rcu_idle_exit(), respectively.
> > 
> > So, checking if we are calling rcu_idle_enter() while in an RCU
> > read side critical section?
> > 
> > But we already have checks that RCU read side API are not called in
> > extended quiescent state.
> 
> Both checks are good.  The existing checks catch this kind of error:
> 
> 1.	CPU 0 goes idle, entering an RCU extended quiescent state.
> 2.	CPU 0 illegally enters an RCU read-side critical section.
> 
> The new check catches this kind of error:
> 
> 1.	CPU 0 enters an RCU read-side critical section.
> 2.	CPU 0 goes idle, entering an RCU extended quiescent state,
> 	but illegally so because it is still in an RCU read-side
> 	critical section.

Right.

> 
> > > This would mean that they operated only in NO_HZ kernels with lockdep
> > > enabled, but I am good with that because to do otherwise would require
> > > adding nesting-level counters to the non-NO_HZ case, which I would like
> > > to avoid, expecially for TINY_RCU.
> 
> And my reworking of RCU's NO_HZ code to instead be idle code removes
> the NO_HZ-only restriction.  Getting rid of the additional per-CPU
> variable reduces the TINY_RCU overhead to acceptable levels.
> 
> > There can be a secondary check in rcu_read_lock_held() and friends to
> > ensures that rcu_is_idle_cpu(). In the non-NO_HZ case it's useful to
> > find similar issues.
> > 
> > In fact we could remove the check for rcu_extended_qs() in read side
> > APIs and check instead rcu_is_idle_cpu(). That would work in any
> > config and not only NO_HZ.
> > 
> > But I hope we can actually keep the check for RCU extended quiescent
> > state so that when rcu_enter_nohz() is called from other places than
> > idle, we are ready for it.
> > 
> > I believe it's fine to have both checks in PROVE_RCU.
> 
> Agreed, I have not yet revisited rcu_extended_qs(), but some change
> might be useful.

Yep.

> > > OK, my current plans are to start forward-porting to -rc8, and I would
> > > like to have this pair of delta patches or something like them pulled
> > > into your stack.
> > 
> > Sure I can take your patches (I'm going to merge the delta into the first).
> > But if you want a rebase against -rc8, it's going to be easier if you
> > do that rebase on the branch you want me to work on. Then I work on top
> > of it.
> > 
> > For example we can take your rcu/dynticks, rewind to
> > "rcu: Make synchronize_sched_expedited() better at work sharing"
> > 771c326f20029a9f30b9a58237c9a5d5ddc1763d, rebase on top of -rc8
> > and I rebase my patches (yours included) on top of it and I repost.
> > 
> > Right?
> 
> Yep!  Your earlier three patches look to need some extended-quiescent-state
> rework as well:
> 
> b5566f3d: Detect illegal rcu dereference in extended quiescent state
> ee05e5a4: Inform the user about dynticks-idle mode on PROVE_RCU warning
> fa5d22cf: Warn when rcu_read_lock() is used in extended quiescent state
> 
> So I will leave these out and let you rebase them.

Fine. Just need to know if they need an update against a patch from you
that is to come or something.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-30 13:11                                   ` Frederic Weisbecker
@ 2011-09-30 15:29                                     ` Paul E. McKenney
  2011-09-30 19:24                                       ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2011-09-30 15:29 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Fri, Sep 30, 2011 at 03:11:09PM +0200, Frederic Weisbecker wrote:
> On Thu, Sep 29, 2011 at 10:12:05AM -0700, Paul E. McKenney wrote:
> > On Thu, Sep 29, 2011 at 02:30:44PM +0200, Frederic Weisbecker wrote:
> > > I was thinking about the fact that idle is a caller of rcu_enter_nohz().
> > > And there may be more callers of it in the future. So I thought it may
> > > be better to keep rcu_enter_nohz() idle-agnostic.
> > > 
> > > But it's fine, there are other ways to call rcu_idle_enter()/rcu_idle_exit()
> > > from the right places other than from rcu_enter/exit_nohz().
> > > We have tick_check_idle() on irq entry and tick_nohz_irq_exit(), both are called
> > > on the first interrupt level in idle.
> > > 
> > > So I can change that easily for the nohz cpusets.
> > 
> > Heh!  From what I can see, we were both wrong!
> > 
> > My thought at this point is to make it so that rcu_enter_nohz() and
> > rcu_exit_nohz() are renamed to rcu_enter_idle() and rcu_exit_idle()
> > respectively.  I drop the per-CPU variable and the added functions
> > from one of my patches.  These functions, along with rcu_irq_enter(),
> > rcu_irq_exit(), rcu_nmi_enter(), and rcu_nmi_exit(), are moved out from
> > under CONFIG_NO_HZ.  This allows these functions to track idle state
> > regardless of the setting of CONFIG_NO_HZ.  It also separates the state
> > of the scheduling-clock tick from RCU's view of CPU idleness, which
> > simplifies things.
> > 
> > I will put something together along these lines.
> 
> Should I wait for your updated patch before rebasing?

Gah!!!  I knew I was forgetting something!  I will get that out.

> > > > > > The problem I have with this is that it is rcu_enter_nohz() that tracks
> > > > > > the irq nesting required to correctly decide whether or not we are going
> > > > > > to really go to idle state.  Furthermore, there are cases where we
> > > > > > do enter idle but do not enter nohz, and that has to be handled correctly
> > > > > > as well.
> > > > > > 
> > > > > > Now, it is quite possible that I am suffering a senior moment and just
> > > > > > failing to see how to structure this in the design where rcu_idle_enter()
> > > > > > invokes rcu_enter_nohz(), but regardless, I am failing to see how to
> > > > > > structure this so that it works correctly.
> > > > > > 
> > > > > > Please feel free to enlighten me!
> > > > > 
> > > > > Ah I realize that you want to call rcu_idle_exit() when we enter
> > > > > the first level interrupt and rcu_idle_enter() when we exit it
> > > > > to return to idle loop.
> > > > > 
> > > > > But we use that check:
> > > > > 
> > > > > 	if (user ||
> > > > > 	    (rcu_is_cpu_idle() &&
> > > > >  	     !in_softirq() &&
> > > > >  	     hardirq_count() <= (1 << HARDIRQ_SHIFT)))
> > > > >  		rcu_sched_qs(cpu);
> > > > > 
> > > > > So we ensure that by the time we call rcu_check_callbacks(), we are not nesting
> > > > > in another interrupt.
> > > > 
> > > > But I would like to enable checks for entering/exiting idle while
> > > > within an RCU read-side critical section. The idea is to move
> > > > the checks from their currently somewhat problematic location in
> > > > rcu_needs_cpu_quick_check() to somewhere more sensible.  My current
> > > > thought is to move them rcu_enter_nohz() and rcu_exit_nohz() near the
> > > > calls to rcu_idle_enter() and rcu_idle_exit(), respectively.
> > > 
> > > So, checking if we are calling rcu_idle_enter() while in an RCU
> > > read side critical section?
> > > 
> > > But we already have checks that RCU read side API are not called in
> > > extended quiescent state.
> > 
> > Both checks are good.  The existing checks catch this kind of error:
> > 
> > 1.	CPU 0 goes idle, entering an RCU extended quiescent state.
> > 2.	CPU 0 illegally enters an RCU read-side critical section.
> > 
> > The new check catches this kind of error:
> > 
> > 1.	CPU 0 enters an RCU read-side critical section.
> > 2.	CPU 0 goes idle, entering an RCU extended quiescent state,
> > 	but illegally so because it is still in an RCU read-side
> > 	critical section.
> 
> Right.
> 
> > 
> > > > This would mean that they operated only in NO_HZ kernels with lockdep
> > > > enabled, but I am good with that because to do otherwise would require
> > > > adding nesting-level counters to the non-NO_HZ case, which I would like
> > > > to avoid, expecially for TINY_RCU.
> > 
> > And my reworking of RCU's NO_HZ code to instead be idle code removes
> > the NO_HZ-only restriction.  Getting rid of the additional per-CPU
> > variable reduces the TINY_RCU overhead to acceptable levels.
> > 
> > > There can be a secondary check in rcu_read_lock_held() and friends to
> > > ensures that rcu_is_idle_cpu(). In the non-NO_HZ case it's useful to
> > > find similar issues.
> > > 
> > > In fact we could remove the check for rcu_extended_qs() in read side
> > > APIs and check instead rcu_is_idle_cpu(). That would work in any
> > > config and not only NO_HZ.
> > > 
> > > But I hope we can actually keep the check for RCU extended quiescent
> > > state so that when rcu_enter_nohz() is called from other places than
> > > idle, we are ready for it.
> > > 
> > > I believe it's fine to have both checks in PROVE_RCU.
> > 
> > Agreed, I have not yet revisited rcu_extended_qs(), but some change
> > might be useful.
> 
> Yep.
> 
> > > > OK, my current plans are to start forward-porting to -rc8, and I would
> > > > like to have this pair of delta patches or something like them pulled
> > > > into your stack.
> > > 
> > > Sure I can take your patches (I'm going to merge the delta into the first).
> > > But if you want a rebase against -rc8, it's going to be easier if you
> > > do that rebase on the branch you want me to work on. Then I work on top
> > > of it.
> > > 
> > > For example we can take your rcu/dynticks, rewind to
> > > "rcu: Make synchronize_sched_expedited() better at work sharing"
> > > 771c326f20029a9f30b9a58237c9a5d5ddc1763d, rebase on top of -rc8
> > > and I rebase my patches (yours included) on top of it and I repost.
> > > 
> > > Right?
> > 
> > Yep!  Your earlier three patches look to need some extended-quiescent-state
> > rework as well:
> > 
> > b5566f3d: Detect illegal rcu dereference in extended quiescent state
> > ee05e5a4: Inform the user about dynticks-idle mode on PROVE_RCU warning
> > fa5d22cf: Warn when rcu_read_lock() is used in extended quiescent state
> > 
> > So I will leave these out and let you rebase them.
> 
> Fine. Just need to know if they need an update against a patch from you
> that is to come or something.

I am on it, apologies for the delay!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-30 15:29                                     ` Paul E. McKenney
@ 2011-09-30 19:24                                       ` Paul E. McKenney
  2011-10-01  4:34                                         ` Paul E. McKenney
                                                           ` (3 more replies)
  0 siblings, 4 replies; 57+ messages in thread
From: Paul E. McKenney @ 2011-09-30 19:24 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Fri, Sep 30, 2011 at 08:29:46AM -0700, Paul E. McKenney wrote:
> On Fri, Sep 30, 2011 at 03:11:09PM +0200, Frederic Weisbecker wrote:
> > On Thu, Sep 29, 2011 at 10:12:05AM -0700, Paul E. McKenney wrote:
> > > On Thu, Sep 29, 2011 at 02:30:44PM +0200, Frederic Weisbecker wrote:
> > > > I was thinking about the fact that idle is a caller of rcu_enter_nohz().
> > > > And there may be more callers of it in the future. So I thought it may
> > > > be better to keep rcu_enter_nohz() idle-agnostic.
> > > > 
> > > > But it's fine, there are other ways to call rcu_idle_enter()/rcu_idle_exit()
> > > > from the right places other than from rcu_enter/exit_nohz().
> > > > We have tick_check_idle() on irq entry and tick_nohz_irq_exit(), both are called
> > > > on the first interrupt level in idle.
> > > > 
> > > > So I can change that easily for the nohz cpusets.
> > > 
> > > Heh!  From what I can see, we were both wrong!
> > > 
> > > My thought at this point is to make it so that rcu_enter_nohz() and
> > > rcu_exit_nohz() are renamed to rcu_enter_idle() and rcu_exit_idle()
> > > respectively.  I drop the per-CPU variable and the added functions
> > > from one of my patches.  These functions, along with rcu_irq_enter(),
> > > rcu_irq_exit(), rcu_nmi_enter(), and rcu_nmi_exit(), are moved out from
> > > under CONFIG_NO_HZ.  This allows these functions to track idle state
> > > regardless of the setting of CONFIG_NO_HZ.  It also separates the state
> > > of the scheduling-clock tick from RCU's view of CPU idleness, which
> > > simplifies things.
> > > 
> > > I will put something together along these lines.
> > 
> > Should I wait for your updated patch before rebasing?
> 
> Gah!!!  I knew I was forgetting something!  I will get that out.
> 
> > > > > > > The problem I have with this is that it is rcu_enter_nohz() that tracks
> > > > > > > the irq nesting required to correctly decide whether or not we are going
> > > > > > > to really go to idle state.  Furthermore, there are cases where we
> > > > > > > do enter idle but do not enter nohz, and that has to be handled correctly
> > > > > > > as well.
> > > > > > > 
> > > > > > > Now, it is quite possible that I am suffering a senior moment and just
> > > > > > > failing to see how to structure this in the design where rcu_idle_enter()
> > > > > > > invokes rcu_enter_nohz(), but regardless, I am failing to see how to
> > > > > > > structure this so that it works correctly.
> > > > > > > 
> > > > > > > Please feel free to enlighten me!
> > > > > > 
> > > > > > Ah I realize that you want to call rcu_idle_exit() when we enter
> > > > > > the first level interrupt and rcu_idle_enter() when we exit it
> > > > > > to return to idle loop.
> > > > > > 
> > > > > > But we use that check:
> > > > > > 
> > > > > > 	if (user ||
> > > > > > 	    (rcu_is_cpu_idle() &&
> > > > > >  	     !in_softirq() &&
> > > > > >  	     hardirq_count() <= (1 << HARDIRQ_SHIFT)))
> > > > > >  		rcu_sched_qs(cpu);
> > > > > > 
> > > > > > So we ensure that by the time we call rcu_check_callbacks(), we are not nesting
> > > > > > in another interrupt.
> > > > > 
> > > > > But I would like to enable checks for entering/exiting idle while
> > > > > within an RCU read-side critical section. The idea is to move
> > > > > the checks from their currently somewhat problematic location in
> > > > > rcu_needs_cpu_quick_check() to somewhere more sensible.  My current
> > > > > thought is to move them rcu_enter_nohz() and rcu_exit_nohz() near the
> > > > > calls to rcu_idle_enter() and rcu_idle_exit(), respectively.
> > > > 
> > > > So, checking if we are calling rcu_idle_enter() while in an RCU
> > > > read side critical section?
> > > > 
> > > > But we already have checks that RCU read side API are not called in
> > > > extended quiescent state.
> > > 
> > > Both checks are good.  The existing checks catch this kind of error:
> > > 
> > > 1.	CPU 0 goes idle, entering an RCU extended quiescent state.
> > > 2.	CPU 0 illegally enters an RCU read-side critical section.
> > > 
> > > The new check catches this kind of error:
> > > 
> > > 1.	CPU 0 enters an RCU read-side critical section.
> > > 2.	CPU 0 goes idle, entering an RCU extended quiescent state,
> > > 	but illegally so because it is still in an RCU read-side
> > > 	critical section.
> > 
> > Right.
> > 
> > > 
> > > > > This would mean that they operated only in NO_HZ kernels with lockdep
> > > > > enabled, but I am good with that because to do otherwise would require
> > > > > adding nesting-level counters to the non-NO_HZ case, which I would like
> > > > > to avoid, expecially for TINY_RCU.
> > > 
> > > And my reworking of RCU's NO_HZ code to instead be idle code removes
> > > the NO_HZ-only restriction.  Getting rid of the additional per-CPU
> > > variable reduces the TINY_RCU overhead to acceptable levels.
> > > 
> > > > There can be a secondary check in rcu_read_lock_held() and friends to
> > > > ensures that rcu_is_idle_cpu(). In the non-NO_HZ case it's useful to
> > > > find similar issues.
> > > > 
> > > > In fact we could remove the check for rcu_extended_qs() in read side
> > > > APIs and check instead rcu_is_idle_cpu(). That would work in any
> > > > config and not only NO_HZ.
> > > > 
> > > > But I hope we can actually keep the check for RCU extended quiescent
> > > > state so that when rcu_enter_nohz() is called from other places than
> > > > idle, we are ready for it.
> > > > 
> > > > I believe it's fine to have both checks in PROVE_RCU.
> > > 
> > > Agreed, I have not yet revisited rcu_extended_qs(), but some change
> > > might be useful.
> > 
> > Yep.
> > 
> > > > > OK, my current plans are to start forward-porting to -rc8, and I would
> > > > > like to have this pair of delta patches or something like them pulled
> > > > > into your stack.
> > > > 
> > > > Sure I can take your patches (I'm going to merge the delta into the first).
> > > > But if you want a rebase against -rc8, it's going to be easier if you
> > > > do that rebase on the branch you want me to work on. Then I work on top
> > > > of it.
> > > > 
> > > > For example we can take your rcu/dynticks, rewind to
> > > > "rcu: Make synchronize_sched_expedited() better at work sharing"
> > > > 771c326f20029a9f30b9a58237c9a5d5ddc1763d, rebase on top of -rc8
> > > > and I rebase my patches (yours included) on top of it and I repost.
> > > > 
> > > > Right?
> > > 
> > > Yep!  Your earlier three patches look to need some extended-quiescent-state
> > > rework as well:
> > > 
> > > b5566f3d: Detect illegal rcu dereference in extended quiescent state
> > > ee05e5a4: Inform the user about dynticks-idle mode on PROVE_RCU warning
> > > fa5d22cf: Warn when rcu_read_lock() is used in extended quiescent state
> > > 
> > > So I will leave these out and let you rebase them.
> > 
> > Fine. Just need to know if they need an update against a patch from you
> > that is to come or something.
> 
> I am on it, apologies for the delay!

And here is a first cut, probably totally broken, but a start.

With this change, I am wondering about tick_nohz_stop_sched_tick()'s
invocation of rcu_idle_enter() -- this now needs to be called regardless
of whether or not tick_nohz_stop_sched_tick() actually stops the tick.
Except that if tick_nohz_stop_sched_tick() is invoked with inidle==0,
it looks like we should -not- call rcu_idle_enter().

I eventually just left the rcu_idle_enter() calls in their current
places due to paranoia about messing up and ending up with unbalanced
rcu_idle_enter() and rcu_idle_exit() calls.  Any thoughts on how to
make this work better?

							Thanx, Paul

------------------------------------------------------------------------

rcu: Track idleness independent of idle tasks

Earlier versions of RCU used the scheduling-clock tick to detect idleness
by checking for the idle task, but handled idleness differently for
CONFIG_NO_HZ=y.  But there are now a number of uses of RCU read-side
critical sections in the idle task, for example, for tracing.  A more
fine-grained detection of idleness is therefore required.

This commit presses the old dyntick-idle code into full-time service,
so that calls to rcu_idle_enter(), previously known as rcu_enter_nohz(),
is always invoked at the beginning of an idle loop iteration.  Similarly,
rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked at
the end of an idle-loop iteration.  This allows the idle task to use RCU
everywhere except between consecutive rcu_idle_enter() and rcu_idle_exit()
calls, in turn allowing architecture maintainers to specify where it is
permissible to use RCU.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index aaf65f6..49587ab 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -105,14 +105,10 @@ o	"dt" is the current value of the dyntick counter that is incremented
 	or one greater than the interrupt-nesting depth otherwise.
 	The number after the second "/" is the NMI nesting depth.
 
-	This field is displayed only for CONFIG_NO_HZ kernels.
-
 o	"df" is the number of times that some other CPU has forced a
 	quiescent state on behalf of this CPU due to this CPU being in
 	dynticks-idle state.
 
-	This field is displayed only for CONFIG_NO_HZ kernels.
-
 o	"of" is the number of times that some other CPU has forced a
 	quiescent state on behalf of this CPU due to this CPU being
 	offline.  In a perfect world, this might never happen, but it
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index f743883..bb7f309 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -139,20 +139,7 @@ static inline void account_system_vtime(struct task_struct *tsk)
 extern void account_system_vtime(struct task_struct *tsk);
 #endif
 
-#if defined(CONFIG_NO_HZ)
 #if defined(CONFIG_TINY_RCU) || defined(CONFIG_TINY_PREEMPT_RCU)
-extern void rcu_enter_nohz(void);
-extern void rcu_exit_nohz(void);
-
-static inline void rcu_irq_enter(void)
-{
-	rcu_exit_nohz();
-}
-
-static inline void rcu_irq_exit(void)
-{
-	rcu_enter_nohz();
-}
 
 static inline void rcu_nmi_enter(void)
 {
@@ -163,17 +150,9 @@ static inline void rcu_nmi_exit(void)
 }
 
 #else
-extern void rcu_irq_enter(void);
-extern void rcu_irq_exit(void);
 extern void rcu_nmi_enter(void);
 extern void rcu_nmi_exit(void);
 #endif
-#else
-# define rcu_irq_enter() do { } while (0)
-# define rcu_irq_exit() do { } while (0)
-# define rcu_nmi_enter() do { } while (0)
-# define rcu_nmi_exit() do { } while (0)
-#endif /* #if defined(CONFIG_NO_HZ) */
 
 /*
  * It is safe to do non-atomic ops on ->hardirq_context,
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 2cf4226..a90a850 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -177,23 +177,8 @@ extern void rcu_sched_qs(int cpu);
 extern void rcu_bh_qs(int cpu);
 extern void rcu_check_callbacks(int cpu, int user);
 struct notifier_block;
-
-#ifdef CONFIG_NO_HZ
-
-extern void rcu_enter_nohz(void);
-extern void rcu_exit_nohz(void);
-
-#else /* #ifdef CONFIG_NO_HZ */
-
-static inline void rcu_enter_nohz(void)
-{
-}
-
-static inline void rcu_exit_nohz(void)
-{
-}
-
-#endif /* #else #ifdef CONFIG_NO_HZ */
+extern void rcu_idle_enter(void);
+extern void rcu_idle_exit(void);
 
 /*
  * Infrastructure to implement the synchronize_() primitives in
diff --git a/include/linux/tick.h b/include/linux/tick.h
index b232ccc..35d2ffc 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -127,8 +127,14 @@ extern ktime_t tick_nohz_get_sleep_length(void);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
 extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
 # else
-static inline void tick_nohz_stop_sched_tick(int inidle) { }
-static inline void tick_nohz_restart_sched_tick(void) { }
+static inline void tick_nohz_stop_sched_tick(int inidle)
+{
+	rcu_idle_enter();
+}
+static inline void tick_nohz_restart_sched_tick(void)
+{
+	rcu_idle_exit();
+}
 static inline ktime_t tick_nohz_get_sleep_length(void)
 {
 	ktime_t len = { .tv64 = NSEC_PER_SEC/HZ };
diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
index da775c8..8b9b9d3 100644
--- a/kernel/rcutiny.c
+++ b/kernel/rcutiny.c
@@ -54,31 +54,47 @@ static void __call_rcu(struct rcu_head *head,
 
 #include "rcutiny_plugin.h"
 
-#ifdef CONFIG_NO_HZ
-
 static long rcu_dynticks_nesting = 1;
 
 /*
- * Enter dynticks-idle mode, which is an extended quiescent state
- * if we have fully entered that mode (i.e., if the new value of
- * dynticks_nesting is zero).
+ * Enter idle, which is an extended quiescent state if we have fully
+ * entered that mode (i.e., if the new value of dynticks_nesting is zero).
  */
-void rcu_enter_nohz(void)
+void rcu_idle_enter(void)
 {
 	if (--rcu_dynticks_nesting == 0)
 		rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
 }
 
 /*
- * Exit dynticks-idle mode, so that we are no longer in an extended
- * quiescent state.
+ * Exit idle, so that we are no longer in an extended quiescent state.
  */
-void rcu_exit_nohz(void)
+void rcu_idle_exit(void)
 {
 	rcu_dynticks_nesting++;
 }
 
-#endif /* #ifdef CONFIG_NO_HZ */
+#ifdef CONFIG_PROVE_RCU
+
+/*
+ * Test whether the current CPU is idle.
+ */
+int rcu_is_cpu_idle(void)
+{
+	return !rcu_dynticks_nesting;
+}
+
+#endif /* #ifdef CONFIG_PROVE_RCU */
+
+/*
+ * Test whether the current CPU was interrupted from idle.  Nested
+ * interrupts don't count, we must be running at the first interrupt
+ * level.
+ */
+int rcu_is_cpu_rrupt_from_idle(void)
+{
+	return rcu_dynticks_nesting <= 0;
+}
 
 /*
  * Helper function for rcu_sched_qs() and rcu_bh_qs().
@@ -131,10 +147,7 @@ void rcu_bh_qs(int cpu)
  */
 void rcu_check_callbacks(int cpu, int user)
 {
-	if (user ||
-	    (idle_cpu(cpu) &&
-	     !in_softirq() &&
-	     hardirq_count() <= (1 << HARDIRQ_SHIFT)))
+	if (user || rcu_is_cpu_rrupt_from_idle())
 		rcu_sched_qs(cpu);
 	else if (!in_softirq())
 		rcu_bh_qs(cpu);
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index cb7c46e..56cc18f 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -195,12 +195,10 @@ void rcu_note_context_switch(int cpu)
 }
 EXPORT_SYMBOL_GPL(rcu_note_context_switch);
 
-#ifdef CONFIG_NO_HZ
 DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {
 	.dynticks_nesting = 1,
 	.dynticks = ATOMIC_INIT(1),
 };
-#endif /* #ifdef CONFIG_NO_HZ */
 
 static int blimit = 10;		/* Maximum callbacks per rcu_do_batch. */
 static int qhimark = 10000;	/* If this many pending, ignore blimit. */
@@ -328,11 +326,11 @@ static int rcu_implicit_offline_qs(struct rcu_data *rdp)
 		return 1;
 	}
 
-	/* If preemptible RCU, no point in sending reschedule IPI. */
-	if (rdp->preemptible)
-		return 0;
-
-	/* The CPU is online, so send it a reschedule IPI. */
+	/*
+	 * The CPU is online, so send it a reschedule IPI.  This forces
+	 * it through the scheduler, and (inefficiently) also handles cases
+	 * where idle loops fail to inform RCU about the CPU being idle.
+	 */
 	if (rdp->cpu != smp_processor_id())
 		smp_send_reschedule(rdp->cpu);
 	else
@@ -343,17 +341,15 @@ static int rcu_implicit_offline_qs(struct rcu_data *rdp)
 
 #endif /* #ifdef CONFIG_SMP */
 
-#ifdef CONFIG_NO_HZ
-
 /**
- * rcu_enter_nohz - inform RCU that current CPU is entering nohz
+ * rcu_idle_enter - inform RCU that current CPU is entering idle
  *
- * Enter nohz mode, in other words, -leave- the mode in which RCU
+ * Enter idle mode, in other words, -leave- the mode in which RCU
  * read-side critical sections can occur.  (Though RCU read-side
- * critical sections can occur in irq handlers in nohz mode, a possibility
+ * critical sections can occur in irq handlers in idle, a possibility
  * handled by rcu_irq_enter() and rcu_irq_exit()).
  */
-void rcu_enter_nohz(void)
+void rcu_idle_enter(void)
 {
 	unsigned long flags;
 	struct rcu_dynticks *rdtp;
@@ -374,12 +370,12 @@ void rcu_enter_nohz(void)
 }
 
 /*
- * rcu_exit_nohz - inform RCU that current CPU is leaving nohz
+ * rcu_idle_exit - inform RCU that current CPU is leaving idle
  *
- * Exit nohz mode, in other words, -enter- the mode in which RCU
+ * Exit idle, in other words, -enter- the mode in which RCU
  * read-side critical sections normally occur.
  */
-void rcu_exit_nohz(void)
+void rcu_idle_exit(void)
 {
 	unsigned long flags;
 	struct rcu_dynticks *rdtp;
@@ -442,27 +438,32 @@ void rcu_nmi_exit(void)
 	WARN_ON_ONCE(atomic_read(&rdtp->dynticks) & 0x1);
 }
 
+#ifdef CONFIG_PROVE_RCU
+
 /**
- * rcu_irq_enter - inform RCU of entry to hard irq context
+ * rcu_is_cpu_idle - see if RCU thinks that the current CPU is idle
  *
- * If the CPU was idle with dynamic ticks active, this updates the
- * rdtp->dynticks to let the RCU handling know that the CPU is active.
+ * If the current CPU is in its idle loop and is neither in an interrupt
+ * or NMI handler, return true.  The caller must have at least disabled
+ * preemption.
  */
-void rcu_irq_enter(void)
+int rcu_is_cpu_idle(void)
 {
-	rcu_exit_nohz();
+	return (atomic_read(&__get_cpu_var(rcu_dynticks).dynticks) & 0x1) == 0;
 }
 
+#endif /* #ifdef CONFIG_PROVE_RCU */
+
 /**
- * rcu_irq_exit - inform RCU of exit from hard irq context
+ * rcu_is_cpu_rrupt_from_idle - see if idle or immediately interrupted from idle
  *
- * If the CPU was idle with dynamic ticks active, update the rdp->dynticks
- * to put let the RCU handling be aware that the CPU is going back to idle
- * with no ticks.
+ * If the current CPU is idle or running at a first-level (not nested)
+ * interrupt from idle, return true.  The caller must have at least
+ * disabled preemption.
  */
-void rcu_irq_exit(void)
+int rcu_is_cpu_rrupt_from_idle(void)
 {
-	rcu_enter_nohz();
+	return (__get_cpu_var(rcu_dynticks).dynticks_nesting & 0x1) <= 1;
 }
 
 #ifdef CONFIG_SMP
@@ -512,24 +513,6 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 
 #endif /* #ifdef CONFIG_SMP */
 
-#else /* #ifdef CONFIG_NO_HZ */
-
-#ifdef CONFIG_SMP
-
-static int dyntick_save_progress_counter(struct rcu_data *rdp)
-{
-	return 0;
-}
-
-static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
-{
-	return rcu_implicit_offline_qs(rdp);
-}
-
-#endif /* #ifdef CONFIG_SMP */
-
-#endif /* #else #ifdef CONFIG_NO_HZ */
-
 int rcu_cpu_stall_suppress __read_mostly;
 
 static void record_gp_stall_check_time(struct rcu_state *rsp)
@@ -1341,9 +1324,7 @@ static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp)
 void rcu_check_callbacks(int cpu, int user)
 {
 	trace_rcu_utilization("Start scheduler-tick");
-	if (user ||
-	    (idle_cpu(cpu) && rcu_scheduler_active &&
-	     !in_softirq() && hardirq_count() <= (1 << HARDIRQ_SHIFT))) {
+	if (user || rcu_is_cpu_rrupt_from_idle()) {
 
 		/*
 		 * Get here if this CPU took its interrupt from user
@@ -1913,9 +1894,7 @@ rcu_boot_init_percpu_data(int cpu, struct rcu_state *rsp)
 	for (i = 0; i < RCU_NEXT_SIZE; i++)
 		rdp->nxttail[i] = &rdp->nxtlist;
 	rdp->qlen = 0;
-#ifdef CONFIG_NO_HZ
 	rdp->dynticks = &per_cpu(rcu_dynticks, cpu);
-#endif /* #ifdef CONFIG_NO_HZ */
 	rdp->cpu = cpu;
 	rdp->rsp = rsp;
 	raw_spin_unlock_irqrestore(&rnp->lock, flags);
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 517f2f8..1f0221f 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -274,16 +274,12 @@ struct rcu_data {
 					/* did other CPU force QS recently? */
 	long		blimit;		/* Upper limit on a processed batch */
 
-#ifdef CONFIG_NO_HZ
 	/* 3) dynticks interface. */
 	struct rcu_dynticks *dynticks;	/* Shared per-CPU dynticks state. */
 	int dynticks_snap;		/* Per-GP tracking for dynticks. */
-#endif /* #ifdef CONFIG_NO_HZ */
 
 	/* 4) reasons this CPU needed to be kicked by force_quiescent_state */
-#ifdef CONFIG_NO_HZ
 	unsigned long dynticks_fqs;	/* Kicked due to dynticks idle. */
-#endif /* #ifdef CONFIG_NO_HZ */
 	unsigned long offline_fqs;	/* Kicked due to being offline. */
 	unsigned long resched_ipi;	/* Sent a resched IPI. */
 
@@ -307,11 +303,7 @@ struct rcu_data {
 #define RCU_GP_INIT		1	/* Grace period being initialized. */
 #define RCU_SAVE_DYNTICK	2	/* Need to scan dyntick state. */
 #define RCU_FORCE_QS		3	/* Need to force quiescent state. */
-#ifdef CONFIG_NO_HZ
 #define RCU_SIGNAL_INIT		RCU_SAVE_DYNTICK
-#else /* #ifdef CONFIG_NO_HZ */
-#define RCU_SIGNAL_INIT		RCU_FORCE_QS
-#endif /* #else #ifdef CONFIG_NO_HZ */
 
 #define RCU_JIFFIES_TILL_FORCE_QS	 3	/* for rsp->jiffies_force_qs */
 
diff --git a/kernel/rcutree_trace.c b/kernel/rcutree_trace.c
index 59c7bee..3b6a0bc 100644
--- a/kernel/rcutree_trace.c
+++ b/kernel/rcutree_trace.c
@@ -67,13 +67,11 @@ static void print_one_rcu_data(struct seq_file *m, struct rcu_data *rdp)
 		   rdp->completed, rdp->gpnum,
 		   rdp->passed_quiesce, rdp->passed_quiesce_gpnum,
 		   rdp->qs_pending);
-#ifdef CONFIG_NO_HZ
 	seq_printf(m, " dt=%d/%d/%d df=%lu",
 		   atomic_read(&rdp->dynticks->dynticks),
 		   rdp->dynticks->dynticks_nesting,
 		   rdp->dynticks->dynticks_nmi_nesting,
 		   rdp->dynticks_fqs);
-#endif /* #ifdef CONFIG_NO_HZ */
 	seq_printf(m, " of=%lu ri=%lu", rdp->offline_fqs, rdp->resched_ipi);
 	seq_printf(m, " ql=%ld qs=%c%c%c%c",
 		   rdp->qlen,
@@ -141,13 +139,11 @@ static void print_one_rcu_data_csv(struct seq_file *m, struct rcu_data *rdp)
 		   rdp->completed, rdp->gpnum,
 		   rdp->passed_quiesce, rdp->passed_quiesce_gpnum,
 		   rdp->qs_pending);
-#ifdef CONFIG_NO_HZ
 	seq_printf(m, ",%d,%d,%d,%lu",
 		   atomic_read(&rdp->dynticks->dynticks),
 		   rdp->dynticks->dynticks_nesting,
 		   rdp->dynticks->dynticks_nmi_nesting,
 		   rdp->dynticks_fqs);
-#endif /* #ifdef CONFIG_NO_HZ */
 	seq_printf(m, ",%lu,%lu", rdp->offline_fqs, rdp->resched_ipi);
 	seq_printf(m, ",%ld,\"%c%c%c%c\"", rdp->qlen,
 		   ".N"[rdp->nxttail[RCU_NEXT_READY_TAIL] !=
@@ -171,9 +167,7 @@ static void print_one_rcu_data_csv(struct seq_file *m, struct rcu_data *rdp)
 static int show_rcudata_csv(struct seq_file *m, void *unused)
 {
 	seq_puts(m, "\"CPU\",\"Online?\",\"c\",\"g\",\"pq\",\"pgp\",\"pq\",");
-#ifdef CONFIG_NO_HZ
 	seq_puts(m, "\"dt\",\"dt nesting\",\"dt NMI nesting\",\"df\",");
-#endif /* #ifdef CONFIG_NO_HZ */
 	seq_puts(m, "\"of\",\"ri\",\"ql\",\"qs\"");
 #ifdef CONFIG_RCU_BOOST
 	seq_puts(m, "\"kt\",\"ktl\"");
diff --git a/kernel/softirq.c b/kernel/softirq.c
index fca82c3..c0120d5 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -296,7 +296,7 @@ void irq_enter(void)
 {
 	int cpu = smp_processor_id();
 
-	rcu_irq_enter();
+	rcu_idle_exit();
 	if (idle_cpu(cpu) && !in_interrupt()) {
 		/*
 		 * Prevent raise_softirq from needlessly waking up ksoftirqd
@@ -347,7 +347,7 @@ void irq_exit(void)
 	if (!in_interrupt() && local_softirq_pending())
 		invoke_softirq();
 
-	rcu_irq_exit();
+	rcu_idle_enter();
 #ifdef CONFIG_NO_HZ
 	/* Make sure that timer wheel updates are propagated */
 	if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index eb98e55..d61b908 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -405,7 +405,7 @@ void tick_nohz_stop_sched_tick(int inidle)
 			ts->idle_tick = hrtimer_get_expires(&ts->sched_timer);
 			ts->tick_stopped = 1;
 			ts->idle_jiffies = last_jiffies;
-			rcu_enter_nohz();
+			rcu_idle_enter();
 		}
 
 		ts->idle_sleeps++;
@@ -514,7 +514,7 @@ void tick_nohz_restart_sched_tick(void)
 
 	ts->inidle = 0;
 
-	rcu_exit_nohz();
+	rcu_idle_exit();
 
 	/* Update jiffies first */
 	select_nohz_load_balancer(0);

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-30 19:24                                       ` Paul E. McKenney
@ 2011-10-01  4:34                                         ` Paul E. McKenney
  2011-10-01 12:24                                         ` Frederic Weisbecker
                                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2011-10-01  4:34 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Fri, Sep 30, 2011 at 12:24:38PM -0700, Paul E. McKenney wrote:
> On Fri, Sep 30, 2011 at 08:29:46AM -0700, Paul E. McKenney wrote:
> > On Fri, Sep 30, 2011 at 03:11:09PM +0200, Frederic Weisbecker wrote:
> > > On Thu, Sep 29, 2011 at 10:12:05AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Sep 29, 2011 at 02:30:44PM +0200, Frederic Weisbecker wrote:
> > > > > I was thinking about the fact that idle is a caller of rcu_enter_nohz().
> > > > > And there may be more callers of it in the future. So I thought it may
> > > > > be better to keep rcu_enter_nohz() idle-agnostic.
> > > > > 
> > > > > But it's fine, there are other ways to call rcu_idle_enter()/rcu_idle_exit()
> > > > > from the right places other than from rcu_enter/exit_nohz().
> > > > > We have tick_check_idle() on irq entry and tick_nohz_irq_exit(), both are called
> > > > > on the first interrupt level in idle.
> > > > > 
> > > > > So I can change that easily for the nohz cpusets.
> > > > 
> > > > Heh!  From what I can see, we were both wrong!
> > > > 
> > > > My thought at this point is to make it so that rcu_enter_nohz() and
> > > > rcu_exit_nohz() are renamed to rcu_enter_idle() and rcu_exit_idle()
> > > > respectively.  I drop the per-CPU variable and the added functions
> > > > from one of my patches.  These functions, along with rcu_irq_enter(),
> > > > rcu_irq_exit(), rcu_nmi_enter(), and rcu_nmi_exit(), are moved out from
> > > > under CONFIG_NO_HZ.  This allows these functions to track idle state
> > > > regardless of the setting of CONFIG_NO_HZ.  It also separates the state
> > > > of the scheduling-clock tick from RCU's view of CPU idleness, which
> > > > simplifies things.
> > > > 
> > > > I will put something together along these lines.
> > > 
> > > Should I wait for your updated patch before rebasing?
> > 
> > Gah!!!  I knew I was forgetting something!  I will get that out.
> > 
> > > > > > > > The problem I have with this is that it is rcu_enter_nohz() that tracks
> > > > > > > > the irq nesting required to correctly decide whether or not we are going
> > > > > > > > to really go to idle state.  Furthermore, there are cases where we
> > > > > > > > do enter idle but do not enter nohz, and that has to be handled correctly
> > > > > > > > as well.
> > > > > > > > 
> > > > > > > > Now, it is quite possible that I am suffering a senior moment and just
> > > > > > > > failing to see how to structure this in the design where rcu_idle_enter()
> > > > > > > > invokes rcu_enter_nohz(), but regardless, I am failing to see how to
> > > > > > > > structure this so that it works correctly.
> > > > > > > > 
> > > > > > > > Please feel free to enlighten me!
> > > > > > > 
> > > > > > > Ah I realize that you want to call rcu_idle_exit() when we enter
> > > > > > > the first level interrupt and rcu_idle_enter() when we exit it
> > > > > > > to return to idle loop.
> > > > > > > 
> > > > > > > But we use that check:
> > > > > > > 
> > > > > > > 	if (user ||
> > > > > > > 	    (rcu_is_cpu_idle() &&
> > > > > > >  	     !in_softirq() &&
> > > > > > >  	     hardirq_count() <= (1 << HARDIRQ_SHIFT)))
> > > > > > >  		rcu_sched_qs(cpu);
> > > > > > > 
> > > > > > > So we ensure that by the time we call rcu_check_callbacks(), we are not nesting
> > > > > > > in another interrupt.
> > > > > > 
> > > > > > But I would like to enable checks for entering/exiting idle while
> > > > > > within an RCU read-side critical section. The idea is to move
> > > > > > the checks from their currently somewhat problematic location in
> > > > > > rcu_needs_cpu_quick_check() to somewhere more sensible.  My current
> > > > > > thought is to move them rcu_enter_nohz() and rcu_exit_nohz() near the
> > > > > > calls to rcu_idle_enter() and rcu_idle_exit(), respectively.
> > > > > 
> > > > > So, checking if we are calling rcu_idle_enter() while in an RCU
> > > > > read side critical section?
> > > > > 
> > > > > But we already have checks that RCU read side API are not called in
> > > > > extended quiescent state.
> > > > 
> > > > Both checks are good.  The existing checks catch this kind of error:
> > > > 
> > > > 1.	CPU 0 goes idle, entering an RCU extended quiescent state.
> > > > 2.	CPU 0 illegally enters an RCU read-side critical section.
> > > > 
> > > > The new check catches this kind of error:
> > > > 
> > > > 1.	CPU 0 enters an RCU read-side critical section.
> > > > 2.	CPU 0 goes idle, entering an RCU extended quiescent state,
> > > > 	but illegally so because it is still in an RCU read-side
> > > > 	critical section.
> > > 
> > > Right.
> > > 
> > > > 
> > > > > > This would mean that they operated only in NO_HZ kernels with lockdep
> > > > > > enabled, but I am good with that because to do otherwise would require
> > > > > > adding nesting-level counters to the non-NO_HZ case, which I would like
> > > > > > to avoid, expecially for TINY_RCU.
> > > > 
> > > > And my reworking of RCU's NO_HZ code to instead be idle code removes
> > > > the NO_HZ-only restriction.  Getting rid of the additional per-CPU
> > > > variable reduces the TINY_RCU overhead to acceptable levels.
> > > > 
> > > > > There can be a secondary check in rcu_read_lock_held() and friends to
> > > > > ensures that rcu_is_idle_cpu(). In the non-NO_HZ case it's useful to
> > > > > find similar issues.
> > > > > 
> > > > > In fact we could remove the check for rcu_extended_qs() in read side
> > > > > APIs and check instead rcu_is_idle_cpu(). That would work in any
> > > > > config and not only NO_HZ.
> > > > > 
> > > > > But I hope we can actually keep the check for RCU extended quiescent
> > > > > state so that when rcu_enter_nohz() is called from other places than
> > > > > idle, we are ready for it.
> > > > > 
> > > > > I believe it's fine to have both checks in PROVE_RCU.
> > > > 
> > > > Agreed, I have not yet revisited rcu_extended_qs(), but some change
> > > > might be useful.
> > > 
> > > Yep.
> > > 
> > > > > > OK, my current plans are to start forward-porting to -rc8, and I would
> > > > > > like to have this pair of delta patches or something like them pulled
> > > > > > into your stack.
> > > > > 
> > > > > Sure I can take your patches (I'm going to merge the delta into the first).
> > > > > But if you want a rebase against -rc8, it's going to be easier if you
> > > > > do that rebase on the branch you want me to work on. Then I work on top
> > > > > of it.
> > > > > 
> > > > > For example we can take your rcu/dynticks, rewind to
> > > > > "rcu: Make synchronize_sched_expedited() better at work sharing"
> > > > > 771c326f20029a9f30b9a58237c9a5d5ddc1763d, rebase on top of -rc8
> > > > > and I rebase my patches (yours included) on top of it and I repost.
> > > > > 
> > > > > Right?
> > > > 
> > > > Yep!  Your earlier three patches look to need some extended-quiescent-state
> > > > rework as well:
> > > > 
> > > > b5566f3d: Detect illegal rcu dereference in extended quiescent state
> > > > ee05e5a4: Inform the user about dynticks-idle mode on PROVE_RCU warning
> > > > fa5d22cf: Warn when rcu_read_lock() is used in extended quiescent state
> > > > 
> > > > So I will leave these out and let you rebase them.
> > > 
> > > Fine. Just need to know if they need an update against a patch from you
> > > that is to come or something.
> > 
> > I am on it, apologies for the delay!
> 
> And here is a first cut, probably totally broken, but a start.
> 
> With this change, I am wondering about tick_nohz_stop_sched_tick()'s
> invocation of rcu_idle_enter() -- this now needs to be called regardless
> of whether or not tick_nohz_stop_sched_tick() actually stops the tick.
> Except that if tick_nohz_stop_sched_tick() is invoked with inidle==0,
> it looks like we should -not- call rcu_idle_enter().
> 
> I eventually just left the rcu_idle_enter() calls in their current
> places due to paranoia about messing up and ending up with unbalanced
> rcu_idle_enter() and rcu_idle_exit() calls.  Any thoughts on how to
> make this work better?

Well, rcutorture didn't like this one much.  Turns out that I messed
up the count balances on the NO_HZ=n case, and perhaps more besides.
I am now trying the following patch on top of my previous one.

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 35d2ffc..ca40838 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -129,7 +129,8 @@ extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
 # else
 static inline void tick_nohz_stop_sched_tick(int inidle)
 {
-	rcu_idle_enter();
+	if (inidle)
+		rcu_idle_enter();
 }
 static inline void tick_nohz_restart_sched_tick(void)
 {
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d61b908..4692907 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -405,7 +405,6 @@ void tick_nohz_stop_sched_tick(int inidle)
 			ts->idle_tick = hrtimer_get_expires(&ts->sched_timer);
 			ts->tick_stopped = 1;
 			ts->idle_jiffies = last_jiffies;
-			rcu_idle_enter();
 		}
 
 		ts->idle_sleeps++;
@@ -444,6 +443,8 @@ out:
 	ts->last_jiffies = last_jiffies;
 	ts->sleep_length = ktime_sub(dev->next_event, now);
 end:
+	if (inidle)
+		rcu_idle_enter();
 	local_irq_restore(flags);
 }
 
@@ -500,6 +501,7 @@ void tick_nohz_restart_sched_tick(void)
 	ktime_t now;
 
 	local_irq_disable();
+	rcu_idle_exit();
 	if (ts->idle_active || (ts->inidle && ts->tick_stopped))
 		now = ktime_get();
 
@@ -514,8 +516,6 @@ void tick_nohz_restart_sched_tick(void)
 
 	ts->inidle = 0;
 
-	rcu_idle_exit();
-
 	/* Update jiffies first */
 	select_nohz_load_balancer(0);
 	tick_do_update_jiffies64(now);

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-30 19:24                                       ` Paul E. McKenney
  2011-10-01  4:34                                         ` Paul E. McKenney
@ 2011-10-01 12:24                                         ` Frederic Weisbecker
  2011-10-01 12:28                                           ` Frederic Weisbecker
  2011-10-01 17:07                                           ` Paul E. McKenney
  2011-10-02 22:50                                         ` Frederic Weisbecker
  2011-10-02 23:07                                         ` Frederic Weisbecker
  3 siblings, 2 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2011-10-01 12:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Fri, Sep 30, 2011 at 12:24:38PM -0700, Paul E. McKenney wrote:
> And here is a first cut, probably totally broken, but a start.
> 
> With this change, I am wondering about tick_nohz_stop_sched_tick()'s
> invocation of rcu_idle_enter() -- this now needs to be called regardless
> of whether or not tick_nohz_stop_sched_tick() actually stops the tick.
> Except that if tick_nohz_stop_sched_tick() is invoked with inidle==0,
> it looks like we should -not- call rcu_idle_enter().

Because of the new check in rcu_check_callbacks()? Yeah.

If you think it's fine to call rcu_enter_nohz() unconditionally
everytime we enter the idle loop then yeah. I just don't know
the overhead it adds, as it adds an unconditional tiny piece of
code before we can finally save the power.

Either entering idle involves extended quiescent state as in this
patch, or you separate both and then rcu_enter_nohz() is only
called when the tick is stopped.

If you choose to merge both, you indeed need to call rcu_idle_enter()
and rcu_idle_exit() whether the tick is stopped or not.

> I eventually just left the rcu_idle_enter() calls in their current
> places due to paranoia about messing up and ending up with unbalanced
> rcu_idle_enter() and rcu_idle_exit() calls.  Any thoughts on how to
> make this work better?

Yeah something like this (untested):

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d5097c4..ad3ecad 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -273,9 +273,12 @@ void tick_nohz_stop_sched_tick(int inidle)
 	 * updated. Thus, it must not be called in the event we are called from
 	 * irq_exit() with the prior state different than idle.
 	 */
-	if (!inidle && !ts->inidle)
+	if (inidle)
+		rcu_idle_enter();
+	else if (!ts->inidle)
 		goto end;
 
+
 	/*
 	 * Set ts->inidle unconditionally. Even if the system did not
 	 * switch to NOHZ mode the cpu frequency governers rely on the
@@ -409,7 +412,6 @@ void tick_nohz_stop_sched_tick(int inidle)
 			ts->idle_tick = hrtimer_get_expires(&ts->sched_timer);
 			ts->tick_stopped = 1;
 			ts->idle_jiffies = last_jiffies;
-			rcu_enter_nohz();
 		}
 
 		ts->idle_sleeps++;
@@ -505,6 +507,9 @@ void tick_nohz_restart_sched_tick(void)
 	ktime_t now;
 
 	local_irq_disable();
+
+	rcu_idle_exit();
+
 	if (ts->idle_active || (ts->inidle && ts->tick_stopped))
 		now = ktime_get();
 
@@ -519,8 +524,6 @@ void tick_nohz_restart_sched_tick(void)
 
 	ts->inidle = 0;
 
-	rcu_exit_nohz();
-
 	/* Update jiffies first */
 	select_nohz_load_balancer(0);
 	tick_do_update_jiffies64(now);




More things about your patch below:

> --- a/kernel/rcutiny.c
> +++ b/kernel/rcutiny.c
> @@ -54,31 +54,47 @@ static void __call_rcu(struct rcu_head *head,
>  
>  #include "rcutiny_plugin.h"
>  
> -#ifdef CONFIG_NO_HZ
> -
>  static long rcu_dynticks_nesting = 1;
>  
>  /*
> - * Enter dynticks-idle mode, which is an extended quiescent state
> - * if we have fully entered that mode (i.e., if the new value of
> - * dynticks_nesting is zero).
> + * Enter idle, which is an extended quiescent state if we have fully
> + * entered that mode (i.e., if the new value of dynticks_nesting is zero).
>   */
> -void rcu_enter_nohz(void)
> +void rcu_idle_enter(void)
>  {
>  	if (--rcu_dynticks_nesting == 0)
>  		rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
>  }
>  
>  /*
> - * Exit dynticks-idle mode, so that we are no longer in an extended
> - * quiescent state.
> + * Exit idle, so that we are no longer in an extended quiescent state.
>   */
> -void rcu_exit_nohz(void)
> +void rcu_idle_exit(void)
>  {
>  	rcu_dynticks_nesting++;
>  }
>  
> -#endif /* #ifdef CONFIG_NO_HZ */
> +#ifdef CONFIG_PROVE_RCU
> +
> +/*
> + * Test whether the current CPU is idle.
> + */

Is idle from an RCU point of view yeah.

> +int rcu_is_cpu_idle(void)
> +{
> +	return !rcu_dynticks_nesting;
> +}
> +
> +#endif /* #ifdef CONFIG_PROVE_RCU */
> +
> +/*
> + * Test whether the current CPU was interrupted from idle.  Nested
> + * interrupts don't count, we must be running at the first interrupt
> + * level.
> + */
> +int rcu_is_cpu_rrupt_from_idle(void)
> +{
> +	return rcu_dynticks_nesting <= 0;
> +}
>  
>  /*
>   * Helper function for rcu_sched_qs() and rcu_bh_qs().
> @@ -131,10 +147,7 @@ void rcu_bh_qs(int cpu)
>   */
>  void rcu_check_callbacks(int cpu, int user)
>  {
> -	if (user ||
> -	    (idle_cpu(cpu) &&
> -	     !in_softirq() &&
> -	     hardirq_count() <= (1 << HARDIRQ_SHIFT)))
> +	if (user || rcu_is_cpu_rrupt_from_idle())
>  		rcu_sched_qs(cpu);

It wasn't obvious to me in the first shot. This might need a comment
that tells rcu_check_callbacks() is called from an interrupt
and thus need to handle that first level in the check.

Other than that, looks good overall.

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-10-01 12:24                                         ` Frederic Weisbecker
@ 2011-10-01 12:28                                           ` Frederic Weisbecker
  2011-10-01 16:35                                             ` Paul E. McKenney
  2011-10-01 17:07                                           ` Paul E. McKenney
  1 sibling, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2011-10-01 12:28 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

2011/10/1 Frederic Weisbecker <fweisbec@gmail.com>:
> Yeah something like this (untested):
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index d5097c4..ad3ecad 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -273,9 +273,12 @@ void tick_nohz_stop_sched_tick(int inidle)
>         * updated. Thus, it must not be called in the event we are called from
>         * irq_exit() with the prior state different than idle.
>         */
> -       if (!inidle && !ts->inidle)
> +       if (inidle)
> +               rcu_idle_enter();
> +       else if (!ts->inidle)
>                goto end;
>
> +
>        /*
>         * Set ts->inidle unconditionally. Even if the system did not
>         * switch to NOHZ mode the cpu frequency governers rely on the
> @@ -409,7 +412,6 @@ void tick_nohz_stop_sched_tick(int inidle)
>                        ts->idle_tick = hrtimer_get_expires(&ts->sched_timer);
>                        ts->tick_stopped = 1;
>                        ts->idle_jiffies = last_jiffies;
> -                       rcu_enter_nohz();
>                }
>
>                ts->idle_sleeps++;
> @@ -505,6 +507,9 @@ void tick_nohz_restart_sched_tick(void)
>        ktime_t now;
>
>        local_irq_disable();
> +
> +       rcu_idle_exit();
> +
>        if (ts->idle_active || (ts->inidle && ts->tick_stopped))
>                now = ktime_get();
>
> @@ -519,8 +524,6 @@ void tick_nohz_restart_sched_tick(void)
>
>        ts->inidle = 0;
>
> -       rcu_exit_nohz();
> -
>        /* Update jiffies first */
>        select_nohz_load_balancer(0);
>        tick_do_update_jiffies64(now);
>

Ah I see you fixed it in the delta. Ok :)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-10-01 12:28                                           ` Frederic Weisbecker
@ 2011-10-01 16:35                                             ` Paul E. McKenney
  0 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2011-10-01 16:35 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Sat, Oct 01, 2011 at 02:28:10PM +0200, Frederic Weisbecker wrote:
> 2011/10/1 Frederic Weisbecker <fweisbec@gmail.com>:
> > Yeah something like this (untested):
> >
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index d5097c4..ad3ecad 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -273,9 +273,12 @@ void tick_nohz_stop_sched_tick(int inidle)
> >         * updated. Thus, it must not be called in the event we are called from
> >         * irq_exit() with the prior state different than idle.
> >         */
> > -       if (!inidle && !ts->inidle)
> > +       if (inidle)
> > +               rcu_idle_enter();
> > +       else if (!ts->inidle)
> >                goto end;
> >
> > +
> >        /*
> >         * Set ts->inidle unconditionally. Even if the system did not
> >         * switch to NOHZ mode the cpu frequency governers rely on the
> > @@ -409,7 +412,6 @@ void tick_nohz_stop_sched_tick(int inidle)
> >                        ts->idle_tick = hrtimer_get_expires(&ts->sched_timer);
> >                        ts->tick_stopped = 1;
> >                        ts->idle_jiffies = last_jiffies;
> > -                       rcu_enter_nohz();
> >                }
> >
> >                ts->idle_sleeps++;
> > @@ -505,6 +507,9 @@ void tick_nohz_restart_sched_tick(void)
> >        ktime_t now;
> >
> >        local_irq_disable();
> > +
> > +       rcu_idle_exit();
> > +
> >        if (ts->idle_active || (ts->inidle && ts->tick_stopped))
> >                now = ktime_get();
> >
> > @@ -519,8 +524,6 @@ void tick_nohz_restart_sched_tick(void)
> >
> >        ts->inidle = 0;
> >
> > -       rcu_exit_nohz();
> > -
> >        /* Update jiffies first */
> >        select_nohz_load_balancer(0);
> >        tick_do_update_jiffies64(now);
> >
> 
> Ah I see you fixed it in the delta. Ok :)

But rcutorture still hates it.  Possibly because of this boneheaded bug:

int rcu_is_cpu_rrupt_from_idle(void)
{
	return (__get_cpu_var(rcu_dynticks).dynticks_nesting & 0x1) <= 1;
}

Do you think that it might work better if I drop off the "& 0x1" that
I copy-pasted from rcu_is_cpu_idle()?  ;-)

Sigh!!!  Giving it a try now.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-10-01 12:24                                         ` Frederic Weisbecker
  2011-10-01 12:28                                           ` Frederic Weisbecker
@ 2011-10-01 17:07                                           ` Paul E. McKenney
  2011-10-02  3:23                                             ` Paul E. McKenney
  1 sibling, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2011-10-01 17:07 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Sat, Oct 01, 2011 at 02:24:45PM +0200, Frederic Weisbecker wrote:
> On Fri, Sep 30, 2011 at 12:24:38PM -0700, Paul E. McKenney wrote:
> > And here is a first cut, probably totally broken, but a start.
> > 
> > With this change, I am wondering about tick_nohz_stop_sched_tick()'s
> > invocation of rcu_idle_enter() -- this now needs to be called regardless
> > of whether or not tick_nohz_stop_sched_tick() actually stops the tick.
> > Except that if tick_nohz_stop_sched_tick() is invoked with inidle==0,
> > it looks like we should -not- call rcu_idle_enter().
> 
> Because of the new check in rcu_check_callbacks()? Yeah.
> 
> If you think it's fine to call rcu_enter_nohz() unconditionally
> everytime we enter the idle loop then yeah. I just don't know
> the overhead it adds, as it adds an unconditional tiny piece of
> code before we can finally save the power.
> 
> Either entering idle involves extended quiescent state as in this
> patch, or you separate both and then rcu_enter_nohz() is only
> called when the tick is stopped.
> 
> If you choose to merge both, you indeed need to call rcu_idle_enter()
> and rcu_idle_exit() whether the tick is stopped or not.
> 
> > I eventually just left the rcu_idle_enter() calls in their current
> > places due to paranoia about messing up and ending up with unbalanced
> > rcu_idle_enter() and rcu_idle_exit() calls.  Any thoughts on how to
> > make this work better?
> 
> Yeah something like this (untested):
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index d5097c4..ad3ecad 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -273,9 +273,12 @@ void tick_nohz_stop_sched_tick(int inidle)
>  	 * updated. Thus, it must not be called in the event we are called from
>  	 * irq_exit() with the prior state different than idle.
>  	 */
> -	if (!inidle && !ts->inidle)
> +	if (inidle)
> +		rcu_idle_enter();
> +	else if (!ts->inidle)
>  		goto end;
> 
> +
>  	/*
>  	 * Set ts->inidle unconditionally. Even if the system did not
>  	 * switch to NOHZ mode the cpu frequency governers rely on the
> @@ -409,7 +412,6 @@ void tick_nohz_stop_sched_tick(int inidle)
>  			ts->idle_tick = hrtimer_get_expires(&ts->sched_timer);
>  			ts->tick_stopped = 1;
>  			ts->idle_jiffies = last_jiffies;
> -			rcu_enter_nohz();
>  		}
> 
>  		ts->idle_sleeps++;
> @@ -505,6 +507,9 @@ void tick_nohz_restart_sched_tick(void)
>  	ktime_t now;
> 
>  	local_irq_disable();
> +
> +	rcu_idle_exit();
> +
>  	if (ts->idle_active || (ts->inidle && ts->tick_stopped))
>  		now = ktime_get();
> 
> @@ -519,8 +524,6 @@ void tick_nohz_restart_sched_tick(void)
> 
>  	ts->inidle = 0;
> 
> -	rcu_exit_nohz();
> -
>  	/* Update jiffies first */
>  	select_nohz_load_balancer(0);
>  	tick_do_update_jiffies64(now);
> 
> 
> 
> 
> More things about your patch below:
> 
> > --- a/kernel/rcutiny.c
> > +++ b/kernel/rcutiny.c
> > @@ -54,31 +54,47 @@ static void __call_rcu(struct rcu_head *head,
> >  
> >  #include "rcutiny_plugin.h"
> >  
> > -#ifdef CONFIG_NO_HZ
> > -
> >  static long rcu_dynticks_nesting = 1;
> >  
> >  /*
> > - * Enter dynticks-idle mode, which is an extended quiescent state
> > - * if we have fully entered that mode (i.e., if the new value of
> > - * dynticks_nesting is zero).
> > + * Enter idle, which is an extended quiescent state if we have fully
> > + * entered that mode (i.e., if the new value of dynticks_nesting is zero).
> >   */
> > -void rcu_enter_nohz(void)
> > +void rcu_idle_enter(void)
> >  {
> >  	if (--rcu_dynticks_nesting == 0)
> >  		rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
> >  }
> >  
> >  /*
> > - * Exit dynticks-idle mode, so that we are no longer in an extended
> > - * quiescent state.
> > + * Exit idle, so that we are no longer in an extended quiescent state.
> >   */
> > -void rcu_exit_nohz(void)
> > +void rcu_idle_exit(void)
> >  {
> >  	rcu_dynticks_nesting++;
> >  }
> >  
> > -#endif /* #ifdef CONFIG_NO_HZ */
> > +#ifdef CONFIG_PROVE_RCU
> > +
> > +/*
> > + * Test whether the current CPU is idle.
> > + */
> 
> Is idle from an RCU point of view yeah.

Good point -- I now say "Test whether RCU thinks that the current CPU is idle."

> > +int rcu_is_cpu_idle(void)
> > +{
> > +	return !rcu_dynticks_nesting;
> > +}
> > +
> > +#endif /* #ifdef CONFIG_PROVE_RCU */
> > +
> > +/*
> > + * Test whether the current CPU was interrupted from idle.  Nested
> > + * interrupts don't count, we must be running at the first interrupt
> > + * level.
> > + */
> > +int rcu_is_cpu_rrupt_from_idle(void)
> > +{
> > +	return rcu_dynticks_nesting <= 0;
> > +}
> >  
> >  /*
> >   * Helper function for rcu_sched_qs() and rcu_bh_qs().
> > @@ -131,10 +147,7 @@ void rcu_bh_qs(int cpu)
> >   */
> >  void rcu_check_callbacks(int cpu, int user)
> >  {
> > -	if (user ||
> > -	    (idle_cpu(cpu) &&
> > -	     !in_softirq() &&
> > -	     hardirq_count() <= (1 << HARDIRQ_SHIFT)))
> > +	if (user || rcu_is_cpu_rrupt_from_idle())
> >  		rcu_sched_qs(cpu);
> 
> It wasn't obvious to me in the first shot. This might need a comment
> that tells rcu_check_callbacks() is called from an interrupt
> and thus need to handle that first level in the check.

OK, I added "This function must be called from hardirq context".

> Other than that, looks good overall.

Keeping fingers firmly crossed for the testing...

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-10-01 17:07                                           ` Paul E. McKenney
@ 2011-10-02  3:23                                             ` Paul E. McKenney
  2011-10-02 11:45                                               ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2011-10-02  3:23 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Sat, Oct 01, 2011 at 10:07:14AM -0700, Paul E. McKenney wrote:
> On Sat, Oct 01, 2011 at 02:24:45PM +0200, Frederic Weisbecker wrote:
> > On Fri, Sep 30, 2011 at 12:24:38PM -0700, Paul E. McKenney wrote:
> > > And here is a first cut, probably totally broken, but a start.
> > > 
> > > With this change, I am wondering about tick_nohz_stop_sched_tick()'s
> > > invocation of rcu_idle_enter() -- this now needs to be called regardless
> > > of whether or not tick_nohz_stop_sched_tick() actually stops the tick.
> > > Except that if tick_nohz_stop_sched_tick() is invoked with inidle==0,
> > > it looks like we should -not- call rcu_idle_enter().
> > 
> > Because of the new check in rcu_check_callbacks()? Yeah.
> > 
> > If you think it's fine to call rcu_enter_nohz() unconditionally
> > everytime we enter the idle loop then yeah. I just don't know
> > the overhead it adds, as it adds an unconditional tiny piece of
> > code before we can finally save the power.
> > 
> > Either entering idle involves extended quiescent state as in this
> > patch, or you separate both and then rcu_enter_nohz() is only
> > called when the tick is stopped.
> > 
> > If you choose to merge both, you indeed need to call rcu_idle_enter()
> > and rcu_idle_exit() whether the tick is stopped or not.
> > 
> > > I eventually just left the rcu_idle_enter() calls in their current
> > > places due to paranoia about messing up and ending up with unbalanced
> > > rcu_idle_enter() and rcu_idle_exit() calls.  Any thoughts on how to
> > > make this work better?
> > 
> > Yeah something like this (untested):
> > 
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index d5097c4..ad3ecad 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -273,9 +273,12 @@ void tick_nohz_stop_sched_tick(int inidle)
> >  	 * updated. Thus, it must not be called in the event we are called from
> >  	 * irq_exit() with the prior state different than idle.
> >  	 */
> > -	if (!inidle && !ts->inidle)
> > +	if (inidle)
> > +		rcu_idle_enter();
> > +	else if (!ts->inidle)
> >  		goto end;
> > 
> > +
> >  	/*
> >  	 * Set ts->inidle unconditionally. Even if the system did not
> >  	 * switch to NOHZ mode the cpu frequency governers rely on the
> > @@ -409,7 +412,6 @@ void tick_nohz_stop_sched_tick(int inidle)
> >  			ts->idle_tick = hrtimer_get_expires(&ts->sched_timer);
> >  			ts->tick_stopped = 1;
> >  			ts->idle_jiffies = last_jiffies;
> > -			rcu_enter_nohz();
> >  		}
> > 
> >  		ts->idle_sleeps++;
> > @@ -505,6 +507,9 @@ void tick_nohz_restart_sched_tick(void)
> >  	ktime_t now;
> > 
> >  	local_irq_disable();
> > +
> > +	rcu_idle_exit();
> > +
> >  	if (ts->idle_active || (ts->inidle && ts->tick_stopped))
> >  		now = ktime_get();
> > 
> > @@ -519,8 +524,6 @@ void tick_nohz_restart_sched_tick(void)
> > 
> >  	ts->inidle = 0;
> > 
> > -	rcu_exit_nohz();
> > -
> >  	/* Update jiffies first */
> >  	select_nohz_load_balancer(0);
> >  	tick_do_update_jiffies64(now);
> > 
> > 
> > 
> > 
> > More things about your patch below:
> > 
> > > --- a/kernel/rcutiny.c
> > > +++ b/kernel/rcutiny.c
> > > @@ -54,31 +54,47 @@ static void __call_rcu(struct rcu_head *head,
> > >  
> > >  #include "rcutiny_plugin.h"
> > >  
> > > -#ifdef CONFIG_NO_HZ
> > > -
> > >  static long rcu_dynticks_nesting = 1;
> > >  
> > >  /*
> > > - * Enter dynticks-idle mode, which is an extended quiescent state
> > > - * if we have fully entered that mode (i.e., if the new value of
> > > - * dynticks_nesting is zero).
> > > + * Enter idle, which is an extended quiescent state if we have fully
> > > + * entered that mode (i.e., if the new value of dynticks_nesting is zero).
> > >   */
> > > -void rcu_enter_nohz(void)
> > > +void rcu_idle_enter(void)
> > >  {
> > >  	if (--rcu_dynticks_nesting == 0)
> > >  		rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
> > >  }
> > >  
> > >  /*
> > > - * Exit dynticks-idle mode, so that we are no longer in an extended
> > > - * quiescent state.
> > > + * Exit idle, so that we are no longer in an extended quiescent state.
> > >   */
> > > -void rcu_exit_nohz(void)
> > > +void rcu_idle_exit(void)
> > >  {
> > >  	rcu_dynticks_nesting++;
> > >  }
> > >  
> > > -#endif /* #ifdef CONFIG_NO_HZ */
> > > +#ifdef CONFIG_PROVE_RCU
> > > +
> > > +/*
> > > + * Test whether the current CPU is idle.
> > > + */
> > 
> > Is idle from an RCU point of view yeah.
> 
> Good point -- I now say "Test whether RCU thinks that the current CPU is idle."
> 
> > > +int rcu_is_cpu_idle(void)
> > > +{
> > > +	return !rcu_dynticks_nesting;
> > > +}
> > > +
> > > +#endif /* #ifdef CONFIG_PROVE_RCU */
> > > +
> > > +/*
> > > + * Test whether the current CPU was interrupted from idle.  Nested
> > > + * interrupts don't count, we must be running at the first interrupt
> > > + * level.
> > > + */
> > > +int rcu_is_cpu_rrupt_from_idle(void)
> > > +{
> > > +	return rcu_dynticks_nesting <= 0;
> > > +}
> > >  
> > >  /*
> > >   * Helper function for rcu_sched_qs() and rcu_bh_qs().
> > > @@ -131,10 +147,7 @@ void rcu_bh_qs(int cpu)
> > >   */
> > >  void rcu_check_callbacks(int cpu, int user)
> > >  {
> > > -	if (user ||
> > > -	    (idle_cpu(cpu) &&
> > > -	     !in_softirq() &&
> > > -	     hardirq_count() <= (1 << HARDIRQ_SHIFT)))
> > > +	if (user || rcu_is_cpu_rrupt_from_idle())
> > >  		rcu_sched_qs(cpu);
> > 
> > It wasn't obvious to me in the first shot. This might need a comment
> > that tells rcu_check_callbacks() is called from an interrupt
> > and thus need to handle that first level in the check.
> 
> OK, I added "This function must be called from hardirq context".
> 
> > Other than that, looks good overall.
> 
> Keeping fingers firmly crossed for the testing...

And it appears sane in testing thus far.  I have consolidated to one
patch and pushed to https://github.com/paulmckrcu/linux branch rcu/dev.

Testing continues.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-10-02  3:23                                             ` Paul E. McKenney
@ 2011-10-02 11:45                                               ` Frederic Weisbecker
  0 siblings, 0 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2011-10-02 11:45 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Sat, Oct 01, 2011 at 08:23:16PM -0700, Paul E. McKenney wrote:
> On Sat, Oct 01, 2011 at 10:07:14AM -0700, Paul E. McKenney wrote:
> > On Sat, Oct 01, 2011 at 02:24:45PM +0200, Frederic Weisbecker wrote:
> > > On Fri, Sep 30, 2011 at 12:24:38PM -0700, Paul E. McKenney wrote:
> > > > And here is a first cut, probably totally broken, but a start.
> > > > 
> > > > With this change, I am wondering about tick_nohz_stop_sched_tick()'s
> > > > invocation of rcu_idle_enter() -- this now needs to be called regardless
> > > > of whether or not tick_nohz_stop_sched_tick() actually stops the tick.
> > > > Except that if tick_nohz_stop_sched_tick() is invoked with inidle==0,
> > > > it looks like we should -not- call rcu_idle_enter().
> > > 
> > > Because of the new check in rcu_check_callbacks()? Yeah.
> > > 
> > > If you think it's fine to call rcu_enter_nohz() unconditionally
> > > everytime we enter the idle loop then yeah. I just don't know
> > > the overhead it adds, as it adds an unconditional tiny piece of
> > > code before we can finally save the power.
> > > 
> > > Either entering idle involves extended quiescent state as in this
> > > patch, or you separate both and then rcu_enter_nohz() is only
> > > called when the tick is stopped.
> > > 
> > > If you choose to merge both, you indeed need to call rcu_idle_enter()
> > > and rcu_idle_exit() whether the tick is stopped or not.
> > > 
> > > > I eventually just left the rcu_idle_enter() calls in their current
> > > > places due to paranoia about messing up and ending up with unbalanced
> > > > rcu_idle_enter() and rcu_idle_exit() calls.  Any thoughts on how to
> > > > make this work better?
> > > 
> > > Yeah something like this (untested):
> > > 
> > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > > index d5097c4..ad3ecad 100644
> > > --- a/kernel/time/tick-sched.c
> > > +++ b/kernel/time/tick-sched.c
> > > @@ -273,9 +273,12 @@ void tick_nohz_stop_sched_tick(int inidle)
> > >  	 * updated. Thus, it must not be called in the event we are called from
> > >  	 * irq_exit() with the prior state different than idle.
> > >  	 */
> > > -	if (!inidle && !ts->inidle)
> > > +	if (inidle)
> > > +		rcu_idle_enter();
> > > +	else if (!ts->inidle)
> > >  		goto end;
> > > 
> > > +
> > >  	/*
> > >  	 * Set ts->inidle unconditionally. Even if the system did not
> > >  	 * switch to NOHZ mode the cpu frequency governers rely on the
> > > @@ -409,7 +412,6 @@ void tick_nohz_stop_sched_tick(int inidle)
> > >  			ts->idle_tick = hrtimer_get_expires(&ts->sched_timer);
> > >  			ts->tick_stopped = 1;
> > >  			ts->idle_jiffies = last_jiffies;
> > > -			rcu_enter_nohz();
> > >  		}
> > > 
> > >  		ts->idle_sleeps++;
> > > @@ -505,6 +507,9 @@ void tick_nohz_restart_sched_tick(void)
> > >  	ktime_t now;
> > > 
> > >  	local_irq_disable();
> > > +
> > > +	rcu_idle_exit();
> > > +
> > >  	if (ts->idle_active || (ts->inidle && ts->tick_stopped))
> > >  		now = ktime_get();
> > > 
> > > @@ -519,8 +524,6 @@ void tick_nohz_restart_sched_tick(void)
> > > 
> > >  	ts->inidle = 0;
> > > 
> > > -	rcu_exit_nohz();
> > > -
> > >  	/* Update jiffies first */
> > >  	select_nohz_load_balancer(0);
> > >  	tick_do_update_jiffies64(now);
> > > 
> > > 
> > > 
> > > 
> > > More things about your patch below:
> > > 
> > > > --- a/kernel/rcutiny.c
> > > > +++ b/kernel/rcutiny.c
> > > > @@ -54,31 +54,47 @@ static void __call_rcu(struct rcu_head *head,
> > > >  
> > > >  #include "rcutiny_plugin.h"
> > > >  
> > > > -#ifdef CONFIG_NO_HZ
> > > > -
> > > >  static long rcu_dynticks_nesting = 1;
> > > >  
> > > >  /*
> > > > - * Enter dynticks-idle mode, which is an extended quiescent state
> > > > - * if we have fully entered that mode (i.e., if the new value of
> > > > - * dynticks_nesting is zero).
> > > > + * Enter idle, which is an extended quiescent state if we have fully
> > > > + * entered that mode (i.e., if the new value of dynticks_nesting is zero).
> > > >   */
> > > > -void rcu_enter_nohz(void)
> > > > +void rcu_idle_enter(void)
> > > >  {
> > > >  	if (--rcu_dynticks_nesting == 0)
> > > >  		rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
> > > >  }
> > > >  
> > > >  /*
> > > > - * Exit dynticks-idle mode, so that we are no longer in an extended
> > > > - * quiescent state.
> > > > + * Exit idle, so that we are no longer in an extended quiescent state.
> > > >   */
> > > > -void rcu_exit_nohz(void)
> > > > +void rcu_idle_exit(void)
> > > >  {
> > > >  	rcu_dynticks_nesting++;
> > > >  }
> > > >  
> > > > -#endif /* #ifdef CONFIG_NO_HZ */
> > > > +#ifdef CONFIG_PROVE_RCU
> > > > +
> > > > +/*
> > > > + * Test whether the current CPU is idle.
> > > > + */
> > > 
> > > Is idle from an RCU point of view yeah.
> > 
> > Good point -- I now say "Test whether RCU thinks that the current CPU is idle."
> > 
> > > > +int rcu_is_cpu_idle(void)
> > > > +{
> > > > +	return !rcu_dynticks_nesting;
> > > > +}
> > > > +
> > > > +#endif /* #ifdef CONFIG_PROVE_RCU */
> > > > +
> > > > +/*
> > > > + * Test whether the current CPU was interrupted from idle.  Nested
> > > > + * interrupts don't count, we must be running at the first interrupt
> > > > + * level.
> > > > + */
> > > > +int rcu_is_cpu_rrupt_from_idle(void)
> > > > +{
> > > > +	return rcu_dynticks_nesting <= 0;
> > > > +}
> > > >  
> > > >  /*
> > > >   * Helper function for rcu_sched_qs() and rcu_bh_qs().
> > > > @@ -131,10 +147,7 @@ void rcu_bh_qs(int cpu)
> > > >   */
> > > >  void rcu_check_callbacks(int cpu, int user)
> > > >  {
> > > > -	if (user ||
> > > > -	    (idle_cpu(cpu) &&
> > > > -	     !in_softirq() &&
> > > > -	     hardirq_count() <= (1 << HARDIRQ_SHIFT)))
> > > > +	if (user || rcu_is_cpu_rrupt_from_idle())
> > > >  		rcu_sched_qs(cpu);
> > > 
> > > It wasn't obvious to me in the first shot. This might need a comment
> > > that tells rcu_check_callbacks() is called from an interrupt
> > > and thus need to handle that first level in the check.
> > 
> > OK, I added "This function must be called from hardirq context".
> > 
> > > Other than that, looks good overall.
> > 
> > Keeping fingers firmly crossed for the testing...
> 
> And it appears sane in testing thus far.  I have consolidated to one
> patch and pushed to https://github.com/paulmckrcu/linux branch rcu/dev.
> 
> Testing continues.

Great. I'll start the rebase then.

Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-30 19:24                                       ` Paul E. McKenney
  2011-10-01  4:34                                         ` Paul E. McKenney
  2011-10-01 12:24                                         ` Frederic Weisbecker
@ 2011-10-02 22:50                                         ` Frederic Weisbecker
  2011-10-03  0:28                                           ` Paul E. McKenney
  2011-10-02 23:07                                         ` Frederic Weisbecker
  3 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2011-10-02 22:50 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Fri, Sep 30, 2011 at 12:24:38PM -0700, Paul E. McKenney wrote:
> @@ -328,11 +326,11 @@ static int rcu_implicit_offline_qs(struct rcu_data *rdp)
>  		return 1;
>  	}
>  
> -	/* If preemptible RCU, no point in sending reschedule IPI. */
> -	if (rdp->preemptible)
> -		return 0;
> -
> -	/* The CPU is online, so send it a reschedule IPI. */
> +	/*
> +	 * The CPU is online, so send it a reschedule IPI.  This forces
> +	 * it through the scheduler, and (inefficiently) also handles cases
> +	 * where idle loops fail to inform RCU about the CPU being idle.
> +	 */

If the idle loop forgets to call rcu_idle_enter() before going to
sleep, I don't know if it's a good idea to try to cure that situation
by forcing a quiescent state remotely. It may make the thing worse
because we actually won't notice the lack of call to rcu_idle_enter()
that the rcu stall detector would otherwise report to us.

Also I don't think that works. If the task doesn't have
TIF_RESCHED, it won't go through the scheduler on irq exit.
smp_send_reschedule() doesn't set the flag. And also scheduler_ipi()
returns right away if no wake up is pending.

So, other than resuming the idle loop to sleep again, nothing may happen.

Or am I missing something?

>  	if (rdp->cpu != smp_processor_id())
>  		smp_send_reschedule(rdp->cpu);
>  	else

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-09-30 19:24                                       ` Paul E. McKenney
                                                           ` (2 preceding siblings ...)
  2011-10-02 22:50                                         ` Frederic Weisbecker
@ 2011-10-02 23:07                                         ` Frederic Weisbecker
  2011-10-03  0:32                                           ` Paul E. McKenney
  3 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2011-10-02 23:07 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Fri, Sep 30, 2011 at 12:24:38PM -0700, Paul E. McKenney wrote:
> +#ifdef CONFIG_PROVE_RCU
> +
> +/*
> + * Test whether the current CPU is idle.
> + */
> +int rcu_is_cpu_idle(void)
> +{
> +	return !rcu_dynticks_nesting;
> +}

Seems that's not used in the patch.

> +
> +#endif /* #ifdef CONFIG_PROVE_RCU */
<snip>
> +#ifdef CONFIG_PROVE_RCU
> +
>  /**
> - * rcu_irq_enter - inform RCU of entry to hard irq context
> + * rcu_is_cpu_idle - see if RCU thinks that the current CPU is idle
>   *
> - * If the CPU was idle with dynamic ticks active, this updates the
> - * rdtp->dynticks to let the RCU handling know that the CPU is active.
> + * If the current CPU is in its idle loop and is neither in an interrupt
> + * or NMI handler, return true.  The caller must have at least disabled
> + * preemption.
>   */
> -void rcu_irq_enter(void)
> +int rcu_is_cpu_idle(void)
>  {
> -	rcu_exit_nohz();
> +	return (atomic_read(&__get_cpu_var(rcu_dynticks).dynticks) & 0x1) == 0;
>  }

So that's not used in this patch but it's interesting for me
to backport "rcu: Detect illegal rcu dereference in extended quiescent state".

The above should be read from a preempt disabled section though
(remember "rcu: Fix preempt-unsafe debug check of rcu extended quiescent state")

Those functions should probably lay in a separate patch. But I don't mind
much keeping the things as is and use these APIs in my next patches though.
I'll just fix the preempt enabled thing above.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-10-02 22:50                                         ` Frederic Weisbecker
@ 2011-10-03  0:28                                           ` Paul E. McKenney
  2011-10-03 12:59                                             ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2011-10-03  0:28 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Mon, Oct 03, 2011 at 12:50:22AM +0200, Frederic Weisbecker wrote:
> On Fri, Sep 30, 2011 at 12:24:38PM -0700, Paul E. McKenney wrote:
> > @@ -328,11 +326,11 @@ static int rcu_implicit_offline_qs(struct rcu_data *rdp)
> >  		return 1;
> >  	}
> >  
> > -	/* If preemptible RCU, no point in sending reschedule IPI. */
> > -	if (rdp->preemptible)
> > -		return 0;
> > -
> > -	/* The CPU is online, so send it a reschedule IPI. */
> > +	/*
> > +	 * The CPU is online, so send it a reschedule IPI.  This forces
> > +	 * it through the scheduler, and (inefficiently) also handles cases
> > +	 * where idle loops fail to inform RCU about the CPU being idle.
> > +	 */
> 
> If the idle loop forgets to call rcu_idle_enter() before going to
> sleep, I don't know if it's a good idea to try to cure that situation
> by forcing a quiescent state remotely. It may make the thing worse
> because we actually won't notice the lack of call to rcu_idle_enter()
> that the rcu stall detector would otherwise report to us.
> 
> Also I don't think that works. If the task doesn't have
> TIF_RESCHED, it won't go through the scheduler on irq exit.
> smp_send_reschedule() doesn't set the flag. And also scheduler_ipi()
> returns right away if no wake up is pending.
> 
> So, other than resuming the idle loop to sleep again, nothing may happen.
> 
> Or am I missing something?

Hmmm...  Seems like the IPIs aren't helping in any case, then?

I suppose that I could do an smp_call_function_single(), which then
did a set_need_resched()...

But this is a separate issue that I need to deal with.  That said, any
suggestions are welcome!

							Thanx, Paul

> >  	if (rdp->cpu != smp_processor_id())
> >  		smp_send_reschedule(rdp->cpu);
> >  	else


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-10-02 23:07                                         ` Frederic Weisbecker
@ 2011-10-03  0:32                                           ` Paul E. McKenney
  2011-10-03 13:03                                             ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2011-10-03  0:32 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Mon, Oct 03, 2011 at 01:07:56AM +0200, Frederic Weisbecker wrote:
> On Fri, Sep 30, 2011 at 12:24:38PM -0700, Paul E. McKenney wrote:
> > +#ifdef CONFIG_PROVE_RCU
> > +
> > +/*
> > + * Test whether the current CPU is idle.
> > + */
> > +int rcu_is_cpu_idle(void)
> > +{
> > +	return !rcu_dynticks_nesting;
> > +}
> 
> Seems that's not used in the patch.
> 
> > +
> > +#endif /* #ifdef CONFIG_PROVE_RCU */
> <snip>
> > +#ifdef CONFIG_PROVE_RCU
> > +
> >  /**
> > - * rcu_irq_enter - inform RCU of entry to hard irq context
> > + * rcu_is_cpu_idle - see if RCU thinks that the current CPU is idle
> >   *
> > - * If the CPU was idle with dynamic ticks active, this updates the
> > - * rdtp->dynticks to let the RCU handling know that the CPU is active.
> > + * If the current CPU is in its idle loop and is neither in an interrupt
> > + * or NMI handler, return true.  The caller must have at least disabled
> > + * preemption.
> >   */
> > -void rcu_irq_enter(void)
> > +int rcu_is_cpu_idle(void)
> >  {
> > -	rcu_exit_nohz();
> > +	return (atomic_read(&__get_cpu_var(rcu_dynticks).dynticks) & 0x1) == 0;
> >  }
> 
> So that's not used in this patch but it's interesting for me
> to backport "rcu: Detect illegal rcu dereference in extended quiescent state".

Yep, that is why it is there.

> The above should be read from a preempt disabled section though
> (remember "rcu: Fix preempt-unsafe debug check of rcu extended quiescent state")

Yes, and that is why the last line of the header comment reads "The
caller must have at least disabled preemption."  Disabling preemption
is not necessary in Tiny RCU because there is no other CPU for the task
to go to.  (Right?)

> Those functions should probably lay in a separate patch. But I don't mind
> much keeping the things as is and use these APIs in my next patches though.
> I'll just fix the preempt enabled thing above.

Or were you saying that you wish to make calls to rcu_is_cpu_idle()
that have preemption enabled?

And I can split the patch easily enough while keeping the diff the same,
so you should be able to do your porting on top of the existing code.

And thank you very much for looking this over!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-10-03  0:28                                           ` Paul E. McKenney
@ 2011-10-03 12:59                                             ` Frederic Weisbecker
  2011-10-03 16:22                                               ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2011-10-03 12:59 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Sun, Oct 02, 2011 at 05:28:32PM -0700, Paul E. McKenney wrote:
> On Mon, Oct 03, 2011 at 12:50:22AM +0200, Frederic Weisbecker wrote:
> > On Fri, Sep 30, 2011 at 12:24:38PM -0700, Paul E. McKenney wrote:
> > > @@ -328,11 +326,11 @@ static int rcu_implicit_offline_qs(struct rcu_data *rdp)
> > >  		return 1;
> > >  	}
> > >  
> > > -	/* If preemptible RCU, no point in sending reschedule IPI. */
> > > -	if (rdp->preemptible)
> > > -		return 0;
> > > -
> > > -	/* The CPU is online, so send it a reschedule IPI. */
> > > +	/*
> > > +	 * The CPU is online, so send it a reschedule IPI.  This forces
> > > +	 * it through the scheduler, and (inefficiently) also handles cases
> > > +	 * where idle loops fail to inform RCU about the CPU being idle.
> > > +	 */
> > 
> > If the idle loop forgets to call rcu_idle_enter() before going to
> > sleep, I don't know if it's a good idea to try to cure that situation
> > by forcing a quiescent state remotely. It may make the thing worse
> > because we actually won't notice the lack of call to rcu_idle_enter()
> > that the rcu stall detector would otherwise report to us.
> > 
> > Also I don't think that works. If the task doesn't have
> > TIF_RESCHED, it won't go through the scheduler on irq exit.
> > smp_send_reschedule() doesn't set the flag. And also scheduler_ipi()
> > returns right away if no wake up is pending.
> > 
> > So, other than resuming the idle loop to sleep again, nothing may happen.
> > 
> > Or am I missing something?
> 
> Hmmm...  Seems like the IPIs aren't helping in any case, then?

I thought it was there for !PREEMPT cases where the task has TIF_RESCHED
but takes too much time to find an opportunity to go to sleep.
 
> I suppose that I could do an smp_call_function_single(), which then
> did a set_need_resched()...
> 
> But this is a separate issue that I need to deal with.  That said, any
> suggestions are welcome!

Note you can't call smp_call_function_*() while irqs are disabled.

Perhaps you need something like kernel/sched.c:resched_cpu()
This adds some rq->lock contention though.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-10-03  0:32                                           ` Paul E. McKenney
@ 2011-10-03 13:03                                             ` Frederic Weisbecker
  2011-10-03 16:30                                               ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2011-10-03 13:03 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Sun, Oct 02, 2011 at 05:32:47PM -0700, Paul E. McKenney wrote:
> > > -void rcu_irq_enter(void)
> > > +int rcu_is_cpu_idle(void)
> > >  {
> > > -	rcu_exit_nohz();
> > > +	return (atomic_read(&__get_cpu_var(rcu_dynticks).dynticks) & 0x1) == 0;
> > >  }
> > 
> > So that's not used in this patch but it's interesting for me
> > to backport "rcu: Detect illegal rcu dereference in extended quiescent state".
> 
> Yep, that is why it is there.

Ok.

> 
> > The above should be read from a preempt disabled section though
> > (remember "rcu: Fix preempt-unsafe debug check of rcu extended quiescent state")
> 
> Yes, and that is why the last line of the header comment reads "The
> caller must have at least disabled preemption."  Disabling preemption
> is not necessary in Tiny RCU because there is no other CPU for the task
> to go to.  (Right?)

Right.

> > Those functions should probably lay in a separate patch. But I don't mind
> > much keeping the things as is and use these APIs in my next patches though.
> > I'll just fix the preempt enabled thing above.
> 
> Or were you saying that you wish to make calls to rcu_is_cpu_idle()
> that have preemption enabled?

Yeah. That's going to be called from places like rcu_read_lock_held()
and things like this that don't need to disable preemption themselves.

Would be better to disable preemption from that function.

> And I can split the patch easily enough while keeping the diff the same,
> so you should be able to do your porting on top of the existing code.

No I'm actually pretty fine with the current state. Whether that's defined
in this patch or a following one is actually not important.

Thanks!

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-10-03 12:59                                             ` Frederic Weisbecker
@ 2011-10-03 16:22                                               ` Paul E. McKenney
  2011-10-03 17:11                                                 ` Frederic Weisbecker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2011-10-03 16:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Mon, Oct 03, 2011 at 02:59:03PM +0200, Frederic Weisbecker wrote:
> On Sun, Oct 02, 2011 at 05:28:32PM -0700, Paul E. McKenney wrote:
> > On Mon, Oct 03, 2011 at 12:50:22AM +0200, Frederic Weisbecker wrote:
> > > On Fri, Sep 30, 2011 at 12:24:38PM -0700, Paul E. McKenney wrote:
> > > > @@ -328,11 +326,11 @@ static int rcu_implicit_offline_qs(struct rcu_data *rdp)
> > > >  		return 1;
> > > >  	}
> > > >  
> > > > -	/* If preemptible RCU, no point in sending reschedule IPI. */
> > > > -	if (rdp->preemptible)
> > > > -		return 0;
> > > > -
> > > > -	/* The CPU is online, so send it a reschedule IPI. */
> > > > +	/*
> > > > +	 * The CPU is online, so send it a reschedule IPI.  This forces
> > > > +	 * it through the scheduler, and (inefficiently) also handles cases
> > > > +	 * where idle loops fail to inform RCU about the CPU being idle.
> > > > +	 */
> > > 
> > > If the idle loop forgets to call rcu_idle_enter() before going to
> > > sleep, I don't know if it's a good idea to try to cure that situation
> > > by forcing a quiescent state remotely. It may make the thing worse
> > > because we actually won't notice the lack of call to rcu_idle_enter()
> > > that the rcu stall detector would otherwise report to us.
> > > 
> > > Also I don't think that works. If the task doesn't have
> > > TIF_RESCHED, it won't go through the scheduler on irq exit.
> > > smp_send_reschedule() doesn't set the flag. And also scheduler_ipi()
> > > returns right away if no wake up is pending.
> > > 
> > > So, other than resuming the idle loop to sleep again, nothing may happen.
> > > 
> > > Or am I missing something?
> > 
> > Hmmm...  Seems like the IPIs aren't helping in any case, then?
> 
> I thought it was there for !PREEMPT cases where the task has TIF_RESCHED
> but takes too much time to find an opportunity to go to sleep.

Indeed, and it might be worth leaving in for that.

> > I suppose that I could do an smp_call_function_single(), which then
> > did a set_need_resched()...
> > 
> > But this is a separate issue that I need to deal with.  That said, any
> > suggestions are welcome!
> 
> Note you can't call smp_call_function_*() while irqs are disabled.

Sigh!  This isn't the first time this year that I have forgotten that,
is it?

> Perhaps you need something like kernel/sched.c:resched_cpu()
> This adds some rq->lock contention though.

This would happen infrequently, and could be made to be event more
infrequent.  But I wonder what happens when you do this to a CPU
that is running the idle task?  Seems like it should work normally,
but...

						Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-10-03 13:03                                             ` Frederic Weisbecker
@ 2011-10-03 16:30                                               ` Paul E. McKenney
  2011-10-06  0:58                                                 ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2011-10-03 16:30 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Mon, Oct 03, 2011 at 03:03:48PM +0200, Frederic Weisbecker wrote:
> On Sun, Oct 02, 2011 at 05:32:47PM -0700, Paul E. McKenney wrote:
> > > > -void rcu_irq_enter(void)
> > > > +int rcu_is_cpu_idle(void)
> > > >  {
> > > > -	rcu_exit_nohz();
> > > > +	return (atomic_read(&__get_cpu_var(rcu_dynticks).dynticks) & 0x1) == 0;
> > > >  }
> > > 
> > > So that's not used in this patch but it's interesting for me
> > > to backport "rcu: Detect illegal rcu dereference in extended quiescent state".
> > 
> > Yep, that is why it is there.
> 
> Ok.
> 
> > 
> > > The above should be read from a preempt disabled section though
> > > (remember "rcu: Fix preempt-unsafe debug check of rcu extended quiescent state")
> > 
> > Yes, and that is why the last line of the header comment reads "The
> > caller must have at least disabled preemption."  Disabling preemption
> > is not necessary in Tiny RCU because there is no other CPU for the task
> > to go to.  (Right?)
> 
> Right.
> 
> > > Those functions should probably lay in a separate patch. But I don't mind
> > > much keeping the things as is and use these APIs in my next patches though.
> > > I'll just fix the preempt enabled thing above.
> > 
> > Or were you saying that you wish to make calls to rcu_is_cpu_idle()
> > that have preemption enabled?
> 
> Yeah. That's going to be called from places like rcu_read_lock_held()
> and things like this that don't need to disable preemption themselves.
> 
> Would be better to disable preemption from that function.

Hmmm...  This might be a good use for the "drive-by" per-CPU access
functions.

No, that doesn't work.  We could pick up the pointer, switch to another
CPU, the original CPU could run a task that blocks before we start running,
and then we could incorrectly decide that we were running in idle context,
issuing a spurious warning.  This approach would only work in environments
that (unlike the Linux kernel) mapped all the per-CPU variables to the
same virtual address on all CPUs.  (DYNIX/ptx did this, but this leads
to other problems, like being unable to reasonably access other CPUs'
variables.  Double mapping has other issues on some architectures.)

OK, agreed.  I will make this function disable preemption.

> > And I can split the patch easily enough while keeping the diff the same,
> > so you should be able to do your porting on top of the existing code.
> 
> No I'm actually pretty fine with the current state. Whether that's defined
> in this patch or a following one is actually not important.

Fair enough!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-10-03 16:22                                               ` Paul E. McKenney
@ 2011-10-03 17:11                                                 ` Frederic Weisbecker
  0 siblings, 0 replies; 57+ messages in thread
From: Frederic Weisbecker @ 2011-10-03 17:11 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan

On Mon, Oct 03, 2011 at 09:22:21AM -0700, Paul E. McKenney wrote:
> On Mon, Oct 03, 2011 at 02:59:03PM +0200, Frederic Weisbecker wrote:
> > On Sun, Oct 02, 2011 at 05:28:32PM -0700, Paul E. McKenney wrote:
> > > On Mon, Oct 03, 2011 at 12:50:22AM +0200, Frederic Weisbecker wrote:
> > > > On Fri, Sep 30, 2011 at 12:24:38PM -0700, Paul E. McKenney wrote:
> > > > > @@ -328,11 +326,11 @@ static int rcu_implicit_offline_qs(struct rcu_data *rdp)
> > > > >  		return 1;
> > > > >  	}
> > > > >  
> > > > > -	/* If preemptible RCU, no point in sending reschedule IPI. */
> > > > > -	if (rdp->preemptible)
> > > > > -		return 0;
> > > > > -
> > > > > -	/* The CPU is online, so send it a reschedule IPI. */
> > > > > +	/*
> > > > > +	 * The CPU is online, so send it a reschedule IPI.  This forces
> > > > > +	 * it through the scheduler, and (inefficiently) also handles cases
> > > > > +	 * where idle loops fail to inform RCU about the CPU being idle.
> > > > > +	 */
> > > > 
> > > > If the idle loop forgets to call rcu_idle_enter() before going to
> > > > sleep, I don't know if it's a good idea to try to cure that situation
> > > > by forcing a quiescent state remotely. It may make the thing worse
> > > > because we actually won't notice the lack of call to rcu_idle_enter()
> > > > that the rcu stall detector would otherwise report to us.
> > > > 
> > > > Also I don't think that works. If the task doesn't have
> > > > TIF_RESCHED, it won't go through the scheduler on irq exit.
> > > > smp_send_reschedule() doesn't set the flag. And also scheduler_ipi()
> > > > returns right away if no wake up is pending.
> > > > 
> > > > So, other than resuming the idle loop to sleep again, nothing may happen.
> > > > 
> > > > Or am I missing something?
> > > 
> > > Hmmm...  Seems like the IPIs aren't helping in any case, then?
> > 
> > I thought it was there for !PREEMPT cases where the task has TIF_RESCHED
> > but takes too much time to find an opportunity to go to sleep.
> 
> Indeed, and it might be worth leaving in for that.

Now I realize it's not even helpful in that case. If you're having a long
time in the kernel without calling schedule(), an IPI won't be very useful
on that.

No, the current call looks useless to me :)

> > > I suppose that I could do an smp_call_function_single(), which then
> > > did a set_need_resched()...
> > > 
> > > But this is a separate issue that I need to deal with.  That said, any
> > > suggestions are welcome!
> > 
> > Note you can't call smp_call_function_*() while irqs are disabled.
> 
> Sigh!  This isn't the first time this year that I have forgotten that,
> is it?
> 
> > Perhaps you need something like kernel/sched.c:resched_cpu()
> > This adds some rq->lock contention though.
> 
> This would happen infrequently, and could be made to be event more
> infrequent.  But I wonder what happens when you do this to a CPU
> that is running the idle task?  Seems like it should work normally,
> but...

That should work as well. But I think we shouldn't send an IPI
with TIF_RESCHED set along to a remote CPU that is running idle.

If there is a missing rcu_idle_enter() call, we should report it (rcu
stall) and fix it. Not trying to cure the consequences. Sending an IPI
would make it harder to find such bugs.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-10-03 16:30                                               ` Paul E. McKenney
@ 2011-10-06  0:58                                                 ` Paul E. McKenney
  2011-10-06  1:59                                                   ` Paul E. McKenney
  2011-10-06 12:11                                                   ` Frederic Weisbecker
  0 siblings, 2 replies; 57+ messages in thread
From: Paul E. McKenney @ 2011-10-06  0:58 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan,
	arjan.van.de.ven

On Mon, Oct 03, 2011 at 09:30:36AM -0700, Paul E. McKenney wrote:
> On Mon, Oct 03, 2011 at 03:03:48PM +0200, Frederic Weisbecker wrote:
> > On Sun, Oct 02, 2011 at 05:32:47PM -0700, Paul E. McKenney wrote:
> > > > > -void rcu_irq_enter(void)
> > > > > +int rcu_is_cpu_idle(void)
> > > > >  {
> > > > > -	rcu_exit_nohz();
> > > > > +	return (atomic_read(&__get_cpu_var(rcu_dynticks).dynticks) & 0x1) == 0;
> > > > >  }
> > > > 
> > > > So that's not used in this patch but it's interesting for me
> > > > to backport "rcu: Detect illegal rcu dereference in extended quiescent state".
> > > 
> > > Yep, that is why it is there.
> > 
> > Ok.
> > 
> > > 
> > > > The above should be read from a preempt disabled section though
> > > > (remember "rcu: Fix preempt-unsafe debug check of rcu extended quiescent state")
> > > 
> > > Yes, and that is why the last line of the header comment reads "The
> > > caller must have at least disabled preemption."  Disabling preemption
> > > is not necessary in Tiny RCU because there is no other CPU for the task
> > > to go to.  (Right?)
> > 
> > Right.
> > 
> > > > Those functions should probably lay in a separate patch. But I don't mind
> > > > much keeping the things as is and use these APIs in my next patches though.
> > > > I'll just fix the preempt enabled thing above.
> > > 
> > > Or were you saying that you wish to make calls to rcu_is_cpu_idle()
> > > that have preemption enabled?
> > 
> > Yeah. That's going to be called from places like rcu_read_lock_held()
> > and things like this that don't need to disable preemption themselves.
> > 
> > Would be better to disable preemption from that function.
> 
> Hmmm...  This might be a good use for the "drive-by" per-CPU access
> functions.
> 
> No, that doesn't work.  We could pick up the pointer, switch to another
> CPU, the original CPU could run a task that blocks before we start running,
> and then we could incorrectly decide that we were running in idle context,
> issuing a spurious warning.  This approach would only work in environments
> that (unlike the Linux kernel) mapped all the per-CPU variables to the
> same virtual address on all CPUs.  (DYNIX/ptx did this, but this leads
> to other problems, like being unable to reasonably access other CPUs'
> variables.  Double mapping has other issues on some architectures.)
> 
> OK, agreed.  I will make this function disable preemption.
> 
> > > And I can split the patch easily enough while keeping the diff the same,
> > > so you should be able to do your porting on top of the existing code.
> > 
> > No I'm actually pretty fine with the current state. Whether that's defined
> > in this patch or a following one is actually not important.
> 
> Fair enough!

And here is an update that might handle an irq entry/exit miscounting
problem.  Thanks to Arjan van de Ven for pointing out that my earlier
approach would in fact miscount irq entries/exits in face of things like
upcalls to user-mode helpers.

This is experimental, and might well hurt more than it helps.  Testing
ongoing.  Applies on top of my "Track idleness independent of idle tasks"
commit.  Right...  And the tracing relies on a later patch, so feel free
to yank the calls to trace_rcu_dyntick() on the off-chance that you are
crazy enough to actually try this.

Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

Not-yet-signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 06c0ed4..d4247e0 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -356,6 +356,11 @@ void rcu_idle_enter(void)
 
 	local_irq_save(flags);
 	rdtp = &__get_cpu_var(rcu_dynticks);
+	if (!idle_cpu(smp_processor_id())) {
+		trace_rcu_dyntick("--|", rdtp->dynticks_nesting);
+		local_irq_restore(flags);
+		return;
+	}
 	if (--rdtp->dynticks_nesting) {
 		trace_rcu_dyntick("--=", rdtp->dynticks_nesting);
 		local_irq_restore(flags);
@@ -384,6 +389,11 @@ void rcu_idle_exit(void)
 
 	local_irq_save(flags);
 	rdtp = &__get_cpu_var(rcu_dynticks);
+	if (!idle_cpu(smp_processor_id())) {
+		trace_rcu_dyntick("++|", rdtp->dynticks_nesting);
+		local_irq_restore(flags);
+		return;
+	}
 	if (rdtp->dynticks_nesting++) {
 		trace_rcu_dyntick("++=", rdtp->dynticks_nesting);
 		local_irq_restore(flags);

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-10-06  0:58                                                 ` Paul E. McKenney
@ 2011-10-06  1:59                                                   ` Paul E. McKenney
  2011-10-06 12:11                                                   ` Frederic Weisbecker
  1 sibling, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2011-10-06  1:59 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan,
	arjan.van.de.ven

On Wed, Oct 05, 2011 at 05:58:58PM -0700, Paul E. McKenney wrote:
> On Mon, Oct 03, 2011 at 09:30:36AM -0700, Paul E. McKenney wrote:
> > On Mon, Oct 03, 2011 at 03:03:48PM +0200, Frederic Weisbecker wrote:
> > > On Sun, Oct 02, 2011 at 05:32:47PM -0700, Paul E. McKenney wrote:
> > > > > > -void rcu_irq_enter(void)
> > > > > > +int rcu_is_cpu_idle(void)
> > > > > >  {
> > > > > > -	rcu_exit_nohz();
> > > > > > +	return (atomic_read(&__get_cpu_var(rcu_dynticks).dynticks) & 0x1) == 0;
> > > > > >  }
> > > > > 
> > > > > So that's not used in this patch but it's interesting for me
> > > > > to backport "rcu: Detect illegal rcu dereference in extended quiescent state".
> > > > 
> > > > Yep, that is why it is there.
> > > 
> > > Ok.
> > > 
> > > > 
> > > > > The above should be read from a preempt disabled section though
> > > > > (remember "rcu: Fix preempt-unsafe debug check of rcu extended quiescent state")
> > > > 
> > > > Yes, and that is why the last line of the header comment reads "The
> > > > caller must have at least disabled preemption."  Disabling preemption
> > > > is not necessary in Tiny RCU because there is no other CPU for the task
> > > > to go to.  (Right?)
> > > 
> > > Right.
> > > 
> > > > > Those functions should probably lay in a separate patch. But I don't mind
> > > > > much keeping the things as is and use these APIs in my next patches though.
> > > > > I'll just fix the preempt enabled thing above.
> > > > 
> > > > Or were you saying that you wish to make calls to rcu_is_cpu_idle()
> > > > that have preemption enabled?
> > > 
> > > Yeah. That's going to be called from places like rcu_read_lock_held()
> > > and things like this that don't need to disable preemption themselves.
> > > 
> > > Would be better to disable preemption from that function.
> > 
> > Hmmm...  This might be a good use for the "drive-by" per-CPU access
> > functions.
> > 
> > No, that doesn't work.  We could pick up the pointer, switch to another
> > CPU, the original CPU could run a task that blocks before we start running,
> > and then we could incorrectly decide that we were running in idle context,
> > issuing a spurious warning.  This approach would only work in environments
> > that (unlike the Linux kernel) mapped all the per-CPU variables to the
> > same virtual address on all CPUs.  (DYNIX/ptx did this, but this leads
> > to other problems, like being unable to reasonably access other CPUs'
> > variables.  Double mapping has other issues on some architectures.)
> > 
> > OK, agreed.  I will make this function disable preemption.
> > 
> > > > And I can split the patch easily enough while keeping the diff the same,
> > > > so you should be able to do your porting on top of the existing code.
> > > 
> > > No I'm actually pretty fine with the current state. Whether that's defined
> > > in this patch or a following one is actually not important.
> > 
> > Fair enough!
> 
> And here is an update that might handle an irq entry/exit miscounting
> problem.  Thanks to Arjan van de Ven for pointing out that my earlier
> approach would in fact miscount irq entries/exits in face of things like
> upcalls to user-mode helpers.
> 
> This is experimental, and might well hurt more than it helps.  Testing
> ongoing.  Applies on top of my "Track idleness independent of idle tasks"
> commit.  Right...  And the tracing relies on a later patch, so feel free
> to yank the calls to trace_rcu_dyntick() on the off-chance that you are
> crazy enough to actually try this.
> 
> Thoughts?

For the code currently in mainline, I hasten to add.  For your use,
Frederic, I need to handle the case where a user process is idle
from an RCU viewpoint.  I will be looking into this, but in the
meantime I wanted to prove/disprove that this is the source of the
failures that I have been seeing.

							Thanx, Paul

> ------------------------------------------------------------------------
> 
> Not-yet-signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 06c0ed4..d4247e0 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -356,6 +356,11 @@ void rcu_idle_enter(void)
>  
>  	local_irq_save(flags);
>  	rdtp = &__get_cpu_var(rcu_dynticks);
> +	if (!idle_cpu(smp_processor_id())) {
> +		trace_rcu_dyntick("--|", rdtp->dynticks_nesting);
> +		local_irq_restore(flags);
> +		return;
> +	}
>  	if (--rdtp->dynticks_nesting) {
>  		trace_rcu_dyntick("--=", rdtp->dynticks_nesting);
>  		local_irq_restore(flags);
> @@ -384,6 +389,11 @@ void rcu_idle_exit(void)
>  
>  	local_irq_save(flags);
>  	rdtp = &__get_cpu_var(rcu_dynticks);
> +	if (!idle_cpu(smp_processor_id())) {
> +		trace_rcu_dyntick("++|", rdtp->dynticks_nesting);
> +		local_irq_restore(flags);
> +		return;
> +	}
>  	if (rdtp->dynticks_nesting++) {
>  		trace_rcu_dyntick("++=", rdtp->dynticks_nesting);
>  		local_irq_restore(flags);

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-10-06  0:58                                                 ` Paul E. McKenney
  2011-10-06  1:59                                                   ` Paul E. McKenney
@ 2011-10-06 12:11                                                   ` Frederic Weisbecker
  2011-10-06 18:44                                                     ` Paul E. McKenney
  1 sibling, 1 reply; 57+ messages in thread
From: Frederic Weisbecker @ 2011-10-06 12:11 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan,
	arjan.van.de.ven

On Wed, Oct 05, 2011 at 05:58:58PM -0700, Paul E. McKenney wrote:
> On Mon, Oct 03, 2011 at 09:30:36AM -0700, Paul E. McKenney wrote:
> > On Mon, Oct 03, 2011 at 03:03:48PM +0200, Frederic Weisbecker wrote:
> > > On Sun, Oct 02, 2011 at 05:32:47PM -0700, Paul E. McKenney wrote:
> > > > > > -void rcu_irq_enter(void)
> > > > > > +int rcu_is_cpu_idle(void)
> > > > > >  {
> > > > > > -	rcu_exit_nohz();
> > > > > > +	return (atomic_read(&__get_cpu_var(rcu_dynticks).dynticks) & 0x1) == 0;
> > > > > >  }
> > > > > 
> > > > > So that's not used in this patch but it's interesting for me
> > > > > to backport "rcu: Detect illegal rcu dereference in extended quiescent state".
> > > > 
> > > > Yep, that is why it is there.
> > > 
> > > Ok.
> > > 
> > > > 
> > > > > The above should be read from a preempt disabled section though
> > > > > (remember "rcu: Fix preempt-unsafe debug check of rcu extended quiescent state")
> > > > 
> > > > Yes, and that is why the last line of the header comment reads "The
> > > > caller must have at least disabled preemption."  Disabling preemption
> > > > is not necessary in Tiny RCU because there is no other CPU for the task
> > > > to go to.  (Right?)
> > > 
> > > Right.
> > > 
> > > > > Those functions should probably lay in a separate patch. But I don't mind
> > > > > much keeping the things as is and use these APIs in my next patches though.
> > > > > I'll just fix the preempt enabled thing above.
> > > > 
> > > > Or were you saying that you wish to make calls to rcu_is_cpu_idle()
> > > > that have preemption enabled?
> > > 
> > > Yeah. That's going to be called from places like rcu_read_lock_held()
> > > and things like this that don't need to disable preemption themselves.
> > > 
> > > Would be better to disable preemption from that function.
> > 
> > Hmmm...  This might be a good use for the "drive-by" per-CPU access
> > functions.
> > 
> > No, that doesn't work.  We could pick up the pointer, switch to another
> > CPU, the original CPU could run a task that blocks before we start running,
> > and then we could incorrectly decide that we were running in idle context,
> > issuing a spurious warning.  This approach would only work in environments
> > that (unlike the Linux kernel) mapped all the per-CPU variables to the
> > same virtual address on all CPUs.  (DYNIX/ptx did this, but this leads
> > to other problems, like being unable to reasonably access other CPUs'
> > variables.  Double mapping has other issues on some architectures.)
> > 
> > OK, agreed.  I will make this function disable preemption.
> > 
> > > > And I can split the patch easily enough while keeping the diff the same,
> > > > so you should be able to do your porting on top of the existing code.
> > > 
> > > No I'm actually pretty fine with the current state. Whether that's defined
> > > in this patch or a following one is actually not important.
> > 
> > Fair enough!
> 
> And here is an update that might handle an irq entry/exit miscounting
> problem.  Thanks to Arjan van de Ven for pointing out that my earlier
> approach would in fact miscount irq entries/exits in face of things like
> upcalls to user-mode helpers.

I'm not sure what you mean. How could the current state miscount in user-mode?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-10-06 12:11                                                   ` Frederic Weisbecker
@ 2011-10-06 18:44                                                     ` Paul E. McKenney
  2011-10-06 23:44                                                       ` Paul E. McKenney
  0 siblings, 1 reply; 57+ messages in thread
From: Paul E. McKenney @ 2011-10-06 18:44 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan,
	arjan.van.de.ven, andi.kleen

On Thu, Oct 06, 2011 at 02:11:28PM +0200, Frederic Weisbecker wrote:
> On Wed, Oct 05, 2011 at 05:58:58PM -0700, Paul E. McKenney wrote:
> > On Mon, Oct 03, 2011 at 09:30:36AM -0700, Paul E. McKenney wrote:
> > > On Mon, Oct 03, 2011 at 03:03:48PM +0200, Frederic Weisbecker wrote:
> > > > On Sun, Oct 02, 2011 at 05:32:47PM -0700, Paul E. McKenney wrote:
> > > > > > > -void rcu_irq_enter(void)
> > > > > > > +int rcu_is_cpu_idle(void)
> > > > > > >  {
> > > > > > > -	rcu_exit_nohz();
> > > > > > > +	return (atomic_read(&__get_cpu_var(rcu_dynticks).dynticks) & 0x1) == 0;
> > > > > > >  }
> > > > > > 
> > > > > > So that's not used in this patch but it's interesting for me
> > > > > > to backport "rcu: Detect illegal rcu dereference in extended quiescent state".
> > > > > 
> > > > > Yep, that is why it is there.
> > > > 
> > > > Ok.
> > > > 
> > > > > 
> > > > > > The above should be read from a preempt disabled section though
> > > > > > (remember "rcu: Fix preempt-unsafe debug check of rcu extended quiescent state")
> > > > > 
> > > > > Yes, and that is why the last line of the header comment reads "The
> > > > > caller must have at least disabled preemption."  Disabling preemption
> > > > > is not necessary in Tiny RCU because there is no other CPU for the task
> > > > > to go to.  (Right?)
> > > > 
> > > > Right.
> > > > 
> > > > > > Those functions should probably lay in a separate patch. But I don't mind
> > > > > > much keeping the things as is and use these APIs in my next patches though.
> > > > > > I'll just fix the preempt enabled thing above.
> > > > > 
> > > > > Or were you saying that you wish to make calls to rcu_is_cpu_idle()
> > > > > that have preemption enabled?
> > > > 
> > > > Yeah. That's going to be called from places like rcu_read_lock_held()
> > > > and things like this that don't need to disable preemption themselves.
> > > > 
> > > > Would be better to disable preemption from that function.
> > > 
> > > Hmmm...  This might be a good use for the "drive-by" per-CPU access
> > > functions.
> > > 
> > > No, that doesn't work.  We could pick up the pointer, switch to another
> > > CPU, the original CPU could run a task that blocks before we start running,
> > > and then we could incorrectly decide that we were running in idle context,
> > > issuing a spurious warning.  This approach would only work in environments
> > > that (unlike the Linux kernel) mapped all the per-CPU variables to the
> > > same virtual address on all CPUs.  (DYNIX/ptx did this, but this leads
> > > to other problems, like being unable to reasonably access other CPUs'
> > > variables.  Double mapping has other issues on some architectures.)
> > > 
> > > OK, agreed.  I will make this function disable preemption.
> > > 
> > > > > And I can split the patch easily enough while keeping the diff the same,
> > > > > so you should be able to do your porting on top of the existing code.
> > > > 
> > > > No I'm actually pretty fine with the current state. Whether that's defined
> > > > in this patch or a following one is actually not important.
> > > 
> > > Fair enough!
> > 
> > And here is an update that might handle an irq entry/exit miscounting
> > problem.  Thanks to Arjan van de Ven for pointing out that my earlier
> > approach would in fact miscount irq entries/exits in face of things like
> > upcalls to user-mode helpers.
> 
> I'm not sure what you mean. How could the current state miscount in user-mode?

It appears that some sorts of upcalls to userspace can have an irq_exit()
without a matching irq_enter(), as shown by the stack trace below.  This
splat was generated by some code in rcu_idle_enter() that complains when
a non-idle task tries to become idle.

One possibility that I am considering is to have ____call_usermodehelper()
set a task-structure flag just before the call to kernel_execve(), and
to have rcu_idle_enter() check that flag, and, if set, zero the flag
and just return without doing anything.  I don't claim to understand
the code well enough to know whether this really works, though.

							Thanx, Paul

------------------------------------------------------------------------

[    0.373084] WARNING: at kernel/rcutree.c:398
[    0.373089] Modules linked in:
[    0.373097] NIP: c0000000000d3c4c LR: c0000000000d3c34 CTR: 0000000000000000
[    0.373106] REGS: c000000042212f50 TRAP: 0700   Not tainted  (3.1.0-rc8-autokern1)
[    0.373114] MSR: 8000000000021032 <ME,CE,IR,DR>  CR: 48008022  XER: 00000000
[    0.373134] CFAR: c000000000053340
[    0.373140] TASK = c0000000421f2640[5] 'kworker/u:0' THREAD: c000000042210000 CPU: 1
[    0.373149] GPR00: 0000000000000001 c0000000422131d0 c000000000a1a7c0 0000000000000000 
[    0.373165] GPR04: 0000000000000001 c000000008123d50 0000000004000000 0000000000000000 
[    0.373182] GPR08: 0000000000000001 c000000000a8809d c0000000008f9520 c000000000a47d58 
[    0.373198] GPR12: 8000000000009032 c000000007578280 0000000002080000 c0000000007b89d8 
[    0.373214] GPR16: c0000000007b5078 0000000000000000 0000000000000000 0000000000000000 
[    0.373231] GPR20: c000000042213a00 c000000000940480 c0000000428076a0 c000000042807600 
[    0.373247] GPR24: c000000042807600 0000000000000040 c0000000009405f0 0000000000000000 
[    0.373263] GPR28: 0000000000000001 0000000000000001 c0000000009991b0 0000000000000001 
[    0.373284] NIP [c0000000000d3c4c] .rcu_idle_exit+0x1f4/0x248
[    0.373293] LR [c0000000000d3c34] .rcu_idle_exit+0x1dc/0x248
[    0.373300] Call Trace:
[    0.373306] [c0000000422131d0] [c0000000000d3c28] .rcu_idle_exit+0x1d0/0x248 (unreliable)
[    0.373319] [c000000042213270] [c00000000006f8d4] .irq_enter+0x20/0x88
[    0.373330] [c0000000422132f0] [c00000000001b264] .timer_interrupt+0x150/0x2d0
[    0.373341] [c000000042213390] [c0000000000038a4] decrementer_common+0x124/0x180
[    0.373354] --- Exception: 901 at .dup_fd+0x1a0/0x2d8
[    0.373355]     LR = .dup_fd+0x160/0x2d8
[    0.373365] [c000000042213680] [c000000000172678] .dup_fd+0xf8/0x2d8 (unreliable)
[    0.373378] [c000000042213750] [c000000000065f2c] .copy_process+0x64c/0x115c
[    0.373388] [c000000042213840] [c000000000066f4c] .do_fork+0x118/0x338
[    0.373399] [c000000042213920] [c0000000000134d8] .sys_clone+0x5c/0x74
[    0.373409] [c000000042213990] [c000000000009914] .ppc_clone+0x8/0xc
[    0.373421] --- Exception: c00 at .kernel_thread+0x28/0x70
[    0.373423]     LR = .__call_usermodehelper+0x68/0xf0
[    0.373433] [c000000042213c80] [c000000042213d10] 0xc000000042213d10 (unreliable)
[    0.373445] [c000000042213cf0] [c000000042213d80] 0xc000000042213d80
[    0.373455] [c000000042213d80] [c000000000086394] .process_one_work+0x2e8/0x4d0
[    0.373467] [c000000042213e40] [c000000000089484] .worker_thread+0x1b0/0x2f4
[    0.373477] [c000000042213ed0] [c000000000091bf8] .kthread+0xb4/0xc0
[    0.373488] [c000000042213f90] [c00000000001de90] .kernel_thread+0x54/0x70
[    0.373497] Instruction dump:
[    0.373502] 485117d9 60000000 482428bd 60000000 7c6307b4 4bf7f711 60000000 2fa30000 
[    0.373523] 40be0028 e93e8300 88090000 68000001 <0b000000> 2fa00000 41be0010 e93e8300 
[    0.373549] ---[ end trace 75d2b1226921d2ff ]---

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: linux-next-20110923: warning kernel/rcutree.c:1833
  2011-10-06 18:44                                                     ` Paul E. McKenney
@ 2011-10-06 23:44                                                       ` Paul E. McKenney
  0 siblings, 0 replies; 57+ messages in thread
From: Paul E. McKenney @ 2011-10-06 23:44 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kirill A. Shutemov, linux-kernel, Dipankar Sarma,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Lai Jiangshan,
	arjan.van.de.ven, andi.kleen

On Thu, Oct 06, 2011 at 11:44:55AM -0700, Paul E. McKenney wrote:
> On Thu, Oct 06, 2011 at 02:11:28PM +0200, Frederic Weisbecker wrote:
> > On Wed, Oct 05, 2011 at 05:58:58PM -0700, Paul E. McKenney wrote:
> > > On Mon, Oct 03, 2011 at 09:30:36AM -0700, Paul E. McKenney wrote:
> > > > On Mon, Oct 03, 2011 at 03:03:48PM +0200, Frederic Weisbecker wrote:
> > > > > On Sun, Oct 02, 2011 at 05:32:47PM -0700, Paul E. McKenney wrote:
> > > > > > > > -void rcu_irq_enter(void)
> > > > > > > > +int rcu_is_cpu_idle(void)
> > > > > > > >  {
> > > > > > > > -	rcu_exit_nohz();
> > > > > > > > +	return (atomic_read(&__get_cpu_var(rcu_dynticks).dynticks) & 0x1) == 0;
> > > > > > > >  }
> > > > > > > 
> > > > > > > So that's not used in this patch but it's interesting for me
> > > > > > > to backport "rcu: Detect illegal rcu dereference in extended quiescent state".
> > > > > > 
> > > > > > Yep, that is why it is there.
> > > > > 
> > > > > Ok.
> > > > > 
> > > > > > 
> > > > > > > The above should be read from a preempt disabled section though
> > > > > > > (remember "rcu: Fix preempt-unsafe debug check of rcu extended quiescent state")
> > > > > > 
> > > > > > Yes, and that is why the last line of the header comment reads "The
> > > > > > caller must have at least disabled preemption."  Disabling preemption
> > > > > > is not necessary in Tiny RCU because there is no other CPU for the task
> > > > > > to go to.  (Right?)
> > > > > 
> > > > > Right.
> > > > > 
> > > > > > > Those functions should probably lay in a separate patch. But I don't mind
> > > > > > > much keeping the things as is and use these APIs in my next patches though.
> > > > > > > I'll just fix the preempt enabled thing above.
> > > > > > 
> > > > > > Or were you saying that you wish to make calls to rcu_is_cpu_idle()
> > > > > > that have preemption enabled?
> > > > > 
> > > > > Yeah. That's going to be called from places like rcu_read_lock_held()
> > > > > and things like this that don't need to disable preemption themselves.
> > > > > 
> > > > > Would be better to disable preemption from that function.
> > > > 
> > > > Hmmm...  This might be a good use for the "drive-by" per-CPU access
> > > > functions.
> > > > 
> > > > No, that doesn't work.  We could pick up the pointer, switch to another
> > > > CPU, the original CPU could run a task that blocks before we start running,
> > > > and then we could incorrectly decide that we were running in idle context,
> > > > issuing a spurious warning.  This approach would only work in environments
> > > > that (unlike the Linux kernel) mapped all the per-CPU variables to the
> > > > same virtual address on all CPUs.  (DYNIX/ptx did this, but this leads
> > > > to other problems, like being unable to reasonably access other CPUs'
> > > > variables.  Double mapping has other issues on some architectures.)
> > > > 
> > > > OK, agreed.  I will make this function disable preemption.
> > > > 
> > > > > > And I can split the patch easily enough while keeping the diff the same,
> > > > > > so you should be able to do your porting on top of the existing code.
> > > > > 
> > > > > No I'm actually pretty fine with the current state. Whether that's defined
> > > > > in this patch or a following one is actually not important.
> > > > 
> > > > Fair enough!
> > > 
> > > And here is an update that might handle an irq entry/exit miscounting
> > > problem.  Thanks to Arjan van de Ven for pointing out that my earlier
> > > approach would in fact miscount irq entries/exits in face of things like
> > > upcalls to user-mode helpers.
> > 
> > I'm not sure what you mean. How could the current state miscount in user-mode?
> 
> It appears that some sorts of upcalls to userspace can have an irq_exit()
> without a matching irq_enter(), as shown by the stack trace below.  This
> splat was generated by some code in rcu_idle_enter() that complains when
> a non-idle task tries to become idle.
> 
> One possibility that I am considering is to have ____call_usermodehelper()
> set a task-structure flag just before the call to kernel_execve(), and
> to have rcu_idle_enter() check that flag, and, if set, zero the flag
> and just return without doing anything.  I don't claim to understand
> the code well enough to know whether this really works, though.

And not a chance -- too many opportunities for interrupts and preemption
at any number of points in this code.  Back to the drawing board...

							Thanx, Paul

> ------------------------------------------------------------------------
> 
> [    0.373084] WARNING: at kernel/rcutree.c:398
> [    0.373089] Modules linked in:
> [    0.373097] NIP: c0000000000d3c4c LR: c0000000000d3c34 CTR: 0000000000000000
> [    0.373106] REGS: c000000042212f50 TRAP: 0700   Not tainted  (3.1.0-rc8-autokern1)
> [    0.373114] MSR: 8000000000021032 <ME,CE,IR,DR>  CR: 48008022  XER: 00000000
> [    0.373134] CFAR: c000000000053340
> [    0.373140] TASK = c0000000421f2640[5] 'kworker/u:0' THREAD: c000000042210000 CPU: 1
> [    0.373149] GPR00: 0000000000000001 c0000000422131d0 c000000000a1a7c0 0000000000000000 
> [    0.373165] GPR04: 0000000000000001 c000000008123d50 0000000004000000 0000000000000000 
> [    0.373182] GPR08: 0000000000000001 c000000000a8809d c0000000008f9520 c000000000a47d58 
> [    0.373198] GPR12: 8000000000009032 c000000007578280 0000000002080000 c0000000007b89d8 
> [    0.373214] GPR16: c0000000007b5078 0000000000000000 0000000000000000 0000000000000000 
> [    0.373231] GPR20: c000000042213a00 c000000000940480 c0000000428076a0 c000000042807600 
> [    0.373247] GPR24: c000000042807600 0000000000000040 c0000000009405f0 0000000000000000 
> [    0.373263] GPR28: 0000000000000001 0000000000000001 c0000000009991b0 0000000000000001 
> [    0.373284] NIP [c0000000000d3c4c] .rcu_idle_exit+0x1f4/0x248
> [    0.373293] LR [c0000000000d3c34] .rcu_idle_exit+0x1dc/0x248
> [    0.373300] Call Trace:
> [    0.373306] [c0000000422131d0] [c0000000000d3c28] .rcu_idle_exit+0x1d0/0x248 (unreliable)
> [    0.373319] [c000000042213270] [c00000000006f8d4] .irq_enter+0x20/0x88
> [    0.373330] [c0000000422132f0] [c00000000001b264] .timer_interrupt+0x150/0x2d0
> [    0.373341] [c000000042213390] [c0000000000038a4] decrementer_common+0x124/0x180
> [    0.373354] --- Exception: 901 at .dup_fd+0x1a0/0x2d8
> [    0.373355]     LR = .dup_fd+0x160/0x2d8
> [    0.373365] [c000000042213680] [c000000000172678] .dup_fd+0xf8/0x2d8 (unreliable)
> [    0.373378] [c000000042213750] [c000000000065f2c] .copy_process+0x64c/0x115c
> [    0.373388] [c000000042213840] [c000000000066f4c] .do_fork+0x118/0x338
> [    0.373399] [c000000042213920] [c0000000000134d8] .sys_clone+0x5c/0x74
> [    0.373409] [c000000042213990] [c000000000009914] .ppc_clone+0x8/0xc
> [    0.373421] --- Exception: c00 at .kernel_thread+0x28/0x70
> [    0.373423]     LR = .__call_usermodehelper+0x68/0xf0
> [    0.373433] [c000000042213c80] [c000000042213d10] 0xc000000042213d10 (unreliable)
> [    0.373445] [c000000042213cf0] [c000000042213d80] 0xc000000042213d80
> [    0.373455] [c000000042213d80] [c000000000086394] .process_one_work+0x2e8/0x4d0
> [    0.373467] [c000000042213e40] [c000000000089484] .worker_thread+0x1b0/0x2f4
> [    0.373477] [c000000042213ed0] [c000000000091bf8] .kthread+0xb4/0xc0
> [    0.373488] [c000000042213f90] [c00000000001de90] .kernel_thread+0x54/0x70
> [    0.373497] Instruction dump:
> [    0.373502] 485117d9 60000000 482428bd 60000000 7c6307b4 4bf7f711 60000000 2fa30000 
> [    0.373523] 40be0028 e93e8300 88090000 68000001 <0b000000> 2fa00000 41be0010 e93e8300 
> [    0.373549] ---[ end trace 75d2b1226921d2ff ]---

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2011-10-06 23:45 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-25  0:24 linux-next-20110923: warning kernel/rcutree.c:1833 Kirill A. Shutemov
2011-09-25  5:08 ` Paul E. McKenney
2011-09-25 11:26   ` Kirill A. Shutemov
2011-09-25 13:06     ` Frederic Weisbecker
2011-09-25 14:19       ` Kirill A. Shutemov
2011-09-25 16:48       ` Paul E. McKenney
2011-09-26  1:04         ` Frederic Weisbecker
2011-09-26  1:10           ` Frederic Weisbecker
2011-09-26  1:26             ` Paul E. McKenney
2011-09-26  1:41               ` Paul E. McKenney
2011-09-26  9:39                 ` Frederic Weisbecker
2011-09-26 22:34                   ` Paul E. McKenney
2011-09-27 12:07                     ` Frederic Weisbecker
2011-09-26  9:42                 ` Frederic Weisbecker
2011-09-26 22:35                   ` Paul E. McKenney
2011-09-26  9:20               ` Frederic Weisbecker
2011-09-26 22:50                 ` Paul E. McKenney
2011-09-27 12:16                   ` Frederic Weisbecker
2011-09-27 18:01                     ` Paul E. McKenney
2011-09-28 12:31                       ` Frederic Weisbecker
2011-09-28 18:40                         ` Paul E. McKenney
2011-09-28 23:46                           ` Frederic Weisbecker
2011-09-29  0:55                             ` Paul E. McKenney
2011-09-29  4:49                               ` Paul E. McKenney
2011-09-29 12:30                               ` Frederic Weisbecker
2011-09-29 17:12                                 ` Paul E. McKenney
2011-09-29 17:19                                   ` Paul E. McKenney
2011-09-29 23:18                                     ` Paul E. McKenney
2011-09-30 13:11                                   ` Frederic Weisbecker
2011-09-30 15:29                                     ` Paul E. McKenney
2011-09-30 19:24                                       ` Paul E. McKenney
2011-10-01  4:34                                         ` Paul E. McKenney
2011-10-01 12:24                                         ` Frederic Weisbecker
2011-10-01 12:28                                           ` Frederic Weisbecker
2011-10-01 16:35                                             ` Paul E. McKenney
2011-10-01 17:07                                           ` Paul E. McKenney
2011-10-02  3:23                                             ` Paul E. McKenney
2011-10-02 11:45                                               ` Frederic Weisbecker
2011-10-02 22:50                                         ` Frederic Weisbecker
2011-10-03  0:28                                           ` Paul E. McKenney
2011-10-03 12:59                                             ` Frederic Weisbecker
2011-10-03 16:22                                               ` Paul E. McKenney
2011-10-03 17:11                                                 ` Frederic Weisbecker
2011-10-02 23:07                                         ` Frederic Weisbecker
2011-10-03  0:32                                           ` Paul E. McKenney
2011-10-03 13:03                                             ` Frederic Weisbecker
2011-10-03 16:30                                               ` Paul E. McKenney
2011-10-06  0:58                                                 ` Paul E. McKenney
2011-10-06  1:59                                                   ` Paul E. McKenney
2011-10-06 12:11                                                   ` Frederic Weisbecker
2011-10-06 18:44                                                     ` Paul E. McKenney
2011-10-06 23:44                                                       ` Paul E. McKenney
2011-09-26  1:25           ` Paul E. McKenney
2011-09-26  8:48             ` Frederic Weisbecker
2011-09-26  8:49             ` Frederic Weisbecker
2011-09-26 22:30               ` Paul E. McKenney
2011-09-27 11:55                 ` Frederic Weisbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.