All of lore.kernel.org
 help / color / mirror / Atom feed
* [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-01  2:59 ` Fengguang Wu
  0 siblings, 0 replies; 45+ messages in thread
From: Fengguang Wu @ 2015-02-01  2:59 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: LKP, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 7842 bytes --]

Greetings,

0day kernel testing robot got the below dmesg and the first bad commit is

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git revert-c418b8035fac0cc7d242e5de126cec1006a34bed-dd2b39be8eee9d175c7842c30e405a5cbe50095a

commit dd2b39be8eee9d175c7842c30e405a5cbe50095a
Author:     Paul E. McKenney <paulmck@linux.vnet.ibm.com>
AuthorDate: Wed Jan 28 14:42:09 2015 -0800
Commit:     Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CommitDate: Fri Jan 30 12:59:22 2015 -0800

    rcu: Handle outgoing CPUs on exit from idle loop
    
    This commit informs RCU of an outgoing CPU just before that CPU invokes
    arch_cpu_idle_dead() during its last pass through the idle loop (via a
    new CPU_DYING_IDLE notifier value).  This change means that RCU need not
    deal with outgoing CPUs passing through the scheduler after informing
    RCU that they are no longer online.  Note that removing the CPU from
    the rcu_node ->qsmaskinit bit masks is done at CPU_DYING_IDLE time,
    and orphaning callbacks is still done at CPU_DEAD time, the reason being
    that at CPU_DEAD time we have another CPU that can adopt them.
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

+-------------------------------------+------------+------------+------------+
|                                     | 8c50d7da91 | dd2b39be8e | d586642522 |
+-------------------------------------+------------+------------+------------+
| boot_successes                      | 198        | 11         | 51         |
| boot_failures                       | 0          | 55         | 15         |
| INFO:suspicious_RCU_usage           | 0          | 55         | 15         |
| RCU_used_illegally_from_offline_CPU | 0          | 55         | 15         |
| backtrace:cpu_startup_entry         | 0          | 55         | 15         |
| BUG:kernel_test_hang                | 0          | 0          | 4          |
+-------------------------------------+------------+------------+------------+

[   15.244825] numa_remove_cpu cpu 0 node 0: mask now 1
[   15.246713] 
[   15.246917] ===============================
[   15.247424] [ INFO: suspicious RCU usage. ]
[   15.247964] 3.19.0-rc1-gdd2b39b #10 Not tainted
[   15.248531] -------------------------------
[   15.248586] include/trace/events/rcu.h:35 suspicious rcu_dereference_check() usage!
[   15.248586] 
[   15.248586] other info that might help us debug this:
[   15.248586] 
[   15.248586] 
[   15.248586] RCU used illegally from offline CPU!
[   15.248586] rcu_scheduler_active = 1, debug_locks = 0
[   15.248586] no locks held by swapper/0/0.
[   15.248586] 
[   15.248586] stack backtrace:
[   15.248586] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.19.0-rc1-gdd2b39b #10
[   15.248586]  0000000000000001 ffffffff81e03e08 ffffffff8171b89b 0000000000000000
[   15.248586]  ffffffff81e0e580 ffffffff81e03e38 ffffffff810efec2 ffffffff81e4b140
[   15.248586]  ffffffff81c77ba0 0000000000000002 ffffffff81e11e98 ffffffff81e03e58
[   15.248586] Call Trace:
[   15.248586]  [<ffffffff8171b89b>] dump_stack+0x7f/0xa7
[   15.248586]  [<ffffffff810efec2>] lockdep_rcu_suspicious+0x107/0x110
[   15.248586]  [<ffffffff81111363>] trace_rcu_utilization+0x127/0x133
[   15.248586]  [<ffffffff8111291e>] rcu_cpu_notify+0x527/0x53b
[   15.248586]  [<ffffffff810e9722>] cpu_startup_entry+0x1dc/0x4ea
[   15.248586]  [<ffffffff8170eb5d>] rest_init+0x159/0x15f
[   15.248586]  [<ffffffff8237b2da>] start_kernel+0x565/0x572
[   15.248586]  [<ffffffff8237a120>] ? early_idt_handlers+0x120/0x120
[   15.248586]  [<ffffffff8237a4e4>] x86_64_start_reservations+0x41/0x43
[   15.248586]  [<ffffffff8237a623>] x86_64_start_kernel+0x13d/0x14c
[   15.265151] CPU 0 is now offline
[   15.265941] debug: unmapping init [mem 0xffffffff82365000-0xffffffff82539fff]
[   15.266726] Write protecting the kernel read-only data: 14336k

git bisect start d58664252218cfefb19709d597ff0c5d93688203 26bc420b59a38e4e6685a73345a0def461136dce --
git bisect  bad 19f7d9c2f948a4c5c7742adb16fe00920f35f302  # 13:29     23-      6  Merge 'jtkirshe-net-next/i40e-queue' into devel-roam-smoke-201501311226
git bisect  bad 2c86978183cc365003e2d6949052a30865ef8b89  # 13:33     34-     32  Merge 'wsa/i2c/for-next' into devel-roam-smoke-201501311226
git bisect good 1ffdd3662d27b1d4d59d51bbcc104b200be63d6a  # 13:37     66+      0  Merge 'pci/pci/virtualization' into devel-roam-smoke-201501311226
git bisect  bad 0ce6ea6707a3d5ae5bfdbdc4d16ebc86cff77f5f  # 13:43     32-     22  Merge 'rcu/rcu/next' into devel-roam-smoke-201501311226
git bisect good 53805a9f2fa76294af534fb7e9f96d43f1d820eb  # 13:52     66+      0  Merge 'iio/testing' into devel-roam-smoke-201501311226
git bisect good 78e691f4ae2d5edea0199ca802bb505b9cdced88  # 14:01     66+      0  Merge branches 'doc.2015.01.07a', 'fixes.2015.01.15a', 'preempt.2015.01.06a', 'srcu.2015.01.06a', 'stall.2015.01.16a' and 'torture.2015.01.11a' into HEAD
git bisect good 17366dc8dc49858ba931c4120d8de494e388d93e  # 14:05     66+      0  documentation: Update rcutree.kthread_prio for grace-period kthread use
git bisect good 569c1500e44189136c8a9f4b5e39f0055e422b0d  # 14:14     66+      0  documentation: Update based on on-demand vmstat workers
git bisect good 14fefdb410cf48327c972ce91deb5e98edc8671f  # 14:18     66+      0  rcu: Eliminate ->onoff_mutex from rcu_node structure
git bisect  bad dd2b39be8eee9d175c7842c30e405a5cbe50095a  # 14:29     11-     55  rcu: Handle outgoing CPUs on exit from idle loop
git bisect good 8c50d7da9124a9f1e92e13996a0a148b2431390d  # 14:34     66+      0  cpu: Make CPU-offline idle-loop transition point more precise
# first bad commit: [dd2b39be8eee9d175c7842c30e405a5cbe50095a] rcu: Handle outgoing CPUs on exit from idle loop
git bisect good 8c50d7da9124a9f1e92e13996a0a148b2431390d  # 14:37    198+      0  cpu: Make CPU-offline idle-loop transition point more precise
# extra tests with DEBUG_INFO
git bisect good dd2b39be8eee9d175c7842c30e405a5cbe50095a  # 15:35    198+    198  rcu: Handle outgoing CPUs on exit from idle loop
# extra tests on HEAD of linux-devel/devel-roam-smoke-201501311226
git bisect  bad d58664252218cfefb19709d597ff0c5d93688203  # 15:35      0-     15  0day head guard for 'devel-roam-smoke-201501311226'
# extra tests on tree/branch rcu/rcu/next
git bisect  bad c418b8035fac0cc7d242e5de126cec1006a34bed  # 15:52     47-     21  cpu: Stop newly offlined CPU from using RCU readers
# extra tests with first bad commit reverted
# extra tests on tree/branch linus/master
git bisect good 2141fd018156db0f29efb384f4d99ead23b48f18  # 16:04    198+      0  Merge tag 'char-misc-3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
# extra tests on tree/branch next/master
git bisect good 827e3bdf1bb2401c1a1e5586eb3977d228d298b2  # 16:12    198+      0  Add linux-next specific files for 20150130


This script may reproduce the error.

----------------------------------------------------------------------------
#!/bin/bash

kernel=$1
initrd=quantal-core-x86_64.cgz

wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/$initrd

kvm=(
	qemu-system-x86_64
	-cpu kvm64
	-enable-kvm
	-kernel $kernel
	-initrd $initrd
	-m 320
	-smp 2
	-net nic,vlan=1,model=e1000
	-net user,vlan=1
	-boot order=nc
	-no-reboot
	-watchdog i6300esb
	-rtc base=localtime
	-serial stdio
	-display none
	-monitor null 
)

append=(
	hung_task_panic=1
	earlyprintk=ttyS0,115200
	debug
	apic=debug
	sysrq_always_enabled
	rcupdate.rcu_cpu_stall_timeout=100
	panic=-1
	softlockup_panic=1
	nmi_watchdog=panic
	oops=panic
	load_ramdisk=2
	prompt_ramdisk=0
	console=ttyS0,115200
	console=tty0
	vga=normal
	root=/dev/ram0
	rw
	drbd.minor_count=8
)

"${kvm[@]}" --append "${append[*]}"
----------------------------------------------------------------------------

Thanks,
Fengguang

[-- Attachment #2: dmesg-quantal-ivb41-29:20150131142748:x86_64-randconfig-r0-01311150:3.19.0-rc1-gdd2b39b:10 --]
[-- Type: text/plain, Size: 30319 bytes --]

early console in setup code
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.19.0-rc1-gdd2b39b (kbuild@roam) (gcc version 4.9.1 (Debian 4.9.1-19) ) #10 SMP Sat Jan 31 14:26:30 CST 2015
[    0.000000] Command line: hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal  root=/dev/ram0 rw link=/kbuild-tests/run-queue/kvm/x86_64-randconfig-r0-01311150/linux-devel:devel-roam-smoke-201501311226:dd2b39be8eee9d175c7842c30e405a5cbe50095a:bisect-linux-1/.vmlinuz-dd2b39be8eee9d175c7842c30e405a5cbe50095a-20150131142657-48-ivb41 branch=linux-devel/devel-roam-smoke-201501311226 BOOT_IMAGE=/kernel/x86_64-randconfig-r0-01311150/dd2b39be8eee9d175c7842c30e405a5cbe50095a/vmlinuz-3.19.0-rc1-gdd2b39b drbd.minor_count=8
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000013fdffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000013fe0000-0x0000000013ffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] Hypervisor detected: KVM
[    0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[    0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[    0.000000] e820: last_pfn = 0x13fe0 max_arch_pfn = 0x400000000
[    0.000000] MTRR default type: write-back
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 0080000000 mask FF80000000 uncachable
[    0.000000]   1 disabled
[    0.000000]   2 disabled
[    0.000000]   3 disabled
[    0.000000]   4 disabled
[    0.000000]   5 disabled
[    0.000000]   6 disabled
[    0.000000]   7 disabled
[    0.000000] Scan for SMP in [mem 0x00000000-0x000003ff]
[    0.000000] Scan for SMP in [mem 0x0009fc00-0x0009ffff]
[    0.000000] Scan for SMP in [mem 0x000f0000-0x000fffff]
[    0.000000] found SMP MP-table at [mem 0x000f0eb0-0x000f0ebf] mapped at [ffff8800000f0eb0]
[    0.000000]   mpc: f0ec0-f0fa4
[    0.000000] Base memory trampoline at [ffff880000099000] 99000 size 24576
[    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[    0.000000]  [mem 0x00000000-0x000fffff] page 4k
[    0.000000] BRK [0x03356000, 0x03356fff] PGTABLE
[    0.000000] BRK [0x03357000, 0x03357fff] PGTABLE
[    0.000000] BRK [0x03358000, 0x03358fff] PGTABLE
[    0.000000] init_memory_mapping: [mem 0x12600000-0x127fffff]
[    0.000000]  [mem 0x12600000-0x127fffff] page 4k
[    0.000000] BRK [0x03359000, 0x03359fff] PGTABLE
[    0.000000] init_memory_mapping: [mem 0x10000000-0x125fffff]
[    0.000000]  [mem 0x10000000-0x125fffff] page 4k
[    0.000000] BRK [0x0335a000, 0x0335afff] PGTABLE
[    0.000000] BRK [0x0335b000, 0x0335bfff] PGTABLE
[    0.000000] init_memory_mapping: [mem 0x00100000-0x0fffffff]
[    0.000000]  [mem 0x00100000-0x0fffffff] page 4k
[    0.000000] init_memory_mapping: [mem 0x12800000-0x13fdffff]
[    0.000000]  [mem 0x12800000-0x13fdffff] page 4k
[    0.000000] RAMDISK: [mem 0x12925000-0x13fd7fff]
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x00000000000F0C90 000014 (v00 BOCHS )
[    0.000000] ACPI: RSDT 0x0000000013FE18BD 000034 (v01 BOCHS  BXPCRSDT 00000001 BXPC 00000001)
[    0.000000] ACPI: FACP 0x0000000013FE0B37 000074 (v01 BOCHS  BXPCFACP 00000001 BXPC 00000001)
[    0.000000] ACPI: DSDT 0x0000000013FE0040 000AF7 (v01 BOCHS  BXPCDSDT 00000001 BXPC 00000001)
[    0.000000] ACPI: FACS 0x0000000013FE0000 000040
[    0.000000] ACPI: SSDT 0x0000000013FE0BAB 000C5A (v01 BOCHS  BXPCSSDT 00000001 BXPC 00000001)
[    0.000000] ACPI: APIC 0x0000000013FE1805 000080 (v01 BOCHS  BXPCAPIC 00000001 BXPC 00000001)
[    0.000000] ACPI: HPET 0x0000000013FE1885 000038 (v01 BOCHS  BXPCHPET 00000001 BXPC 00000001)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] mapped APIC to ffffffffff57d000 (        fee00000)
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at [mem 0x0000000000000000-0x0000000013fdffff]
[    0.000000] NODE_DATA(0) allocated [mem 0x12908000-0x12924fff]
[    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    0.000000] kvm-clock: cpu 0, msr 0:12888001, primary cpu clock
[    0.000000]  [ffffea0000000000-ffffea00005fffff] PMD -> [ffff880011800000-ffff880011dfffff] on node 0
[    0.000000] Zone ranges:
[    0.000000]   DMA32    [mem 0x00001000-0x13fdffff]
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x00001000-0x0009efff]
[    0.000000]   node   0: [mem 0x00100000-0x13fdffff]
[    0.000000] Initmem setup node 0 [mem 0x00001000-0x13fdffff]
[    0.000000] On node 0 totalpages: 81790
[    0.000000]   DMA32 zone: 1280 pages used for memmap
[    0.000000]   DMA32 zone: 21 pages reserved
[    0.000000]   DMA32 zone: 81790 pages, LIFO batch:15
[    0.000000] ACPI: PM-Timer IO Port: 0x608
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] mapped APIC to ffffffffff57d000 (        fee00000)
[    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[    0.000000] ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 00, APIC ID 0, APIC INT 02
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[    0.000000] Int: type 0, pol 1, trig 3, bus 00, IRQ 05, APIC ID 0, APIC INT 05
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] Int: type 0, pol 1, trig 3, bus 00, IRQ 09, APIC ID 0, APIC INT 09
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[    0.000000] Int: type 0, pol 1, trig 3, bus 00, IRQ 0a, APIC ID 0, APIC INT 0a
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[    0.000000] Int: type 0, pol 1, trig 3, bus 00, IRQ 0b, APIC ID 0, APIC INT 0b
[    0.000000] ACPI: IRQ0 used by override.
[    0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 01, APIC ID 0, APIC INT 01
[    0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 03, APIC ID 0, APIC INT 03
[    0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 04, APIC ID 0, APIC INT 04
[    0.000000] ACPI: IRQ5 used by override.
[    0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 06, APIC ID 0, APIC INT 06
[    0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 07, APIC ID 0, APIC INT 07
[    0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 08, APIC ID 0, APIC INT 08
[    0.000000] ACPI: IRQ9 used by override.
[    0.000000] ACPI: IRQ10 used by override.
[    0.000000] ACPI: IRQ11 used by override.
[    0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 0c, APIC ID 0, APIC INT 0c
[    0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 0d, APIC ID 0, APIC INT 0d
[    0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 0e, APIC ID 0, APIC INT 0e
[    0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 0f, APIC ID 0, APIC INT 0f
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[    0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
[    0.000000] mapped IOAPIC to ffffffffff57c000 (fec00000)
[    0.000000] e820: [mem 0x14000000-0xfeffbfff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on KVM
[    0.000000] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
[    0.000000] PERCPU: Embedded 30 pages/cpu @ffff880012400000 s83688 r8192 d31000 u1048576
[    0.000000] pcpu-alloc: s83688 r8192 d31000 u1048576 alloc=1*2097152
[    0.000000] pcpu-alloc: [0] 0 1 
[    0.000000] KVM setup async PF for cpu 0
[    0.000000] kvm-stealtime: cpu 0, msr 1240d200
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 80489
[    0.000000] Policy zone: DMA32
[    0.000000] Kernel command line: hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal  root=/dev/ram0 rw link=/kbuild-tests/run-queue/kvm/x86_64-randconfig-r0-01311150/linux-devel:devel-roam-smoke-201501311226:dd2b39be8eee9d175c7842c30e405a5cbe50095a:bisect-linux-1/.vmlinuz-dd2b39be8eee9d175c7842c30e405a5cbe50095a-20150131142657-48-ivb41 branch=linux-devel/devel-roam-smoke-201501311226 BOOT_IMAGE=/kernel/x86_64-randconfig-r0-01311150/dd2b39be8eee9d175c7842c30e405a5cbe50095a/vmlinuz-3.19.0-rc1-gdd2b39b drbd.minor_count=8
[    0.000000] PID hash table entries: 2048 (order: 2, 16384 bytes)
[    0.000000] Memory: 259652K/327160K available (7334K kernel code, 5518K rwdata, 5292K rodata, 1876K init, 14436K bss, 67508K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[    0.000000] Running RCU self tests
[    0.000000] Hierarchical RCU implementation.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
[    0.000000] NR_IRQS:524544 nr_irqs:440 16
[    0.000000] console [ttyS0] enabled
[    0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[    0.000000] ... MAX_LOCKDEP_SUBCLASSES:  8
[    0.000000] ... MAX_LOCK_DEPTH:          48
[    0.000000] ... MAX_LOCKDEP_KEYS:        8191
[    0.000000] ... CLASSHASH_SIZE:          4096
[    0.000000] ... MAX_LOCKDEP_ENTRIES:     32768
[    0.000000] ... MAX_LOCKDEP_CHAINS:      65536
[    0.000000] ... CHAINHASH_SIZE:          32768
[    0.000000]  memory used by lock dependency info: 8159 kB
[    0.000000]  per task-struct memory footprint: 1920 bytes
[    0.000000] hpet clockevent registered
[    0.000000] tsc: Detected 2693.508 MHz processor
[    0.008000] Calibrating delay loop (skipped) preset value.. 5387.01 BogoMIPS (lpj=10774032)
[    0.008000] pid_max: default: 4096 minimum: 301
[    0.008051] ACPI: Core revision 20141107
[    0.015007] ACPI: All ACPI Tables successfully acquired
[    0.016354] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.017723] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
[    0.018879] Mount-cache hash table entries: 1024 (order: 1, 8192 bytes)
[    0.019967] Mountpoint-cache hash table entries: 1024 (order: 1, 8192 bytes)
[    0.021012] Initializing cgroup subsys memory
[    0.021630] Initializing cgroup subsys freezer
[    0.022270] Initializing cgroup subsys net_cls
[    0.022904] Initializing cgroup subsys net_prio
[    0.024018] Initializing cgroup subsys debug
[    0.024712] mce: CPU supports 10 MCE banks
[    0.025319] numa_add_cpu cpu 0 node 0: mask now 0
[    0.025957] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[    0.025957] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
[    0.028128] debug: unmapping init [mem 0xffffffff8253a000-0xffffffff8253cfff]
[    0.030871] Getting VERSION: 1050014
[    0.031396] Getting VERSION: 1050014
[    0.032011] Getting ID: 0
[    0.032451] Getting ID: ff000000
[    0.032980] Getting LVT0: 8700
[    0.033396] Getting LVT1: 8400
[    0.033828] enabled ExtINT on CPU#0
[    0.034828] ENABLING IO-APIC IRQs
[    0.035164] init IO_APIC IRQs
[    0.035526]  apic 0 pin 0 not connected
[    0.036061] IOAPIC[0]: Set routing entry (0-1 -> 0x31 -> IRQ 1 Mode:0 Active:0 Dest:1)
[    0.036842] IOAPIC[0]: Set routing entry (0-2 -> 0x30 -> IRQ 0 Mode:0 Active:0 Dest:1)
[    0.037598] IOAPIC[0]: Set routing entry (0-2 -> 0x30 -> IRQ 0 Mode:0 Active:0 Dest:1)
[    0.038344] IOAPIC[0]: Set routing entry (0-3 -> 0x33 -> IRQ 3 Mode:0 Active:0 Dest:1)
[    0.039099] IOAPIC[0]: Set routing entry (0-4 -> 0x34 -> IRQ 4 Mode:0 Active:0 Dest:1)
[    0.040027] IOAPIC[0]: Set routing entry (0-5 -> 0x35 -> IRQ 5 Mode:1 Active:0 Dest:1)
[    0.040788] IOAPIC[0]: Set routing entry (0-6 -> 0x36 -> IRQ 6 Mode:0 Active:0 Dest:1)
[    0.041535] IOAPIC[0]: Set routing entry (0-7 -> 0x37 -> IRQ 7 Mode:0 Active:0 Dest:1)
[    0.042288] IOAPIC[0]: Set routing entry (0-8 -> 0x38 -> IRQ 8 Mode:0 Active:0 Dest:1)
[    0.043046] IOAPIC[0]: Set routing entry (0-9 -> 0x39 -> IRQ 9 Mode:1 Active:0 Dest:1)
[    0.043799] IOAPIC[0]: Set routing entry (0-10 -> 0x3a -> IRQ 10 Mode:1 Active:0 Dest:1)
[    0.044026] IOAPIC[0]: Set routing entry (0-11 -> 0x3b -> IRQ 11 Mode:1 Active:0 Dest:1)
[    0.044798] IOAPIC[0]: Set routing entry (0-12 -> 0x3c -> IRQ 12 Mode:0 Active:0 Dest:1)
[    0.045569] IOAPIC[0]: Set routing entry (0-13 -> 0x3d -> IRQ 13 Mode:0 Active:0 Dest:1)
[    0.046342] IOAPIC[0]: Set routing entry (0-14 -> 0x3e -> IRQ 14 Mode:0 Active:0 Dest:1)
[    0.047112] IOAPIC[0]: Set routing entry (0-15 -> 0x3f -> IRQ 15 Mode:0 Active:0 Dest:1)
[    0.048035]  apic 0 pin 16 not connected
[    0.048757]  apic 0 pin 17 not connected
[    0.049455]  apic 0 pin 18 not connected
[    0.050168]  apic 0 pin 19 not connected
[    0.052008]  apic 0 pin 20 not connected
[    0.052725]  apic 0 pin 21 not connected
[    0.053429]  apic 0 pin 22 not connected
[    0.054142]  apic 0 pin 23 not connected
[    0.055022] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.056009] smpboot: CPU0: Intel Common KVM processor (fam: 0f, model: 06, stepping: 01)
[    0.058187] Using local APIC timer interrupts.
[    0.058187] calibrating APIC timer ...
[    0.064000] ... lapic delta = 6249605
[    0.064000] ... PM-Timer delta = 357912
[    0.064000] ... PM-Timer result ok
[    0.064000] ..... delta 6249605
[    0.064000] ..... mult: 268418490
[    0.064000] ..... calibration result: 3999747
[    0.064000] ..... CPU clock speed is 2693.0801 MHz.
[    0.064000] ..... host bus clock speed is 999.3747 MHz.
[    0.064096] Performance Events: unsupported Netburst CPU model 6 no PMU driver, software events only.
[    0.066874] NMI watchdog: disabled (cpu0): hardware events not enabled
[    0.068619] x86: Booting SMP configuration:
[    0.069550] .... node  #0, CPUs:      #1
[    0.008000] kvm-clock: cpu 1, msr 0:12888041, secondary cpu clock
[    0.008000] masked ExtINT on CPU#1
[    0.008000] numa_add_cpu cpu 1 node 0: mask now 0-1
[    0.084241] x86: Booted up 1 node, 2 CPUs
[    0.084191] KVM setup async PF for cpu 1
[    0.084191] kvm-stealtime: cpu 1, msr 1250d200
[    0.088769] smpboot: Total of 2 processors activated (10774.03 BogoMIPS)
[    0.092091] devtmpfs: initialized
[    0.103741] prandom: seed boundary self test passed
[    0.104794] prandom: 100 self tests passed
[    0.106142] regulator-dummy: no parameters
[    0.107419] NET: Registered protocol family 16
[    0.112106] cpuidle: using governor ladder
[    0.116050] cpuidle: using governor menu
[    0.117118] ACPI: bus type PCI registered
[    0.117867] PCI: Using configuration type 1 for base access
[    0.120097] Running resizable hashtable tests...
[    0.120853]   Adding 2048 keys
[    0.217339]   Traversal complete: counted=2048, nelems=2048, entries=2048
[    0.219246]   Table expansion iteration 0...
[    0.232139]   Verifying lookups...
[    0.233457]   Table expansion iteration 1...
[    0.244212]   Verifying lookups...
[    0.245566]   Table expansion iteration 2...
[    0.256359]   Verifying lookups...
[    0.257696]   Table expansion iteration 3...
[    0.264550]   Verifying lookups...
[    0.265771]   Table shrinkage iteration 0...
[    0.268063]   Verifying lookups...
[    0.269277]   Table shrinkage iteration 1...
[    0.272053]   Verifying lookups...
[    0.273276]   Table shrinkage iteration 2...
[    0.276048]   Verifying lookups...
[    0.277294]   Table shrinkage iteration 3...
[    0.280047]   Verifying lookups...
[    0.282118]   Traversal complete: counted=2048, nelems=2048, entries=2048
[    0.282981]   Deleting 2048 keys
[    0.324175] ACPI: Added _OSI(Module Device)
[    0.324734] ACPI: Added _OSI(Processor Device)
[    0.325313] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.325912] ACPI: Added _OSI(Processor Aggregator Device)
[    0.329359] IOAPIC[0]: Set routing entry (0-9 -> 0x39 -> IRQ 9 Mode:1 Active:0 Dest:3)
[    0.335760] ACPI: Interpreter enabled
[    0.336011] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S1_] (20141107/hwxface-580)
[    0.337214] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S2_] (20141107/hwxface-580)
[    0.338453] ACPI: (supports S0 S3 S5)
[    0.338932] ACPI: Using IOAPIC for interrupt routing
[    0.339587] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.365344] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.368018] acpi PNP0A03:00: _OSC: OS supports [Segments]
[    0.369088] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM
[    0.371024] PCI host bridge to bus 0000:00
[    0.372012] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.372993] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7]
[    0.373953] pci_bus 0000:00: root bus resource [io  0x0d00-0xadff]
[    0.374727] pci_bus 0000:00: root bus resource [io  0xae0f-0xaeff]
[    0.375507] pci_bus 0000:00: root bus resource [io  0xaf20-0xafdf]
[    0.376008] pci_bus 0000:00: root bus resource [io  0xafe4-0xffff]
[    0.376796] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff]
[    0.377667] pci_bus 0000:00: root bus resource [mem 0x14000000-0xfebfffff]
[    0.378632] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
[    0.380575] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100
[    0.382181] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180
[    0.386282] pci 0000:00:01.1: reg 0x20: [io  0xc040-0xc04f]
[    0.388613] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io  0x01f0-0x01f7]
[    0.389884] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io  0x03f6]
[    0.391027] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io  0x0170-0x0177]
[    0.392009] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io  0x0376]
[    0.396212] pci 0000:00:01.3: [8086:7113] type 00 class 0x068000
[    0.397893] pci 0000:00:01.3: quirk: [io  0x0600-0x063f] claimed by PIIX4 ACPI
[    0.400026] pci 0000:00:01.3: quirk: [io  0x0700-0x070f] claimed by PIIX4 SMB
[    0.401835] pci 0000:00:02.0: [1013:00b8] type 00 class 0x030000
[    0.404664] pci 0000:00:02.0: reg 0x10: [mem 0xfc000000-0xfdffffff pref]
[    0.407223] pci 0000:00:02.0: reg 0x14: [mem 0xfebf0000-0xfebf0fff]
[    0.414670] pci 0000:00:02.0: reg 0x30: [mem 0xfebe0000-0xfebeffff pref]
[    0.416563] pci 0000:00:03.0: [8086:100e] type 00 class 0x020000
[    0.418708] pci 0000:00:03.0: reg 0x10: [mem 0xfebc0000-0xfebdffff]
[    0.421033] pci 0000:00:03.0: reg 0x14: [io  0xc000-0xc03f]
[    0.426913] pci 0000:00:03.0: reg 0x30: [mem 0xfeb80000-0xfebbffff pref]
[    0.428443] pci 0000:00:04.0: [8086:25ab] type 00 class 0x088000
[    0.429926] pci 0000:00:04.0: reg 0x10: [mem 0xfebf1000-0xfebf100f]
[    0.436309] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[    0.437658] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[    0.438974] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[    0.440281] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[    0.441316] ACPI: PCI Interrupt Link [LNKS] (IRQs *9)
[    0.443295] ACPI: Enabled 16 GPEs in block 00 to 0F
[    0.444517] vgaarb: setting as boot device: PCI:0000:00:02.0
[    0.444517] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[    0.444517] vgaarb: loaded
[    0.444517] vgaarb: bridge control possible 0000:00:02.0
[    0.445397] pps_core: LinuxPPS API ver. 1 registered
[    0.446290] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    0.448096] PCI: Using ACPI for IRQ routing
[    0.448868] PCI: pci_cache_line_size set to 64 bytes
[    0.450026] e820: reserve RAM buffer [mem 0x0009fc00-0x0009ffff]
[    0.451070] e820: reserve RAM buffer [mem 0x13fe0000-0x13ffffff]
[    0.460045] Switched to clocksource kvm-clock
[    0.461646] Warning: could not register all branches stats
[    0.462501] Warning: could not register annotated branches stats
[    0.564347] pnp: PnP ACPI init
[    0.565076] IOAPIC[0]: Set routing entry (0-8 -> 0x38 -> IRQ 8 Mode:0 Active:0 Dest:3)
[    0.566600] pnp 00:00: Plug and Play ACPI device, IDs PNP0b00 (active)
[    0.567841] IOAPIC[0]: Set routing entry (0-1 -> 0x31 -> IRQ 1 Mode:0 Active:0 Dest:3)
[    0.569254] pnp 00:01: Plug and Play ACPI device, IDs PNP0303 (active)
[    0.570501] IOAPIC[0]: Set routing entry (0-12 -> 0x3c -> IRQ 12 Mode:0 Active:0 Dest:3)
[    0.571937] pnp 00:02: Plug and Play ACPI device, IDs PNP0f13 (active)
[    0.573072] IOAPIC[0]: Set routing entry (0-6 -> 0x36 -> IRQ 6 Mode:0 Active:0 Dest:3)
[    0.574194] pnp 00:03: [dma 2]
[    0.574774] pnp 00:03: Plug and Play ACPI device, IDs PNP0700 (active)
[    0.576029] IOAPIC[0]: Set routing entry (0-7 -> 0x37 -> IRQ 7 Mode:0 Active:0 Dest:3)
[    0.577458] pnp 00:04: Plug and Play ACPI device, IDs PNP0400 (active)
[    0.578714] IOAPIC[0]: Set routing entry (0-4 -> 0x34 -> IRQ 4 Mode:0 Active:0 Dest:3)
[    0.580124] pnp 00:05: Plug and Play ACPI device, IDs PNP0501 (active)
[    0.582349] pnp: PnP ACPI: found 6 devices
[    0.588689] pci_bus 0000:00: resource 4 [io  0x0000-0x0cf7]
[    0.589671] pci_bus 0000:00: resource 5 [io  0x0d00-0xadff]
[    0.590638] pci_bus 0000:00: resource 6 [io  0xae0f-0xaeff]
[    0.591577] pci_bus 0000:00: resource 7 [io  0xaf20-0xafdf]
[    0.592528] pci_bus 0000:00: resource 8 [io  0xafe4-0xffff]
[    0.593492] pci_bus 0000:00: resource 9 [mem 0x000a0000-0x000bffff]
[    0.594561] pci_bus 0000:00: resource 10 [mem 0x14000000-0xfebfffff]
[    0.595737] NET: Registered protocol family 2
[    0.597382] TCP established hash table entries: 4096 (order: 3, 32768 bytes)
[    0.598795] TCP bind hash table entries: 4096 (order: 6, 262144 bytes)
[    0.600822] TCP: Hash tables configured (established 4096 bind 4096)
[    0.601984] TCP: reno registered
[    0.602597] UDP hash table entries: 256 (order: 3, 40960 bytes)
[    0.603661] UDP-Lite hash table entries: 256 (order: 3, 40960 bytes)
[    0.605144] NET: Registered protocol family 1
[    0.605923] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    0.606897] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[    0.607844] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[    0.608881] pci 0000:00:02.0: Video device with shadowed ROM
[    0.609857] PCI: CLS 0 bytes, default 64
[    0.610924] Unpacking initramfs...
[    1.687420] debug: unmapping init [mem 0xffff880012925000-0xffff880013fd7fff]
[    1.690572] Machine check injector initialized
[    1.692324] twofish-x86_64-3way: performance on this CPU would be suboptimal: disabling twofish-x86_64-3way.
[    1.693789] sha512_ssse3: Neither AVX nor SSSE3 is available/usable.
[    1.694759] AVX instructions are not detected.
[    1.695434] AVX instructions are not detected.
[    1.696127] AVX2 instructions are not detected.
[    1.696937] rcu-torture:--- Start of test: nreaders=1 nfakewriters=4 stat_interval=60 verbose=1 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 shutdown_secs=0 stall_cpu=0 stall_cpu_holdoff=10 n_barrier_cbs=0 onoff_interval=0 onoff_holdoff=0
[    1.701632] rcu-torture: Creating rcu_torture_writer task
[    1.702549] rcu-torture: rcu_torture_writer task started
[    1.703409] rcu-torture: Creating rcu_torture_fakewriter task
[    1.704396] rcu-torture: rcu_torture_fakewriter task started
[    1.716467] rcu-torture: Creating rcu_torture_fakewriter task
[    1.717663] rcu-torture: Creating rcu_torture_fakewriter task
[    1.718574] rcu-torture: rcu_torture_fakewriter task started
[    1.719510] rcu-torture: Creating rcu_torture_fakewriter task
[    1.720291] rcu-torture: rcu_torture_fakewriter task started
[    1.721165] rcu-torture: Creating rcu_torture_reader task
[    1.722106] rcu-torture: rcu_torture_reader task started
[    1.734524] rcu-torture: rcu_torture_fakewriter task started
[    1.735378] rcu-torture: Creating rcu_torture_stats task
[    1.748763] rcu-torture: rcu_torture_stats task started
[    1.750152] rcu-torture: Creating torture_shuffle task
[    1.751021] rcu-torture: torture_shuffle task started
[    1.751753] rcu-torture: Creating torture_stutter task
[    1.753081] rcu-torture: torture_stutter task started
[    1.753876] rcu-torture: Creating rcu_torture_cbflood task
[    1.767895] futex hash table entries: 16 (order: -1, 2048 bytes)
[    1.768791] Initialise system trusted keyring
[    1.769753] audit: initializing netlink subsys (disabled)
[    1.770842] audit: type=2000 audit(1422685591.829:1): initialized
[    1.772722] rcu-torture: rcu_torture_cbflood task started
[    1.774017] zbud: loaded
[    1.788126] fuse init (API version 7.23)
[    1.806254] NET: Registered protocol family 38
[    1.806912] Key type asymmetric registered
[    1.807469] Asymmetric key parser 'x509' registered
[    1.810243] crc32: CRC_LE_BITS = 8, CRC_BE BITS = 8
[    1.810913] crc32: self tests passed, processed 225944 bytes in 649394 nsec
[    1.812523] crc32c: CRC_LE_BITS = 8
[    1.813030] crc32c: self tests passed, processed 225944 bytes in 324877 nsec
[    1.850266] crc32_combine: 8373 self tests passed
[    1.887457] crc32c_combine: 8373 self tests passed
[    1.888249] rbtree testing -> 25922 cycles
[    2.960534] augmented rbtree testing -> 37812 cycles
[    4.506730] tsc: Refined TSC clocksource calibration: 2693.507 MHz
[    4.506799] ipmi message handler version 39.2
[    4.506826] IPMI System Interface driver.
[    4.506970] ipmi_si: Adding default-specified kcs state machine
[    4.506973] ipmi_si: Trying default-specified kcs state machine at i/o address 0xca2, slave address 0x0, irq 0
[    4.507526] ipmi_si: Interface detection failed
[    4.551449] ipmi_si: Adding default-specified smic state machine
[    4.552392] ipmi_si: Trying default-specified smic state machine at i/o address 0xca9, slave address 0x0, irq 0
[    4.553958] ipmi_si: Interface detection failed
[    4.560988] ipmi_si: Adding default-specified bt state machine
[    4.561779] ipmi_si: Trying default-specified bt state machine at i/o address 0xe4, slave address 0x0, irq 0
[    4.563019] ipmi_si: Interface detection failed
[    4.569085] ipmi_si: Unable to find any System Interface(s)
[    4.569825] IPMI SSIF Interface driver
[    4.570390] IPMI Watchdog: driver initialized
[    4.570955] Copyright (C) 2004 MontaVista Software - IPMI Powerdown via sys_reboot.
[    4.571915] IPMI poweroff: Unable to register powercycle sysctl
[    4.572865] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[    4.573861] ACPI: Power Button [PWRF]
[    4.602960] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    4.652261] serial 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[    4.655912] lp: driver loaded but no devices found
[    4.656553] Hangcheck: starting hangcheck timer 0.9.1 (tick is 180 seconds, margin is 60 seconds).
[    4.657768] [drm] Initialized drm 1.1.0 20060810
[    4.658716] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[    4.660763] serio: i8042 KBD port at 0x60,0x64 irq 1
[    4.661470] serio: i8042 AUX port at 0x60,0x64 irq 12
[    4.662754] mousedev: PS/2 mouse device common for all mice
[    4.664150] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1
[    4.669630] advantechwdt: WDT driver for Advantech single board computer initialising
[    4.679729] advantechwdt: initialized. timeout=60 sec (nowayout=0)
[    4.680812] sbc60xxwdt: I/O address 0x0443 already in use
[    4.681649] wbsd: Winbond W83L51xD SD/MMC card interface driver
[    4.682413] wbsd: Copyright(c) Pierre Ossman
[    4.685640] panel: driver version 0.9.5 not yet registered
[    4.687029] GACT probability NOT on
[    4.687508] Mirror/redirect action on
[    4.688001] netem: version 1.3
[    4.688499] Netfilter messages via NETLINK v0.30.
[    4.689185] nf_conntrack version 0.5.0 (2028 buckets, 8112 max)
[    4.690422] IPVS: Registered protocols (UDP, SCTP)
[    4.691091] IPVS: Connection hash table configured (size=4096, memory=64Kbytes)
[    4.692001] IPVS: Each connection entry needs 400 bytes at least
[    4.692815] IPVS: Creating netns size=2032 id=0
[    4.769936] IPVS: ipvs loaded.
[    4.770607] IPVS: [lc] scheduler registered.
[    4.779748] IPVS: [lblc] scheduler registered.
[    4.780659] gre: GRE over IPv4 demultiplexor driver
[    4.781949] TCP: cubic registered
[    4.789127] Initializing XFRM netlink socket
[    4.790110] NET: Registered protocol family 10
[    4.792333] ip6_tables: (C) 2000-2006 Netfilter Core Team
[    4.793544] bridge: automatic filtering via arp/ip/ip6tables has been deprecated. Update your scripts to load br_netfilter if you need this

[-- Attachment #3: Type: text/plain, Size: 85 bytes --]

_______________________________________________
LKP mailing list
LKP@linux.intel.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-01  2:59 ` Fengguang Wu
  0 siblings, 0 replies; 45+ messages in thread
From: Fengguang Wu @ 2015-02-01  2:59 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 8087 bytes --]

Greetings,

0day kernel testing robot got the below dmesg and the first bad commit is

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git revert-c418b8035fac0cc7d242e5de126cec1006a34bed-dd2b39be8eee9d175c7842c30e405a5cbe50095a

commit dd2b39be8eee9d175c7842c30e405a5cbe50095a
Author:     Paul E. McKenney <paulmck@linux.vnet.ibm.com>
AuthorDate: Wed Jan 28 14:42:09 2015 -0800
Commit:     Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CommitDate: Fri Jan 30 12:59:22 2015 -0800

    rcu: Handle outgoing CPUs on exit from idle loop
    
    This commit informs RCU of an outgoing CPU just before that CPU invokes
    arch_cpu_idle_dead() during its last pass through the idle loop (via a
    new CPU_DYING_IDLE notifier value).  This change means that RCU need not
    deal with outgoing CPUs passing through the scheduler after informing
    RCU that they are no longer online.  Note that removing the CPU from
    the rcu_node ->qsmaskinit bit masks is done at CPU_DYING_IDLE time,
    and orphaning callbacks is still done at CPU_DEAD time, the reason being
    that at CPU_DEAD time we have another CPU that can adopt them.
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

+-------------------------------------+------------+------------+------------+
|                                     | 8c50d7da91 | dd2b39be8e | d586642522 |
+-------------------------------------+------------+------------+------------+
| boot_successes                      | 198        | 11         | 51         |
| boot_failures                       | 0          | 55         | 15         |
| INFO:suspicious_RCU_usage           | 0          | 55         | 15         |
| RCU_used_illegally_from_offline_CPU | 0          | 55         | 15         |
| backtrace:cpu_startup_entry         | 0          | 55         | 15         |
| BUG:kernel_test_hang                | 0          | 0          | 4          |
+-------------------------------------+------------+------------+------------+

[   15.244825] numa_remove_cpu cpu 0 node 0: mask now 1
[   15.246713] 
[   15.246917] ===============================
[   15.247424] [ INFO: suspicious RCU usage. ]
[   15.247964] 3.19.0-rc1-gdd2b39b #10 Not tainted
[   15.248531] -------------------------------
[   15.248586] include/trace/events/rcu.h:35 suspicious rcu_dereference_check() usage!
[   15.248586] 
[   15.248586] other info that might help us debug this:
[   15.248586] 
[   15.248586] 
[   15.248586] RCU used illegally from offline CPU!
[   15.248586] rcu_scheduler_active = 1, debug_locks = 0
[   15.248586] no locks held by swapper/0/0.
[   15.248586] 
[   15.248586] stack backtrace:
[   15.248586] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.19.0-rc1-gdd2b39b #10
[   15.248586]  0000000000000001 ffffffff81e03e08 ffffffff8171b89b 0000000000000000
[   15.248586]  ffffffff81e0e580 ffffffff81e03e38 ffffffff810efec2 ffffffff81e4b140
[   15.248586]  ffffffff81c77ba0 0000000000000002 ffffffff81e11e98 ffffffff81e03e58
[   15.248586] Call Trace:
[   15.248586]  [<ffffffff8171b89b>] dump_stack+0x7f/0xa7
[   15.248586]  [<ffffffff810efec2>] lockdep_rcu_suspicious+0x107/0x110
[   15.248586]  [<ffffffff81111363>] trace_rcu_utilization+0x127/0x133
[   15.248586]  [<ffffffff8111291e>] rcu_cpu_notify+0x527/0x53b
[   15.248586]  [<ffffffff810e9722>] cpu_startup_entry+0x1dc/0x4ea
[   15.248586]  [<ffffffff8170eb5d>] rest_init+0x159/0x15f
[   15.248586]  [<ffffffff8237b2da>] start_kernel+0x565/0x572
[   15.248586]  [<ffffffff8237a120>] ? early_idt_handlers+0x120/0x120
[   15.248586]  [<ffffffff8237a4e4>] x86_64_start_reservations+0x41/0x43
[   15.248586]  [<ffffffff8237a623>] x86_64_start_kernel+0x13d/0x14c
[   15.265151] CPU 0 is now offline
[   15.265941] debug: unmapping init [mem 0xffffffff82365000-0xffffffff82539fff]
[   15.266726] Write protecting the kernel read-only data: 14336k

git bisect start d58664252218cfefb19709d597ff0c5d93688203 26bc420b59a38e4e6685a73345a0def461136dce --
git bisect  bad 19f7d9c2f948a4c5c7742adb16fe00920f35f302  # 13:29     23-      6  Merge 'jtkirshe-net-next/i40e-queue' into devel-roam-smoke-201501311226
git bisect  bad 2c86978183cc365003e2d6949052a30865ef8b89  # 13:33     34-     32  Merge 'wsa/i2c/for-next' into devel-roam-smoke-201501311226
git bisect good 1ffdd3662d27b1d4d59d51bbcc104b200be63d6a  # 13:37     66+      0  Merge 'pci/pci/virtualization' into devel-roam-smoke-201501311226
git bisect  bad 0ce6ea6707a3d5ae5bfdbdc4d16ebc86cff77f5f  # 13:43     32-     22  Merge 'rcu/rcu/next' into devel-roam-smoke-201501311226
git bisect good 53805a9f2fa76294af534fb7e9f96d43f1d820eb  # 13:52     66+      0  Merge 'iio/testing' into devel-roam-smoke-201501311226
git bisect good 78e691f4ae2d5edea0199ca802bb505b9cdced88  # 14:01     66+      0  Merge branches 'doc.2015.01.07a', 'fixes.2015.01.15a', 'preempt.2015.01.06a', 'srcu.2015.01.06a', 'stall.2015.01.16a' and 'torture.2015.01.11a' into HEAD
git bisect good 17366dc8dc49858ba931c4120d8de494e388d93e  # 14:05     66+      0  documentation: Update rcutree.kthread_prio for grace-period kthread use
git bisect good 569c1500e44189136c8a9f4b5e39f0055e422b0d  # 14:14     66+      0  documentation: Update based on on-demand vmstat workers
git bisect good 14fefdb410cf48327c972ce91deb5e98edc8671f  # 14:18     66+      0  rcu: Eliminate ->onoff_mutex from rcu_node structure
git bisect  bad dd2b39be8eee9d175c7842c30e405a5cbe50095a  # 14:29     11-     55  rcu: Handle outgoing CPUs on exit from idle loop
git bisect good 8c50d7da9124a9f1e92e13996a0a148b2431390d  # 14:34     66+      0  cpu: Make CPU-offline idle-loop transition point more precise
# first bad commit: [dd2b39be8eee9d175c7842c30e405a5cbe50095a] rcu: Handle outgoing CPUs on exit from idle loop
git bisect good 8c50d7da9124a9f1e92e13996a0a148b2431390d  # 14:37    198+      0  cpu: Make CPU-offline idle-loop transition point more precise
# extra tests with DEBUG_INFO
git bisect good dd2b39be8eee9d175c7842c30e405a5cbe50095a  # 15:35    198+    198  rcu: Handle outgoing CPUs on exit from idle loop
# extra tests on HEAD of linux-devel/devel-roam-smoke-201501311226
git bisect  bad d58664252218cfefb19709d597ff0c5d93688203  # 15:35      0-     15  0day head guard for 'devel-roam-smoke-201501311226'
# extra tests on tree/branch rcu/rcu/next
git bisect  bad c418b8035fac0cc7d242e5de126cec1006a34bed  # 15:52     47-     21  cpu: Stop newly offlined CPU from using RCU readers
# extra tests with first bad commit reverted
# extra tests on tree/branch linus/master
git bisect good 2141fd018156db0f29efb384f4d99ead23b48f18  # 16:04    198+      0  Merge tag 'char-misc-3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
# extra tests on tree/branch next/master
git bisect good 827e3bdf1bb2401c1a1e5586eb3977d228d298b2  # 16:12    198+      0  Add linux-next specific files for 20150130


This script may reproduce the error.

----------------------------------------------------------------------------
#!/bin/bash

kernel=$1
initrd=quantal-core-x86_64.cgz

wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/$initrd

kvm=(
	qemu-system-x86_64
	-cpu kvm64
	-enable-kvm
	-kernel $kernel
	-initrd $initrd
	-m 320
	-smp 2
	-net nic,vlan=1,model=e1000
	-net user,vlan=1
	-boot order=nc
	-no-reboot
	-watchdog i6300esb
	-rtc base=localtime
	-serial stdio
	-display none
	-monitor null 
)

append=(
	hung_task_panic=1
	earlyprintk=ttyS0,115200
	debug
	apic=debug
	sysrq_always_enabled
	rcupdate.rcu_cpu_stall_timeout=100
	panic=-1
	softlockup_panic=1
	nmi_watchdog=panic
	oops=panic
	load_ramdisk=2
	prompt_ramdisk=0
	console=ttyS0,115200
	console=tty0
	vga=normal
	root=/dev/ram0
	rw
	drbd.minor_count=8
)

"${kvm[@]}" --append "${append[*]}"
----------------------------------------------------------------------------

Thanks,
Fengguang

_______________________________________________
LKP mailing list
LKP(a)linux.intel.com

[-- Attachment #2: 3.19.0-rc1-gdd2b39b10 --]
[-- Type: text/plain, Size: 29863 bytes --]

early console in setup code
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.19.0-rc1-gdd2b39b (kbuild@roam) (gcc version 4.9.1 (Debian 4.9.1-19) ) #10 SMP Sat Jan 31 14:26:30 CST 2015
[    0.000000] Command line: hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal  root=/dev/ram0 rw link=/kbuild-tests/run-queue/kvm/x86_64-randconfig-r0-01311150/linux-devel:devel-roam-smoke-201501311226:dd2b39be8eee9d175c7842c30e405a5cbe50095a:bisect-linux-1/.vmlinuz-dd2b39be8eee9d175c7842c30e405a5cbe50095a-20150131142657-48-ivb41 branch=linux-devel/devel-roam-smoke-201501311226 BOOT_IMAGE=/kernel/x86_64-randconfig-r0-01311150/dd2b39be8eee9d175c7842c30e405a5cbe50095a/vmlinuz-3.19.0-rc1-gdd2b39b drbd.minor_count=8
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000013fdffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000013fe0000-0x0000000013ffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] Hypervisor detected: KVM
[    0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[    0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[    0.000000] e820: last_pfn = 0x13fe0 max_arch_pfn = 0x400000000
[    0.000000] MTRR default type: write-back
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 0080000000 mask FF80000000 uncachable
[    0.000000]   1 disabled
[    0.000000]   2 disabled
[    0.000000]   3 disabled
[    0.000000]   4 disabled
[    0.000000]   5 disabled
[    0.000000]   6 disabled
[    0.000000]   7 disabled
[    0.000000] Scan for SMP in [mem 0x00000000-0x000003ff]
[    0.000000] Scan for SMP in [mem 0x0009fc00-0x0009ffff]
[    0.000000] Scan for SMP in [mem 0x000f0000-0x000fffff]
[    0.000000] found SMP MP-table at [mem 0x000f0eb0-0x000f0ebf] mapped at [ffff8800000f0eb0]
[    0.000000]   mpc: f0ec0-f0fa4
[    0.000000] Base memory trampoline at [ffff880000099000] 99000 size 24576
[    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[    0.000000]  [mem 0x00000000-0x000fffff] page 4k
[    0.000000] BRK [0x03356000, 0x03356fff] PGTABLE
[    0.000000] BRK [0x03357000, 0x03357fff] PGTABLE
[    0.000000] BRK [0x03358000, 0x03358fff] PGTABLE
[    0.000000] init_memory_mapping: [mem 0x12600000-0x127fffff]
[    0.000000]  [mem 0x12600000-0x127fffff] page 4k
[    0.000000] BRK [0x03359000, 0x03359fff] PGTABLE
[    0.000000] init_memory_mapping: [mem 0x10000000-0x125fffff]
[    0.000000]  [mem 0x10000000-0x125fffff] page 4k
[    0.000000] BRK [0x0335a000, 0x0335afff] PGTABLE
[    0.000000] BRK [0x0335b000, 0x0335bfff] PGTABLE
[    0.000000] init_memory_mapping: [mem 0x00100000-0x0fffffff]
[    0.000000]  [mem 0x00100000-0x0fffffff] page 4k
[    0.000000] init_memory_mapping: [mem 0x12800000-0x13fdffff]
[    0.000000]  [mem 0x12800000-0x13fdffff] page 4k
[    0.000000] RAMDISK: [mem 0x12925000-0x13fd7fff]
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x00000000000F0C90 000014 (v00 BOCHS )
[    0.000000] ACPI: RSDT 0x0000000013FE18BD 000034 (v01 BOCHS  BXPCRSDT 00000001 BXPC 00000001)
[    0.000000] ACPI: FACP 0x0000000013FE0B37 000074 (v01 BOCHS  BXPCFACP 00000001 BXPC 00000001)
[    0.000000] ACPI: DSDT 0x0000000013FE0040 000AF7 (v01 BOCHS  BXPCDSDT 00000001 BXPC 00000001)
[    0.000000] ACPI: FACS 0x0000000013FE0000 000040
[    0.000000] ACPI: SSDT 0x0000000013FE0BAB 000C5A (v01 BOCHS  BXPCSSDT 00000001 BXPC 00000001)
[    0.000000] ACPI: APIC 0x0000000013FE1805 000080 (v01 BOCHS  BXPCAPIC 00000001 BXPC 00000001)
[    0.000000] ACPI: HPET 0x0000000013FE1885 000038 (v01 BOCHS  BXPCHPET 00000001 BXPC 00000001)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] mapped APIC to ffffffffff57d000 (        fee00000)
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at [mem 0x0000000000000000-0x0000000013fdffff]
[    0.000000] NODE_DATA(0) allocated [mem 0x12908000-0x12924fff]
[    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    0.000000] kvm-clock: cpu 0, msr 0:12888001, primary cpu clock
[    0.000000]  [ffffea0000000000-ffffea00005fffff] PMD -> [ffff880011800000-ffff880011dfffff] on node 0
[    0.000000] Zone ranges:
[    0.000000]   DMA32    [mem 0x00001000-0x13fdffff]
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x00001000-0x0009efff]
[    0.000000]   node   0: [mem 0x00100000-0x13fdffff]
[    0.000000] Initmem setup node 0 [mem 0x00001000-0x13fdffff]
[    0.000000] On node 0 totalpages: 81790
[    0.000000]   DMA32 zone: 1280 pages used for memmap
[    0.000000]   DMA32 zone: 21 pages reserved
[    0.000000]   DMA32 zone: 81790 pages, LIFO batch:15
[    0.000000] ACPI: PM-Timer IO Port: 0x608
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] mapped APIC to ffffffffff57d000 (        fee00000)
[    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[    0.000000] ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 00, APIC ID 0, APIC INT 02
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[    0.000000] Int: type 0, pol 1, trig 3, bus 00, IRQ 05, APIC ID 0, APIC INT 05
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] Int: type 0, pol 1, trig 3, bus 00, IRQ 09, APIC ID 0, APIC INT 09
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[    0.000000] Int: type 0, pol 1, trig 3, bus 00, IRQ 0a, APIC ID 0, APIC INT 0a
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[    0.000000] Int: type 0, pol 1, trig 3, bus 00, IRQ 0b, APIC ID 0, APIC INT 0b
[    0.000000] ACPI: IRQ0 used by override.
[    0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 01, APIC ID 0, APIC INT 01
[    0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 03, APIC ID 0, APIC INT 03
[    0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 04, APIC ID 0, APIC INT 04
[    0.000000] ACPI: IRQ5 used by override.
[    0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 06, APIC ID 0, APIC INT 06
[    0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 07, APIC ID 0, APIC INT 07
[    0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 08, APIC ID 0, APIC INT 08
[    0.000000] ACPI: IRQ9 used by override.
[    0.000000] ACPI: IRQ10 used by override.
[    0.000000] ACPI: IRQ11 used by override.
[    0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 0c, APIC ID 0, APIC INT 0c
[    0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 0d, APIC ID 0, APIC INT 0d
[    0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 0e, APIC ID 0, APIC INT 0e
[    0.000000] Int: type 0, pol 0, trig 0, bus 00, IRQ 0f, APIC ID 0, APIC INT 0f
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[    0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
[    0.000000] mapped IOAPIC to ffffffffff57c000 (fec00000)
[    0.000000] e820: [mem 0x14000000-0xfeffbfff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on KVM
[    0.000000] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
[    0.000000] PERCPU: Embedded 30 pages/cpu @ffff880012400000 s83688 r8192 d31000 u1048576
[    0.000000] pcpu-alloc: s83688 r8192 d31000 u1048576 alloc=1*2097152
[    0.000000] pcpu-alloc: [0] 0 1 
[    0.000000] KVM setup async PF for cpu 0
[    0.000000] kvm-stealtime: cpu 0, msr 1240d200
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 80489
[    0.000000] Policy zone: DMA32
[    0.000000] Kernel command line: hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal  root=/dev/ram0 rw link=/kbuild-tests/run-queue/kvm/x86_64-randconfig-r0-01311150/linux-devel:devel-roam-smoke-201501311226:dd2b39be8eee9d175c7842c30e405a5cbe50095a:bisect-linux-1/.vmlinuz-dd2b39be8eee9d175c7842c30e405a5cbe50095a-20150131142657-48-ivb41 branch=linux-devel/devel-roam-smoke-201501311226 BOOT_IMAGE=/kernel/x86_64-randconfig-r0-01311150/dd2b39be8eee9d175c7842c30e405a5cbe50095a/vmlinuz-3.19.0-rc1-gdd2b39b drbd.minor_count=8
[    0.000000] PID hash table entries: 2048 (order: 2, 16384 bytes)
[    0.000000] Memory: 259652K/327160K available (7334K kernel code, 5518K rwdata, 5292K rodata, 1876K init, 14436K bss, 67508K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[    0.000000] Running RCU self tests
[    0.000000] Hierarchical RCU implementation.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
[    0.000000] NR_IRQS:524544 nr_irqs:440 16
[    0.000000] console [ttyS0] enabled
[    0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[    0.000000] ... MAX_LOCKDEP_SUBCLASSES:  8
[    0.000000] ... MAX_LOCK_DEPTH:          48
[    0.000000] ... MAX_LOCKDEP_KEYS:        8191
[    0.000000] ... CLASSHASH_SIZE:          4096
[    0.000000] ... MAX_LOCKDEP_ENTRIES:     32768
[    0.000000] ... MAX_LOCKDEP_CHAINS:      65536
[    0.000000] ... CHAINHASH_SIZE:          32768
[    0.000000]  memory used by lock dependency info: 8159 kB
[    0.000000]  per task-struct memory footprint: 1920 bytes
[    0.000000] hpet clockevent registered
[    0.000000] tsc: Detected 2693.508 MHz processor
[    0.008000] Calibrating delay loop (skipped) preset value.. 5387.01 BogoMIPS (lpj=10774032)
[    0.008000] pid_max: default: 4096 minimum: 301
[    0.008051] ACPI: Core revision 20141107
[    0.015007] ACPI: All ACPI Tables successfully acquired
[    0.016354] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.017723] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
[    0.018879] Mount-cache hash table entries: 1024 (order: 1, 8192 bytes)
[    0.019967] Mountpoint-cache hash table entries: 1024 (order: 1, 8192 bytes)
[    0.021012] Initializing cgroup subsys memory
[    0.021630] Initializing cgroup subsys freezer
[    0.022270] Initializing cgroup subsys net_cls
[    0.022904] Initializing cgroup subsys net_prio
[    0.024018] Initializing cgroup subsys debug
[    0.024712] mce: CPU supports 10 MCE banks
[    0.025319] numa_add_cpu cpu 0 node 0: mask now 0
[    0.025957] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[    0.025957] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
[    0.028128] debug: unmapping init [mem 0xffffffff8253a000-0xffffffff8253cfff]
[    0.030871] Getting VERSION: 1050014
[    0.031396] Getting VERSION: 1050014
[    0.032011] Getting ID: 0
[    0.032451] Getting ID: ff000000
[    0.032980] Getting LVT0: 8700
[    0.033396] Getting LVT1: 8400
[    0.033828] enabled ExtINT on CPU#0
[    0.034828] ENABLING IO-APIC IRQs
[    0.035164] init IO_APIC IRQs
[    0.035526]  apic 0 pin 0 not connected
[    0.036061] IOAPIC[0]: Set routing entry (0-1 -> 0x31 -> IRQ 1 Mode:0 Active:0 Dest:1)
[    0.036842] IOAPIC[0]: Set routing entry (0-2 -> 0x30 -> IRQ 0 Mode:0 Active:0 Dest:1)
[    0.037598] IOAPIC[0]: Set routing entry (0-2 -> 0x30 -> IRQ 0 Mode:0 Active:0 Dest:1)
[    0.038344] IOAPIC[0]: Set routing entry (0-3 -> 0x33 -> IRQ 3 Mode:0 Active:0 Dest:1)
[    0.039099] IOAPIC[0]: Set routing entry (0-4 -> 0x34 -> IRQ 4 Mode:0 Active:0 Dest:1)
[    0.040027] IOAPIC[0]: Set routing entry (0-5 -> 0x35 -> IRQ 5 Mode:1 Active:0 Dest:1)
[    0.040788] IOAPIC[0]: Set routing entry (0-6 -> 0x36 -> IRQ 6 Mode:0 Active:0 Dest:1)
[    0.041535] IOAPIC[0]: Set routing entry (0-7 -> 0x37 -> IRQ 7 Mode:0 Active:0 Dest:1)
[    0.042288] IOAPIC[0]: Set routing entry (0-8 -> 0x38 -> IRQ 8 Mode:0 Active:0 Dest:1)
[    0.043046] IOAPIC[0]: Set routing entry (0-9 -> 0x39 -> IRQ 9 Mode:1 Active:0 Dest:1)
[    0.043799] IOAPIC[0]: Set routing entry (0-10 -> 0x3a -> IRQ 10 Mode:1 Active:0 Dest:1)
[    0.044026] IOAPIC[0]: Set routing entry (0-11 -> 0x3b -> IRQ 11 Mode:1 Active:0 Dest:1)
[    0.044798] IOAPIC[0]: Set routing entry (0-12 -> 0x3c -> IRQ 12 Mode:0 Active:0 Dest:1)
[    0.045569] IOAPIC[0]: Set routing entry (0-13 -> 0x3d -> IRQ 13 Mode:0 Active:0 Dest:1)
[    0.046342] IOAPIC[0]: Set routing entry (0-14 -> 0x3e -> IRQ 14 Mode:0 Active:0 Dest:1)
[    0.047112] IOAPIC[0]: Set routing entry (0-15 -> 0x3f -> IRQ 15 Mode:0 Active:0 Dest:1)
[    0.048035]  apic 0 pin 16 not connected
[    0.048757]  apic 0 pin 17 not connected
[    0.049455]  apic 0 pin 18 not connected
[    0.050168]  apic 0 pin 19 not connected
[    0.052008]  apic 0 pin 20 not connected
[    0.052725]  apic 0 pin 21 not connected
[    0.053429]  apic 0 pin 22 not connected
[    0.054142]  apic 0 pin 23 not connected
[    0.055022] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.056009] smpboot: CPU0: Intel Common KVM processor (fam: 0f, model: 06, stepping: 01)
[    0.058187] Using local APIC timer interrupts.
[    0.058187] calibrating APIC timer ...
[    0.064000] ... lapic delta = 6249605
[    0.064000] ... PM-Timer delta = 357912
[    0.064000] ... PM-Timer result ok
[    0.064000] ..... delta 6249605
[    0.064000] ..... mult: 268418490
[    0.064000] ..... calibration result: 3999747
[    0.064000] ..... CPU clock speed is 2693.0801 MHz.
[    0.064000] ..... host bus clock speed is 999.3747 MHz.
[    0.064096] Performance Events: unsupported Netburst CPU model 6 no PMU driver, software events only.
[    0.066874] NMI watchdog: disabled (cpu0): hardware events not enabled
[    0.068619] x86: Booting SMP configuration:
[    0.069550] .... node  #0, CPUs:      #1
[    0.008000] kvm-clock: cpu 1, msr 0:12888041, secondary cpu clock
[    0.008000] masked ExtINT on CPU#1
[    0.008000] numa_add_cpu cpu 1 node 0: mask now 0-1
[    0.084241] x86: Booted up 1 node, 2 CPUs
[    0.084191] KVM setup async PF for cpu 1
[    0.084191] kvm-stealtime: cpu 1, msr 1250d200
[    0.088769] smpboot: Total of 2 processors activated (10774.03 BogoMIPS)
[    0.092091] devtmpfs: initialized
[    0.103741] prandom: seed boundary self test passed
[    0.104794] prandom: 100 self tests passed
[    0.106142] regulator-dummy: no parameters
[    0.107419] NET: Registered protocol family 16
[    0.112106] cpuidle: using governor ladder
[    0.116050] cpuidle: using governor menu
[    0.117118] ACPI: bus type PCI registered
[    0.117867] PCI: Using configuration type 1 for base access
[    0.120097] Running resizable hashtable tests...
[    0.120853]   Adding 2048 keys
[    0.217339]   Traversal complete: counted=2048, nelems=2048, entries=2048
[    0.219246]   Table expansion iteration 0...
[    0.232139]   Verifying lookups...
[    0.233457]   Table expansion iteration 1...
[    0.244212]   Verifying lookups...
[    0.245566]   Table expansion iteration 2...
[    0.256359]   Verifying lookups...
[    0.257696]   Table expansion iteration 3...
[    0.264550]   Verifying lookups...
[    0.265771]   Table shrinkage iteration 0...
[    0.268063]   Verifying lookups...
[    0.269277]   Table shrinkage iteration 1...
[    0.272053]   Verifying lookups...
[    0.273276]   Table shrinkage iteration 2...
[    0.276048]   Verifying lookups...
[    0.277294]   Table shrinkage iteration 3...
[    0.280047]   Verifying lookups...
[    0.282118]   Traversal complete: counted=2048, nelems=2048, entries=2048
[    0.282981]   Deleting 2048 keys
[    0.324175] ACPI: Added _OSI(Module Device)
[    0.324734] ACPI: Added _OSI(Processor Device)
[    0.325313] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.325912] ACPI: Added _OSI(Processor Aggregator Device)
[    0.329359] IOAPIC[0]: Set routing entry (0-9 -> 0x39 -> IRQ 9 Mode:1 Active:0 Dest:3)
[    0.335760] ACPI: Interpreter enabled
[    0.336011] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S1_] (20141107/hwxface-580)
[    0.337214] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S2_] (20141107/hwxface-580)
[    0.338453] ACPI: (supports S0 S3 S5)
[    0.338932] ACPI: Using IOAPIC for interrupt routing
[    0.339587] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.365344] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.368018] acpi PNP0A03:00: _OSC: OS supports [Segments]
[    0.369088] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM
[    0.371024] PCI host bridge to bus 0000:00
[    0.372012] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.372993] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7]
[    0.373953] pci_bus 0000:00: root bus resource [io  0x0d00-0xadff]
[    0.374727] pci_bus 0000:00: root bus resource [io  0xae0f-0xaeff]
[    0.375507] pci_bus 0000:00: root bus resource [io  0xaf20-0xafdf]
[    0.376008] pci_bus 0000:00: root bus resource [io  0xafe4-0xffff]
[    0.376796] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff]
[    0.377667] pci_bus 0000:00: root bus resource [mem 0x14000000-0xfebfffff]
[    0.378632] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
[    0.380575] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100
[    0.382181] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180
[    0.386282] pci 0000:00:01.1: reg 0x20: [io  0xc040-0xc04f]
[    0.388613] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io  0x01f0-0x01f7]
[    0.389884] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io  0x03f6]
[    0.391027] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io  0x0170-0x0177]
[    0.392009] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io  0x0376]
[    0.396212] pci 0000:00:01.3: [8086:7113] type 00 class 0x068000
[    0.397893] pci 0000:00:01.3: quirk: [io  0x0600-0x063f] claimed by PIIX4 ACPI
[    0.400026] pci 0000:00:01.3: quirk: [io  0x0700-0x070f] claimed by PIIX4 SMB
[    0.401835] pci 0000:00:02.0: [1013:00b8] type 00 class 0x030000
[    0.404664] pci 0000:00:02.0: reg 0x10: [mem 0xfc000000-0xfdffffff pref]
[    0.407223] pci 0000:00:02.0: reg 0x14: [mem 0xfebf0000-0xfebf0fff]
[    0.414670] pci 0000:00:02.0: reg 0x30: [mem 0xfebe0000-0xfebeffff pref]
[    0.416563] pci 0000:00:03.0: [8086:100e] type 00 class 0x020000
[    0.418708] pci 0000:00:03.0: reg 0x10: [mem 0xfebc0000-0xfebdffff]
[    0.421033] pci 0000:00:03.0: reg 0x14: [io  0xc000-0xc03f]
[    0.426913] pci 0000:00:03.0: reg 0x30: [mem 0xfeb80000-0xfebbffff pref]
[    0.428443] pci 0000:00:04.0: [8086:25ab] type 00 class 0x088000
[    0.429926] pci 0000:00:04.0: reg 0x10: [mem 0xfebf1000-0xfebf100f]
[    0.436309] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[    0.437658] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[    0.438974] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[    0.440281] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[    0.441316] ACPI: PCI Interrupt Link [LNKS] (IRQs *9)
[    0.443295] ACPI: Enabled 16 GPEs in block 00 to 0F
[    0.444517] vgaarb: setting as boot device: PCI:0000:00:02.0
[    0.444517] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[    0.444517] vgaarb: loaded
[    0.444517] vgaarb: bridge control possible 0000:00:02.0
[    0.445397] pps_core: LinuxPPS API ver. 1 registered
[    0.446290] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    0.448096] PCI: Using ACPI for IRQ routing
[    0.448868] PCI: pci_cache_line_size set to 64 bytes
[    0.450026] e820: reserve RAM buffer [mem 0x0009fc00-0x0009ffff]
[    0.451070] e820: reserve RAM buffer [mem 0x13fe0000-0x13ffffff]
[    0.460045] Switched to clocksource kvm-clock
[    0.461646] Warning: could not register all branches stats
[    0.462501] Warning: could not register annotated branches stats
[    0.564347] pnp: PnP ACPI init
[    0.565076] IOAPIC[0]: Set routing entry (0-8 -> 0x38 -> IRQ 8 Mode:0 Active:0 Dest:3)
[    0.566600] pnp 00:00: Plug and Play ACPI device, IDs PNP0b00 (active)
[    0.567841] IOAPIC[0]: Set routing entry (0-1 -> 0x31 -> IRQ 1 Mode:0 Active:0 Dest:3)
[    0.569254] pnp 00:01: Plug and Play ACPI device, IDs PNP0303 (active)
[    0.570501] IOAPIC[0]: Set routing entry (0-12 -> 0x3c -> IRQ 12 Mode:0 Active:0 Dest:3)
[    0.571937] pnp 00:02: Plug and Play ACPI device, IDs PNP0f13 (active)
[    0.573072] IOAPIC[0]: Set routing entry (0-6 -> 0x36 -> IRQ 6 Mode:0 Active:0 Dest:3)
[    0.574194] pnp 00:03: [dma 2]
[    0.574774] pnp 00:03: Plug and Play ACPI device, IDs PNP0700 (active)
[    0.576029] IOAPIC[0]: Set routing entry (0-7 -> 0x37 -> IRQ 7 Mode:0 Active:0 Dest:3)
[    0.577458] pnp 00:04: Plug and Play ACPI device, IDs PNP0400 (active)
[    0.578714] IOAPIC[0]: Set routing entry (0-4 -> 0x34 -> IRQ 4 Mode:0 Active:0 Dest:3)
[    0.580124] pnp 00:05: Plug and Play ACPI device, IDs PNP0501 (active)
[    0.582349] pnp: PnP ACPI: found 6 devices
[    0.588689] pci_bus 0000:00: resource 4 [io  0x0000-0x0cf7]
[    0.589671] pci_bus 0000:00: resource 5 [io  0x0d00-0xadff]
[    0.590638] pci_bus 0000:00: resource 6 [io  0xae0f-0xaeff]
[    0.591577] pci_bus 0000:00: resource 7 [io  0xaf20-0xafdf]
[    0.592528] pci_bus 0000:00: resource 8 [io  0xafe4-0xffff]
[    0.593492] pci_bus 0000:00: resource 9 [mem 0x000a0000-0x000bffff]
[    0.594561] pci_bus 0000:00: resource 10 [mem 0x14000000-0xfebfffff]
[    0.595737] NET: Registered protocol family 2
[    0.597382] TCP established hash table entries: 4096 (order: 3, 32768 bytes)
[    0.598795] TCP bind hash table entries: 4096 (order: 6, 262144 bytes)
[    0.600822] TCP: Hash tables configured (established 4096 bind 4096)
[    0.601984] TCP: reno registered
[    0.602597] UDP hash table entries: 256 (order: 3, 40960 bytes)
[    0.603661] UDP-Lite hash table entries: 256 (order: 3, 40960 bytes)
[    0.605144] NET: Registered protocol family 1
[    0.605923] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    0.606897] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[    0.607844] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[    0.608881] pci 0000:00:02.0: Video device with shadowed ROM
[    0.609857] PCI: CLS 0 bytes, default 64
[    0.610924] Unpacking initramfs...
[    1.687420] debug: unmapping init [mem 0xffff880012925000-0xffff880013fd7fff]
[    1.690572] Machine check injector initialized
[    1.692324] twofish-x86_64-3way: performance on this CPU would be suboptimal: disabling twofish-x86_64-3way.
[    1.693789] sha512_ssse3: Neither AVX nor SSSE3 is available/usable.
[    1.694759] AVX instructions are not detected.
[    1.695434] AVX instructions are not detected.
[    1.696127] AVX2 instructions are not detected.
[    1.696937] rcu-torture:--- Start of test: nreaders=1 nfakewriters=4 stat_interval=60 verbose=1 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 shutdown_secs=0 stall_cpu=0 stall_cpu_holdoff=10 n_barrier_cbs=0 onoff_interval=0 onoff_holdoff=0
[    1.701632] rcu-torture: Creating rcu_torture_writer task
[    1.702549] rcu-torture: rcu_torture_writer task started
[    1.703409] rcu-torture: Creating rcu_torture_fakewriter task
[    1.704396] rcu-torture: rcu_torture_fakewriter task started
[    1.716467] rcu-torture: Creating rcu_torture_fakewriter task
[    1.717663] rcu-torture: Creating rcu_torture_fakewriter task
[    1.718574] rcu-torture: rcu_torture_fakewriter task started
[    1.719510] rcu-torture: Creating rcu_torture_fakewriter task
[    1.720291] rcu-torture: rcu_torture_fakewriter task started
[    1.721165] rcu-torture: Creating rcu_torture_reader task
[    1.722106] rcu-torture: rcu_torture_reader task started
[    1.734524] rcu-torture: rcu_torture_fakewriter task started
[    1.735378] rcu-torture: Creating rcu_torture_stats task
[    1.748763] rcu-torture: rcu_torture_stats task started
[    1.750152] rcu-torture: Creating torture_shuffle task
[    1.751021] rcu-torture: torture_shuffle task started
[    1.751753] rcu-torture: Creating torture_stutter task
[    1.753081] rcu-torture: torture_stutter task started
[    1.753876] rcu-torture: Creating rcu_torture_cbflood task
[    1.767895] futex hash table entries: 16 (order: -1, 2048 bytes)
[    1.768791] Initialise system trusted keyring
[    1.769753] audit: initializing netlink subsys (disabled)
[    1.770842] audit: type=2000 audit(1422685591.829:1): initialized
[    1.772722] rcu-torture: rcu_torture_cbflood task started
[    1.774017] zbud: loaded
[    1.788126] fuse init (API version 7.23)
[    1.806254] NET: Registered protocol family 38
[    1.806912] Key type asymmetric registered
[    1.807469] Asymmetric key parser 'x509' registered
[    1.810243] crc32: CRC_LE_BITS = 8, CRC_BE BITS = 8
[    1.810913] crc32: self tests passed, processed 225944 bytes in 649394 nsec
[    1.812523] crc32c: CRC_LE_BITS = 8
[    1.813030] crc32c: self tests passed, processed 225944 bytes in 324877 nsec
[    1.850266] crc32_combine: 8373 self tests passed
[    1.887457] crc32c_combine: 8373 self tests passed
[    1.888249] rbtree testing -> 25922 cycles
[    2.960534] augmented rbtree testing -> 37812 cycles
[    4.506730] tsc: Refined TSC clocksource calibration: 2693.507 MHz
[    4.506799] ipmi message handler version 39.2
[    4.506826] IPMI System Interface driver.
[    4.506970] ipmi_si: Adding default-specified kcs state machine
[    4.506973] ipmi_si: Trying default-specified kcs state machine at i/o address 0xca2, slave address 0x0, irq 0
[    4.507526] ipmi_si: Interface detection failed
[    4.551449] ipmi_si: Adding default-specified smic state machine
[    4.552392] ipmi_si: Trying default-specified smic state machine at i/o address 0xca9, slave address 0x0, irq 0
[    4.553958] ipmi_si: Interface detection failed
[    4.560988] ipmi_si: Adding default-specified bt state machine
[    4.561779] ipmi_si: Trying default-specified bt state machine at i/o address 0xe4, slave address 0x0, irq 0
[    4.563019] ipmi_si: Interface detection failed
[    4.569085] ipmi_si: Unable to find any System Interface(s)
[    4.569825] IPMI SSIF Interface driver
[    4.570390] IPMI Watchdog: driver initialized
[    4.570955] Copyright (C) 2004 MontaVista Software - IPMI Powerdown via sys_reboot.
[    4.571915] IPMI poweroff: Unable to register powercycle sysctl
[    4.572865] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[    4.573861] ACPI: Power Button [PWRF]
[    4.602960] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    4.652261] serial 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[    4.655912] lp: driver loaded but no devices found
[    4.656553] Hangcheck: starting hangcheck timer 0.9.1 (tick is 180 seconds, margin is 60 seconds).
[    4.657768] [drm] Initialized drm 1.1.0 20060810
[    4.658716] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[    4.660763] serio: i8042 KBD port at 0x60,0x64 irq 1
[    4.661470] serio: i8042 AUX port at 0x60,0x64 irq 12
[    4.662754] mousedev: PS/2 mouse device common for all mice
[    4.664150] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1
[    4.669630] advantechwdt: WDT driver for Advantech single board computer initialising
[    4.679729] advantechwdt: initialized. timeout=60 sec (nowayout=0)
[    4.680812] sbc60xxwdt: I/O address 0x0443 already in use
[    4.681649] wbsd: Winbond W83L51xD SD/MMC card interface driver
[    4.682413] wbsd: Copyright(c) Pierre Ossman
[    4.685640] panel: driver version 0.9.5 not yet registered
[    4.687029] GACT probability NOT on
[    4.687508] Mirror/redirect action on
[    4.688001] netem: version 1.3
[    4.688499] Netfilter messages via NETLINK v0.30.
[    4.689185] nf_conntrack version 0.5.0 (2028 buckets, 8112 max)
[    4.690422] IPVS: Registered protocols (UDP, SCTP)
[    4.691091] IPVS: Connection hash table configured (size=4096, memory=64Kbytes)
[    4.692001] IPVS: Each connection entry needs 400 bytes at least
[    4.692815] IPVS: Creating netns size=2032 id=0
[    4.769936] IPVS: ipvs loaded.
[    4.770607] IPVS: [lc] scheduler registered.
[    4.779748] IPVS: [lblc] scheduler registered.
[    4.780659] gre: GRE over IPv4 demultiplexor driver
[    4.781949] TCP: cubic registered
[    4.789127] Initializing XFRM netlink socket
[    4.790110] NET: Registered protocol family 10
[    4.792333] ip6_tables: (C) 2000-2006 Netfilter Core Team
[    4.793544] bridge: automatic filtering via arp/ip/ip6tables has been deprecated. Update your scripts to load br_netfilter if you need this

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
  2015-02-01  2:59 ` Fengguang Wu
@ 2015-02-03 10:01   ` Krzysztof Kozlowski
  -1 siblings, 0 replies; 45+ messages in thread
From: Krzysztof Kozlowski @ 2015-02-03 10:01 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: Paul E. McKenney, LKP, linux-kernel

On sob, 2015-01-31 at 18:59 -0800, Fengguang Wu wrote:
> Greetings,
> 
> 0day kernel testing robot got the below dmesg and the first bad commit is
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git revert-c418b8035fac0cc7d242e5de126cec1006a34bed-dd2b39be8eee9d175c7842c30e405a5cbe50095a

On next-20150203 I hit similar error on ARM/Exynos4412 (Trats2 board)
while suspending to RAM:

[   30.986262] PM: Syncing filesystems ... done.
[   30.994661] PM: Preparing system for mem sleep
[   31.002064] Freezing user space processes ... (elapsed 0.002 seconds) done.
[   31.008629] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[   31.016325] PM: Entering mem sleep
[   31.016338] Suspending console(s) (use no_console_suspend to debug)
[   31.051009] random: nonblocking pool is initialized
[   31.085811] wake enabled for irq 102
[   31.086964] wake enabled for irq 123
[   31.086972] wake enabled for irq 124
[   31.090409] PM: suspend of devices complete after 59.684 msecs
[   31.090524] CAM_ISP_CORE_1.2V: No configuration
[   31.090534] VMEM_VDDF_3.0V: No configuration
[   31.090543] VCC_SUB_2.0V: No configuration
[   31.090552] VCC_SUB_1.35V: No configuration
[   31.090562] VMEM_1.2V_AP: No configuration
[   31.090587] MOTOR_VCC_3.0V: No configuration
[   31.090596] LCD_VCC_3.3V: No configuration
[   31.090605] TSP_VDD_1.8V: No configuration
[   31.090614] TSP_AVDD_3.3V: No configuration
[   31.090623] VMEM_VDD_2.8V: No configuration
[   31.090631] VTF_2.8V: No configuration
[   31.090640] VDDQ_PRE_1.8V: No configuration
[   31.090649] VT_CAM_1.8V: No configuration
[   31.090658] CAM_ISP_SEN_IO_1.8V: No configuration
[   31.090667] CAM_SENSOR_CORE_1.2V: No configuration
[   31.090677] VHSIC_1.8V: No configuration
[   31.090685] VHSIC_1.0V: No configuration
[   31.090694] VABB2_1.95V: No configuration
[   31.090703] NFC_AVDD_1.8V: No configuration
[   31.090712] VUOTG_3.0V: No configuration
[   31.090721] VABB1_1.95V: No configuration
[   31.090730] VMIPI_1.8V: No configuration
[   31.090739] CAM_ISP_MIPI_1.2V: No configuration
[   31.090747] VMIPI_1.0V: No configuration
[   31.090756] VPLL_1.0V_AP: No configuration
[   31.090765] VMPLL_1.0V_AP: No configuration
[   31.090773] VCC_1.8V_IO: No configuration
[   31.090782] VCC_2.8V_AP: No configuration
[   31.090791] VCC_1.8V_AP: No configuration
[   31.090800] VM1M2_1.2V_AP: No configuration
[   31.090809] VALIVE_1.0V_AP: No configuration
[   31.100297] PM: late suspend of devices complete after 9.445 msecs
[   31.108891] PM: noirq suspend of devices complete after 8.577 msecs
[   31.109052] Disabling non-boot CPUs ...
[   31.113921]
[   31.113925] ===============================
[   31.113928] [ INFO: suspicious RCU usage. ]
[   31.113935] 3.19.0-rc7-next-20150203 #1914 Not tainted
[   31.113938] -------------------------------
[   31.113943] kernel/sched/fair.c:4740 suspicious rcu_dereference_check() usage!
[   31.113946]
[   31.113946] other info that might help us debug this:
[   31.113946]
[   31.113952]
[   31.113952] RCU used illegally from offline CPU!
[   31.113952] rcu_scheduler_active = 1, debug_locks = 0
[   31.113957] 3 locks held by swapper/1/0:
[   31.113988]  #0:  ((cpu_died).wait.lock){......}, at: [<c005a114>] complete+0x14/0x44
[   31.114012]  #1:  (&p->pi_lock){-.-.-.}, at: [<c004a790>] try_to_wake_up+0x28/0x300
[   31.114035]  #2:  (rcu_read_lock){......}, at: [<c004f1b8>] select_task_rq_fair+0x5c/0xa04
[   31.114038]
[   31.114038] stack backtrace:
[   31.114046] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150203 #1914
[   31.114050] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[   31.114076] [<c0014ce4>] (unwind_backtrace) from [<c0011c30>] (show_stack+0x10/0x14)
[   31.114091] [<c0011c30>] (show_stack) from [<c04dc048>] (dump_stack+0x70/0xbc)
[   31.114105] [<c04dc048>] (dump_stack) from [<c004f83c>] (select_task_rq_fair+0x6e0/0xa04)
[   31.114118] [<c004f83c>] (select_task_rq_fair) from [<c004a83c>] (try_to_wake_up+0xd4/0x300)
[   31.114129] [<c004a83c>] (try_to_wake_up) from [<c00598a0>] (__wake_up_common+0x4c/0x80)
[   31.114140] [<c00598a0>] (__wake_up_common) from [<c00598e8>] (__wake_up_locked+0x14/0x1c)
[   31.114150] [<c00598e8>] (__wake_up_locked) from [<c005a134>] (complete+0x34/0x44)
[   31.114167] [<c005a134>] (complete) from [<c04d6ca4>] (cpu_die+0x24/0x84)
[   31.114179] [<c04d6ca4>] (cpu_die) from [<c005a508>] (cpu_startup_entry+0x328/0x358)
[   31.114189] [<c005a508>] (cpu_startup_entry) from [<40008784>] (0x40008784)
[   31.114226] CPU1: shutdown
[   31.132479] CPU2: shutdown
[   31.146815] CPU3: shutdown
[   31.160767] Enabling non-boot CPUs ...
[   31.175645] CPU1 is up
[   31.191120] CPU2 is up
[   31.206650] CPU3 is up
[   31.206922] s3c-i2c 13860000.i2c: slave address 0x10
[   31.206935] s3c-i2c 13860000.i2c: bus frequency set to 390 KHz
[   31.206952] s3c-i2c 13890000.i2c: slave address 0x10
[   31.206962] s3c-i2c 13890000.i2c: bus frequency set to 390 KHz
[   31.206978] s3c-i2c 138d0000.i2c: slave address 0x10
[   31.206987] s3c-i2c 138d0000.i2c: bus frequency set to 97 KHz
[   31.209201] PM: noirq resume of devices complete after 2.539 msecs
[   31.212202] PM: early resume of devices complete after 2.812 msecs
[   31.229844] Failed to resume regulators from suspend (-22)
[   31.230915] wake disabled for irq 123
[   31.230923] wake disabled for irq 124
[   31.232003] wake disabled for irq 102
[   31.259950] max77686_rtc_tm_to_data: MAX77686 RTC cannot handle the year 1970.Assume it's 2000.
[   31.298929] mmc_host mmc1: Bus speed (slot 0) = 50000000Hz (slot req 400000Hz, actual 396825HZ div = 63)
[   31.526729] mmc_host mmc1: Bus speed (slot 0) = 50000000Hz (slot req 52000000Hz, actual 50000000HZ div = 0)
[   31.526976] mmc_host mmc1: Bus speed (slot 0) = 100000000Hz (slot req 52000000Hz, actual 50000000HZ div = 1)
[   31.527207] PM: resume of devices complete after 297.352 msecs
[   31.985665] PM: Finishing wakeup.
[   31.988959] Restarting tasks ... done.
root@target:~#

Best regards,
Krzysztof

> 
> commit dd2b39be8eee9d175c7842c30e405a5cbe50095a
> Author:     Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> AuthorDate: Wed Jan 28 14:42:09 2015 -0800
> Commit:     Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> CommitDate: Fri Jan 30 12:59:22 2015 -0800
> 
>     rcu: Handle outgoing CPUs on exit from idle loop
>     
>     This commit informs RCU of an outgoing CPU just before that CPU invokes
>     arch_cpu_idle_dead() during its last pass through the idle loop (via a
>     new CPU_DYING_IDLE notifier value).  This change means that RCU need not
>     deal with outgoing CPUs passing through the scheduler after informing
>     RCU that they are no longer online.  Note that removing the CPU from
>     the rcu_node ->qsmaskinit bit masks is done at CPU_DYING_IDLE time,
>     and orphaning callbacks is still done at CPU_DEAD time, the reason being
>     that at CPU_DEAD time we have another CPU that can adopt them.
>     
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> +-------------------------------------+------------+------------+------------+
> |                                     | 8c50d7da91 | dd2b39be8e | d586642522 |
> +-------------------------------------+------------+------------+------------+
> | boot_successes                      | 198        | 11         | 51         |
> | boot_failures                       | 0          | 55         | 15         |
> | INFO:suspicious_RCU_usage           | 0          | 55         | 15         |
> | RCU_used_illegally_from_offline_CPU | 0          | 55         | 15         |
> | backtrace:cpu_startup_entry         | 0          | 55         | 15         |
> | BUG:kernel_test_hang                | 0          | 0          | 4          |
> +-------------------------------------+------------+------------+------------+
> 
> [   15.244825] numa_remove_cpu cpu 0 node 0: mask now 1
> [   15.246713] 
> [   15.246917] ===============================
> [   15.247424] [ INFO: suspicious RCU usage. ]
> [   15.247964] 3.19.0-rc1-gdd2b39b #10 Not tainted
> [   15.248531] -------------------------------
> [   15.248586] include/trace/events/rcu.h:35 suspicious rcu_dereference_check() usage!
> [   15.248586] 
> [   15.248586] other info that might help us debug this:
> [   15.248586] 
> [   15.248586] 
> [   15.248586] RCU used illegally from offline CPU!
> [   15.248586] rcu_scheduler_active = 1, debug_locks = 0
> [   15.248586] no locks held by swapper/0/0.
> [   15.248586] 
> [   15.248586] stack backtrace:
> [   15.248586] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.19.0-rc1-gdd2b39b #10
> [   15.248586]  0000000000000001 ffffffff81e03e08 ffffffff8171b89b 0000000000000000
> [   15.248586]  ffffffff81e0e580 ffffffff81e03e38 ffffffff810efec2 ffffffff81e4b140
> [   15.248586]  ffffffff81c77ba0 0000000000000002 ffffffff81e11e98 ffffffff81e03e58
> [   15.248586] Call Trace:
> [   15.248586]  [<ffffffff8171b89b>] dump_stack+0x7f/0xa7
> [   15.248586]  [<ffffffff810efec2>] lockdep_rcu_suspicious+0x107/0x110
> [   15.248586]  [<ffffffff81111363>] trace_rcu_utilization+0x127/0x133
> [   15.248586]  [<ffffffff8111291e>] rcu_cpu_notify+0x527/0x53b
> [   15.248586]  [<ffffffff810e9722>] cpu_startup_entry+0x1dc/0x4ea
> [   15.248586]  [<ffffffff8170eb5d>] rest_init+0x159/0x15f
> [   15.248586]  [<ffffffff8237b2da>] start_kernel+0x565/0x572
> [   15.248586]  [<ffffffff8237a120>] ? early_idt_handlers+0x120/0x120
> [   15.248586]  [<ffffffff8237a4e4>] x86_64_start_reservations+0x41/0x43
> [   15.248586]  [<ffffffff8237a623>] x86_64_start_kernel+0x13d/0x14c
> [   15.265151] CPU 0 is now offline
> [   15.265941] debug: unmapping init [mem 0xffffffff82365000-0xffffffff82539fff]
> [   15.266726] Write protecting the kernel read-only data: 14336k
> 
> git bisect start d58664252218cfefb19709d597ff0c5d93688203 26bc420b59a38e4e6685a73345a0def461136dce --
> git bisect  bad 19f7d9c2f948a4c5c7742adb16fe00920f35f302  # 13:29     23-      6  Merge 'jtkirshe-net-next/i40e-queue' into devel-roam-smoke-201501311226
> git bisect  bad 2c86978183cc365003e2d6949052a30865ef8b89  # 13:33     34-     32  Merge 'wsa/i2c/for-next' into devel-roam-smoke-201501311226
> git bisect good 1ffdd3662d27b1d4d59d51bbcc104b200be63d6a  # 13:37     66+      0  Merge 'pci/pci/virtualization' into devel-roam-smoke-201501311226
> git bisect  bad 0ce6ea6707a3d5ae5bfdbdc4d16ebc86cff77f5f  # 13:43     32-     22  Merge 'rcu/rcu/next' into devel-roam-smoke-201501311226
> git bisect good 53805a9f2fa76294af534fb7e9f96d43f1d820eb  # 13:52     66+      0  Merge 'iio/testing' into devel-roam-smoke-201501311226
> git bisect good 78e691f4ae2d5edea0199ca802bb505b9cdced88  # 14:01     66+      0  Merge branches 'doc.2015.01.07a', 'fixes.2015.01.15a', 'preempt.2015.01.06a', 'srcu.2015.01.06a', 'stall.2015.01.16a' and 'torture.2015.01.11a' into HEAD
> git bisect good 17366dc8dc49858ba931c4120d8de494e388d93e  # 14:05     66+      0  documentation: Update rcutree.kthread_prio for grace-period kthread use
> git bisect good 569c1500e44189136c8a9f4b5e39f0055e422b0d  # 14:14     66+      0  documentation: Update based on on-demand vmstat workers
> git bisect good 14fefdb410cf48327c972ce91deb5e98edc8671f  # 14:18     66+      0  rcu: Eliminate ->onoff_mutex from rcu_node structure
> git bisect  bad dd2b39be8eee9d175c7842c30e405a5cbe50095a  # 14:29     11-     55  rcu: Handle outgoing CPUs on exit from idle loop
> git bisect good 8c50d7da9124a9f1e92e13996a0a148b2431390d  # 14:34     66+      0  cpu: Make CPU-offline idle-loop transition point more precise
> # first bad commit: [dd2b39be8eee9d175c7842c30e405a5cbe50095a] rcu: Handle outgoing CPUs on exit from idle loop
> git bisect good 8c50d7da9124a9f1e92e13996a0a148b2431390d  # 14:37    198+      0  cpu: Make CPU-offline idle-loop transition point more precise
> # extra tests with DEBUG_INFO
> git bisect good dd2b39be8eee9d175c7842c30e405a5cbe50095a  # 15:35    198+    198  rcu: Handle outgoing CPUs on exit from idle loop
> # extra tests on HEAD of linux-devel/devel-roam-smoke-201501311226
> git bisect  bad d58664252218cfefb19709d597ff0c5d93688203  # 15:35      0-     15  0day head guard for 'devel-roam-smoke-201501311226'
> # extra tests on tree/branch rcu/rcu/next
> git bisect  bad c418b8035fac0cc7d242e5de126cec1006a34bed  # 15:52     47-     21  cpu: Stop newly offlined CPU from using RCU readers
> # extra tests with first bad commit reverted
> # extra tests on tree/branch linus/master
> git bisect good 2141fd018156db0f29efb384f4d99ead23b48f18  # 16:04    198+      0  Merge tag 'char-misc-3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
> # extra tests on tree/branch next/master
> git bisect good 827e3bdf1bb2401c1a1e5586eb3977d228d298b2  # 16:12    198+      0  Add linux-next specific files for 20150130
> 
> 
> This script may reproduce the error.
> 
> ----------------------------------------------------------------------------
> #!/bin/bash
> 
> kernel=$1
> initrd=quantal-core-x86_64.cgz
> 
> wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/$initrd
> 
> kvm=(
> 	qemu-system-x86_64
> 	-cpu kvm64
> 	-enable-kvm
> 	-kernel $kernel
> 	-initrd $initrd
> 	-m 320
> 	-smp 2
> 	-net nic,vlan=1,model=e1000
> 	-net user,vlan=1
> 	-boot order=nc
> 	-no-reboot
> 	-watchdog i6300esb
> 	-rtc base=localtime
> 	-serial stdio
> 	-display none
> 	-monitor null 
> )
> 
> append=(
> 	hung_task_panic=1
> 	earlyprintk=ttyS0,115200
> 	debug
> 	apic=debug
> 	sysrq_always_enabled
> 	rcupdate.rcu_cpu_stall_timeout=100
> 	panic=-1
> 	softlockup_panic=1
> 	nmi_watchdog=panic
> 	oops=panic
> 	load_ramdisk=2
> 	prompt_ramdisk=0
> 	console=ttyS0,115200
> 	console=tty0
> 	vga=normal
> 	root=/dev/ram0
> 	rw
> 	drbd.minor_count=8
> )
> 
> "${kvm[@]}" --append "${append[*]}"
> ----------------------------------------------------------------------------
> 
> Thanks,
> Fengguang
> _______________________________________________
> LKP mailing list
> LKP@linux.intel.com


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-03 10:01   ` Krzysztof Kozlowski
  0 siblings, 0 replies; 45+ messages in thread
From: Krzysztof Kozlowski @ 2015-02-03 10:01 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 14320 bytes --]

On sob, 2015-01-31 at 18:59 -0800, Fengguang Wu wrote:
> Greetings,
> 
> 0day kernel testing robot got the below dmesg and the first bad commit is
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git revert-c418b8035fac0cc7d242e5de126cec1006a34bed-dd2b39be8eee9d175c7842c30e405a5cbe50095a

On next-20150203 I hit similar error on ARM/Exynos4412 (Trats2 board)
while suspending to RAM:

[   30.986262] PM: Syncing filesystems ... done.
[   30.994661] PM: Preparing system for mem sleep
[   31.002064] Freezing user space processes ... (elapsed 0.002 seconds) done.
[   31.008629] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[   31.016325] PM: Entering mem sleep
[   31.016338] Suspending console(s) (use no_console_suspend to debug)
[   31.051009] random: nonblocking pool is initialized
[   31.085811] wake enabled for irq 102
[   31.086964] wake enabled for irq 123
[   31.086972] wake enabled for irq 124
[   31.090409] PM: suspend of devices complete after 59.684 msecs
[   31.090524] CAM_ISP_CORE_1.2V: No configuration
[   31.090534] VMEM_VDDF_3.0V: No configuration
[   31.090543] VCC_SUB_2.0V: No configuration
[   31.090552] VCC_SUB_1.35V: No configuration
[   31.090562] VMEM_1.2V_AP: No configuration
[   31.090587] MOTOR_VCC_3.0V: No configuration
[   31.090596] LCD_VCC_3.3V: No configuration
[   31.090605] TSP_VDD_1.8V: No configuration
[   31.090614] TSP_AVDD_3.3V: No configuration
[   31.090623] VMEM_VDD_2.8V: No configuration
[   31.090631] VTF_2.8V: No configuration
[   31.090640] VDDQ_PRE_1.8V: No configuration
[   31.090649] VT_CAM_1.8V: No configuration
[   31.090658] CAM_ISP_SEN_IO_1.8V: No configuration
[   31.090667] CAM_SENSOR_CORE_1.2V: No configuration
[   31.090677] VHSIC_1.8V: No configuration
[   31.090685] VHSIC_1.0V: No configuration
[   31.090694] VABB2_1.95V: No configuration
[   31.090703] NFC_AVDD_1.8V: No configuration
[   31.090712] VUOTG_3.0V: No configuration
[   31.090721] VABB1_1.95V: No configuration
[   31.090730] VMIPI_1.8V: No configuration
[   31.090739] CAM_ISP_MIPI_1.2V: No configuration
[   31.090747] VMIPI_1.0V: No configuration
[   31.090756] VPLL_1.0V_AP: No configuration
[   31.090765] VMPLL_1.0V_AP: No configuration
[   31.090773] VCC_1.8V_IO: No configuration
[   31.090782] VCC_2.8V_AP: No configuration
[   31.090791] VCC_1.8V_AP: No configuration
[   31.090800] VM1M2_1.2V_AP: No configuration
[   31.090809] VALIVE_1.0V_AP: No configuration
[   31.100297] PM: late suspend of devices complete after 9.445 msecs
[   31.108891] PM: noirq suspend of devices complete after 8.577 msecs
[   31.109052] Disabling non-boot CPUs ...
[   31.113921]
[   31.113925] ===============================
[   31.113928] [ INFO: suspicious RCU usage. ]
[   31.113935] 3.19.0-rc7-next-20150203 #1914 Not tainted
[   31.113938] -------------------------------
[   31.113943] kernel/sched/fair.c:4740 suspicious rcu_dereference_check() usage!
[   31.113946]
[   31.113946] other info that might help us debug this:
[   31.113946]
[   31.113952]
[   31.113952] RCU used illegally from offline CPU!
[   31.113952] rcu_scheduler_active = 1, debug_locks = 0
[   31.113957] 3 locks held by swapper/1/0:
[   31.113988]  #0:  ((cpu_died).wait.lock){......}, at: [<c005a114>] complete+0x14/0x44
[   31.114012]  #1:  (&p->pi_lock){-.-.-.}, at: [<c004a790>] try_to_wake_up+0x28/0x300
[   31.114035]  #2:  (rcu_read_lock){......}, at: [<c004f1b8>] select_task_rq_fair+0x5c/0xa04
[   31.114038]
[   31.114038] stack backtrace:
[   31.114046] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150203 #1914
[   31.114050] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[   31.114076] [<c0014ce4>] (unwind_backtrace) from [<c0011c30>] (show_stack+0x10/0x14)
[   31.114091] [<c0011c30>] (show_stack) from [<c04dc048>] (dump_stack+0x70/0xbc)
[   31.114105] [<c04dc048>] (dump_stack) from [<c004f83c>] (select_task_rq_fair+0x6e0/0xa04)
[   31.114118] [<c004f83c>] (select_task_rq_fair) from [<c004a83c>] (try_to_wake_up+0xd4/0x300)
[   31.114129] [<c004a83c>] (try_to_wake_up) from [<c00598a0>] (__wake_up_common+0x4c/0x80)
[   31.114140] [<c00598a0>] (__wake_up_common) from [<c00598e8>] (__wake_up_locked+0x14/0x1c)
[   31.114150] [<c00598e8>] (__wake_up_locked) from [<c005a134>] (complete+0x34/0x44)
[   31.114167] [<c005a134>] (complete) from [<c04d6ca4>] (cpu_die+0x24/0x84)
[   31.114179] [<c04d6ca4>] (cpu_die) from [<c005a508>] (cpu_startup_entry+0x328/0x358)
[   31.114189] [<c005a508>] (cpu_startup_entry) from [<40008784>] (0x40008784)
[   31.114226] CPU1: shutdown
[   31.132479] CPU2: shutdown
[   31.146815] CPU3: shutdown
[   31.160767] Enabling non-boot CPUs ...
[   31.175645] CPU1 is up
[   31.191120] CPU2 is up
[   31.206650] CPU3 is up
[   31.206922] s3c-i2c 13860000.i2c: slave address 0x10
[   31.206935] s3c-i2c 13860000.i2c: bus frequency set to 390 KHz
[   31.206952] s3c-i2c 13890000.i2c: slave address 0x10
[   31.206962] s3c-i2c 13890000.i2c: bus frequency set to 390 KHz
[   31.206978] s3c-i2c 138d0000.i2c: slave address 0x10
[   31.206987] s3c-i2c 138d0000.i2c: bus frequency set to 97 KHz
[   31.209201] PM: noirq resume of devices complete after 2.539 msecs
[   31.212202] PM: early resume of devices complete after 2.812 msecs
[   31.229844] Failed to resume regulators from suspend (-22)
[   31.230915] wake disabled for irq 123
[   31.230923] wake disabled for irq 124
[   31.232003] wake disabled for irq 102
[   31.259950] max77686_rtc_tm_to_data: MAX77686 RTC cannot handle the year 1970.Assume it's 2000.
[   31.298929] mmc_host mmc1: Bus speed (slot 0) = 50000000Hz (slot req 400000Hz, actual 396825HZ div = 63)
[   31.526729] mmc_host mmc1: Bus speed (slot 0) = 50000000Hz (slot req 52000000Hz, actual 50000000HZ div = 0)
[   31.526976] mmc_host mmc1: Bus speed (slot 0) = 100000000Hz (slot req 52000000Hz, actual 50000000HZ div = 1)
[   31.527207] PM: resume of devices complete after 297.352 msecs
[   31.985665] PM: Finishing wakeup.
[   31.988959] Restarting tasks ... done.
root(a)target:~#

Best regards,
Krzysztof

> 
> commit dd2b39be8eee9d175c7842c30e405a5cbe50095a
> Author:     Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> AuthorDate: Wed Jan 28 14:42:09 2015 -0800
> Commit:     Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> CommitDate: Fri Jan 30 12:59:22 2015 -0800
> 
>     rcu: Handle outgoing CPUs on exit from idle loop
>     
>     This commit informs RCU of an outgoing CPU just before that CPU invokes
>     arch_cpu_idle_dead() during its last pass through the idle loop (via a
>     new CPU_DYING_IDLE notifier value).  This change means that RCU need not
>     deal with outgoing CPUs passing through the scheduler after informing
>     RCU that they are no longer online.  Note that removing the CPU from
>     the rcu_node ->qsmaskinit bit masks is done at CPU_DYING_IDLE time,
>     and orphaning callbacks is still done at CPU_DEAD time, the reason being
>     that at CPU_DEAD time we have another CPU that can adopt them.
>     
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> +-------------------------------------+------------+------------+------------+
> |                                     | 8c50d7da91 | dd2b39be8e | d586642522 |
> +-------------------------------------+------------+------------+------------+
> | boot_successes                      | 198        | 11         | 51         |
> | boot_failures                       | 0          | 55         | 15         |
> | INFO:suspicious_RCU_usage           | 0          | 55         | 15         |
> | RCU_used_illegally_from_offline_CPU | 0          | 55         | 15         |
> | backtrace:cpu_startup_entry         | 0          | 55         | 15         |
> | BUG:kernel_test_hang                | 0          | 0          | 4          |
> +-------------------------------------+------------+------------+------------+
> 
> [   15.244825] numa_remove_cpu cpu 0 node 0: mask now 1
> [   15.246713] 
> [   15.246917] ===============================
> [   15.247424] [ INFO: suspicious RCU usage. ]
> [   15.247964] 3.19.0-rc1-gdd2b39b #10 Not tainted
> [   15.248531] -------------------------------
> [   15.248586] include/trace/events/rcu.h:35 suspicious rcu_dereference_check() usage!
> [   15.248586] 
> [   15.248586] other info that might help us debug this:
> [   15.248586] 
> [   15.248586] 
> [   15.248586] RCU used illegally from offline CPU!
> [   15.248586] rcu_scheduler_active = 1, debug_locks = 0
> [   15.248586] no locks held by swapper/0/0.
> [   15.248586] 
> [   15.248586] stack backtrace:
> [   15.248586] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.19.0-rc1-gdd2b39b #10
> [   15.248586]  0000000000000001 ffffffff81e03e08 ffffffff8171b89b 0000000000000000
> [   15.248586]  ffffffff81e0e580 ffffffff81e03e38 ffffffff810efec2 ffffffff81e4b140
> [   15.248586]  ffffffff81c77ba0 0000000000000002 ffffffff81e11e98 ffffffff81e03e58
> [   15.248586] Call Trace:
> [   15.248586]  [<ffffffff8171b89b>] dump_stack+0x7f/0xa7
> [   15.248586]  [<ffffffff810efec2>] lockdep_rcu_suspicious+0x107/0x110
> [   15.248586]  [<ffffffff81111363>] trace_rcu_utilization+0x127/0x133
> [   15.248586]  [<ffffffff8111291e>] rcu_cpu_notify+0x527/0x53b
> [   15.248586]  [<ffffffff810e9722>] cpu_startup_entry+0x1dc/0x4ea
> [   15.248586]  [<ffffffff8170eb5d>] rest_init+0x159/0x15f
> [   15.248586]  [<ffffffff8237b2da>] start_kernel+0x565/0x572
> [   15.248586]  [<ffffffff8237a120>] ? early_idt_handlers+0x120/0x120
> [   15.248586]  [<ffffffff8237a4e4>] x86_64_start_reservations+0x41/0x43
> [   15.248586]  [<ffffffff8237a623>] x86_64_start_kernel+0x13d/0x14c
> [   15.265151] CPU 0 is now offline
> [   15.265941] debug: unmapping init [mem 0xffffffff82365000-0xffffffff82539fff]
> [   15.266726] Write protecting the kernel read-only data: 14336k
> 
> git bisect start d58664252218cfefb19709d597ff0c5d93688203 26bc420b59a38e4e6685a73345a0def461136dce --
> git bisect  bad 19f7d9c2f948a4c5c7742adb16fe00920f35f302  # 13:29     23-      6  Merge 'jtkirshe-net-next/i40e-queue' into devel-roam-smoke-201501311226
> git bisect  bad 2c86978183cc365003e2d6949052a30865ef8b89  # 13:33     34-     32  Merge 'wsa/i2c/for-next' into devel-roam-smoke-201501311226
> git bisect good 1ffdd3662d27b1d4d59d51bbcc104b200be63d6a  # 13:37     66+      0  Merge 'pci/pci/virtualization' into devel-roam-smoke-201501311226
> git bisect  bad 0ce6ea6707a3d5ae5bfdbdc4d16ebc86cff77f5f  # 13:43     32-     22  Merge 'rcu/rcu/next' into devel-roam-smoke-201501311226
> git bisect good 53805a9f2fa76294af534fb7e9f96d43f1d820eb  # 13:52     66+      0  Merge 'iio/testing' into devel-roam-smoke-201501311226
> git bisect good 78e691f4ae2d5edea0199ca802bb505b9cdced88  # 14:01     66+      0  Merge branches 'doc.2015.01.07a', 'fixes.2015.01.15a', 'preempt.2015.01.06a', 'srcu.2015.01.06a', 'stall.2015.01.16a' and 'torture.2015.01.11a' into HEAD
> git bisect good 17366dc8dc49858ba931c4120d8de494e388d93e  # 14:05     66+      0  documentation: Update rcutree.kthread_prio for grace-period kthread use
> git bisect good 569c1500e44189136c8a9f4b5e39f0055e422b0d  # 14:14     66+      0  documentation: Update based on on-demand vmstat workers
> git bisect good 14fefdb410cf48327c972ce91deb5e98edc8671f  # 14:18     66+      0  rcu: Eliminate ->onoff_mutex from rcu_node structure
> git bisect  bad dd2b39be8eee9d175c7842c30e405a5cbe50095a  # 14:29     11-     55  rcu: Handle outgoing CPUs on exit from idle loop
> git bisect good 8c50d7da9124a9f1e92e13996a0a148b2431390d  # 14:34     66+      0  cpu: Make CPU-offline idle-loop transition point more precise
> # first bad commit: [dd2b39be8eee9d175c7842c30e405a5cbe50095a] rcu: Handle outgoing CPUs on exit from idle loop
> git bisect good 8c50d7da9124a9f1e92e13996a0a148b2431390d  # 14:37    198+      0  cpu: Make CPU-offline idle-loop transition point more precise
> # extra tests with DEBUG_INFO
> git bisect good dd2b39be8eee9d175c7842c30e405a5cbe50095a  # 15:35    198+    198  rcu: Handle outgoing CPUs on exit from idle loop
> # extra tests on HEAD of linux-devel/devel-roam-smoke-201501311226
> git bisect  bad d58664252218cfefb19709d597ff0c5d93688203  # 15:35      0-     15  0day head guard for 'devel-roam-smoke-201501311226'
> # extra tests on tree/branch rcu/rcu/next
> git bisect  bad c418b8035fac0cc7d242e5de126cec1006a34bed  # 15:52     47-     21  cpu: Stop newly offlined CPU from using RCU readers
> # extra tests with first bad commit reverted
> # extra tests on tree/branch linus/master
> git bisect good 2141fd018156db0f29efb384f4d99ead23b48f18  # 16:04    198+      0  Merge tag 'char-misc-3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
> # extra tests on tree/branch next/master
> git bisect good 827e3bdf1bb2401c1a1e5586eb3977d228d298b2  # 16:12    198+      0  Add linux-next specific files for 20150130
> 
> 
> This script may reproduce the error.
> 
> ----------------------------------------------------------------------------
> #!/bin/bash
> 
> kernel=$1
> initrd=quantal-core-x86_64.cgz
> 
> wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/$initrd
> 
> kvm=(
> 	qemu-system-x86_64
> 	-cpu kvm64
> 	-enable-kvm
> 	-kernel $kernel
> 	-initrd $initrd
> 	-m 320
> 	-smp 2
> 	-net nic,vlan=1,model=e1000
> 	-net user,vlan=1
> 	-boot order=nc
> 	-no-reboot
> 	-watchdog i6300esb
> 	-rtc base=localtime
> 	-serial stdio
> 	-display none
> 	-monitor null 
> )
> 
> append=(
> 	hung_task_panic=1
> 	earlyprintk=ttyS0,115200
> 	debug
> 	apic=debug
> 	sysrq_always_enabled
> 	rcupdate.rcu_cpu_stall_timeout=100
> 	panic=-1
> 	softlockup_panic=1
> 	nmi_watchdog=panic
> 	oops=panic
> 	load_ramdisk=2
> 	prompt_ramdisk=0
> 	console=ttyS0,115200
> 	console=tty0
> 	vga=normal
> 	root=/dev/ram0
> 	rw
> 	drbd.minor_count=8
> )
> 
> "${kvm[@]}" --append "${append[*]}"
> ----------------------------------------------------------------------------
> 
> Thanks,
> Fengguang
> _______________________________________________
> LKP mailing list
> LKP(a)linux.intel.com


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
  2015-02-03 10:01   ` Krzysztof Kozlowski
@ 2015-02-03 16:27     ` Paul E. McKenney
  -1 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2015-02-03 16:27 UTC (permalink / raw)
  To: Krzysztof Kozlowski; +Cc: Fengguang Wu, LKP, linux-kernel

On Tue, Feb 03, 2015 at 11:01:42AM +0100, Krzysztof Kozlowski wrote:
> On sob, 2015-01-31 at 18:59 -0800, Fengguang Wu wrote:
> > Greetings,
> > 
> > 0day kernel testing robot got the below dmesg and the first bad commit is
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git revert-c418b8035fac0cc7d242e5de126cec1006a34bed-dd2b39be8eee9d175c7842c30e405a5cbe50095a
> 
> On next-20150203 I hit similar error on ARM/Exynos4412 (Trats2 board)
> while suspending to RAM:

Yep, you are not supposed to be using RCU on offline CPUs, and RCU recently
got more picky about that.  This could cause failures in any environment
where CPUs could get delayed by more than one jiffy, which includes pretty
much all virtualized environements.

> [   30.986262] PM: Syncing filesystems ... done.
> [   30.994661] PM: Preparing system for mem sleep
> [   31.002064] Freezing user space processes ... (elapsed 0.002 seconds) done.
> [   31.008629] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> [   31.016325] PM: Entering mem sleep
> [   31.016338] Suspending console(s) (use no_console_suspend to debug)
> [   31.051009] random: nonblocking pool is initialized
> [   31.085811] wake enabled for irq 102
> [   31.086964] wake enabled for irq 123
> [   31.086972] wake enabled for irq 124
> [   31.090409] PM: suspend of devices complete after 59.684 msecs
> [   31.090524] CAM_ISP_CORE_1.2V: No configuration
> [   31.090534] VMEM_VDDF_3.0V: No configuration
> [   31.090543] VCC_SUB_2.0V: No configuration
> [   31.090552] VCC_SUB_1.35V: No configuration
> [   31.090562] VMEM_1.2V_AP: No configuration
> [   31.090587] MOTOR_VCC_3.0V: No configuration
> [   31.090596] LCD_VCC_3.3V: No configuration
> [   31.090605] TSP_VDD_1.8V: No configuration
> [   31.090614] TSP_AVDD_3.3V: No configuration
> [   31.090623] VMEM_VDD_2.8V: No configuration
> [   31.090631] VTF_2.8V: No configuration
> [   31.090640] VDDQ_PRE_1.8V: No configuration
> [   31.090649] VT_CAM_1.8V: No configuration
> [   31.090658] CAM_ISP_SEN_IO_1.8V: No configuration
> [   31.090667] CAM_SENSOR_CORE_1.2V: No configuration
> [   31.090677] VHSIC_1.8V: No configuration
> [   31.090685] VHSIC_1.0V: No configuration
> [   31.090694] VABB2_1.95V: No configuration
> [   31.090703] NFC_AVDD_1.8V: No configuration
> [   31.090712] VUOTG_3.0V: No configuration
> [   31.090721] VABB1_1.95V: No configuration
> [   31.090730] VMIPI_1.8V: No configuration
> [   31.090739] CAM_ISP_MIPI_1.2V: No configuration
> [   31.090747] VMIPI_1.0V: No configuration
> [   31.090756] VPLL_1.0V_AP: No configuration
> [   31.090765] VMPLL_1.0V_AP: No configuration
> [   31.090773] VCC_1.8V_IO: No configuration
> [   31.090782] VCC_2.8V_AP: No configuration
> [   31.090791] VCC_1.8V_AP: No configuration
> [   31.090800] VM1M2_1.2V_AP: No configuration
> [   31.090809] VALIVE_1.0V_AP: No configuration
> [   31.100297] PM: late suspend of devices complete after 9.445 msecs
> [   31.108891] PM: noirq suspend of devices complete after 8.577 msecs
> [   31.109052] Disabling non-boot CPUs ...
> [   31.113921]
> [   31.113925] ===============================
> [   31.113928] [ INFO: suspicious RCU usage. ]
> [   31.113935] 3.19.0-rc7-next-20150203 #1914 Not tainted
> [   31.113938] -------------------------------
> [   31.113943] kernel/sched/fair.c:4740 suspicious rcu_dereference_check() usage!
> [   31.113946]
> [   31.113946] other info that might help us debug this:
> [   31.113946]
> [   31.113952]
> [   31.113952] RCU used illegally from offline CPU!
> [   31.113952] rcu_scheduler_active = 1, debug_locks = 0
> [   31.113957] 3 locks held by swapper/1/0:
> [   31.113988]  #0:  ((cpu_died).wait.lock){......}, at: [<c005a114>] complete+0x14/0x44
> [   31.114012]  #1:  (&p->pi_lock){-.-.-.}, at: [<c004a790>] try_to_wake_up+0x28/0x300
> [   31.114035]  #2:  (rcu_read_lock){......}, at: [<c004f1b8>] select_task_rq_fair+0x5c/0xa04
> [   31.114038]
> [   31.114038] stack backtrace:
> [   31.114046] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150203 #1914
> [   31.114050] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> [   31.114076] [<c0014ce4>] (unwind_backtrace) from [<c0011c30>] (show_stack+0x10/0x14)
> [   31.114091] [<c0011c30>] (show_stack) from [<c04dc048>] (dump_stack+0x70/0xbc)
> [   31.114105] [<c04dc048>] (dump_stack) from [<c004f83c>] (select_task_rq_fair+0x6e0/0xa04)
> [   31.114118] [<c004f83c>] (select_task_rq_fair) from [<c004a83c>] (try_to_wake_up+0xd4/0x300)
> [   31.114129] [<c004a83c>] (try_to_wake_up) from [<c00598a0>] (__wake_up_common+0x4c/0x80)
> [   31.114140] [<c00598a0>] (__wake_up_common) from [<c00598e8>] (__wake_up_locked+0x14/0x1c)
> [   31.114150] [<c00598e8>] (__wake_up_locked) from [<c005a134>] (complete+0x34/0x44)
> [   31.114167] [<c005a134>] (complete) from [<c04d6ca4>] (cpu_die+0x24/0x84)
> [   31.114179] [<c04d6ca4>] (cpu_die) from [<c005a508>] (cpu_startup_entry+0x328/0x358)

And so you no longer get to invoke complete() from the CPU going offline
out of the idle loop.

How would you like to handle this?  One approach would be to make __cpu_die()
poll with appropriate duty cycle.  Or is there some ARM-specific approach
that could work here?

Another thing I could do would be to have an arch-specific Kconfig
variable that made ARM responsible for informing RCU that the CPU
was departing, which would allow a call to as follows to be placed
immediately after the complete():

rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());

Note:  This absolutely requires that the rcu_cpu_notify() -always-
be allowed to execute!!!  This will not work if there is -any- possibility
of __cpu_die() powering off the outgoing CPU before the call to
rcu_cpu_notify() returns.

Thoughts?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-03 16:27     ` Paul E. McKenney
  0 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2015-02-03 16:27 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 5921 bytes --]

On Tue, Feb 03, 2015 at 11:01:42AM +0100, Krzysztof Kozlowski wrote:
> On sob, 2015-01-31 at 18:59 -0800, Fengguang Wu wrote:
> > Greetings,
> > 
> > 0day kernel testing robot got the below dmesg and the first bad commit is
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git revert-c418b8035fac0cc7d242e5de126cec1006a34bed-dd2b39be8eee9d175c7842c30e405a5cbe50095a
> 
> On next-20150203 I hit similar error on ARM/Exynos4412 (Trats2 board)
> while suspending to RAM:

Yep, you are not supposed to be using RCU on offline CPUs, and RCU recently
got more picky about that.  This could cause failures in any environment
where CPUs could get delayed by more than one jiffy, which includes pretty
much all virtualized environements.

> [   30.986262] PM: Syncing filesystems ... done.
> [   30.994661] PM: Preparing system for mem sleep
> [   31.002064] Freezing user space processes ... (elapsed 0.002 seconds) done.
> [   31.008629] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> [   31.016325] PM: Entering mem sleep
> [   31.016338] Suspending console(s) (use no_console_suspend to debug)
> [   31.051009] random: nonblocking pool is initialized
> [   31.085811] wake enabled for irq 102
> [   31.086964] wake enabled for irq 123
> [   31.086972] wake enabled for irq 124
> [   31.090409] PM: suspend of devices complete after 59.684 msecs
> [   31.090524] CAM_ISP_CORE_1.2V: No configuration
> [   31.090534] VMEM_VDDF_3.0V: No configuration
> [   31.090543] VCC_SUB_2.0V: No configuration
> [   31.090552] VCC_SUB_1.35V: No configuration
> [   31.090562] VMEM_1.2V_AP: No configuration
> [   31.090587] MOTOR_VCC_3.0V: No configuration
> [   31.090596] LCD_VCC_3.3V: No configuration
> [   31.090605] TSP_VDD_1.8V: No configuration
> [   31.090614] TSP_AVDD_3.3V: No configuration
> [   31.090623] VMEM_VDD_2.8V: No configuration
> [   31.090631] VTF_2.8V: No configuration
> [   31.090640] VDDQ_PRE_1.8V: No configuration
> [   31.090649] VT_CAM_1.8V: No configuration
> [   31.090658] CAM_ISP_SEN_IO_1.8V: No configuration
> [   31.090667] CAM_SENSOR_CORE_1.2V: No configuration
> [   31.090677] VHSIC_1.8V: No configuration
> [   31.090685] VHSIC_1.0V: No configuration
> [   31.090694] VABB2_1.95V: No configuration
> [   31.090703] NFC_AVDD_1.8V: No configuration
> [   31.090712] VUOTG_3.0V: No configuration
> [   31.090721] VABB1_1.95V: No configuration
> [   31.090730] VMIPI_1.8V: No configuration
> [   31.090739] CAM_ISP_MIPI_1.2V: No configuration
> [   31.090747] VMIPI_1.0V: No configuration
> [   31.090756] VPLL_1.0V_AP: No configuration
> [   31.090765] VMPLL_1.0V_AP: No configuration
> [   31.090773] VCC_1.8V_IO: No configuration
> [   31.090782] VCC_2.8V_AP: No configuration
> [   31.090791] VCC_1.8V_AP: No configuration
> [   31.090800] VM1M2_1.2V_AP: No configuration
> [   31.090809] VALIVE_1.0V_AP: No configuration
> [   31.100297] PM: late suspend of devices complete after 9.445 msecs
> [   31.108891] PM: noirq suspend of devices complete after 8.577 msecs
> [   31.109052] Disabling non-boot CPUs ...
> [   31.113921]
> [   31.113925] ===============================
> [   31.113928] [ INFO: suspicious RCU usage. ]
> [   31.113935] 3.19.0-rc7-next-20150203 #1914 Not tainted
> [   31.113938] -------------------------------
> [   31.113943] kernel/sched/fair.c:4740 suspicious rcu_dereference_check() usage!
> [   31.113946]
> [   31.113946] other info that might help us debug this:
> [   31.113946]
> [   31.113952]
> [   31.113952] RCU used illegally from offline CPU!
> [   31.113952] rcu_scheduler_active = 1, debug_locks = 0
> [   31.113957] 3 locks held by swapper/1/0:
> [   31.113988]  #0:  ((cpu_died).wait.lock){......}, at: [<c005a114>] complete+0x14/0x44
> [   31.114012]  #1:  (&p->pi_lock){-.-.-.}, at: [<c004a790>] try_to_wake_up+0x28/0x300
> [   31.114035]  #2:  (rcu_read_lock){......}, at: [<c004f1b8>] select_task_rq_fair+0x5c/0xa04
> [   31.114038]
> [   31.114038] stack backtrace:
> [   31.114046] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150203 #1914
> [   31.114050] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> [   31.114076] [<c0014ce4>] (unwind_backtrace) from [<c0011c30>] (show_stack+0x10/0x14)
> [   31.114091] [<c0011c30>] (show_stack) from [<c04dc048>] (dump_stack+0x70/0xbc)
> [   31.114105] [<c04dc048>] (dump_stack) from [<c004f83c>] (select_task_rq_fair+0x6e0/0xa04)
> [   31.114118] [<c004f83c>] (select_task_rq_fair) from [<c004a83c>] (try_to_wake_up+0xd4/0x300)
> [   31.114129] [<c004a83c>] (try_to_wake_up) from [<c00598a0>] (__wake_up_common+0x4c/0x80)
> [   31.114140] [<c00598a0>] (__wake_up_common) from [<c00598e8>] (__wake_up_locked+0x14/0x1c)
> [   31.114150] [<c00598e8>] (__wake_up_locked) from [<c005a134>] (complete+0x34/0x44)
> [   31.114167] [<c005a134>] (complete) from [<c04d6ca4>] (cpu_die+0x24/0x84)
> [   31.114179] [<c04d6ca4>] (cpu_die) from [<c005a508>] (cpu_startup_entry+0x328/0x358)

And so you no longer get to invoke complete() from the CPU going offline
out of the idle loop.

How would you like to handle this?  One approach would be to make __cpu_die()
poll with appropriate duty cycle.  Or is there some ARM-specific approach
that could work here?

Another thing I could do would be to have an arch-specific Kconfig
variable that made ARM responsible for informing RCU that the CPU
was departing, which would allow a call to as follows to be placed
immediately after the complete():

rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());

Note:  This absolutely requires that the rcu_cpu_notify() -always-
be allowed to execute!!!  This will not work if there is -any- possibility
of __cpu_die() powering off the outgoing CPU before the call to
rcu_cpu_notify() returns.

Thoughts?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
  2015-02-03 16:27     ` Paul E. McKenney
  (?)
@ 2015-02-04 11:39       ` Krzysztof Kozlowski
  -1 siblings, 0 replies; 45+ messages in thread
From: Krzysztof Kozlowski @ 2015-02-04 11:39 UTC (permalink / raw)
  To: paulmck
  Cc: Fengguang Wu, LKP, linux-kernel, Russell King,
	Bartlomiej Zolnierkiewicz, linux-arm-kernel, Arnd Bergmann,
	MarkRutland

+Cc some ARM people


On wto, 2015-02-03 at 08:27 -0800, Paul E. McKenney wrote:
> On Tue, Feb 03, 2015 at 11:01:42AM +0100, Krzysztof Kozlowski wrote:
> > On sob, 2015-01-31 at 18:59 -0800, Fengguang Wu wrote:
> > > Greetings,
> > > 
> > > 0day kernel testing robot got the below dmesg and the first bad commit is
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git revert-c418b8035fac0cc7d242e5de126cec1006a34bed-dd2b39be8eee9d175c7842c30e405a5cbe50095a
> > 
> > On next-20150203 I hit similar error on ARM/Exynos4412 (Trats2 board)
> > while suspending to RAM:
> 
> Yep, you are not supposed to be using RCU on offline CPUs, and RCU recently
> got more picky about that.  This could cause failures in any environment
> where CPUs could get delayed by more than one jiffy, which includes pretty
> much all virtualized environements.
> 
> > [   30.986262] PM: Syncing filesystems ... done.
> > [   30.994661] PM: Preparing system for mem sleep
> > [   31.002064] Freezing user space processes ... (elapsed 0.002 seconds) done.
> > [   31.008629] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> > [   31.016325] PM: Entering mem sleep
> > [   31.016338] Suspending console(s) (use no_console_suspend to debug)
> > [   31.051009] random: nonblocking pool is initialized
> > [   31.085811] wake enabled for irq 102
> > [   31.086964] wake enabled for irq 123
> > [   31.086972] wake enabled for irq 124
> > [   31.090409] PM: suspend of devices complete after 59.684 msecs
> > [   31.090524] CAM_ISP_CORE_1.2V: No configuration
> > [   31.090534] VMEM_VDDF_3.0V: No configuration
> > [   31.090543] VCC_SUB_2.0V: No configuration
> > [   31.090552] VCC_SUB_1.35V: No configuration
> > [   31.090562] VMEM_1.2V_AP: No configuration
> > [   31.090587] MOTOR_VCC_3.0V: No configuration
> > [   31.090596] LCD_VCC_3.3V: No configuration
> > [   31.090605] TSP_VDD_1.8V: No configuration
> > [   31.090614] TSP_AVDD_3.3V: No configuration
> > [   31.090623] VMEM_VDD_2.8V: No configuration
> > [   31.090631] VTF_2.8V: No configuration
> > [   31.090640] VDDQ_PRE_1.8V: No configuration
> > [   31.090649] VT_CAM_1.8V: No configuration
> > [   31.090658] CAM_ISP_SEN_IO_1.8V: No configuration
> > [   31.090667] CAM_SENSOR_CORE_1.2V: No configuration
> > [   31.090677] VHSIC_1.8V: No configuration
> > [   31.090685] VHSIC_1.0V: No configuration
> > [   31.090694] VABB2_1.95V: No configuration
> > [   31.090703] NFC_AVDD_1.8V: No configuration
> > [   31.090712] VUOTG_3.0V: No configuration
> > [   31.090721] VABB1_1.95V: No configuration
> > [   31.090730] VMIPI_1.8V: No configuration
> > [   31.090739] CAM_ISP_MIPI_1.2V: No configuration
> > [   31.090747] VMIPI_1.0V: No configuration
> > [   31.090756] VPLL_1.0V_AP: No configuration
> > [   31.090765] VMPLL_1.0V_AP: No configuration
> > [   31.090773] VCC_1.8V_IO: No configuration
> > [   31.090782] VCC_2.8V_AP: No configuration
> > [   31.090791] VCC_1.8V_AP: No configuration
> > [   31.090800] VM1M2_1.2V_AP: No configuration
> > [   31.090809] VALIVE_1.0V_AP: No configuration
> > [   31.100297] PM: late suspend of devices complete after 9.445 msecs
> > [   31.108891] PM: noirq suspend of devices complete after 8.577 msecs
> > [   31.109052] Disabling non-boot CPUs ...
> > [   31.113921]
> > [   31.113925] ===============================
> > [   31.113928] [ INFO: suspicious RCU usage. ]
> > [   31.113935] 3.19.0-rc7-next-20150203 #1914 Not tainted
> > [   31.113938] -------------------------------
> > [   31.113943] kernel/sched/fair.c:4740 suspicious rcu_dereference_check() usage!
> > [   31.113946]
> > [   31.113946] other info that might help us debug this:
> > [   31.113946]
> > [   31.113952]
> > [   31.113952] RCU used illegally from offline CPU!
> > [   31.113952] rcu_scheduler_active = 1, debug_locks = 0
> > [   31.113957] 3 locks held by swapper/1/0:
> > [   31.113988]  #0:  ((cpu_died).wait.lock){......}, at: [<c005a114>] complete+0x14/0x44
> > [   31.114012]  #1:  (&p->pi_lock){-.-.-.}, at: [<c004a790>] try_to_wake_up+0x28/0x300
> > [   31.114035]  #2:  (rcu_read_lock){......}, at: [<c004f1b8>] select_task_rq_fair+0x5c/0xa04
> > [   31.114038]
> > [   31.114038] stack backtrace:
> > [   31.114046] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150203 #1914
> > [   31.114050] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> > [   31.114076] [<c0014ce4>] (unwind_backtrace) from [<c0011c30>] (show_stack+0x10/0x14)
> > [   31.114091] [<c0011c30>] (show_stack) from [<c04dc048>] (dump_stack+0x70/0xbc)
> > [   31.114105] [<c04dc048>] (dump_stack) from [<c004f83c>] (select_task_rq_fair+0x6e0/0xa04)
> > [   31.114118] [<c004f83c>] (select_task_rq_fair) from [<c004a83c>] (try_to_wake_up+0xd4/0x300)
> > [   31.114129] [<c004a83c>] (try_to_wake_up) from [<c00598a0>] (__wake_up_common+0x4c/0x80)
> > [   31.114140] [<c00598a0>] (__wake_up_common) from [<c00598e8>] (__wake_up_locked+0x14/0x1c)
> > [   31.114150] [<c00598e8>] (__wake_up_locked) from [<c005a134>] (complete+0x34/0x44)
> > [   31.114167] [<c005a134>] (complete) from [<c04d6ca4>] (cpu_die+0x24/0x84)
> > [   31.114179] [<c04d6ca4>] (cpu_die) from [<c005a508>] (cpu_startup_entry+0x328/0x358)
> 
> And so you no longer get to invoke complete() from the CPU going offline
> out of the idle loop.
> 
> How would you like to handle this?  One approach would be to make __cpu_die()
> poll with appropriate duty cycle.

The polling could work but that would be somehow reinventing the
wait/complete.

> Or is there some ARM-specific approach
> that could work here?

I am not aware of such. Anyone?

> 
> Another thing I could do would be to have an arch-specific Kconfig
> variable that made ARM responsible for informing RCU that the CPU
> was departing, which would allow a call to as follows to be placed
> immediately after the complete():
> 
> rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> 
> Note:  This absolutely requires that the rcu_cpu_notify() -always-
> be allowed to execute!!!  This will not work if there is -any- possibility
> of __cpu_die() powering off the outgoing CPU before the call to
> rcu_cpu_notify() returns.

The problem is that __cpu_die() (waiting for completion signal) may cut
the power of dying CPU.

It could however wait for all RCU callbacks before powering down.
rcu_barrier() would do the trick?

	rcu_barrier();
        if (!platform_cpu_kill(cpu))
                pr_err("CPU%u: unable to kill\n", cpu);

Best regards,
Krzysztof



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 11:39       ` Krzysztof Kozlowski
  0 siblings, 0 replies; 45+ messages in thread
From: Krzysztof Kozlowski @ 2015-02-04 11:39 UTC (permalink / raw)
  To: linux-arm-kernel

+Cc some ARM people


On wto, 2015-02-03 at 08:27 -0800, Paul E. McKenney wrote:
> On Tue, Feb 03, 2015 at 11:01:42AM +0100, Krzysztof Kozlowski wrote:
> > On sob, 2015-01-31 at 18:59 -0800, Fengguang Wu wrote:
> > > Greetings,
> > > 
> > > 0day kernel testing robot got the below dmesg and the first bad commit is
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git revert-c418b8035fac0cc7d242e5de126cec1006a34bed-dd2b39be8eee9d175c7842c30e405a5cbe50095a
> > 
> > On next-20150203 I hit similar error on ARM/Exynos4412 (Trats2 board)
> > while suspending to RAM:
> 
> Yep, you are not supposed to be using RCU on offline CPUs, and RCU recently
> got more picky about that.  This could cause failures in any environment
> where CPUs could get delayed by more than one jiffy, which includes pretty
> much all virtualized environements.
> 
> > [   30.986262] PM: Syncing filesystems ... done.
> > [   30.994661] PM: Preparing system for mem sleep
> > [   31.002064] Freezing user space processes ... (elapsed 0.002 seconds) done.
> > [   31.008629] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> > [   31.016325] PM: Entering mem sleep
> > [   31.016338] Suspending console(s) (use no_console_suspend to debug)
> > [   31.051009] random: nonblocking pool is initialized
> > [   31.085811] wake enabled for irq 102
> > [   31.086964] wake enabled for irq 123
> > [   31.086972] wake enabled for irq 124
> > [   31.090409] PM: suspend of devices complete after 59.684 msecs
> > [   31.090524] CAM_ISP_CORE_1.2V: No configuration
> > [   31.090534] VMEM_VDDF_3.0V: No configuration
> > [   31.090543] VCC_SUB_2.0V: No configuration
> > [   31.090552] VCC_SUB_1.35V: No configuration
> > [   31.090562] VMEM_1.2V_AP: No configuration
> > [   31.090587] MOTOR_VCC_3.0V: No configuration
> > [   31.090596] LCD_VCC_3.3V: No configuration
> > [   31.090605] TSP_VDD_1.8V: No configuration
> > [   31.090614] TSP_AVDD_3.3V: No configuration
> > [   31.090623] VMEM_VDD_2.8V: No configuration
> > [   31.090631] VTF_2.8V: No configuration
> > [   31.090640] VDDQ_PRE_1.8V: No configuration
> > [   31.090649] VT_CAM_1.8V: No configuration
> > [   31.090658] CAM_ISP_SEN_IO_1.8V: No configuration
> > [   31.090667] CAM_SENSOR_CORE_1.2V: No configuration
> > [   31.090677] VHSIC_1.8V: No configuration
> > [   31.090685] VHSIC_1.0V: No configuration
> > [   31.090694] VABB2_1.95V: No configuration
> > [   31.090703] NFC_AVDD_1.8V: No configuration
> > [   31.090712] VUOTG_3.0V: No configuration
> > [   31.090721] VABB1_1.95V: No configuration
> > [   31.090730] VMIPI_1.8V: No configuration
> > [   31.090739] CAM_ISP_MIPI_1.2V: No configuration
> > [   31.090747] VMIPI_1.0V: No configuration
> > [   31.090756] VPLL_1.0V_AP: No configuration
> > [   31.090765] VMPLL_1.0V_AP: No configuration
> > [   31.090773] VCC_1.8V_IO: No configuration
> > [   31.090782] VCC_2.8V_AP: No configuration
> > [   31.090791] VCC_1.8V_AP: No configuration
> > [   31.090800] VM1M2_1.2V_AP: No configuration
> > [   31.090809] VALIVE_1.0V_AP: No configuration
> > [   31.100297] PM: late suspend of devices complete after 9.445 msecs
> > [   31.108891] PM: noirq suspend of devices complete after 8.577 msecs
> > [   31.109052] Disabling non-boot CPUs ...
> > [   31.113921]
> > [   31.113925] ===============================
> > [   31.113928] [ INFO: suspicious RCU usage. ]
> > [   31.113935] 3.19.0-rc7-next-20150203 #1914 Not tainted
> > [   31.113938] -------------------------------
> > [   31.113943] kernel/sched/fair.c:4740 suspicious rcu_dereference_check() usage!
> > [   31.113946]
> > [   31.113946] other info that might help us debug this:
> > [   31.113946]
> > [   31.113952]
> > [   31.113952] RCU used illegally from offline CPU!
> > [   31.113952] rcu_scheduler_active = 1, debug_locks = 0
> > [   31.113957] 3 locks held by swapper/1/0:
> > [   31.113988]  #0:  ((cpu_died).wait.lock){......}, at: [<c005a114>] complete+0x14/0x44
> > [   31.114012]  #1:  (&p->pi_lock){-.-.-.}, at: [<c004a790>] try_to_wake_up+0x28/0x300
> > [   31.114035]  #2:  (rcu_read_lock){......}, at: [<c004f1b8>] select_task_rq_fair+0x5c/0xa04
> > [   31.114038]
> > [   31.114038] stack backtrace:
> > [   31.114046] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150203 #1914
> > [   31.114050] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> > [   31.114076] [<c0014ce4>] (unwind_backtrace) from [<c0011c30>] (show_stack+0x10/0x14)
> > [   31.114091] [<c0011c30>] (show_stack) from [<c04dc048>] (dump_stack+0x70/0xbc)
> > [   31.114105] [<c04dc048>] (dump_stack) from [<c004f83c>] (select_task_rq_fair+0x6e0/0xa04)
> > [   31.114118] [<c004f83c>] (select_task_rq_fair) from [<c004a83c>] (try_to_wake_up+0xd4/0x300)
> > [   31.114129] [<c004a83c>] (try_to_wake_up) from [<c00598a0>] (__wake_up_common+0x4c/0x80)
> > [   31.114140] [<c00598a0>] (__wake_up_common) from [<c00598e8>] (__wake_up_locked+0x14/0x1c)
> > [   31.114150] [<c00598e8>] (__wake_up_locked) from [<c005a134>] (complete+0x34/0x44)
> > [   31.114167] [<c005a134>] (complete) from [<c04d6ca4>] (cpu_die+0x24/0x84)
> > [   31.114179] [<c04d6ca4>] (cpu_die) from [<c005a508>] (cpu_startup_entry+0x328/0x358)
> 
> And so you no longer get to invoke complete() from the CPU going offline
> out of the idle loop.
> 
> How would you like to handle this?  One approach would be to make __cpu_die()
> poll with appropriate duty cycle.

The polling could work but that would be somehow reinventing the
wait/complete.

> Or is there some ARM-specific approach
> that could work here?

I am not aware of such. Anyone?

> 
> Another thing I could do would be to have an arch-specific Kconfig
> variable that made ARM responsible for informing RCU that the CPU
> was departing, which would allow a call to as follows to be placed
> immediately after the complete():
> 
> rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> 
> Note:  This absolutely requires that the rcu_cpu_notify() -always-
> be allowed to execute!!!  This will not work if there is -any- possibility
> of __cpu_die() powering off the outgoing CPU before the call to
> rcu_cpu_notify() returns.

The problem is that __cpu_die() (waiting for completion signal) may cut
the power of dying CPU.

It could however wait for all RCU callbacks before powering down.
rcu_barrier() would do the trick?

	rcu_barrier();
        if (!platform_cpu_kill(cpu))
                pr_err("CPU%u: unable to kill\n", cpu);

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 11:39       ` Krzysztof Kozlowski
  0 siblings, 0 replies; 45+ messages in thread
From: Krzysztof Kozlowski @ 2015-02-04 11:39 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 6662 bytes --]

+Cc some ARM people


On wto, 2015-02-03 at 08:27 -0800, Paul E. McKenney wrote:
> On Tue, Feb 03, 2015 at 11:01:42AM +0100, Krzysztof Kozlowski wrote:
> > On sob, 2015-01-31 at 18:59 -0800, Fengguang Wu wrote:
> > > Greetings,
> > > 
> > > 0day kernel testing robot got the below dmesg and the first bad commit is
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git revert-c418b8035fac0cc7d242e5de126cec1006a34bed-dd2b39be8eee9d175c7842c30e405a5cbe50095a
> > 
> > On next-20150203 I hit similar error on ARM/Exynos4412 (Trats2 board)
> > while suspending to RAM:
> 
> Yep, you are not supposed to be using RCU on offline CPUs, and RCU recently
> got more picky about that.  This could cause failures in any environment
> where CPUs could get delayed by more than one jiffy, which includes pretty
> much all virtualized environements.
> 
> > [   30.986262] PM: Syncing filesystems ... done.
> > [   30.994661] PM: Preparing system for mem sleep
> > [   31.002064] Freezing user space processes ... (elapsed 0.002 seconds) done.
> > [   31.008629] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> > [   31.016325] PM: Entering mem sleep
> > [   31.016338] Suspending console(s) (use no_console_suspend to debug)
> > [   31.051009] random: nonblocking pool is initialized
> > [   31.085811] wake enabled for irq 102
> > [   31.086964] wake enabled for irq 123
> > [   31.086972] wake enabled for irq 124
> > [   31.090409] PM: suspend of devices complete after 59.684 msecs
> > [   31.090524] CAM_ISP_CORE_1.2V: No configuration
> > [   31.090534] VMEM_VDDF_3.0V: No configuration
> > [   31.090543] VCC_SUB_2.0V: No configuration
> > [   31.090552] VCC_SUB_1.35V: No configuration
> > [   31.090562] VMEM_1.2V_AP: No configuration
> > [   31.090587] MOTOR_VCC_3.0V: No configuration
> > [   31.090596] LCD_VCC_3.3V: No configuration
> > [   31.090605] TSP_VDD_1.8V: No configuration
> > [   31.090614] TSP_AVDD_3.3V: No configuration
> > [   31.090623] VMEM_VDD_2.8V: No configuration
> > [   31.090631] VTF_2.8V: No configuration
> > [   31.090640] VDDQ_PRE_1.8V: No configuration
> > [   31.090649] VT_CAM_1.8V: No configuration
> > [   31.090658] CAM_ISP_SEN_IO_1.8V: No configuration
> > [   31.090667] CAM_SENSOR_CORE_1.2V: No configuration
> > [   31.090677] VHSIC_1.8V: No configuration
> > [   31.090685] VHSIC_1.0V: No configuration
> > [   31.090694] VABB2_1.95V: No configuration
> > [   31.090703] NFC_AVDD_1.8V: No configuration
> > [   31.090712] VUOTG_3.0V: No configuration
> > [   31.090721] VABB1_1.95V: No configuration
> > [   31.090730] VMIPI_1.8V: No configuration
> > [   31.090739] CAM_ISP_MIPI_1.2V: No configuration
> > [   31.090747] VMIPI_1.0V: No configuration
> > [   31.090756] VPLL_1.0V_AP: No configuration
> > [   31.090765] VMPLL_1.0V_AP: No configuration
> > [   31.090773] VCC_1.8V_IO: No configuration
> > [   31.090782] VCC_2.8V_AP: No configuration
> > [   31.090791] VCC_1.8V_AP: No configuration
> > [   31.090800] VM1M2_1.2V_AP: No configuration
> > [   31.090809] VALIVE_1.0V_AP: No configuration
> > [   31.100297] PM: late suspend of devices complete after 9.445 msecs
> > [   31.108891] PM: noirq suspend of devices complete after 8.577 msecs
> > [   31.109052] Disabling non-boot CPUs ...
> > [   31.113921]
> > [   31.113925] ===============================
> > [   31.113928] [ INFO: suspicious RCU usage. ]
> > [   31.113935] 3.19.0-rc7-next-20150203 #1914 Not tainted
> > [   31.113938] -------------------------------
> > [   31.113943] kernel/sched/fair.c:4740 suspicious rcu_dereference_check() usage!
> > [   31.113946]
> > [   31.113946] other info that might help us debug this:
> > [   31.113946]
> > [   31.113952]
> > [   31.113952] RCU used illegally from offline CPU!
> > [   31.113952] rcu_scheduler_active = 1, debug_locks = 0
> > [   31.113957] 3 locks held by swapper/1/0:
> > [   31.113988]  #0:  ((cpu_died).wait.lock){......}, at: [<c005a114>] complete+0x14/0x44
> > [   31.114012]  #1:  (&p->pi_lock){-.-.-.}, at: [<c004a790>] try_to_wake_up+0x28/0x300
> > [   31.114035]  #2:  (rcu_read_lock){......}, at: [<c004f1b8>] select_task_rq_fair+0x5c/0xa04
> > [   31.114038]
> > [   31.114038] stack backtrace:
> > [   31.114046] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150203 #1914
> > [   31.114050] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> > [   31.114076] [<c0014ce4>] (unwind_backtrace) from [<c0011c30>] (show_stack+0x10/0x14)
> > [   31.114091] [<c0011c30>] (show_stack) from [<c04dc048>] (dump_stack+0x70/0xbc)
> > [   31.114105] [<c04dc048>] (dump_stack) from [<c004f83c>] (select_task_rq_fair+0x6e0/0xa04)
> > [   31.114118] [<c004f83c>] (select_task_rq_fair) from [<c004a83c>] (try_to_wake_up+0xd4/0x300)
> > [   31.114129] [<c004a83c>] (try_to_wake_up) from [<c00598a0>] (__wake_up_common+0x4c/0x80)
> > [   31.114140] [<c00598a0>] (__wake_up_common) from [<c00598e8>] (__wake_up_locked+0x14/0x1c)
> > [   31.114150] [<c00598e8>] (__wake_up_locked) from [<c005a134>] (complete+0x34/0x44)
> > [   31.114167] [<c005a134>] (complete) from [<c04d6ca4>] (cpu_die+0x24/0x84)
> > [   31.114179] [<c04d6ca4>] (cpu_die) from [<c005a508>] (cpu_startup_entry+0x328/0x358)
> 
> And so you no longer get to invoke complete() from the CPU going offline
> out of the idle loop.
> 
> How would you like to handle this?  One approach would be to make __cpu_die()
> poll with appropriate duty cycle.

The polling could work but that would be somehow reinventing the
wait/complete.

> Or is there some ARM-specific approach
> that could work here?

I am not aware of such. Anyone?

> 
> Another thing I could do would be to have an arch-specific Kconfig
> variable that made ARM responsible for informing RCU that the CPU
> was departing, which would allow a call to as follows to be placed
> immediately after the complete():
> 
> rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> 
> Note:  This absolutely requires that the rcu_cpu_notify() -always-
> be allowed to execute!!!  This will not work if there is -any- possibility
> of __cpu_die() powering off the outgoing CPU before the call to
> rcu_cpu_notify() returns.

The problem is that __cpu_die() (waiting for completion signal) may cut
the power of dying CPU.

It could however wait for all RCU callbacks before powering down.
rcu_barrier() would do the trick?

	rcu_barrier();
        if (!platform_cpu_kill(cpu))
                pr_err("CPU%u: unable to kill\n", cpu);

Best regards,
Krzysztof



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
  2015-02-04 11:39       ` Krzysztof Kozlowski
  (?)
@ 2015-02-04 13:00         ` Russell King - ARM Linux
  -1 siblings, 0 replies; 45+ messages in thread
From: Russell King - ARM Linux @ 2015-02-04 13:00 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: paulmck, Fengguang Wu, LKP, linux-kernel,
	Bartlomiej Zolnierkiewicz, linux-arm-kernel, Arnd Bergmann,
	MarkRutland

On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> +Cc some ARM people

I wish that people would CC this list with problems seen on ARM.  I'm
minded to just ignore this message because of this in the hope that by
doing so, people will learn something...

> > Another thing I could do would be to have an arch-specific Kconfig
> > variable that made ARM responsible for informing RCU that the CPU
> > was departing, which would allow a call to as follows to be placed
> > immediately after the complete():
> > 
> > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > 
> > Note:  This absolutely requires that the rcu_cpu_notify() -always-
> > be allowed to execute!!!  This will not work if there is -any- possibility
> > of __cpu_die() powering off the outgoing CPU before the call to
> > rcu_cpu_notify() returns.

Exactly, so that's not going to be possible.  The completion at that
point marks the point at which power _could_ be removed from the CPU
going down.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 13:00         ` Russell King - ARM Linux
  0 siblings, 0 replies; 45+ messages in thread
From: Russell King - ARM Linux @ 2015-02-04 13:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> +Cc some ARM people

I wish that people would CC this list with problems seen on ARM.  I'm
minded to just ignore this message because of this in the hope that by
doing so, people will learn something...

> > Another thing I could do would be to have an arch-specific Kconfig
> > variable that made ARM responsible for informing RCU that the CPU
> > was departing, which would allow a call to as follows to be placed
> > immediately after the complete():
> > 
> > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > 
> > Note:  This absolutely requires that the rcu_cpu_notify() -always-
> > be allowed to execute!!!  This will not work if there is -any- possibility
> > of __cpu_die() powering off the outgoing CPU before the call to
> > rcu_cpu_notify() returns.

Exactly, so that's not going to be possible.  The completion at that
point marks the point at which power _could_ be removed from the CPU
going down.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 13:00         ` Russell King - ARM Linux
  0 siblings, 0 replies; 45+ messages in thread
From: Russell King - ARM Linux @ 2015-02-04 13:00 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 1140 bytes --]

On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> +Cc some ARM people

I wish that people would CC this list with problems seen on ARM.  I'm
minded to just ignore this message because of this in the hope that by
doing so, people will learn something...

> > Another thing I could do would be to have an arch-specific Kconfig
> > variable that made ARM responsible for informing RCU that the CPU
> > was departing, which would allow a call to as follows to be placed
> > immediately after the complete():
> > 
> > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > 
> > Note:  This absolutely requires that the rcu_cpu_notify() -always-
> > be allowed to execute!!!  This will not work if there is -any- possibility
> > of __cpu_die() powering off the outgoing CPU before the call to
> > rcu_cpu_notify() returns.

Exactly, so that's not going to be possible.  The completion at that
point marks the point at which power _could_ be removed from the CPU
going down.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
  2015-02-04 11:39       ` Krzysztof Kozlowski
  (?)
@ 2015-02-04 13:13         ` Paul E. McKenney
  -1 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2015-02-04 13:13 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Fengguang Wu, LKP, linux-kernel, Russell King,
	Bartlomiej Zolnierkiewicz, linux-arm-kernel, Arnd Bergmann,
	MarkRutland

On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> +Cc some ARM people
> 
> 
> On wto, 2015-02-03 at 08:27 -0800, Paul E. McKenney wrote:
> > On Tue, Feb 03, 2015 at 11:01:42AM +0100, Krzysztof Kozlowski wrote:
> > > On sob, 2015-01-31 at 18:59 -0800, Fengguang Wu wrote:
> > > > Greetings,
> > > > 
> > > > 0day kernel testing robot got the below dmesg and the first bad commit is
> > > > 
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git revert-c418b8035fac0cc7d242e5de126cec1006a34bed-dd2b39be8eee9d175c7842c30e405a5cbe50095a
> > > 
> > > On next-20150203 I hit similar error on ARM/Exynos4412 (Trats2 board)
> > > while suspending to RAM:
> > 
> > Yep, you are not supposed to be using RCU on offline CPUs, and RCU recently
> > got more picky about that.  This could cause failures in any environment
> > where CPUs could get delayed by more than one jiffy, which includes pretty
> > much all virtualized environements.
> > 
> > > [   30.986262] PM: Syncing filesystems ... done.
> > > [   30.994661] PM: Preparing system for mem sleep
> > > [   31.002064] Freezing user space processes ... (elapsed 0.002 seconds) done.
> > > [   31.008629] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> > > [   31.016325] PM: Entering mem sleep
> > > [   31.016338] Suspending console(s) (use no_console_suspend to debug)
> > > [   31.051009] random: nonblocking pool is initialized
> > > [   31.085811] wake enabled for irq 102
> > > [   31.086964] wake enabled for irq 123
> > > [   31.086972] wake enabled for irq 124
> > > [   31.090409] PM: suspend of devices complete after 59.684 msecs
> > > [   31.090524] CAM_ISP_CORE_1.2V: No configuration
> > > [   31.090534] VMEM_VDDF_3.0V: No configuration
> > > [   31.090543] VCC_SUB_2.0V: No configuration
> > > [   31.090552] VCC_SUB_1.35V: No configuration
> > > [   31.090562] VMEM_1.2V_AP: No configuration
> > > [   31.090587] MOTOR_VCC_3.0V: No configuration
> > > [   31.090596] LCD_VCC_3.3V: No configuration
> > > [   31.090605] TSP_VDD_1.8V: No configuration
> > > [   31.090614] TSP_AVDD_3.3V: No configuration
> > > [   31.090623] VMEM_VDD_2.8V: No configuration
> > > [   31.090631] VTF_2.8V: No configuration
> > > [   31.090640] VDDQ_PRE_1.8V: No configuration
> > > [   31.090649] VT_CAM_1.8V: No configuration
> > > [   31.090658] CAM_ISP_SEN_IO_1.8V: No configuration
> > > [   31.090667] CAM_SENSOR_CORE_1.2V: No configuration
> > > [   31.090677] VHSIC_1.8V: No configuration
> > > [   31.090685] VHSIC_1.0V: No configuration
> > > [   31.090694] VABB2_1.95V: No configuration
> > > [   31.090703] NFC_AVDD_1.8V: No configuration
> > > [   31.090712] VUOTG_3.0V: No configuration
> > > [   31.090721] VABB1_1.95V: No configuration
> > > [   31.090730] VMIPI_1.8V: No configuration
> > > [   31.090739] CAM_ISP_MIPI_1.2V: No configuration
> > > [   31.090747] VMIPI_1.0V: No configuration
> > > [   31.090756] VPLL_1.0V_AP: No configuration
> > > [   31.090765] VMPLL_1.0V_AP: No configuration
> > > [   31.090773] VCC_1.8V_IO: No configuration
> > > [   31.090782] VCC_2.8V_AP: No configuration
> > > [   31.090791] VCC_1.8V_AP: No configuration
> > > [   31.090800] VM1M2_1.2V_AP: No configuration
> > > [   31.090809] VALIVE_1.0V_AP: No configuration
> > > [   31.100297] PM: late suspend of devices complete after 9.445 msecs
> > > [   31.108891] PM: noirq suspend of devices complete after 8.577 msecs
> > > [   31.109052] Disabling non-boot CPUs ...
> > > [   31.113921]
> > > [   31.113925] ===============================
> > > [   31.113928] [ INFO: suspicious RCU usage. ]
> > > [   31.113935] 3.19.0-rc7-next-20150203 #1914 Not tainted
> > > [   31.113938] -------------------------------
> > > [   31.113943] kernel/sched/fair.c:4740 suspicious rcu_dereference_check() usage!
> > > [   31.113946]
> > > [   31.113946] other info that might help us debug this:
> > > [   31.113946]
> > > [   31.113952]
> > > [   31.113952] RCU used illegally from offline CPU!
> > > [   31.113952] rcu_scheduler_active = 1, debug_locks = 0
> > > [   31.113957] 3 locks held by swapper/1/0:
> > > [   31.113988]  #0:  ((cpu_died).wait.lock){......}, at: [<c005a114>] complete+0x14/0x44
> > > [   31.114012]  #1:  (&p->pi_lock){-.-.-.}, at: [<c004a790>] try_to_wake_up+0x28/0x300
> > > [   31.114035]  #2:  (rcu_read_lock){......}, at: [<c004f1b8>] select_task_rq_fair+0x5c/0xa04
> > > [   31.114038]
> > > [   31.114038] stack backtrace:
> > > [   31.114046] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150203 #1914
> > > [   31.114050] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> > > [   31.114076] [<c0014ce4>] (unwind_backtrace) from [<c0011c30>] (show_stack+0x10/0x14)
> > > [   31.114091] [<c0011c30>] (show_stack) from [<c04dc048>] (dump_stack+0x70/0xbc)
> > > [   31.114105] [<c04dc048>] (dump_stack) from [<c004f83c>] (select_task_rq_fair+0x6e0/0xa04)
> > > [   31.114118] [<c004f83c>] (select_task_rq_fair) from [<c004a83c>] (try_to_wake_up+0xd4/0x300)
> > > [   31.114129] [<c004a83c>] (try_to_wake_up) from [<c00598a0>] (__wake_up_common+0x4c/0x80)
> > > [   31.114140] [<c00598a0>] (__wake_up_common) from [<c00598e8>] (__wake_up_locked+0x14/0x1c)
> > > [   31.114150] [<c00598e8>] (__wake_up_locked) from [<c005a134>] (complete+0x34/0x44)
> > > [   31.114167] [<c005a134>] (complete) from [<c04d6ca4>] (cpu_die+0x24/0x84)
> > > [   31.114179] [<c04d6ca4>] (cpu_die) from [<c005a508>] (cpu_startup_entry+0x328/0x358)
> > 
> > And so you no longer get to invoke complete() from the CPU going offline
> > out of the idle loop.
> > 
> > How would you like to handle this?  One approach would be to make __cpu_die()
> > poll with appropriate duty cycle.
> 
> The polling could work but that would be somehow reinventing the
> wait/complete.

Yeah, well, the CPU has reached a point in the offline process where it
cannot use the scheduler, so...

> > Or is there some ARM-specific approach
> > that could work here?
> 
> I am not aware of such. Anyone?
> 
> > Another thing I could do would be to have an arch-specific Kconfig
> > variable that made ARM responsible for informing RCU that the CPU
> > was departing, which would allow a call to as follows to be placed
> > immediately after the complete():
> > 
> > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > 
> > Note:  This absolutely requires that the rcu_cpu_notify() -always-
> > be allowed to execute!!!  This will not work if there is -any- possibility
> > of __cpu_die() powering off the outgoing CPU before the call to
> > rcu_cpu_notify() returns.
> 
> The problem is that __cpu_die() (waiting for completion signal) may cut
> the power of dying CPU.

I was afraid of that...

> It could however wait for all RCU callbacks before powering down.
> rcu_barrier() would do the trick?
> 
> 	rcu_barrier();
>         if (!platform_cpu_kill(cpu))
>                 pr_err("CPU%u: unable to kill\n", cpu);

Unfortunately, no.  The rcu_barrier() function can block, which is
not permitted when preemption is disabled, as it is at this point
in the idle loop.

So polling loop, then?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 13:13         ` Paul E. McKenney
  0 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2015-02-04 13:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> +Cc some ARM people
> 
> 
> On wto, 2015-02-03 at 08:27 -0800, Paul E. McKenney wrote:
> > On Tue, Feb 03, 2015 at 11:01:42AM +0100, Krzysztof Kozlowski wrote:
> > > On sob, 2015-01-31 at 18:59 -0800, Fengguang Wu wrote:
> > > > Greetings,
> > > > 
> > > > 0day kernel testing robot got the below dmesg and the first bad commit is
> > > > 
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git revert-c418b8035fac0cc7d242e5de126cec1006a34bed-dd2b39be8eee9d175c7842c30e405a5cbe50095a
> > > 
> > > On next-20150203 I hit similar error on ARM/Exynos4412 (Trats2 board)
> > > while suspending to RAM:
> > 
> > Yep, you are not supposed to be using RCU on offline CPUs, and RCU recently
> > got more picky about that.  This could cause failures in any environment
> > where CPUs could get delayed by more than one jiffy, which includes pretty
> > much all virtualized environements.
> > 
> > > [   30.986262] PM: Syncing filesystems ... done.
> > > [   30.994661] PM: Preparing system for mem sleep
> > > [   31.002064] Freezing user space processes ... (elapsed 0.002 seconds) done.
> > > [   31.008629] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> > > [   31.016325] PM: Entering mem sleep
> > > [   31.016338] Suspending console(s) (use no_console_suspend to debug)
> > > [   31.051009] random: nonblocking pool is initialized
> > > [   31.085811] wake enabled for irq 102
> > > [   31.086964] wake enabled for irq 123
> > > [   31.086972] wake enabled for irq 124
> > > [   31.090409] PM: suspend of devices complete after 59.684 msecs
> > > [   31.090524] CAM_ISP_CORE_1.2V: No configuration
> > > [   31.090534] VMEM_VDDF_3.0V: No configuration
> > > [   31.090543] VCC_SUB_2.0V: No configuration
> > > [   31.090552] VCC_SUB_1.35V: No configuration
> > > [   31.090562] VMEM_1.2V_AP: No configuration
> > > [   31.090587] MOTOR_VCC_3.0V: No configuration
> > > [   31.090596] LCD_VCC_3.3V: No configuration
> > > [   31.090605] TSP_VDD_1.8V: No configuration
> > > [   31.090614] TSP_AVDD_3.3V: No configuration
> > > [   31.090623] VMEM_VDD_2.8V: No configuration
> > > [   31.090631] VTF_2.8V: No configuration
> > > [   31.090640] VDDQ_PRE_1.8V: No configuration
> > > [   31.090649] VT_CAM_1.8V: No configuration
> > > [   31.090658] CAM_ISP_SEN_IO_1.8V: No configuration
> > > [   31.090667] CAM_SENSOR_CORE_1.2V: No configuration
> > > [   31.090677] VHSIC_1.8V: No configuration
> > > [   31.090685] VHSIC_1.0V: No configuration
> > > [   31.090694] VABB2_1.95V: No configuration
> > > [   31.090703] NFC_AVDD_1.8V: No configuration
> > > [   31.090712] VUOTG_3.0V: No configuration
> > > [   31.090721] VABB1_1.95V: No configuration
> > > [   31.090730] VMIPI_1.8V: No configuration
> > > [   31.090739] CAM_ISP_MIPI_1.2V: No configuration
> > > [   31.090747] VMIPI_1.0V: No configuration
> > > [   31.090756] VPLL_1.0V_AP: No configuration
> > > [   31.090765] VMPLL_1.0V_AP: No configuration
> > > [   31.090773] VCC_1.8V_IO: No configuration
> > > [   31.090782] VCC_2.8V_AP: No configuration
> > > [   31.090791] VCC_1.8V_AP: No configuration
> > > [   31.090800] VM1M2_1.2V_AP: No configuration
> > > [   31.090809] VALIVE_1.0V_AP: No configuration
> > > [   31.100297] PM: late suspend of devices complete after 9.445 msecs
> > > [   31.108891] PM: noirq suspend of devices complete after 8.577 msecs
> > > [   31.109052] Disabling non-boot CPUs ...
> > > [   31.113921]
> > > [   31.113925] ===============================
> > > [   31.113928] [ INFO: suspicious RCU usage. ]
> > > [   31.113935] 3.19.0-rc7-next-20150203 #1914 Not tainted
> > > [   31.113938] -------------------------------
> > > [   31.113943] kernel/sched/fair.c:4740 suspicious rcu_dereference_check() usage!
> > > [   31.113946]
> > > [   31.113946] other info that might help us debug this:
> > > [   31.113946]
> > > [   31.113952]
> > > [   31.113952] RCU used illegally from offline CPU!
> > > [   31.113952] rcu_scheduler_active = 1, debug_locks = 0
> > > [   31.113957] 3 locks held by swapper/1/0:
> > > [   31.113988]  #0:  ((cpu_died).wait.lock){......}, at: [<c005a114>] complete+0x14/0x44
> > > [   31.114012]  #1:  (&p->pi_lock){-.-.-.}, at: [<c004a790>] try_to_wake_up+0x28/0x300
> > > [   31.114035]  #2:  (rcu_read_lock){......}, at: [<c004f1b8>] select_task_rq_fair+0x5c/0xa04
> > > [   31.114038]
> > > [   31.114038] stack backtrace:
> > > [   31.114046] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150203 #1914
> > > [   31.114050] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> > > [   31.114076] [<c0014ce4>] (unwind_backtrace) from [<c0011c30>] (show_stack+0x10/0x14)
> > > [   31.114091] [<c0011c30>] (show_stack) from [<c04dc048>] (dump_stack+0x70/0xbc)
> > > [   31.114105] [<c04dc048>] (dump_stack) from [<c004f83c>] (select_task_rq_fair+0x6e0/0xa04)
> > > [   31.114118] [<c004f83c>] (select_task_rq_fair) from [<c004a83c>] (try_to_wake_up+0xd4/0x300)
> > > [   31.114129] [<c004a83c>] (try_to_wake_up) from [<c00598a0>] (__wake_up_common+0x4c/0x80)
> > > [   31.114140] [<c00598a0>] (__wake_up_common) from [<c00598e8>] (__wake_up_locked+0x14/0x1c)
> > > [   31.114150] [<c00598e8>] (__wake_up_locked) from [<c005a134>] (complete+0x34/0x44)
> > > [   31.114167] [<c005a134>] (complete) from [<c04d6ca4>] (cpu_die+0x24/0x84)
> > > [   31.114179] [<c04d6ca4>] (cpu_die) from [<c005a508>] (cpu_startup_entry+0x328/0x358)
> > 
> > And so you no longer get to invoke complete() from the CPU going offline
> > out of the idle loop.
> > 
> > How would you like to handle this?  One approach would be to make __cpu_die()
> > poll with appropriate duty cycle.
> 
> The polling could work but that would be somehow reinventing the
> wait/complete.

Yeah, well, the CPU has reached a point in the offline process where it
cannot use the scheduler, so...

> > Or is there some ARM-specific approach
> > that could work here?
> 
> I am not aware of such. Anyone?
> 
> > Another thing I could do would be to have an arch-specific Kconfig
> > variable that made ARM responsible for informing RCU that the CPU
> > was departing, which would allow a call to as follows to be placed
> > immediately after the complete():
> > 
> > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > 
> > Note:  This absolutely requires that the rcu_cpu_notify() -always-
> > be allowed to execute!!!  This will not work if there is -any- possibility
> > of __cpu_die() powering off the outgoing CPU before the call to
> > rcu_cpu_notify() returns.
> 
> The problem is that __cpu_die() (waiting for completion signal) may cut
> the power of dying CPU.

I was afraid of that...

> It could however wait for all RCU callbacks before powering down.
> rcu_barrier() would do the trick?
> 
> 	rcu_barrier();
>         if (!platform_cpu_kill(cpu))
>                 pr_err("CPU%u: unable to kill\n", cpu);

Unfortunately, no.  The rcu_barrier() function can block, which is
not permitted when preemption is disabled, as it is at this point
in the idle loop.

So polling loop, then?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 13:13         ` Paul E. McKenney
  0 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2015-02-04 13:13 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 7293 bytes --]

On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> +Cc some ARM people
> 
> 
> On wto, 2015-02-03 at 08:27 -0800, Paul E. McKenney wrote:
> > On Tue, Feb 03, 2015 at 11:01:42AM +0100, Krzysztof Kozlowski wrote:
> > > On sob, 2015-01-31 at 18:59 -0800, Fengguang Wu wrote:
> > > > Greetings,
> > > > 
> > > > 0day kernel testing robot got the below dmesg and the first bad commit is
> > > > 
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git revert-c418b8035fac0cc7d242e5de126cec1006a34bed-dd2b39be8eee9d175c7842c30e405a5cbe50095a
> > > 
> > > On next-20150203 I hit similar error on ARM/Exynos4412 (Trats2 board)
> > > while suspending to RAM:
> > 
> > Yep, you are not supposed to be using RCU on offline CPUs, and RCU recently
> > got more picky about that.  This could cause failures in any environment
> > where CPUs could get delayed by more than one jiffy, which includes pretty
> > much all virtualized environements.
> > 
> > > [   30.986262] PM: Syncing filesystems ... done.
> > > [   30.994661] PM: Preparing system for mem sleep
> > > [   31.002064] Freezing user space processes ... (elapsed 0.002 seconds) done.
> > > [   31.008629] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> > > [   31.016325] PM: Entering mem sleep
> > > [   31.016338] Suspending console(s) (use no_console_suspend to debug)
> > > [   31.051009] random: nonblocking pool is initialized
> > > [   31.085811] wake enabled for irq 102
> > > [   31.086964] wake enabled for irq 123
> > > [   31.086972] wake enabled for irq 124
> > > [   31.090409] PM: suspend of devices complete after 59.684 msecs
> > > [   31.090524] CAM_ISP_CORE_1.2V: No configuration
> > > [   31.090534] VMEM_VDDF_3.0V: No configuration
> > > [   31.090543] VCC_SUB_2.0V: No configuration
> > > [   31.090552] VCC_SUB_1.35V: No configuration
> > > [   31.090562] VMEM_1.2V_AP: No configuration
> > > [   31.090587] MOTOR_VCC_3.0V: No configuration
> > > [   31.090596] LCD_VCC_3.3V: No configuration
> > > [   31.090605] TSP_VDD_1.8V: No configuration
> > > [   31.090614] TSP_AVDD_3.3V: No configuration
> > > [   31.090623] VMEM_VDD_2.8V: No configuration
> > > [   31.090631] VTF_2.8V: No configuration
> > > [   31.090640] VDDQ_PRE_1.8V: No configuration
> > > [   31.090649] VT_CAM_1.8V: No configuration
> > > [   31.090658] CAM_ISP_SEN_IO_1.8V: No configuration
> > > [   31.090667] CAM_SENSOR_CORE_1.2V: No configuration
> > > [   31.090677] VHSIC_1.8V: No configuration
> > > [   31.090685] VHSIC_1.0V: No configuration
> > > [   31.090694] VABB2_1.95V: No configuration
> > > [   31.090703] NFC_AVDD_1.8V: No configuration
> > > [   31.090712] VUOTG_3.0V: No configuration
> > > [   31.090721] VABB1_1.95V: No configuration
> > > [   31.090730] VMIPI_1.8V: No configuration
> > > [   31.090739] CAM_ISP_MIPI_1.2V: No configuration
> > > [   31.090747] VMIPI_1.0V: No configuration
> > > [   31.090756] VPLL_1.0V_AP: No configuration
> > > [   31.090765] VMPLL_1.0V_AP: No configuration
> > > [   31.090773] VCC_1.8V_IO: No configuration
> > > [   31.090782] VCC_2.8V_AP: No configuration
> > > [   31.090791] VCC_1.8V_AP: No configuration
> > > [   31.090800] VM1M2_1.2V_AP: No configuration
> > > [   31.090809] VALIVE_1.0V_AP: No configuration
> > > [   31.100297] PM: late suspend of devices complete after 9.445 msecs
> > > [   31.108891] PM: noirq suspend of devices complete after 8.577 msecs
> > > [   31.109052] Disabling non-boot CPUs ...
> > > [   31.113921]
> > > [   31.113925] ===============================
> > > [   31.113928] [ INFO: suspicious RCU usage. ]
> > > [   31.113935] 3.19.0-rc7-next-20150203 #1914 Not tainted
> > > [   31.113938] -------------------------------
> > > [   31.113943] kernel/sched/fair.c:4740 suspicious rcu_dereference_check() usage!
> > > [   31.113946]
> > > [   31.113946] other info that might help us debug this:
> > > [   31.113946]
> > > [   31.113952]
> > > [   31.113952] RCU used illegally from offline CPU!
> > > [   31.113952] rcu_scheduler_active = 1, debug_locks = 0
> > > [   31.113957] 3 locks held by swapper/1/0:
> > > [   31.113988]  #0:  ((cpu_died).wait.lock){......}, at: [<c005a114>] complete+0x14/0x44
> > > [   31.114012]  #1:  (&p->pi_lock){-.-.-.}, at: [<c004a790>] try_to_wake_up+0x28/0x300
> > > [   31.114035]  #2:  (rcu_read_lock){......}, at: [<c004f1b8>] select_task_rq_fair+0x5c/0xa04
> > > [   31.114038]
> > > [   31.114038] stack backtrace:
> > > [   31.114046] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150203 #1914
> > > [   31.114050] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> > > [   31.114076] [<c0014ce4>] (unwind_backtrace) from [<c0011c30>] (show_stack+0x10/0x14)
> > > [   31.114091] [<c0011c30>] (show_stack) from [<c04dc048>] (dump_stack+0x70/0xbc)
> > > [   31.114105] [<c04dc048>] (dump_stack) from [<c004f83c>] (select_task_rq_fair+0x6e0/0xa04)
> > > [   31.114118] [<c004f83c>] (select_task_rq_fair) from [<c004a83c>] (try_to_wake_up+0xd4/0x300)
> > > [   31.114129] [<c004a83c>] (try_to_wake_up) from [<c00598a0>] (__wake_up_common+0x4c/0x80)
> > > [   31.114140] [<c00598a0>] (__wake_up_common) from [<c00598e8>] (__wake_up_locked+0x14/0x1c)
> > > [   31.114150] [<c00598e8>] (__wake_up_locked) from [<c005a134>] (complete+0x34/0x44)
> > > [   31.114167] [<c005a134>] (complete) from [<c04d6ca4>] (cpu_die+0x24/0x84)
> > > [   31.114179] [<c04d6ca4>] (cpu_die) from [<c005a508>] (cpu_startup_entry+0x328/0x358)
> > 
> > And so you no longer get to invoke complete() from the CPU going offline
> > out of the idle loop.
> > 
> > How would you like to handle this?  One approach would be to make __cpu_die()
> > poll with appropriate duty cycle.
> 
> The polling could work but that would be somehow reinventing the
> wait/complete.

Yeah, well, the CPU has reached a point in the offline process where it
cannot use the scheduler, so...

> > Or is there some ARM-specific approach
> > that could work here?
> 
> I am not aware of such. Anyone?
> 
> > Another thing I could do would be to have an arch-specific Kconfig
> > variable that made ARM responsible for informing RCU that the CPU
> > was departing, which would allow a call to as follows to be placed
> > immediately after the complete():
> > 
> > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > 
> > Note:  This absolutely requires that the rcu_cpu_notify() -always-
> > be allowed to execute!!!  This will not work if there is -any- possibility
> > of __cpu_die() powering off the outgoing CPU before the call to
> > rcu_cpu_notify() returns.
> 
> The problem is that __cpu_die() (waiting for completion signal) may cut
> the power of dying CPU.

I was afraid of that...

> It could however wait for all RCU callbacks before powering down.
> rcu_barrier() would do the trick?
> 
> 	rcu_barrier();
>         if (!platform_cpu_kill(cpu))
>                 pr_err("CPU%u: unable to kill\n", cpu);

Unfortunately, no.  The rcu_barrier() function can block, which is
not permitted when preemption is disabled, as it is at this point
in the idle loop.

So polling loop, then?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
  2015-02-04 13:00         ` Russell King - ARM Linux
  (?)
@ 2015-02-04 13:14           ` Paul E. McKenney
  -1 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2015-02-04 13:14 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Krzysztof Kozlowski, Fengguang Wu, LKP, linux-kernel,
	Bartlomiej Zolnierkiewicz, linux-arm-kernel, Arnd Bergmann,
	MarkRutland

On Wed, Feb 04, 2015 at 01:00:18PM +0000, Russell King - ARM Linux wrote:
> On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> > +Cc some ARM people
> 
> I wish that people would CC this list with problems seen on ARM.  I'm
> minded to just ignore this message because of this in the hope that by
> doing so, people will learn something...
> 
> > > Another thing I could do would be to have an arch-specific Kconfig
> > > variable that made ARM responsible for informing RCU that the CPU
> > > was departing, which would allow a call to as follows to be placed
> > > immediately after the complete():
> > > 
> > > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > > 
> > > Note:  This absolutely requires that the rcu_cpu_notify() -always-
> > > be allowed to execute!!!  This will not work if there is -any- possibility
> > > of __cpu_die() powering off the outgoing CPU before the call to
> > > rcu_cpu_notify() returns.
> 
> Exactly, so that's not going to be possible.  The completion at that
> point marks the point at which power _could_ be removed from the CPU
> going down.

OK, sounds like a polling loop is required.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 13:14           ` Paul E. McKenney
  0 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2015-02-04 13:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 04, 2015 at 01:00:18PM +0000, Russell King - ARM Linux wrote:
> On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> > +Cc some ARM people
> 
> I wish that people would CC this list with problems seen on ARM.  I'm
> minded to just ignore this message because of this in the hope that by
> doing so, people will learn something...
> 
> > > Another thing I could do would be to have an arch-specific Kconfig
> > > variable that made ARM responsible for informing RCU that the CPU
> > > was departing, which would allow a call to as follows to be placed
> > > immediately after the complete():
> > > 
> > > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > > 
> > > Note:  This absolutely requires that the rcu_cpu_notify() -always-
> > > be allowed to execute!!!  This will not work if there is -any- possibility
> > > of __cpu_die() powering off the outgoing CPU before the call to
> > > rcu_cpu_notify() returns.
> 
> Exactly, so that's not going to be possible.  The completion at that
> point marks the point at which power _could_ be removed from the CPU
> going down.

OK, sounds like a polling loop is required.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 13:14           ` Paul E. McKenney
  0 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2015-02-04 13:14 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 1222 bytes --]

On Wed, Feb 04, 2015 at 01:00:18PM +0000, Russell King - ARM Linux wrote:
> On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> > +Cc some ARM people
> 
> I wish that people would CC this list with problems seen on ARM.  I'm
> minded to just ignore this message because of this in the hope that by
> doing so, people will learn something...
> 
> > > Another thing I could do would be to have an arch-specific Kconfig
> > > variable that made ARM responsible for informing RCU that the CPU
> > > was departing, which would allow a call to as follows to be placed
> > > immediately after the complete():
> > > 
> > > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > > 
> > > Note:  This absolutely requires that the rcu_cpu_notify() -always-
> > > be allowed to execute!!!  This will not work if there is -any- possibility
> > > of __cpu_die() powering off the outgoing CPU before the call to
> > > rcu_cpu_notify() returns.
> 
> Exactly, so that's not going to be possible.  The completion at that
> point marks the point at which power _could_ be removed from the CPU
> going down.

OK, sounds like a polling loop is required.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
  2015-02-04 13:14           ` Paul E. McKenney
  (?)
@ 2015-02-04 14:16             ` Krzysztof Kozlowski
  -1 siblings, 0 replies; 45+ messages in thread
From: Krzysztof Kozlowski @ 2015-02-04 14:16 UTC (permalink / raw)
  To: paulmck
  Cc: Russell King - ARM Linux, Fengguang Wu, LKP, linux-kernel,
	Bartlomiej Zolnierkiewicz, linux-arm-kernel, Arnd Bergmann,
	MarkRutland

On śro, 2015-02-04 at 05:14 -0800, Paul E. McKenney wrote:
> On Wed, Feb 04, 2015 at 01:00:18PM +0000, Russell King - ARM Linux wrote:
> > On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> > > +Cc some ARM people
> > 
> > I wish that people would CC this list with problems seen on ARM.  I'm
> > minded to just ignore this message because of this in the hope that by
> > doing so, people will learn something...
> > 
> > > > Another thing I could do would be to have an arch-specific Kconfig
> > > > variable that made ARM responsible for informing RCU that the CPU
> > > > was departing, which would allow a call to as follows to be placed
> > > > immediately after the complete():
> > > > 
> > > > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > > > 
> > > > Note:  This absolutely requires that the rcu_cpu_notify() -always-
> > > > be allowed to execute!!!  This will not work if there is -any- possibility
> > > > of __cpu_die() powering off the outgoing CPU before the call to
> > > > rcu_cpu_notify() returns.
> > 
> > Exactly, so that's not going to be possible.  The completion at that
> > point marks the point at which power _could_ be removed from the CPU
> > going down.
> 
> OK, sounds like a polling loop is required.

I thought about using wait_on_bit() in __cpu_die() (the waiting thread)
and clearing the bit on CPU being powered down. What do you think about
such idea?

Best regards,
Krzysztof



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 14:16             ` Krzysztof Kozlowski
  0 siblings, 0 replies; 45+ messages in thread
From: Krzysztof Kozlowski @ 2015-02-04 14:16 UTC (permalink / raw)
  To: linux-arm-kernel

On ?ro, 2015-02-04 at 05:14 -0800, Paul E. McKenney wrote:
> On Wed, Feb 04, 2015 at 01:00:18PM +0000, Russell King - ARM Linux wrote:
> > On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> > > +Cc some ARM people
> > 
> > I wish that people would CC this list with problems seen on ARM.  I'm
> > minded to just ignore this message because of this in the hope that by
> > doing so, people will learn something...
> > 
> > > > Another thing I could do would be to have an arch-specific Kconfig
> > > > variable that made ARM responsible for informing RCU that the CPU
> > > > was departing, which would allow a call to as follows to be placed
> > > > immediately after the complete():
> > > > 
> > > > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > > > 
> > > > Note:  This absolutely requires that the rcu_cpu_notify() -always-
> > > > be allowed to execute!!!  This will not work if there is -any- possibility
> > > > of __cpu_die() powering off the outgoing CPU before the call to
> > > > rcu_cpu_notify() returns.
> > 
> > Exactly, so that's not going to be possible.  The completion at that
> > point marks the point at which power _could_ be removed from the CPU
> > going down.
> 
> OK, sounds like a polling loop is required.

I thought about using wait_on_bit() in __cpu_die() (the waiting thread)
and clearing the bit on CPU being powered down. What do you think about
such idea?

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 14:16             ` Krzysztof Kozlowski
  0 siblings, 0 replies; 45+ messages in thread
From: Krzysztof Kozlowski @ 2015-02-04 14:16 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 1501 bytes --]

On śro, 2015-02-04 at 05:14 -0800, Paul E. McKenney wrote:
> On Wed, Feb 04, 2015 at 01:00:18PM +0000, Russell King - ARM Linux wrote:
> > On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> > > +Cc some ARM people
> > 
> > I wish that people would CC this list with problems seen on ARM.  I'm
> > minded to just ignore this message because of this in the hope that by
> > doing so, people will learn something...
> > 
> > > > Another thing I could do would be to have an arch-specific Kconfig
> > > > variable that made ARM responsible for informing RCU that the CPU
> > > > was departing, which would allow a call to as follows to be placed
> > > > immediately after the complete():
> > > > 
> > > > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > > > 
> > > > Note:  This absolutely requires that the rcu_cpu_notify() -always-
> > > > be allowed to execute!!!  This will not work if there is -any- possibility
> > > > of __cpu_die() powering off the outgoing CPU before the call to
> > > > rcu_cpu_notify() returns.
> > 
> > Exactly, so that's not going to be possible.  The completion at that
> > point marks the point at which power _could_ be removed from the CPU
> > going down.
> 
> OK, sounds like a polling loop is required.

I thought about using wait_on_bit() in __cpu_die() (the waiting thread)
and clearing the bit on CPU being powered down. What do you think about
such idea?

Best regards,
Krzysztof



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
  2015-02-04 14:16             ` Krzysztof Kozlowski
  (?)
@ 2015-02-04 15:10               ` Paul E. McKenney
  -1 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2015-02-04 15:10 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Russell King - ARM Linux, Fengguang Wu, LKP, linux-kernel,
	Bartlomiej Zolnierkiewicz, linux-arm-kernel, Arnd Bergmann,
	MarkRutland

On Wed, Feb 04, 2015 at 03:16:27PM +0100, Krzysztof Kozlowski wrote:
> On śro, 2015-02-04 at 05:14 -0800, Paul E. McKenney wrote:
> > On Wed, Feb 04, 2015 at 01:00:18PM +0000, Russell King - ARM Linux wrote:
> > > On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> > > > +Cc some ARM people
> > > 
> > > I wish that people would CC this list with problems seen on ARM.  I'm
> > > minded to just ignore this message because of this in the hope that by
> > > doing so, people will learn something...
> > > 
> > > > > Another thing I could do would be to have an arch-specific Kconfig
> > > > > variable that made ARM responsible for informing RCU that the CPU
> > > > > was departing, which would allow a call to as follows to be placed
> > > > > immediately after the complete():
> > > > > 
> > > > > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > > > > 
> > > > > Note:  This absolutely requires that the rcu_cpu_notify() -always-
> > > > > be allowed to execute!!!  This will not work if there is -any- possibility
> > > > > of __cpu_die() powering off the outgoing CPU before the call to
> > > > > rcu_cpu_notify() returns.
> > > 
> > > Exactly, so that's not going to be possible.  The completion at that
> > > point marks the point at which power _could_ be removed from the CPU
> > > going down.
> > 
> > OK, sounds like a polling loop is required.
> 
> I thought about using wait_on_bit() in __cpu_die() (the waiting thread)
> and clearing the bit on CPU being powered down. What do you think about
> such idea?

Hmmm...  It looks to me that wait_on_bit() calls out_of_line_wait_on_bit(),
which in turn calls __wait_on_bit(), which calls prepare_to_wait() and
finish_wait().  These are in the scheduler, but this is being called from
the CPU that remains online, so that should be OK.

But what do you invoke on the outgoing CPU?  Can you get away with
simply clearing the bit, or do you also have to do a wakeup?  It looks
to me like a wakeup is required, which would be illegal on the outgoing
CPU, which is at a point where it cannot legally invoke the scheduler.
Or am I missing something?

You know, this situation is giving me a bad case of nostalgia for the
old Sequent Symmetry and NUMA-Q hardware.  On those platforms, the
outgoing CPU could turn itself off, and thus didn't need to tell some
other CPU when it was ready to be turned off.  Seems to me that this
self-turn-off capability would be a great feature for future systems!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 15:10               ` Paul E. McKenney
  0 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2015-02-04 15:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 04, 2015 at 03:16:27PM +0100, Krzysztof Kozlowski wrote:
> On ?ro, 2015-02-04 at 05:14 -0800, Paul E. McKenney wrote:
> > On Wed, Feb 04, 2015 at 01:00:18PM +0000, Russell King - ARM Linux wrote:
> > > On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> > > > +Cc some ARM people
> > > 
> > > I wish that people would CC this list with problems seen on ARM.  I'm
> > > minded to just ignore this message because of this in the hope that by
> > > doing so, people will learn something...
> > > 
> > > > > Another thing I could do would be to have an arch-specific Kconfig
> > > > > variable that made ARM responsible for informing RCU that the CPU
> > > > > was departing, which would allow a call to as follows to be placed
> > > > > immediately after the complete():
> > > > > 
> > > > > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > > > > 
> > > > > Note:  This absolutely requires that the rcu_cpu_notify() -always-
> > > > > be allowed to execute!!!  This will not work if there is -any- possibility
> > > > > of __cpu_die() powering off the outgoing CPU before the call to
> > > > > rcu_cpu_notify() returns.
> > > 
> > > Exactly, so that's not going to be possible.  The completion at that
> > > point marks the point at which power _could_ be removed from the CPU
> > > going down.
> > 
> > OK, sounds like a polling loop is required.
> 
> I thought about using wait_on_bit() in __cpu_die() (the waiting thread)
> and clearing the bit on CPU being powered down. What do you think about
> such idea?

Hmmm...  It looks to me that wait_on_bit() calls out_of_line_wait_on_bit(),
which in turn calls __wait_on_bit(), which calls prepare_to_wait() and
finish_wait().  These are in the scheduler, but this is being called from
the CPU that remains online, so that should be OK.

But what do you invoke on the outgoing CPU?  Can you get away with
simply clearing the bit, or do you also have to do a wakeup?  It looks
to me like a wakeup is required, which would be illegal on the outgoing
CPU, which is at a point where it cannot legally invoke the scheduler.
Or am I missing something?

You know, this situation is giving me a bad case of nostalgia for the
old Sequent Symmetry and NUMA-Q hardware.  On those platforms, the
outgoing CPU could turn itself off, and thus didn't need to tell some
other CPU when it was ready to be turned off.  Seems to me that this
self-turn-off capability would be a great feature for future systems!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 15:10               ` Paul E. McKenney
  0 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2015-02-04 15:10 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 2569 bytes --]

On Wed, Feb 04, 2015 at 03:16:27PM +0100, Krzysztof Kozlowski wrote:
> On śro, 2015-02-04 at 05:14 -0800, Paul E. McKenney wrote:
> > On Wed, Feb 04, 2015 at 01:00:18PM +0000, Russell King - ARM Linux wrote:
> > > On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> > > > +Cc some ARM people
> > > 
> > > I wish that people would CC this list with problems seen on ARM.  I'm
> > > minded to just ignore this message because of this in the hope that by
> > > doing so, people will learn something...
> > > 
> > > > > Another thing I could do would be to have an arch-specific Kconfig
> > > > > variable that made ARM responsible for informing RCU that the CPU
> > > > > was departing, which would allow a call to as follows to be placed
> > > > > immediately after the complete():
> > > > > 
> > > > > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > > > > 
> > > > > Note:  This absolutely requires that the rcu_cpu_notify() -always-
> > > > > be allowed to execute!!!  This will not work if there is -any- possibility
> > > > > of __cpu_die() powering off the outgoing CPU before the call to
> > > > > rcu_cpu_notify() returns.
> > > 
> > > Exactly, so that's not going to be possible.  The completion at that
> > > point marks the point at which power _could_ be removed from the CPU
> > > going down.
> > 
> > OK, sounds like a polling loop is required.
> 
> I thought about using wait_on_bit() in __cpu_die() (the waiting thread)
> and clearing the bit on CPU being powered down. What do you think about
> such idea?

Hmmm...  It looks to me that wait_on_bit() calls out_of_line_wait_on_bit(),
which in turn calls __wait_on_bit(), which calls prepare_to_wait() and
finish_wait().  These are in the scheduler, but this is being called from
the CPU that remains online, so that should be OK.

But what do you invoke on the outgoing CPU?  Can you get away with
simply clearing the bit, or do you also have to do a wakeup?  It looks
to me like a wakeup is required, which would be illegal on the outgoing
CPU, which is at a point where it cannot legally invoke the scheduler.
Or am I missing something?

You know, this situation is giving me a bad case of nostalgia for the
old Sequent Symmetry and NUMA-Q hardware.  On those platforms, the
outgoing CPU could turn itself off, and thus didn't need to tell some
other CPU when it was ready to be turned off.  Seems to me that this
self-turn-off capability would be a great feature for future systems!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
  2015-02-04 15:10               ` Paul E. McKenney
  (?)
@ 2015-02-04 15:16                 ` Russell King - ARM Linux
  -1 siblings, 0 replies; 45+ messages in thread
From: Russell King - ARM Linux @ 2015-02-04 15:16 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Krzysztof Kozlowski, Fengguang Wu, LKP, linux-kernel,
	Bartlomiej Zolnierkiewicz, linux-arm-kernel, Arnd Bergmann,
	MarkRutland

On Wed, Feb 04, 2015 at 07:10:28AM -0800, Paul E. McKenney wrote:
> You know, this situation is giving me a bad case of nostalgia for the
> old Sequent Symmetry and NUMA-Q hardware.  On those platforms, the
> outgoing CPU could turn itself off, and thus didn't need to tell some
> other CPU when it was ready to be turned off.  Seems to me that this
> self-turn-off capability would be a great feature for future systems!

Unfortunately, some briliant people decided that secure firmware on
their platforms (which is sometimes needed to turn the secondary CPUs
off) can only be called by CPU0...

Other people decide that they can power down the secondary CPU when it
hits a WFI (wait for interrupt) instruction after arming that state
change, which is far saner - but we still need to know on the requesting
CPU when the dying CPU has completed the time-expensive parts of the
offlining process.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 15:16                 ` Russell King - ARM Linux
  0 siblings, 0 replies; 45+ messages in thread
From: Russell King - ARM Linux @ 2015-02-04 15:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 04, 2015 at 07:10:28AM -0800, Paul E. McKenney wrote:
> You know, this situation is giving me a bad case of nostalgia for the
> old Sequent Symmetry and NUMA-Q hardware.  On those platforms, the
> outgoing CPU could turn itself off, and thus didn't need to tell some
> other CPU when it was ready to be turned off.  Seems to me that this
> self-turn-off capability would be a great feature for future systems!

Unfortunately, some briliant people decided that secure firmware on
their platforms (which is sometimes needed to turn the secondary CPUs
off) can only be called by CPU0...

Other people decide that they can power down the secondary CPU when it
hits a WFI (wait for interrupt) instruction after arming that state
change, which is far saner - but we still need to know on the requesting
CPU when the dying CPU has completed the time-expensive parts of the
offlining process.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 15:16                 ` Russell King - ARM Linux
  0 siblings, 0 replies; 45+ messages in thread
From: Russell King - ARM Linux @ 2015-02-04 15:16 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 1021 bytes --]

On Wed, Feb 04, 2015 at 07:10:28AM -0800, Paul E. McKenney wrote:
> You know, this situation is giving me a bad case of nostalgia for the
> old Sequent Symmetry and NUMA-Q hardware.  On those platforms, the
> outgoing CPU could turn itself off, and thus didn't need to tell some
> other CPU when it was ready to be turned off.  Seems to me that this
> self-turn-off capability would be a great feature for future systems!

Unfortunately, some briliant people decided that secure firmware on
their platforms (which is sometimes needed to turn the secondary CPUs
off) can only be called by CPU0...

Other people decide that they can power down the secondary CPU when it
hits a WFI (wait for interrupt) instruction after arming that state
change, which is far saner - but we still need to know on the requesting
CPU when the dying CPU has completed the time-expensive parts of the
offlining process.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
  2015-02-04 15:10               ` Paul E. McKenney
  (?)
@ 2015-02-04 15:22                 ` Krzysztof Kozlowski
  -1 siblings, 0 replies; 45+ messages in thread
From: Krzysztof Kozlowski @ 2015-02-04 15:22 UTC (permalink / raw)
  To: paulmck
  Cc: Russell King - ARM Linux, Fengguang Wu, LKP, linux-kernel,
	Bartlomiej Zolnierkiewicz, linux-arm-kernel, Arnd Bergmann,
	MarkRutland

[-- Attachment #1: Type: text/plain, Size: 3139 bytes --]

On śro, 2015-02-04 at 07:10 -0800, Paul E. McKenney wrote:
> On Wed, Feb 04, 2015 at 03:16:27PM +0100, Krzysztof Kozlowski wrote:
> > On śro, 2015-02-04 at 05:14 -0800, Paul E. McKenney wrote:
> > > On Wed, Feb 04, 2015 at 01:00:18PM +0000, Russell King - ARM Linux wrote:
> > > > On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> > > > > +Cc some ARM people
> > > > 
> > > > I wish that people would CC this list with problems seen on ARM.  I'm
> > > > minded to just ignore this message because of this in the hope that by
> > > > doing so, people will learn something...
> > > > 
> > > > > > Another thing I could do would be to have an arch-specific Kconfig
> > > > > > variable that made ARM responsible for informing RCU that the CPU
> > > > > > was departing, which would allow a call to as follows to be placed
> > > > > > immediately after the complete():
> > > > > > 
> > > > > > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > > > > > 
> > > > > > Note:  This absolutely requires that the rcu_cpu_notify() -always-
> > > > > > be allowed to execute!!!  This will not work if there is -any- possibility
> > > > > > of __cpu_die() powering off the outgoing CPU before the call to
> > > > > > rcu_cpu_notify() returns.
> > > > 
> > > > Exactly, so that's not going to be possible.  The completion at that
> > > > point marks the point at which power _could_ be removed from the CPU
> > > > going down.
> > > 
> > > OK, sounds like a polling loop is required.
> > 
> > I thought about using wait_on_bit() in __cpu_die() (the waiting thread)
> > and clearing the bit on CPU being powered down. What do you think about
> > such idea?
> 
> Hmmm...  It looks to me that wait_on_bit() calls out_of_line_wait_on_bit(),
> which in turn calls __wait_on_bit(), which calls prepare_to_wait() and
> finish_wait().  These are in the scheduler, but this is being called from
> the CPU that remains online, so that should be OK.
> 
> But what do you invoke on the outgoing CPU?  Can you get away with
> simply clearing the bit, or do you also have to do a wakeup?  It looks
> to me like a wakeup is required, which would be illegal on the outgoing
> CPU, which is at a point where it cannot legally invoke the scheduler.
> Or am I missing something?

Actually the timeout versions but I think that doesn't matter.
The wait_on_bit will busy-loop with testing for the bit. Inside the loop
it calls the 'action' which in my case will be bit_wait_io_timeout().
This calls schedule_timeout().

See proof of concept in attachment. One observed issue: hot unplug from
commandline takes a lot more time. About 7 seconds instead of ~0.5.
Probably I did something wrong.

> 
> You know, this situation is giving me a bad case of nostalgia for the
> old Sequent Symmetry and NUMA-Q hardware.  On those platforms, the
> outgoing CPU could turn itself off, and thus didn't need to tell some
> other CPU when it was ready to be turned off.  Seems to me that this
> self-turn-off capability would be a great feature for future systems!

There are a lot more issues with hotplug on ARM...

Patch/RFC attached.


[-- Attachment #2: 0001-ARM-Don-t-use-complete-during-__cpu_die.patch --]
[-- Type: text/x-patch, Size: 2311 bytes --]

>From feaad18a483871747170fa797f80b49592489ad1 Mon Sep 17 00:00:00 2001
From: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Date: Wed, 4 Feb 2015 16:14:41 +0100
Subject: [RFC] ARM: Don't use complete() during __cpu_die

The complete() should not be used on offlined CPU.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
---
 arch/arm/kernel/smp.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 86ef244c5a24..f3a5ad80a253 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -26,6 +26,7 @@
 #include <linux/completion.h>
 #include <linux/cpufreq.h>
 #include <linux/irq_work.h>
+#include <linux/wait.h>
 
 #include <linux/atomic.h>
 #include <asm/smp.h>
@@ -76,6 +77,9 @@ enum ipi_msg_type {
 
 static DECLARE_COMPLETION(cpu_running);
 
+#define CPU_DIE_WAIT_BIT		0
+static unsigned long wait_cpu_die;
+
 static struct smp_operations smp_ops;
 
 void __init smp_set_ops(struct smp_operations *ops)
@@ -133,7 +137,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
 		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
 	}
 
-
+	set_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die);
 	memset(&secondary_data, 0, sizeof(secondary_data));
 	return ret;
 }
@@ -213,7 +217,17 @@ int __cpu_disable(void)
 	return 0;
 }
 
-static DECLARE_COMPLETION(cpu_died);
+static int wait_for_cpu_die(void)
+{
+	might_sleep();
+
+	if (!test_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die))
+		return 0;
+
+	return out_of_line_wait_on_bit_timeout(&wait_cpu_die, CPU_DIE_WAIT_BIT,
+					bit_wait_timeout, TASK_UNINTERRUPTIBLE,
+					msecs_to_jiffies(5000));
+}
 
 /*
  * called on the thread which is asking for a CPU to be shutdown -
@@ -221,7 +235,7 @@ static DECLARE_COMPLETION(cpu_died);
  */
 void __cpu_die(unsigned int cpu)
 {
-	if (!wait_for_completion_timeout(&cpu_died, msecs_to_jiffies(5000))) {
+	if (wait_for_cpu_die()) {
 		pr_err("CPU%u: cpu didn't die\n", cpu);
 		return;
 	}
@@ -267,7 +281,7 @@ void __ref cpu_die(void)
 	 * this returns, power and/or clocks can be removed at any point
 	 * from this CPU and its cache by platform_cpu_kill().
 	 */
-	complete(&cpu_died);
+	clear_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die);
 
 	/*
 	 * Ensure that the cache lines associated with that completion are
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 15:22                 ` Krzysztof Kozlowski
  0 siblings, 0 replies; 45+ messages in thread
From: Krzysztof Kozlowski @ 2015-02-04 15:22 UTC (permalink / raw)
  To: linux-arm-kernel

On ?ro, 2015-02-04 at 07:10 -0800, Paul E. McKenney wrote:
> On Wed, Feb 04, 2015 at 03:16:27PM +0100, Krzysztof Kozlowski wrote:
> > On ?ro, 2015-02-04 at 05:14 -0800, Paul E. McKenney wrote:
> > > On Wed, Feb 04, 2015 at 01:00:18PM +0000, Russell King - ARM Linux wrote:
> > > > On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> > > > > +Cc some ARM people
> > > > 
> > > > I wish that people would CC this list with problems seen on ARM.  I'm
> > > > minded to just ignore this message because of this in the hope that by
> > > > doing so, people will learn something...
> > > > 
> > > > > > Another thing I could do would be to have an arch-specific Kconfig
> > > > > > variable that made ARM responsible for informing RCU that the CPU
> > > > > > was departing, which would allow a call to as follows to be placed
> > > > > > immediately after the complete():
> > > > > > 
> > > > > > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > > > > > 
> > > > > > Note:  This absolutely requires that the rcu_cpu_notify() -always-
> > > > > > be allowed to execute!!!  This will not work if there is -any- possibility
> > > > > > of __cpu_die() powering off the outgoing CPU before the call to
> > > > > > rcu_cpu_notify() returns.
> > > > 
> > > > Exactly, so that's not going to be possible.  The completion at that
> > > > point marks the point at which power _could_ be removed from the CPU
> > > > going down.
> > > 
> > > OK, sounds like a polling loop is required.
> > 
> > I thought about using wait_on_bit() in __cpu_die() (the waiting thread)
> > and clearing the bit on CPU being powered down. What do you think about
> > such idea?
> 
> Hmmm...  It looks to me that wait_on_bit() calls out_of_line_wait_on_bit(),
> which in turn calls __wait_on_bit(), which calls prepare_to_wait() and
> finish_wait().  These are in the scheduler, but this is being called from
> the CPU that remains online, so that should be OK.
> 
> But what do you invoke on the outgoing CPU?  Can you get away with
> simply clearing the bit, or do you also have to do a wakeup?  It looks
> to me like a wakeup is required, which would be illegal on the outgoing
> CPU, which is at a point where it cannot legally invoke the scheduler.
> Or am I missing something?

Actually the timeout versions but I think that doesn't matter.
The wait_on_bit will busy-loop with testing for the bit. Inside the loop
it calls the 'action' which in my case will be bit_wait_io_timeout().
This calls schedule_timeout().

See proof of concept in attachment. One observed issue: hot unplug from
commandline takes a lot more time. About 7 seconds instead of ~0.5.
Probably I did something wrong.

> 
> You know, this situation is giving me a bad case of nostalgia for the
> old Sequent Symmetry and NUMA-Q hardware.  On those platforms, the
> outgoing CPU could turn itself off, and thus didn't need to tell some
> other CPU when it was ready to be turned off.  Seems to me that this
> self-turn-off capability would be a great feature for future systems!

There are a lot more issues with hotplug on ARM...

Patch/RFC attached.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-ARM-Don-t-use-complete-during-__cpu_die.patch
Type: text/x-patch
Size: 2311 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150204/717de999/attachment-0001.bin>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 15:22                 ` Krzysztof Kozlowski
  0 siblings, 0 replies; 45+ messages in thread
From: Krzysztof Kozlowski @ 2015-02-04 15:22 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 3203 bytes --]

On śro, 2015-02-04 at 07:10 -0800, Paul E. McKenney wrote:
> On Wed, Feb 04, 2015 at 03:16:27PM +0100, Krzysztof Kozlowski wrote:
> > On śro, 2015-02-04 at 05:14 -0800, Paul E. McKenney wrote:
> > > On Wed, Feb 04, 2015 at 01:00:18PM +0000, Russell King - ARM Linux wrote:
> > > > On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> > > > > +Cc some ARM people
> > > > 
> > > > I wish that people would CC this list with problems seen on ARM.  I'm
> > > > minded to just ignore this message because of this in the hope that by
> > > > doing so, people will learn something...
> > > > 
> > > > > > Another thing I could do would be to have an arch-specific Kconfig
> > > > > > variable that made ARM responsible for informing RCU that the CPU
> > > > > > was departing, which would allow a call to as follows to be placed
> > > > > > immediately after the complete():
> > > > > > 
> > > > > > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > > > > > 
> > > > > > Note:  This absolutely requires that the rcu_cpu_notify() -always-
> > > > > > be allowed to execute!!!  This will not work if there is -any- possibility
> > > > > > of __cpu_die() powering off the outgoing CPU before the call to
> > > > > > rcu_cpu_notify() returns.
> > > > 
> > > > Exactly, so that's not going to be possible.  The completion at that
> > > > point marks the point at which power _could_ be removed from the CPU
> > > > going down.
> > > 
> > > OK, sounds like a polling loop is required.
> > 
> > I thought about using wait_on_bit() in __cpu_die() (the waiting thread)
> > and clearing the bit on CPU being powered down. What do you think about
> > such idea?
> 
> Hmmm...  It looks to me that wait_on_bit() calls out_of_line_wait_on_bit(),
> which in turn calls __wait_on_bit(), which calls prepare_to_wait() and
> finish_wait().  These are in the scheduler, but this is being called from
> the CPU that remains online, so that should be OK.
> 
> But what do you invoke on the outgoing CPU?  Can you get away with
> simply clearing the bit, or do you also have to do a wakeup?  It looks
> to me like a wakeup is required, which would be illegal on the outgoing
> CPU, which is at a point where it cannot legally invoke the scheduler.
> Or am I missing something?

Actually the timeout versions but I think that doesn't matter.
The wait_on_bit will busy-loop with testing for the bit. Inside the loop
it calls the 'action' which in my case will be bit_wait_io_timeout().
This calls schedule_timeout().

See proof of concept in attachment. One observed issue: hot unplug from
commandline takes a lot more time. About 7 seconds instead of ~0.5.
Probably I did something wrong.

> 
> You know, this situation is giving me a bad case of nostalgia for the
> old Sequent Symmetry and NUMA-Q hardware.  On those platforms, the
> outgoing CPU could turn itself off, and thus didn't need to tell some
> other CPU when it was ready to be turned off.  Seems to me that this
> self-turn-off capability would be a great feature for future systems!

There are a lot more issues with hotplug on ARM...

Patch/RFC attached.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-ARM-Don-t-use-complete-during-__cpu_die.patch --]
[-- Type: text/x-patch, Size: 2311 bytes --]

>From feaad18a483871747170fa797f80b49592489ad1 Mon Sep 17 00:00:00 2001
From: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Date: Wed, 4 Feb 2015 16:14:41 +0100
Subject: [RFC] ARM: Don't use complete() during __cpu_die

The complete() should not be used on offlined CPU.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
---
 arch/arm/kernel/smp.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 86ef244c5a24..f3a5ad80a253 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -26,6 +26,7 @@
 #include <linux/completion.h>
 #include <linux/cpufreq.h>
 #include <linux/irq_work.h>
+#include <linux/wait.h>
 
 #include <linux/atomic.h>
 #include <asm/smp.h>
@@ -76,6 +77,9 @@ enum ipi_msg_type {
 
 static DECLARE_COMPLETION(cpu_running);
 
+#define CPU_DIE_WAIT_BIT		0
+static unsigned long wait_cpu_die;
+
 static struct smp_operations smp_ops;
 
 void __init smp_set_ops(struct smp_operations *ops)
@@ -133,7 +137,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
 		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
 	}
 
-
+	set_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die);
 	memset(&secondary_data, 0, sizeof(secondary_data));
 	return ret;
 }
@@ -213,7 +217,17 @@ int __cpu_disable(void)
 	return 0;
 }
 
-static DECLARE_COMPLETION(cpu_died);
+static int wait_for_cpu_die(void)
+{
+	might_sleep();
+
+	if (!test_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die))
+		return 0;
+
+	return out_of_line_wait_on_bit_timeout(&wait_cpu_die, CPU_DIE_WAIT_BIT,
+					bit_wait_timeout, TASK_UNINTERRUPTIBLE,
+					msecs_to_jiffies(5000));
+}
 
 /*
  * called on the thread which is asking for a CPU to be shutdown -
@@ -221,7 +235,7 @@ static DECLARE_COMPLETION(cpu_died);
  */
 void __cpu_die(unsigned int cpu)
 {
-	if (!wait_for_completion_timeout(&cpu_died, msecs_to_jiffies(5000))) {
+	if (wait_for_cpu_die()) {
 		pr_err("CPU%u: cpu didn't die\n", cpu);
 		return;
 	}
@@ -267,7 +281,7 @@ void __ref cpu_die(void)
 	 * this returns, power and/or clocks can be removed at any point
 	 * from this CPU and its cache by platform_cpu_kill().
 	 */
-	complete(&cpu_died);
+	clear_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die);
 
 	/*
 	 * Ensure that the cache lines associated with that completion are
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
  2015-02-04 15:16                 ` Russell King - ARM Linux
  (?)
@ 2015-02-04 15:46                   ` Paul E. McKenney
  -1 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2015-02-04 15:46 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Krzysztof Kozlowski, Fengguang Wu, LKP, linux-kernel,
	Bartlomiej Zolnierkiewicz, linux-arm-kernel, Arnd Bergmann,
	MarkRutland

On Wed, Feb 04, 2015 at 03:16:24PM +0000, Russell King - ARM Linux wrote:
> On Wed, Feb 04, 2015 at 07:10:28AM -0800, Paul E. McKenney wrote:
> > You know, this situation is giving me a bad case of nostalgia for the
> > old Sequent Symmetry and NUMA-Q hardware.  On those platforms, the
> > outgoing CPU could turn itself off, and thus didn't need to tell some
> > other CPU when it was ready to be turned off.  Seems to me that this
> > self-turn-off capability would be a great feature for future systems!
> 
> Unfortunately, some briliant people decided that secure firmware on
> their platforms (which is sometimes needed to turn the secondary CPUs
> off) can only be called by CPU0...
> 
> Other people decide that they can power down the secondary CPU when it
> hits a WFI (wait for interrupt) instruction after arming that state
> change, which is far saner - but we still need to know on the requesting
> CPU when the dying CPU has completed the time-expensive parts of the
> offlining process.

I suppose that you could grant the outgoing CPU the ability to arm
that state, but easy for me to say...

Anyway, still looks like a pure polling loop is required, with short
timed waits running on the surviving CPU.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 15:46                   ` Paul E. McKenney
  0 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2015-02-04 15:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 04, 2015 at 03:16:24PM +0000, Russell King - ARM Linux wrote:
> On Wed, Feb 04, 2015 at 07:10:28AM -0800, Paul E. McKenney wrote:
> > You know, this situation is giving me a bad case of nostalgia for the
> > old Sequent Symmetry and NUMA-Q hardware.  On those platforms, the
> > outgoing CPU could turn itself off, and thus didn't need to tell some
> > other CPU when it was ready to be turned off.  Seems to me that this
> > self-turn-off capability would be a great feature for future systems!
> 
> Unfortunately, some briliant people decided that secure firmware on
> their platforms (which is sometimes needed to turn the secondary CPUs
> off) can only be called by CPU0...
> 
> Other people decide that they can power down the secondary CPU when it
> hits a WFI (wait for interrupt) instruction after arming that state
> change, which is far saner - but we still need to know on the requesting
> CPU when the dying CPU has completed the time-expensive parts of the
> offlining process.

I suppose that you could grant the outgoing CPU the ability to arm
that state, but easy for me to say...

Anyway, still looks like a pure polling loop is required, with short
timed waits running on the surviving CPU.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 15:46                   ` Paul E. McKenney
  0 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2015-02-04 15:46 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 1268 bytes --]

On Wed, Feb 04, 2015 at 03:16:24PM +0000, Russell King - ARM Linux wrote:
> On Wed, Feb 04, 2015 at 07:10:28AM -0800, Paul E. McKenney wrote:
> > You know, this situation is giving me a bad case of nostalgia for the
> > old Sequent Symmetry and NUMA-Q hardware.  On those platforms, the
> > outgoing CPU could turn itself off, and thus didn't need to tell some
> > other CPU when it was ready to be turned off.  Seems to me that this
> > self-turn-off capability would be a great feature for future systems!
> 
> Unfortunately, some briliant people decided that secure firmware on
> their platforms (which is sometimes needed to turn the secondary CPUs
> off) can only be called by CPU0...
> 
> Other people decide that they can power down the secondary CPU when it
> hits a WFI (wait for interrupt) instruction after arming that state
> change, which is far saner - but we still need to know on the requesting
> CPU when the dying CPU has completed the time-expensive parts of the
> offlining process.

I suppose that you could grant the outgoing CPU the ability to arm
that state, but easy for me to say...

Anyway, still looks like a pure polling loop is required, with short
timed waits running on the surviving CPU.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
  2015-02-04 15:22                 ` Krzysztof Kozlowski
  (?)
@ 2015-02-04 15:56                   ` Paul E. McKenney
  -1 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2015-02-04 15:56 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Russell King - ARM Linux, Fengguang Wu, LKP, linux-kernel,
	Bartlomiej Zolnierkiewicz, linux-arm-kernel, Arnd Bergmann,
	MarkRutland

On Wed, Feb 04, 2015 at 04:22:28PM +0100, Krzysztof Kozlowski wrote:
> On śro, 2015-02-04 at 07:10 -0800, Paul E. McKenney wrote:
> > On Wed, Feb 04, 2015 at 03:16:27PM +0100, Krzysztof Kozlowski wrote:
> > > On śro, 2015-02-04 at 05:14 -0800, Paul E. McKenney wrote:
> > > > On Wed, Feb 04, 2015 at 01:00:18PM +0000, Russell King - ARM Linux wrote:
> > > > > On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> > > > > > +Cc some ARM people
> > > > > 
> > > > > I wish that people would CC this list with problems seen on ARM.  I'm
> > > > > minded to just ignore this message because of this in the hope that by
> > > > > doing so, people will learn something...
> > > > > 
> > > > > > > Another thing I could do would be to have an arch-specific Kconfig
> > > > > > > variable that made ARM responsible for informing RCU that the CPU
> > > > > > > was departing, which would allow a call to as follows to be placed
> > > > > > > immediately after the complete():
> > > > > > > 
> > > > > > > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > > > > > > 
> > > > > > > Note:  This absolutely requires that the rcu_cpu_notify() -always-
> > > > > > > be allowed to execute!!!  This will not work if there is -any- possibility
> > > > > > > of __cpu_die() powering off the outgoing CPU before the call to
> > > > > > > rcu_cpu_notify() returns.
> > > > > 
> > > > > Exactly, so that's not going to be possible.  The completion at that
> > > > > point marks the point at which power _could_ be removed from the CPU
> > > > > going down.
> > > > 
> > > > OK, sounds like a polling loop is required.
> > > 
> > > I thought about using wait_on_bit() in __cpu_die() (the waiting thread)
> > > and clearing the bit on CPU being powered down. What do you think about
> > > such idea?
> > 
> > Hmmm...  It looks to me that wait_on_bit() calls out_of_line_wait_on_bit(),
> > which in turn calls __wait_on_bit(), which calls prepare_to_wait() and
> > finish_wait().  These are in the scheduler, but this is being called from
> > the CPU that remains online, so that should be OK.
> > 
> > But what do you invoke on the outgoing CPU?  Can you get away with
> > simply clearing the bit, or do you also have to do a wakeup?  It looks
> > to me like a wakeup is required, which would be illegal on the outgoing
> > CPU, which is at a point where it cannot legally invoke the scheduler.
> > Or am I missing something?
> 
> Actually the timeout versions but I think that doesn't matter.
> The wait_on_bit will busy-loop with testing for the bit. Inside the loop
> it calls the 'action' which in my case will be bit_wait_io_timeout().
> This calls schedule_timeout().

Ah, good point.

> See proof of concept in attachment. One observed issue: hot unplug from
> commandline takes a lot more time. About 7 seconds instead of ~0.5.
> Probably I did something wrong.

Well, you do set the timeout to five seconds, and so if the condition
does not get set before the surviving CPU finds its way to the
out_of_line_wait_on_bit_timeout(), you are guaranteed to wait for at
least five seconds.

One alternative approach would be to have a loop around a series of
shorter waits.  Other thoughts?

> > You know, this situation is giving me a bad case of nostalgia for the
> > old Sequent Symmetry and NUMA-Q hardware.  On those platforms, the
> > outgoing CPU could turn itself off, and thus didn't need to tell some
> > other CPU when it was ready to be turned off.  Seems to me that this
> > self-turn-off capability would be a great feature for future systems!
> 
> There are a lot more issues with hotplug on ARM...

Just trying to clean up this particular corner at the moment.  ;-)

> Patch/RFC attached.

Again, I believe that you will need to loop over a shorter timeout
in order to get reasonable latencies.  If waiting a millisecond at
a time is an energy-efficiency concern (don't know why it would be
in this rare case, but...), then one approach would be to start
with very short waits, then increase the wait time, for example,
doubling the wait time on each pass through the loop would result
in a smallish number of wakeups, but would mean that you waited
no more than twice as long as necessary.

Thoughts?

							Thanx, Paul

> >From feaad18a483871747170fa797f80b49592489ad1 Mon Sep 17 00:00:00 2001
> From: Krzysztof Kozlowski <k.kozlowski@samsung.com>
> Date: Wed, 4 Feb 2015 16:14:41 +0100
> Subject: [RFC] ARM: Don't use complete() during __cpu_die
> 
> The complete() should not be used on offlined CPU.
> 
> Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
> ---
>  arch/arm/kernel/smp.c | 22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 86ef244c5a24..f3a5ad80a253 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -26,6 +26,7 @@
>  #include <linux/completion.h>
>  #include <linux/cpufreq.h>
>  #include <linux/irq_work.h>
> +#include <linux/wait.h>
> 
>  #include <linux/atomic.h>
>  #include <asm/smp.h>
> @@ -76,6 +77,9 @@ enum ipi_msg_type {
> 
>  static DECLARE_COMPLETION(cpu_running);
> 
> +#define CPU_DIE_WAIT_BIT		0
> +static unsigned long wait_cpu_die;
> +
>  static struct smp_operations smp_ops;
> 
>  void __init smp_set_ops(struct smp_operations *ops)
> @@ -133,7 +137,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>  		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
>  	}
> 
> -
> +	set_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die);
>  	memset(&secondary_data, 0, sizeof(secondary_data));
>  	return ret;
>  }
> @@ -213,7 +217,17 @@ int __cpu_disable(void)
>  	return 0;
>  }
> 
> -static DECLARE_COMPLETION(cpu_died);
> +static int wait_for_cpu_die(void)
> +{
> +	might_sleep();
> +
> +	if (!test_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die))
> +		return 0;
> +
> +	return out_of_line_wait_on_bit_timeout(&wait_cpu_die, CPU_DIE_WAIT_BIT,
> +					bit_wait_timeout, TASK_UNINTERRUPTIBLE,
> +					msecs_to_jiffies(5000));
> +}
> 
>  /*
>   * called on the thread which is asking for a CPU to be shutdown -
> @@ -221,7 +235,7 @@ static DECLARE_COMPLETION(cpu_died);
>   */
>  void __cpu_die(unsigned int cpu)
>  {
> -	if (!wait_for_completion_timeout(&cpu_died, msecs_to_jiffies(5000))) {
> +	if (wait_for_cpu_die()) {
>  		pr_err("CPU%u: cpu didn't die\n", cpu);
>  		return;
>  	}
> @@ -267,7 +281,7 @@ void __ref cpu_die(void)
>  	 * this returns, power and/or clocks can be removed at any point
>  	 * from this CPU and its cache by platform_cpu_kill().
>  	 */
> -	complete(&cpu_died);
> +	clear_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die);
> 
>  	/*
>  	 * Ensure that the cache lines associated with that completion are
> -- 
> 1.9.1
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 15:56                   ` Paul E. McKenney
  0 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2015-02-04 15:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 04, 2015 at 04:22:28PM +0100, Krzysztof Kozlowski wrote:
> On ?ro, 2015-02-04 at 07:10 -0800, Paul E. McKenney wrote:
> > On Wed, Feb 04, 2015 at 03:16:27PM +0100, Krzysztof Kozlowski wrote:
> > > On ?ro, 2015-02-04 at 05:14 -0800, Paul E. McKenney wrote:
> > > > On Wed, Feb 04, 2015 at 01:00:18PM +0000, Russell King - ARM Linux wrote:
> > > > > On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> > > > > > +Cc some ARM people
> > > > > 
> > > > > I wish that people would CC this list with problems seen on ARM.  I'm
> > > > > minded to just ignore this message because of this in the hope that by
> > > > > doing so, people will learn something...
> > > > > 
> > > > > > > Another thing I could do would be to have an arch-specific Kconfig
> > > > > > > variable that made ARM responsible for informing RCU that the CPU
> > > > > > > was departing, which would allow a call to as follows to be placed
> > > > > > > immediately after the complete():
> > > > > > > 
> > > > > > > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > > > > > > 
> > > > > > > Note:  This absolutely requires that the rcu_cpu_notify() -always-
> > > > > > > be allowed to execute!!!  This will not work if there is -any- possibility
> > > > > > > of __cpu_die() powering off the outgoing CPU before the call to
> > > > > > > rcu_cpu_notify() returns.
> > > > > 
> > > > > Exactly, so that's not going to be possible.  The completion at that
> > > > > point marks the point at which power _could_ be removed from the CPU
> > > > > going down.
> > > > 
> > > > OK, sounds like a polling loop is required.
> > > 
> > > I thought about using wait_on_bit() in __cpu_die() (the waiting thread)
> > > and clearing the bit on CPU being powered down. What do you think about
> > > such idea?
> > 
> > Hmmm...  It looks to me that wait_on_bit() calls out_of_line_wait_on_bit(),
> > which in turn calls __wait_on_bit(), which calls prepare_to_wait() and
> > finish_wait().  These are in the scheduler, but this is being called from
> > the CPU that remains online, so that should be OK.
> > 
> > But what do you invoke on the outgoing CPU?  Can you get away with
> > simply clearing the bit, or do you also have to do a wakeup?  It looks
> > to me like a wakeup is required, which would be illegal on the outgoing
> > CPU, which is at a point where it cannot legally invoke the scheduler.
> > Or am I missing something?
> 
> Actually the timeout versions but I think that doesn't matter.
> The wait_on_bit will busy-loop with testing for the bit. Inside the loop
> it calls the 'action' which in my case will be bit_wait_io_timeout().
> This calls schedule_timeout().

Ah, good point.

> See proof of concept in attachment. One observed issue: hot unplug from
> commandline takes a lot more time. About 7 seconds instead of ~0.5.
> Probably I did something wrong.

Well, you do set the timeout to five seconds, and so if the condition
does not get set before the surviving CPU finds its way to the
out_of_line_wait_on_bit_timeout(), you are guaranteed to wait for at
least five seconds.

One alternative approach would be to have a loop around a series of
shorter waits.  Other thoughts?

> > You know, this situation is giving me a bad case of nostalgia for the
> > old Sequent Symmetry and NUMA-Q hardware.  On those platforms, the
> > outgoing CPU could turn itself off, and thus didn't need to tell some
> > other CPU when it was ready to be turned off.  Seems to me that this
> > self-turn-off capability would be a great feature for future systems!
> 
> There are a lot more issues with hotplug on ARM...

Just trying to clean up this particular corner at the moment.  ;-)

> Patch/RFC attached.

Again, I believe that you will need to loop over a shorter timeout
in order to get reasonable latencies.  If waiting a millisecond at
a time is an energy-efficiency concern (don't know why it would be
in this rare case, but...), then one approach would be to start
with very short waits, then increase the wait time, for example,
doubling the wait time on each pass through the loop would result
in a smallish number of wakeups, but would mean that you waited
no more than twice as long as necessary.

Thoughts?

							Thanx, Paul

> >From feaad18a483871747170fa797f80b49592489ad1 Mon Sep 17 00:00:00 2001
> From: Krzysztof Kozlowski <k.kozlowski@samsung.com>
> Date: Wed, 4 Feb 2015 16:14:41 +0100
> Subject: [RFC] ARM: Don't use complete() during __cpu_die
> 
> The complete() should not be used on offlined CPU.
> 
> Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
> ---
>  arch/arm/kernel/smp.c | 22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 86ef244c5a24..f3a5ad80a253 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -26,6 +26,7 @@
>  #include <linux/completion.h>
>  #include <linux/cpufreq.h>
>  #include <linux/irq_work.h>
> +#include <linux/wait.h>
> 
>  #include <linux/atomic.h>
>  #include <asm/smp.h>
> @@ -76,6 +77,9 @@ enum ipi_msg_type {
> 
>  static DECLARE_COMPLETION(cpu_running);
> 
> +#define CPU_DIE_WAIT_BIT		0
> +static unsigned long wait_cpu_die;
> +
>  static struct smp_operations smp_ops;
> 
>  void __init smp_set_ops(struct smp_operations *ops)
> @@ -133,7 +137,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>  		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
>  	}
> 
> -
> +	set_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die);
>  	memset(&secondary_data, 0, sizeof(secondary_data));
>  	return ret;
>  }
> @@ -213,7 +217,17 @@ int __cpu_disable(void)
>  	return 0;
>  }
> 
> -static DECLARE_COMPLETION(cpu_died);
> +static int wait_for_cpu_die(void)
> +{
> +	might_sleep();
> +
> +	if (!test_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die))
> +		return 0;
> +
> +	return out_of_line_wait_on_bit_timeout(&wait_cpu_die, CPU_DIE_WAIT_BIT,
> +					bit_wait_timeout, TASK_UNINTERRUPTIBLE,
> +					msecs_to_jiffies(5000));
> +}
> 
>  /*
>   * called on the thread which is asking for a CPU to be shutdown -
> @@ -221,7 +235,7 @@ static DECLARE_COMPLETION(cpu_died);
>   */
>  void __cpu_die(unsigned int cpu)
>  {
> -	if (!wait_for_completion_timeout(&cpu_died, msecs_to_jiffies(5000))) {
> +	if (wait_for_cpu_die()) {
>  		pr_err("CPU%u: cpu didn't die\n", cpu);
>  		return;
>  	}
> @@ -267,7 +281,7 @@ void __ref cpu_die(void)
>  	 * this returns, power and/or clocks can be removed at any point
>  	 * from this CPU and its cache by platform_cpu_kill().
>  	 */
> -	complete(&cpu_died);
> +	clear_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die);
> 
>  	/*
>  	 * Ensure that the cache lines associated with that completion are
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 15:56                   ` Paul E. McKenney
  0 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2015-02-04 15:56 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 6908 bytes --]

On Wed, Feb 04, 2015 at 04:22:28PM +0100, Krzysztof Kozlowski wrote:
> On śro, 2015-02-04 at 07:10 -0800, Paul E. McKenney wrote:
> > On Wed, Feb 04, 2015 at 03:16:27PM +0100, Krzysztof Kozlowski wrote:
> > > On śro, 2015-02-04 at 05:14 -0800, Paul E. McKenney wrote:
> > > > On Wed, Feb 04, 2015 at 01:00:18PM +0000, Russell King - ARM Linux wrote:
> > > > > On Wed, Feb 04, 2015 at 12:39:07PM +0100, Krzysztof Kozlowski wrote:
> > > > > > +Cc some ARM people
> > > > > 
> > > > > I wish that people would CC this list with problems seen on ARM.  I'm
> > > > > minded to just ignore this message because of this in the hope that by
> > > > > doing so, people will learn something...
> > > > > 
> > > > > > > Another thing I could do would be to have an arch-specific Kconfig
> > > > > > > variable that made ARM responsible for informing RCU that the CPU
> > > > > > > was departing, which would allow a call to as follows to be placed
> > > > > > > immediately after the complete():
> > > > > > > 
> > > > > > > rcu_cpu_notify(NULL, CPU_DYING_IDLE, (void *)(long)smp_processor_id());
> > > > > > > 
> > > > > > > Note:  This absolutely requires that the rcu_cpu_notify() -always-
> > > > > > > be allowed to execute!!!  This will not work if there is -any- possibility
> > > > > > > of __cpu_die() powering off the outgoing CPU before the call to
> > > > > > > rcu_cpu_notify() returns.
> > > > > 
> > > > > Exactly, so that's not going to be possible.  The completion at that
> > > > > point marks the point at which power _could_ be removed from the CPU
> > > > > going down.
> > > > 
> > > > OK, sounds like a polling loop is required.
> > > 
> > > I thought about using wait_on_bit() in __cpu_die() (the waiting thread)
> > > and clearing the bit on CPU being powered down. What do you think about
> > > such idea?
> > 
> > Hmmm...  It looks to me that wait_on_bit() calls out_of_line_wait_on_bit(),
> > which in turn calls __wait_on_bit(), which calls prepare_to_wait() and
> > finish_wait().  These are in the scheduler, but this is being called from
> > the CPU that remains online, so that should be OK.
> > 
> > But what do you invoke on the outgoing CPU?  Can you get away with
> > simply clearing the bit, or do you also have to do a wakeup?  It looks
> > to me like a wakeup is required, which would be illegal on the outgoing
> > CPU, which is at a point where it cannot legally invoke the scheduler.
> > Or am I missing something?
> 
> Actually the timeout versions but I think that doesn't matter.
> The wait_on_bit will busy-loop with testing for the bit. Inside the loop
> it calls the 'action' which in my case will be bit_wait_io_timeout().
> This calls schedule_timeout().

Ah, good point.

> See proof of concept in attachment. One observed issue: hot unplug from
> commandline takes a lot more time. About 7 seconds instead of ~0.5.
> Probably I did something wrong.

Well, you do set the timeout to five seconds, and so if the condition
does not get set before the surviving CPU finds its way to the
out_of_line_wait_on_bit_timeout(), you are guaranteed to wait for at
least five seconds.

One alternative approach would be to have a loop around a series of
shorter waits.  Other thoughts?

> > You know, this situation is giving me a bad case of nostalgia for the
> > old Sequent Symmetry and NUMA-Q hardware.  On those platforms, the
> > outgoing CPU could turn itself off, and thus didn't need to tell some
> > other CPU when it was ready to be turned off.  Seems to me that this
> > self-turn-off capability would be a great feature for future systems!
> 
> There are a lot more issues with hotplug on ARM...

Just trying to clean up this particular corner at the moment.  ;-)

> Patch/RFC attached.

Again, I believe that you will need to loop over a shorter timeout
in order to get reasonable latencies.  If waiting a millisecond at
a time is an energy-efficiency concern (don't know why it would be
in this rare case, but...), then one approach would be to start
with very short waits, then increase the wait time, for example,
doubling the wait time on each pass through the loop would result
in a smallish number of wakeups, but would mean that you waited
no more than twice as long as necessary.

Thoughts?

							Thanx, Paul

> >From feaad18a483871747170fa797f80b49592489ad1 Mon Sep 17 00:00:00 2001
> From: Krzysztof Kozlowski <k.kozlowski@samsung.com>
> Date: Wed, 4 Feb 2015 16:14:41 +0100
> Subject: [RFC] ARM: Don't use complete() during __cpu_die
> 
> The complete() should not be used on offlined CPU.
> 
> Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
> ---
>  arch/arm/kernel/smp.c | 22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 86ef244c5a24..f3a5ad80a253 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -26,6 +26,7 @@
>  #include <linux/completion.h>
>  #include <linux/cpufreq.h>
>  #include <linux/irq_work.h>
> +#include <linux/wait.h>
> 
>  #include <linux/atomic.h>
>  #include <asm/smp.h>
> @@ -76,6 +77,9 @@ enum ipi_msg_type {
> 
>  static DECLARE_COMPLETION(cpu_running);
> 
> +#define CPU_DIE_WAIT_BIT		0
> +static unsigned long wait_cpu_die;
> +
>  static struct smp_operations smp_ops;
> 
>  void __init smp_set_ops(struct smp_operations *ops)
> @@ -133,7 +137,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>  		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
>  	}
> 
> -
> +	set_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die);
>  	memset(&secondary_data, 0, sizeof(secondary_data));
>  	return ret;
>  }
> @@ -213,7 +217,17 @@ int __cpu_disable(void)
>  	return 0;
>  }
> 
> -static DECLARE_COMPLETION(cpu_died);
> +static int wait_for_cpu_die(void)
> +{
> +	might_sleep();
> +
> +	if (!test_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die))
> +		return 0;
> +
> +	return out_of_line_wait_on_bit_timeout(&wait_cpu_die, CPU_DIE_WAIT_BIT,
> +					bit_wait_timeout, TASK_UNINTERRUPTIBLE,
> +					msecs_to_jiffies(5000));
> +}
> 
>  /*
>   * called on the thread which is asking for a CPU to be shutdown -
> @@ -221,7 +235,7 @@ static DECLARE_COMPLETION(cpu_died);
>   */
>  void __cpu_die(unsigned int cpu)
>  {
> -	if (!wait_for_completion_timeout(&cpu_died, msecs_to_jiffies(5000))) {
> +	if (wait_for_cpu_die()) {
>  		pr_err("CPU%u: cpu didn't die\n", cpu);
>  		return;
>  	}
> @@ -267,7 +281,7 @@ void __ref cpu_die(void)
>  	 * this returns, power and/or clocks can be removed at any point
>  	 * from this CPU and its cache by platform_cpu_kill().
>  	 */
> -	complete(&cpu_died);
> +	clear_bit(CPU_DIE_WAIT_BIT, &wait_cpu_die);
> 
>  	/*
>  	 * Ensure that the cache lines associated with that completion are
> -- 
> 1.9.1
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
  2015-02-04 15:56                   ` Paul E. McKenney
  (?)
@ 2015-02-04 16:10                     ` Krzysztof Kozlowski
  -1 siblings, 0 replies; 45+ messages in thread
From: Krzysztof Kozlowski @ 2015-02-04 16:10 UTC (permalink / raw)
  To: paulmck
  Cc: Russell King - ARM Linux, Fengguang Wu, LKP, linux-kernel,
	Bartlomiej Zolnierkiewicz, linux-arm-kernel, Arnd Bergmann,
	MarkRutland

On śro, 2015-02-04 at 07:56 -0800, Paul E. McKenney wrote:
> On Wed, Feb 04, 2015 at 04:22:28PM +0100, Krzysztof Kozlowski wrote:
> > 
> > Actually the timeout versions but I think that doesn't matter.
> > The wait_on_bit will busy-loop with testing for the bit. Inside the loop
> > it calls the 'action' which in my case will be bit_wait_io_timeout().
> > This calls schedule_timeout().
> 
> Ah, good point.
> 
> > See proof of concept in attachment. One observed issue: hot unplug from
> > commandline takes a lot more time. About 7 seconds instead of ~0.5.
> > Probably I did something wrong.
> 
> Well, you do set the timeout to five seconds, and so if the condition
> does not get set before the surviving CPU finds its way to the
> out_of_line_wait_on_bit_timeout(), you are guaranteed to wait for at
> least five seconds.
>
> One alternative approach would be to have a loop around a series of
> shorter waits.  Other thoughts?

Right! That was the issue. It seems it works. I'll think also on
self-adapting interval as you said below. I'll test it more and send a
patch.

Best regards,
Krzysztof

> 
> > > You know, this situation is giving me a bad case of nostalgia for the
> > > old Sequent Symmetry and NUMA-Q hardware.  On those platforms, the
> > > outgoing CPU could turn itself off, and thus didn't need to tell some
> > > other CPU when it was ready to be turned off.  Seems to me that this
> > > self-turn-off capability would be a great feature for future systems!
> > 
> > There are a lot more issues with hotplug on ARM...
> 
> Just trying to clean up this particular corner at the moment.  ;-)
> 
> > Patch/RFC attached.
> 
> Again, I believe that you will need to loop over a shorter timeout
> in order to get reasonable latencies.  If waiting a millisecond at
> a time is an energy-efficiency concern (don't know why it would be
> in this rare case, but...), then one approach would be to start
> with very short waits, then increase the wait time, for example,
> doubling the wait time on each pass through the loop would result
> in a smallish number of wakeups, but would mean that you waited
> no more than twice as long as necessary.
> 
> Thoughts?



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 16:10                     ` Krzysztof Kozlowski
  0 siblings, 0 replies; 45+ messages in thread
From: Krzysztof Kozlowski @ 2015-02-04 16:10 UTC (permalink / raw)
  To: linux-arm-kernel

On ?ro, 2015-02-04 at 07:56 -0800, Paul E. McKenney wrote:
> On Wed, Feb 04, 2015 at 04:22:28PM +0100, Krzysztof Kozlowski wrote:
> > 
> > Actually the timeout versions but I think that doesn't matter.
> > The wait_on_bit will busy-loop with testing for the bit. Inside the loop
> > it calls the 'action' which in my case will be bit_wait_io_timeout().
> > This calls schedule_timeout().
> 
> Ah, good point.
> 
> > See proof of concept in attachment. One observed issue: hot unplug from
> > commandline takes a lot more time. About 7 seconds instead of ~0.5.
> > Probably I did something wrong.
> 
> Well, you do set the timeout to five seconds, and so if the condition
> does not get set before the surviving CPU finds its way to the
> out_of_line_wait_on_bit_timeout(), you are guaranteed to wait for at
> least five seconds.
>
> One alternative approach would be to have a loop around a series of
> shorter waits.  Other thoughts?

Right! That was the issue. It seems it works. I'll think also on
self-adapting interval as you said below. I'll test it more and send a
patch.

Best regards,
Krzysztof

> 
> > > You know, this situation is giving me a bad case of nostalgia for the
> > > old Sequent Symmetry and NUMA-Q hardware.  On those platforms, the
> > > outgoing CPU could turn itself off, and thus didn't need to tell some
> > > other CPU when it was ready to be turned off.  Seems to me that this
> > > self-turn-off capability would be a great feature for future systems!
> > 
> > There are a lot more issues with hotplug on ARM...
> 
> Just trying to clean up this particular corner at the moment.  ;-)
> 
> > Patch/RFC attached.
> 
> Again, I believe that you will need to loop over a shorter timeout
> in order to get reasonable latencies.  If waiting a millisecond at
> a time is an energy-efficiency concern (don't know why it would be
> in this rare case, but...), then one approach would be to start
> with very short waits, then increase the wait time, for example,
> doubling the wait time on each pass through the loop would result
> in a smallish number of wakeups, but would mean that you waited
> no more than twice as long as necessary.
> 
> Thoughts?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 16:10                     ` Krzysztof Kozlowski
  0 siblings, 0 replies; 45+ messages in thread
From: Krzysztof Kozlowski @ 2015-02-04 16:10 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 2235 bytes --]

On śro, 2015-02-04 at 07:56 -0800, Paul E. McKenney wrote:
> On Wed, Feb 04, 2015 at 04:22:28PM +0100, Krzysztof Kozlowski wrote:
> > 
> > Actually the timeout versions but I think that doesn't matter.
> > The wait_on_bit will busy-loop with testing for the bit. Inside the loop
> > it calls the 'action' which in my case will be bit_wait_io_timeout().
> > This calls schedule_timeout().
> 
> Ah, good point.
> 
> > See proof of concept in attachment. One observed issue: hot unplug from
> > commandline takes a lot more time. About 7 seconds instead of ~0.5.
> > Probably I did something wrong.
> 
> Well, you do set the timeout to five seconds, and so if the condition
> does not get set before the surviving CPU finds its way to the
> out_of_line_wait_on_bit_timeout(), you are guaranteed to wait for at
> least five seconds.
>
> One alternative approach would be to have a loop around a series of
> shorter waits.  Other thoughts?

Right! That was the issue. It seems it works. I'll think also on
self-adapting interval as you said below. I'll test it more and send a
patch.

Best regards,
Krzysztof

> 
> > > You know, this situation is giving me a bad case of nostalgia for the
> > > old Sequent Symmetry and NUMA-Q hardware.  On those platforms, the
> > > outgoing CPU could turn itself off, and thus didn't need to tell some
> > > other CPU when it was ready to be turned off.  Seems to me that this
> > > self-turn-off capability would be a great feature for future systems!
> > 
> > There are a lot more issues with hotplug on ARM...
> 
> Just trying to clean up this particular corner at the moment.  ;-)
> 
> > Patch/RFC attached.
> 
> Again, I believe that you will need to loop over a shorter timeout
> in order to get reasonable latencies.  If waiting a millisecond at
> a time is an energy-efficiency concern (don't know why it would be
> in this rare case, but...), then one approach would be to start
> with very short waits, then increase the wait time, for example,
> doubling the wait time on each pass through the loop would result
> in a smallish number of wakeups, but would mean that you waited
> no more than twice as long as necessary.
> 
> Thoughts?



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
  2015-02-04 16:10                     ` Krzysztof Kozlowski
  (?)
@ 2015-02-04 16:28                       ` Paul E. McKenney
  -1 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2015-02-04 16:28 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Russell King - ARM Linux, Fengguang Wu, LKP, linux-kernel,
	Bartlomiej Zolnierkiewicz, linux-arm-kernel, Arnd Bergmann,
	MarkRutland

On Wed, Feb 04, 2015 at 05:10:56PM +0100, Krzysztof Kozlowski wrote:
> On śro, 2015-02-04 at 07:56 -0800, Paul E. McKenney wrote:
> > On Wed, Feb 04, 2015 at 04:22:28PM +0100, Krzysztof Kozlowski wrote:
> > > 
> > > Actually the timeout versions but I think that doesn't matter.
> > > The wait_on_bit will busy-loop with testing for the bit. Inside the loop
> > > it calls the 'action' which in my case will be bit_wait_io_timeout().
> > > This calls schedule_timeout().
> > 
> > Ah, good point.
> > 
> > > See proof of concept in attachment. One observed issue: hot unplug from
> > > commandline takes a lot more time. About 7 seconds instead of ~0.5.
> > > Probably I did something wrong.
> > 
> > Well, you do set the timeout to five seconds, and so if the condition
> > does not get set before the surviving CPU finds its way to the
> > out_of_line_wait_on_bit_timeout(), you are guaranteed to wait for at
> > least five seconds.
> >
> > One alternative approach would be to have a loop around a series of
> > shorter waits.  Other thoughts?
> 
> Right! That was the issue. It seems it works. I'll think also on
> self-adapting interval as you said below. I'll test it more and send a
> patch.

Sounds good!

Are you doing ARM, ARM64, or both?  I of course vote for both.  ;-)

							Thanx, Paul

> Best regards,
> Krzysztof
> 
> > 
> > > > You know, this situation is giving me a bad case of nostalgia for the
> > > > old Sequent Symmetry and NUMA-Q hardware.  On those platforms, the
> > > > outgoing CPU could turn itself off, and thus didn't need to tell some
> > > > other CPU when it was ready to be turned off.  Seems to me that this
> > > > self-turn-off capability would be a great feature for future systems!
> > > 
> > > There are a lot more issues with hotplug on ARM...
> > 
> > Just trying to clean up this particular corner at the moment.  ;-)
> > 
> > > Patch/RFC attached.
> > 
> > Again, I believe that you will need to loop over a shorter timeout
> > in order to get reasonable latencies.  If waiting a millisecond at
> > a time is an energy-efficiency concern (don't know why it would be
> > in this rare case, but...), then one approach would be to start
> > with very short waits, then increase the wait time, for example,
> > doubling the wait time on each pass through the loop would result
> > in a smallish number of wakeups, but would mean that you waited
> > no more than twice as long as necessary.
> > 
> > Thoughts?
> 
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 16:28                       ` Paul E. McKenney
  0 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2015-02-04 16:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 04, 2015 at 05:10:56PM +0100, Krzysztof Kozlowski wrote:
> On ?ro, 2015-02-04 at 07:56 -0800, Paul E. McKenney wrote:
> > On Wed, Feb 04, 2015 at 04:22:28PM +0100, Krzysztof Kozlowski wrote:
> > > 
> > > Actually the timeout versions but I think that doesn't matter.
> > > The wait_on_bit will busy-loop with testing for the bit. Inside the loop
> > > it calls the 'action' which in my case will be bit_wait_io_timeout().
> > > This calls schedule_timeout().
> > 
> > Ah, good point.
> > 
> > > See proof of concept in attachment. One observed issue: hot unplug from
> > > commandline takes a lot more time. About 7 seconds instead of ~0.5.
> > > Probably I did something wrong.
> > 
> > Well, you do set the timeout to five seconds, and so if the condition
> > does not get set before the surviving CPU finds its way to the
> > out_of_line_wait_on_bit_timeout(), you are guaranteed to wait for at
> > least five seconds.
> >
> > One alternative approach would be to have a loop around a series of
> > shorter waits.  Other thoughts?
> 
> Right! That was the issue. It seems it works. I'll think also on
> self-adapting interval as you said below. I'll test it more and send a
> patch.

Sounds good!

Are you doing ARM, ARM64, or both?  I of course vote for both.  ;-)

							Thanx, Paul

> Best regards,
> Krzysztof
> 
> > 
> > > > You know, this situation is giving me a bad case of nostalgia for the
> > > > old Sequent Symmetry and NUMA-Q hardware.  On those platforms, the
> > > > outgoing CPU could turn itself off, and thus didn't need to tell some
> > > > other CPU when it was ready to be turned off.  Seems to me that this
> > > > self-turn-off capability would be a great feature for future systems!
> > > 
> > > There are a lot more issues with hotplug on ARM...
> > 
> > Just trying to clean up this particular corner at the moment.  ;-)
> > 
> > > Patch/RFC attached.
> > 
> > Again, I believe that you will need to loop over a shorter timeout
> > in order to get reasonable latencies.  If waiting a millisecond at
> > a time is an energy-efficiency concern (don't know why it would be
> > in this rare case, but...), then one approach would be to start
> > with very short waits, then increase the wait time, for example,
> > doubling the wait time on each pass through the loop would result
> > in a smallish number of wakeups, but would mean that you waited
> > no more than twice as long as necessary.
> > 
> > Thoughts?
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 16:28                       ` Paul E. McKenney
  0 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2015-02-04 16:28 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 2522 bytes --]

On Wed, Feb 04, 2015 at 05:10:56PM +0100, Krzysztof Kozlowski wrote:
> On śro, 2015-02-04 at 07:56 -0800, Paul E. McKenney wrote:
> > On Wed, Feb 04, 2015 at 04:22:28PM +0100, Krzysztof Kozlowski wrote:
> > > 
> > > Actually the timeout versions but I think that doesn't matter.
> > > The wait_on_bit will busy-loop with testing for the bit. Inside the loop
> > > it calls the 'action' which in my case will be bit_wait_io_timeout().
> > > This calls schedule_timeout().
> > 
> > Ah, good point.
> > 
> > > See proof of concept in attachment. One observed issue: hot unplug from
> > > commandline takes a lot more time. About 7 seconds instead of ~0.5.
> > > Probably I did something wrong.
> > 
> > Well, you do set the timeout to five seconds, and so if the condition
> > does not get set before the surviving CPU finds its way to the
> > out_of_line_wait_on_bit_timeout(), you are guaranteed to wait for at
> > least five seconds.
> >
> > One alternative approach would be to have a loop around a series of
> > shorter waits.  Other thoughts?
> 
> Right! That was the issue. It seems it works. I'll think also on
> self-adapting interval as you said below. I'll test it more and send a
> patch.

Sounds good!

Are you doing ARM, ARM64, or both?  I of course vote for both.  ;-)

							Thanx, Paul

> Best regards,
> Krzysztof
> 
> > 
> > > > You know, this situation is giving me a bad case of nostalgia for the
> > > > old Sequent Symmetry and NUMA-Q hardware.  On those platforms, the
> > > > outgoing CPU could turn itself off, and thus didn't need to tell some
> > > > other CPU when it was ready to be turned off.  Seems to me that this
> > > > self-turn-off capability would be a great feature for future systems!
> > > 
> > > There are a lot more issues with hotplug on ARM...
> > 
> > Just trying to clean up this particular corner at the moment.  ;-)
> > 
> > > Patch/RFC attached.
> > 
> > Again, I believe that you will need to loop over a shorter timeout
> > in order to get reasonable latencies.  If waiting a millisecond at
> > a time is an energy-efficiency concern (don't know why it would be
> > in this rare case, but...), then one approach would be to start
> > with very short waits, then increase the wait time, for example,
> > doubling the wait time on each pass through the loop would result
> > in a smallish number of wakeups, but would mean that you waited
> > no more than twice as long as necessary.
> > 
> > Thoughts?
> 
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
  2015-02-04 16:28                       ` Paul E. McKenney
  (?)
@ 2015-02-04 16:43                         ` Krzysztof Kozlowski
  -1 siblings, 0 replies; 45+ messages in thread
From: Krzysztof Kozlowski @ 2015-02-04 16:43 UTC (permalink / raw)
  To: paulmck
  Cc: Russell King - ARM Linux, Fengguang Wu, LKP, linux-kernel,
	Bartlomiej Zolnierkiewicz, linux-arm-kernel, Arnd Bergmann,
	MarkRutland

On śro, 2015-02-04 at 08:28 -0800, Paul E. McKenney wrote:
> On Wed, Feb 04, 2015 at 05:10:56PM +0100, Krzysztof Kozlowski wrote:
> > On śro, 2015-02-04 at 07:56 -0800, Paul E. McKenney wrote:
> > > On Wed, Feb 04, 2015 at 04:22:28PM +0100, Krzysztof Kozlowski wrote:
> > > > 
> > > > Actually the timeout versions but I think that doesn't matter.
> > > > The wait_on_bit will busy-loop with testing for the bit. Inside the loop
> > > > it calls the 'action' which in my case will be bit_wait_io_timeout().
> > > > This calls schedule_timeout().
> > > 
> > > Ah, good point.
> > > 
> > > > See proof of concept in attachment. One observed issue: hot unplug from
> > > > commandline takes a lot more time. About 7 seconds instead of ~0.5.
> > > > Probably I did something wrong.
> > > 
> > > Well, you do set the timeout to five seconds, and so if the condition
> > > does not get set before the surviving CPU finds its way to the
> > > out_of_line_wait_on_bit_timeout(), you are guaranteed to wait for at
> > > least five seconds.
> > >
> > > One alternative approach would be to have a loop around a series of
> > > shorter waits.  Other thoughts?
> > 
> > Right! That was the issue. It seems it works. I'll think also on
> > self-adapting interval as you said below. I'll test it more and send a
> > patch.
> 
> Sounds good!
> 
> Are you doing ARM, ARM64, or both?  I of course vote for both.  ;-)

I'll do both but first I need to find who has ARM64 board in my team.

Best regards,
Krzysztof




^ permalink raw reply	[flat|nested] 45+ messages in thread

* [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 16:43                         ` Krzysztof Kozlowski
  0 siblings, 0 replies; 45+ messages in thread
From: Krzysztof Kozlowski @ 2015-02-04 16:43 UTC (permalink / raw)
  To: linux-arm-kernel

On ?ro, 2015-02-04 at 08:28 -0800, Paul E. McKenney wrote:
> On Wed, Feb 04, 2015 at 05:10:56PM +0100, Krzysztof Kozlowski wrote:
> > On ?ro, 2015-02-04 at 07:56 -0800, Paul E. McKenney wrote:
> > > On Wed, Feb 04, 2015 at 04:22:28PM +0100, Krzysztof Kozlowski wrote:
> > > > 
> > > > Actually the timeout versions but I think that doesn't matter.
> > > > The wait_on_bit will busy-loop with testing for the bit. Inside the loop
> > > > it calls the 'action' which in my case will be bit_wait_io_timeout().
> > > > This calls schedule_timeout().
> > > 
> > > Ah, good point.
> > > 
> > > > See proof of concept in attachment. One observed issue: hot unplug from
> > > > commandline takes a lot more time. About 7 seconds instead of ~0.5.
> > > > Probably I did something wrong.
> > > 
> > > Well, you do set the timeout to five seconds, and so if the condition
> > > does not get set before the surviving CPU finds its way to the
> > > out_of_line_wait_on_bit_timeout(), you are guaranteed to wait for at
> > > least five seconds.
> > >
> > > One alternative approach would be to have a loop around a series of
> > > shorter waits.  Other thoughts?
> > 
> > Right! That was the issue. It seems it works. I'll think also on
> > self-adapting interval as you said below. I'll test it more and send a
> > patch.
> 
> Sounds good!
> 
> Are you doing ARM, ARM64, or both?  I of course vote for both.  ;-)

I'll do both but first I need to find who has ARM64 board in my team.

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [rcu] [ INFO: suspicious RCU usage. ]
@ 2015-02-04 16:43                         ` Krzysztof Kozlowski
  0 siblings, 0 replies; 45+ messages in thread
From: Krzysztof Kozlowski @ 2015-02-04 16:43 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 1540 bytes --]

On śro, 2015-02-04 at 08:28 -0800, Paul E. McKenney wrote:
> On Wed, Feb 04, 2015 at 05:10:56PM +0100, Krzysztof Kozlowski wrote:
> > On śro, 2015-02-04 at 07:56 -0800, Paul E. McKenney wrote:
> > > On Wed, Feb 04, 2015 at 04:22:28PM +0100, Krzysztof Kozlowski wrote:
> > > > 
> > > > Actually the timeout versions but I think that doesn't matter.
> > > > The wait_on_bit will busy-loop with testing for the bit. Inside the loop
> > > > it calls the 'action' which in my case will be bit_wait_io_timeout().
> > > > This calls schedule_timeout().
> > > 
> > > Ah, good point.
> > > 
> > > > See proof of concept in attachment. One observed issue: hot unplug from
> > > > commandline takes a lot more time. About 7 seconds instead of ~0.5.
> > > > Probably I did something wrong.
> > > 
> > > Well, you do set the timeout to five seconds, and so if the condition
> > > does not get set before the surviving CPU finds its way to the
> > > out_of_line_wait_on_bit_timeout(), you are guaranteed to wait for at
> > > least five seconds.
> > >
> > > One alternative approach would be to have a loop around a series of
> > > shorter waits.  Other thoughts?
> > 
> > Right! That was the issue. It seems it works. I'll think also on
> > self-adapting interval as you said below. I'll test it more and send a
> > patch.
> 
> Sounds good!
> 
> Are you doing ARM, ARM64, or both?  I of course vote for both.  ;-)

I'll do both but first I need to find who has ARM64 board in my team.

Best regards,
Krzysztof




^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2015-02-04 16:43 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-01  2:59 [rcu] [ INFO: suspicious RCU usage. ] Fengguang Wu
2015-02-01  2:59 ` Fengguang Wu
2015-02-03 10:01 ` Krzysztof Kozlowski
2015-02-03 10:01   ` Krzysztof Kozlowski
2015-02-03 16:27   ` Paul E. McKenney
2015-02-03 16:27     ` Paul E. McKenney
2015-02-04 11:39     ` Krzysztof Kozlowski
2015-02-04 11:39       ` Krzysztof Kozlowski
2015-02-04 11:39       ` Krzysztof Kozlowski
2015-02-04 13:00       ` Russell King - ARM Linux
2015-02-04 13:00         ` Russell King - ARM Linux
2015-02-04 13:00         ` Russell King - ARM Linux
2015-02-04 13:14         ` Paul E. McKenney
2015-02-04 13:14           ` Paul E. McKenney
2015-02-04 13:14           ` Paul E. McKenney
2015-02-04 14:16           ` Krzysztof Kozlowski
2015-02-04 14:16             ` Krzysztof Kozlowski
2015-02-04 14:16             ` Krzysztof Kozlowski
2015-02-04 15:10             ` Paul E. McKenney
2015-02-04 15:10               ` Paul E. McKenney
2015-02-04 15:10               ` Paul E. McKenney
2015-02-04 15:16               ` Russell King - ARM Linux
2015-02-04 15:16                 ` Russell King - ARM Linux
2015-02-04 15:16                 ` Russell King - ARM Linux
2015-02-04 15:46                 ` Paul E. McKenney
2015-02-04 15:46                   ` Paul E. McKenney
2015-02-04 15:46                   ` Paul E. McKenney
2015-02-04 15:22               ` Krzysztof Kozlowski
2015-02-04 15:22                 ` Krzysztof Kozlowski
2015-02-04 15:22                 ` Krzysztof Kozlowski
2015-02-04 15:56                 ` Paul E. McKenney
2015-02-04 15:56                   ` Paul E. McKenney
2015-02-04 15:56                   ` Paul E. McKenney
2015-02-04 16:10                   ` Krzysztof Kozlowski
2015-02-04 16:10                     ` Krzysztof Kozlowski
2015-02-04 16:10                     ` Krzysztof Kozlowski
2015-02-04 16:28                     ` Paul E. McKenney
2015-02-04 16:28                       ` Paul E. McKenney
2015-02-04 16:28                       ` Paul E. McKenney
2015-02-04 16:43                       ` Krzysztof Kozlowski
2015-02-04 16:43                         ` Krzysztof Kozlowski
2015-02-04 16:43                         ` Krzysztof Kozlowski
2015-02-04 13:13       ` Paul E. McKenney
2015-02-04 13:13         ` Paul E. McKenney
2015-02-04 13:13         ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.