All of lore.kernel.org
 help / color / mirror / Atom feed
* PROBLEM: Kernel OOPS and possible system freeze after concurrent writing to cpufreq/scaling_governor
@ 2014-07-16 14:53 Robert Schöne
  2014-07-24  7:11 ` PROBLEM: Kernel OOPS and possible system freeze after concurrent writing to cpufreq/scaling_governor (Resend) Robert Schöne
  0 siblings, 1 reply; 13+ messages in thread
From: Robert Schöne @ 2014-07-16 14:53 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar; +Cc: linux-pm

[-- Attachment #1: Type: text/plain, Size: 24070 bytes --]

1. Summary:
When two or more processes concurrently activate the ondemand governor, Linux might crash.

2. Problem:
When I write concurrently to the cpufreq sysfs scaling_governor files, the kernel gives me first a warning and as a follow multiple oops'. Afterwards several systems within the kernel fail and I have to reboot the system. While the bug report is based on the current Ubuntu kernel (3.13.0-27-generic), it also appears on the latest mainline (3.16-rc5). However, I only managed to get the kernel log from the older Ubuntu kernel.


3. Keywords: cpufreq, governor, policy, ondemand

4. Kernel Version
Linux version 3.13.0-27-generic (buildd@akateko) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #50-Ubuntu SMP Thu May 15 18:06:16 UTC 2014

5. Warning message, followed by OOPS messages:

Jul 16 09:47:39 basti kernel: [  398.441455] ------------[ cut here ]------------
Jul 16 09:47:39 basti kernel: [  398.441462] WARNING: CPU: 5 PID: 4263 at /build/buildd/linux-3.13.0/drivers/cpufreq/cpufreq_governor.c:203 cpufreq_governor_dbs+0x682/0x6f0()
Jul 16 09:47:39 basti kernel: [  398.441494] Modules linked in: vtsspp(OF) sep3_15(OF) pax(OF) apwr3_1(OF) nfsv3 rfcomm bnep bluetooth binfmt_misc nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache snd_hda_codec_hdmi snd_hda_codec_conexant ppdev gpio_ich intel_rapl x86_pkg
_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd parport_pc serio_raw snd_hda_intel snd_hda_codec i915 snd_hwdep lpc_ich snd_pcm video drm_kms_helper 
tpm_infineon drm snd_page_alloc mei_me snd_timer mei snd i2c_algo_bit soundcore mac_hid lp parport e1000e ahci psmouse libahci ptp pps_core
Jul 16 09:47:39 basti kernel: [  398.441496] CPU: 5 PID: 4263 Comm: tee Tainted: GF          O 3.13.0-27-generic #50-Ubuntu
Jul 16 09:47:39 basti kernel: [  398.441497] Hardware name: FUJITSU ESPRIMO P700/D3061-A1, BIOS V4.6.4.0 R1.12.0 for D3061-A1x 07/04/2011
Jul 16 09:47:39 basti kernel: [  398.441500]  0000000000000009 ffff8800b3081bc0 ffffffff817199c4 0000000000000000
Jul 16 09:47:39 basti kernel: [  398.441502]  ffff8800b3081bf8 ffffffff810676bd 0000000000000000 ffff88022ebb4e00
Jul 16 09:47:39 basti kernel: [  398.441504]  0000000000000004 0000000000000002 ffffffff81cd3ae0 ffff8800b3081c08
Jul 16 09:47:39 basti kernel: [  398.441504] Call Trace:
Jul 16 09:47:39 basti kernel: [  398.441509]  [<ffffffff817199c4>] dump_stack+0x45/0x56
Jul 16 09:47:39 basti kernel: [  398.441512]  [<ffffffff810676bd>] warn_slowpath_common+0x7d/0xa0
Jul 16 09:47:39 basti kernel: [  398.441513]  [<ffffffff8106779a>] warn_slowpath_null+0x1a/0x20
Jul 16 09:47:39 basti kernel: [  398.441515]  [<ffffffff815c7142>] cpufreq_governor_dbs+0x682/0x6f0
Jul 16 09:47:39 basti kernel: [  398.441518]  [<ffffffff81725ebc>] ? notifier_call_chain+0x4c/0x70
Jul 16 09:47:39 basti kernel: [  398.441520]  [<ffffffff815c4fc7>] od_cpufreq_governor_dbs+0x17/0x20
Jul 16 09:47:39 basti kernel: [  398.441522]  [<ffffffff815c10cd>] __cpufreq_governor+0xfd/0x230
Jul 16 09:47:39 basti kernel: [  398.441524]  [<ffffffff815c1349>] cpufreq_set_policy+0x149/0x2e0
Jul 16 09:47:39 basti kernel: [  398.441526]  [<ffffffff815c28bd>] store_scaling_governor+0xad/0xf0
Jul 16 09:47:39 basti kernel: [  398.441527]  [<ffffffff815c2260>] ? cpufreq_update_policy+0x170/0x170
Jul 16 09:47:39 basti kernel: [  398.441529]  [<ffffffff815c1a19>] store+0x79/0xc0
Jul 16 09:47:39 basti kernel: [  398.441532]  [<ffffffff812325b8>] sysfs_write_file+0x128/0x1c0
Jul 16 09:47:39 basti kernel: [  398.441534]  [<ffffffff811bc664>] vfs_write+0xb4/0x1f0
Jul 16 09:47:39 basti kernel: [  398.441536]  [<ffffffff811bd099>] SyS_write+0x49/0xa0
Jul 16 09:47:39 basti kernel: [  398.441539]  [<ffffffff8172a5bf>] tracesys+0xe1/0xe6
Jul 16 09:47:39 basti kernel: [  398.441540] ---[ end trace 9a9b0afb92b8c41f ]---
Jul 16 09:47:39 basti kernel: [  398.441545] BUG: unable to handle kernel NULL pointer dereference at           (null)
Jul 16 09:47:39 basti kernel: [  398.441547] IP: [<ffffffff815c6b12>] cpufreq_governor_dbs+0x52/0x6f0
Jul 16 09:47:39 basti kernel: [  398.441549] PGD b39cc067 PUD b30cb067 PMD 0 
Jul 16 09:47:39 basti kernel: [  398.441550] Oops: 0000 [#1] SMP 
Jul 16 09:47:39 basti kernel: [  398.441571] Modules linked in: vtsspp(OF) sep3_15(OF) pax(OF) apwr3_1(OF) nfsv3 rfcomm bnep bluetooth binfmt_misc nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache snd_hda_codec_hdmi snd_hda_codec_conexant ppdev gpio_ich intel_rapl x86_pkg
_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd parport_pc serio_raw snd_hda_intel snd_hda_codec i915 snd_hwdep lpc_ich snd_pcm video drm_kms_helper 
tpm_infineon drm snd_page_alloc mei_me snd_timer mei snd i2c_algo_bit soundcore mac_hid lp parport e1000e ahci psmouse libahci ptp pps_core
Jul 16 09:47:39 basti kernel: [  398.441573] CPU: 5 PID: 4263 Comm: tee Tainted: GF       W  O 3.13.0-27-generic #50-Ubuntu
Jul 16 09:47:39 basti kernel: [  398.441573] Hardware name: FUJITSU ESPRIMO P700/D3061-A1, BIOS V4.6.4.0 R1.12.0 for D3061-A1x 07/04/2011
Jul 16 09:47:39 basti kernel: [  398.441574] task: ffff88022e5f17f0 ti: ffff8800b3080000 task.ti: ffff8800b3080000
Jul 16 09:47:39 basti kernel: [  398.441576] RIP: 0010:[<ffffffff815c6b12>]  [<ffffffff815c6b12>] cpufreq_governor_dbs+0x52/0x6f0
Jul 16 09:47:39 basti kernel: [  398.441577] RSP: 0018:ffff8800b3081c18  EFLAGS: 00010293
Jul 16 09:47:39 basti kernel: [  398.441577] RAX: 0000000000000024 RBX: 0000000000000000 RCX: 00000000000096aa
Jul 16 09:47:39 basti kernel: [  398.441578] RDX: 0000000096aa96aa RSI: 0000000000000000 RDI: 0000000000000009
Jul 16 09:47:39 basti kernel: [  398.441579] RBP: ffff8800b3081c88 R08: 0000000000000082 R09: ffffffff81ecdd30
Jul 16 09:47:39 basti kernel: [  398.441579] R10: 000000000002f8a0 R11: 0000000000040000 R12: ffff88022ebb4e00
Jul 16 09:47:39 basti kernel: [  398.441580] R13: 0000000000000004 R14: 0000000000000002 R15: ffffffff81cd3ae0
Jul 16 09:47:39 basti kernel: [  398.441580] FS:  00002b53b77acb80(0000) GS:ffff88023e340000(0000) knlGS:0000000000000000
Jul 16 09:47:39 basti kernel: [  398.441581] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 16 09:47:39 basti kernel: [  398.441582] CR2: 0000000000000000 CR3: 00000000b395c000 CR4: 00000000000407e0
Jul 16 09:47:39 basti kernel: [  398.441582] Stack:
Jul 16 09:47:39 basti kernel: [  398.441583]  0000000000000000 0000000000000002 ffff8800b3081d00 0000000000000000
Jul 16 09:47:39 basti kernel: [  398.441584]  ffff8800b3081c70 ffffffff81725ebc ffffffff81cd3520 0000000000000000
Jul 16 09:47:39 basti kernel: [  398.441586]  0000000000000002 ffff88022ebb4e00 0000000000000002 ffffffff81cd3b40
Jul 16 09:47:39 basti kernel: [  398.441586] Call Trace:
Jul 16 09:47:39 basti kernel: [  398.441587]  [<ffffffff81725ebc>] ? notifier_call_chain+0x4c/0x70
Jul 16 09:47:39 basti kernel: [  398.441589]  [<ffffffff815c4fc7>] od_cpufreq_governor_dbs+0x17/0x20
Jul 16 09:47:39 basti kernel: [  398.441590]  [<ffffffff815c10cd>] __cpufreq_governor+0xfd/0x230
Jul 16 09:47:39 basti kernel: [  398.441591]  [<ffffffff815c1349>] cpufreq_set_policy+0x149/0x2e0
Jul 16 09:47:39 basti kernel: [  398.441592]  [<ffffffff815c28bd>] store_scaling_governor+0xad/0xf0
Jul 16 09:47:39 basti kernel: [  398.441593]  [<ffffffff815c2260>] ? cpufreq_update_policy+0x170/0x170
Jul 16 09:47:39 basti kernel: [  398.441594]  [<ffffffff815c1a19>] store+0x79/0xc0
Jul 16 09:47:39 basti kernel: [  398.441595]  [<ffffffff812325b8>] sysfs_write_file+0x128/0x1c0
Jul 16 09:47:39 basti kernel: [  398.441597]  [<ffffffff811bc664>] vfs_write+0xb4/0x1f0
Jul 16 09:47:39 basti kernel: [  398.441600]  [<ffffffff811bd099>] SyS_write+0x49/0xa0
Jul 16 09:47:39 basti kernel: [  398.441601]  [<ffffffff8172a5bf>] tracesys+0xe1/0xe6
Jul 16 09:47:39 basti kernel: [  398.441611] Code: ff 84 c0 0f 84 40 02 00 00 49 8b 5c 24 70 48 85 db 0f 84 29 06 00 00 41 83 fe 04 0f 84 60 02 00 00 41 83 fe 05 0f 84 2e 02 00 00 <48> 8b 03 44 89 ef ff 50 20 48 89 45 c0 48 8b 03 83 38 01 0f 84 
Jul 16 09:47:39 basti kernel: [  398.441612] RIP  [<ffffffff815c6b12>] cpufreq_governor_dbs+0x52/0x6f0
Jul 16 09:47:39 basti kernel: [  398.441613]  RSP <ffff8800b3081c18>
Jul 16 09:47:39 basti kernel: [  398.441613] CR2: 0000000000000000
Jul 16 09:47:39 basti kernel: [  398.441615] ---[ end trace 9a9b0afb92b8c420 ]---
Jul 16 09:47:39 basti kernel: [  398.444572] general protection fault: 0000 [#2] SMP 
Jul 16 09:47:39 basti kernel: [  398.444628] Modules linked in: vtsspp(OF) sep3_15(OF) pax(OF) apwr3_1(OF) nfsv3 rfcomm bnep bluetooth binfmt_misc nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache snd_hda_codec_hdmi snd_hda_codec_conexant ppdev gpio_ich intel_rapl x86_pkg
_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd parport_pc serio_raw snd_hda_intel snd_hda_codec i915 snd_hwdep lpc_ich snd_pcm video drm_kms_helper tpm_infineon drm snd_page_alloc mei_me snd_timer mei snd i2c_algo_bit soundcore mac_hid lp parport e1000e ahci psmouse libahci ptp pps_core
Jul 16 09:47:39 basti kernel: [  398.444629] CPU: 4 PID: 126 Comm: kworker/4:1 Tainted: GF     D W  O 3.13.0-27-generic #50-Ubuntu
Jul 16 09:47:39 basti kernel: [  398.444630] Hardware name: FUJITSU ESPRIMO P700/D3061-A1, BIOS V4.6.4.0 R1.12.0 for D3061-A1x 07/04/2011
Jul 16 09:47:39 basti kernel: [  398.444633] Workqueue: events od_dbs_timer
Jul 16 09:47:39 basti kernel: [  398.444634] task: ffff88022e5a97f0 ti: ffff88022e69e000 task.ti: ffff88022e69e000
Jul 16 09:47:39 basti kernel: [  398.444636] RIP: 0010:[<ffffffff815c5957>]  [<ffffffff815c5957>] od_dbs_timer+0x57/0x160
Jul 16 09:47:39 basti kernel: [  398.444636] RSP: 0000:ffff88022e69fde8  EFLAGS: 00010246
Jul 16 09:47:39 basti kernel: [  398.444637] RAX: ffff88022e5a97f0 RBX: ffff88023e310e20 RCX: 0000000000000004
Jul 16 09:47:39 basti kernel: [  398.444637] RDX: 0000000000000004 RSI: 00000000170e170c RDI: ffff88023e310ec8
Jul 16 09:47:39 basti kernel: [  398.444638] RBP: ffff88022e69fe20 R08: 2008f8c439200000 R09: 7240000000000000
Jul 16 09:47:39 basti kernel: [  398.444638] R10: dff68f3e05110e48 R11: 0000000000000004 R12: 0000000000000000
Jul 16 09:47:39 basti kernel: [  398.444639] R13: ffff880231144b80 R14: ffff88023e310e48 R15: dead000000100100
Jul 16 09:47:39 basti kernel: [  398.444640] FS:  0000000000000000(0000) GS:ffff88023e300000(0000) knlGS:0000000000000000
Jul 16 09:47:39 basti kernel: [  398.444640] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 16 09:47:39 basti kernel: [  398.444641] CR2: 00002b8959f3f1f0 CR3: 000000022e68f000 CR4: 00000000000407e0
Jul 16 09:47:39 basti kernel: [  398.444641] Stack:
Jul 16 09:47:39 basti kernel: [  398.444643]  000000042e61f700 ffff88023e310ec8 ffff88022eb3c800 ffff88023e313cc0
Jul 16 09:47:39 basti kernel: [  398.444644]  ffff88023e310e48 0000000000000000 0000000000000100 ffff88022e69fe68
Jul 16 09:47:39 basti kernel: [  398.444645]  ffffffff810838a2 000000003e313cd8 ffff88023e317e00 ffff88023e313cd8
Jul 16 09:47:39 basti kernel: [  398.444645] Call Trace:
Jul 16 09:47:39 basti kernel: [  398.444649]  [<ffffffff810838a2>] process_one_work+0x182/0x450
Jul 16 09:47:39 basti kernel: [  398.444651]  [<ffffffff81084641>] worker_thread+0x121/0x410
Jul 16 09:47:39 basti kernel: [  398.444652]  [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0
Jul 16 09:47:39 basti kernel: [  398.444654]  [<ffffffff8108b312>] kthread+0xd2/0xf0
Jul 16 09:47:39 basti kernel: [  398.444655]  [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
Jul 16 09:47:39 basti kernel: [  398.444658]  [<ffffffff8172a2fc>] ret_from_fork+0x7c/0xb0
Jul 16 09:47:39 basti kernel: [  398.444659]  [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
Jul 16 09:47:39 basti kernel: [  398.444668] Code: d1 4d 8b 7d 10 89 55 cc 48 03 1c cd 60 28 d1 81 48 8d 83 a8 00 00 00 44 0f b6 a3 f0 00 00 00 48 89 c7 48 89 45 d0 e8 99 a8 15 00 <41> 8b 77 04 48 89 df 41 83 e4 01 e8 19 10 00 00 84 c0 8b 55 cc 
Jul 16 09:47:39 basti kernel: [  398.444670] RIP  [<ffffffff815c5957>] od_dbs_timer+0x57/0x160
Jul 16 09:47:39 basti kernel: [  398.444670]  RSP <ffff88022e69fde8>
Jul 16 09:47:39 basti kernel: [  398.444671] ---[ end trace 9a9b0afb92b8c421 ]---
Jul 16 09:47:39 basti kernel: [  398.444703] BUG: unable to handle kernel paging request at ffffffffffffffd8
Jul 16 09:47:39 basti kernel: [  398.444706] IP: [<ffffffff8108b9b0>] kthread_data+0x10/0x20
Jul 16 09:47:39 basti kernel: [  398.444710] PGD 1c11067 PUD 1c13067 PMD 0 
Jul 16 09:47:39 basti kernel: [  398.444713] Oops: 0000 [#3] SMP 
Jul 16 09:47:39 basti kernel: [  398.444725] Modules linked in: vtsspp(OF) sep3_15(OF) pax(OF) apwr3_1(OF) nfsv3 rfcomm bnep bluetooth binfmt_misc nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache snd_hda_codec_hdmi snd_hda_codec_conexant ppdev gpio_ich intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd parport_pc serio_raw snd_hda_intel snd_hda_codec i915 snd_hwdep lpc_ich snd_pcm video drm_kms_helper tpm_infineon drm snd_page_alloc mei_me snd_timer mei snd i2c_algo_bit soundcore mac_hid lp parport e1000e ahci psmouse libahci ptp pps_core
Jul 16 09:47:39 basti kernel: [  398.444726] CPU: 4 PID: 126 Comm: kworker/4:1 Tainted: GF     D W  O 3.13.0-27-generic #50-Ubuntu
Jul 16 09:47:39 basti kernel: [  398.444727] Hardware name: FUJITSU ESPRIMO P700/D3061-A1, BIOS V4.6.4.0 R1.12.0 for D3061-A1x 07/04/2011
Jul 16 09:47:39 basti kernel: [  398.444732] task: ffff88022e5a97f0 ti: ffff88022e69e000 task.ti: ffff88022e69e000
Jul 16 09:47:39 basti kernel: [  398.444733] RIP: 0010:[<ffffffff8108b9b0>]  [<ffffffff8108b9b0>] kthread_data+0x10/0x20
Jul 16 09:47:39 basti kernel: [  398.444734] RSP: 0000:ffff88022e69fba0  EFLAGS: 00010002
Jul 16 09:47:39 basti kernel: [  398.444734] RAX: 0000000000000000 RBX: 0000000000000004 RCX: 000000000000000d
Jul 16 09:47:39 basti kernel: [  398.444735] RDX: 0000000000000005 RSI: 0000000000000004 RDI: ffff88022e5a97f0
Jul 16 09:47:39 basti kernel: [  398.444735] RBP: ffff88022e69fba0 R08: 0000000000000000 R09: 0000000000000000
Jul 16 09:47:39 basti kernel: [  398.444736] R10: ffffffff8106518c R11: ffffea0008c99800 R12: ffff88023e314440
Jul 16 09:47:39 basti kernel: [  398.444736] R13: 0000000000000004 R14: ffff88022e5a97e0 R15: ffff88022e5a97f0
Jul 16 09:47:39 basti kernel: [  398.444738] FS:  0000000000000000(0000) GS:ffff88023e300000(0000) knlGS:0000000000000000
Jul 16 09:47:39 basti kernel: [  398.444740] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 16 09:47:39 basti kernel: [  398.444741] CR2: 0000000000000028 CR3: 000000022e68f000 CR4: 00000000000407e0
Jul 16 09:47:39 basti kernel: [  398.444742] Stack:
Jul 16 09:47:39 basti kernel: [  398.444748]  ffff88022e69fbb8 ffffffff81084d51 ffff88022e5a97f0 ffff88022e69fc18
Jul 16 09:47:39 basti kernel: [  398.444753]  ffffffff8171db79 ffff88022e5a97f0 ffff88022e69ffd8 0000000000014440
Jul 16 09:47:39 basti kernel: [  398.444757]  0000000000014440 ffff88022e5a97f0 ffff88022e5a9e28 ffff88022e5a97e0
Jul 16 09:47:39 basti kernel: [  398.444757] Call Trace:
Jul 16 09:47:39 basti kernel: [  398.444759]  [<ffffffff81084d51>] wq_worker_sleeping+0x11/0x90
Jul 16 09:47:39 basti kernel: [  398.444761]  [<ffffffff8171db79>] __schedule+0x589/0x7d0
Jul 16 09:47:39 basti kernel: [  398.444762]  [<ffffffff8171dde9>] schedule+0x29/0x70
Jul 16 09:47:39 basti kernel: [  398.444764]  [<ffffffff8106a02f>] do_exit+0x6df/0xa50
Jul 16 09:47:39 basti kernel: [  398.444765]  [<ffffffff81722e79>] oops_end+0xa9/0x150
Jul 16 09:47:39 basti kernel: [  398.444767]  [<ffffffff810171cb>] die+0x4b/0x70
Jul 16 09:47:39 basti kernel: [  398.444768]  [<ffffffff8172280e>] do_general_protection+0x11e/0x1b0
Jul 16 09:47:39 basti kernel: [  398.444770]  [<ffffffff81722128>] general_protection+0x28/0x30
Jul 16 09:47:39 basti kernel: [  398.444773]  [<ffffffff815c5957>] ? od_dbs_timer+0x57/0x160
Jul 16 09:47:39 basti kernel: [  398.444776]  [<ffffffff815c5957>] ? od_dbs_timer+0x57/0x160
Jul 16 09:47:39 basti kernel: [  398.444779]  [<ffffffff810838a2>] process_one_work+0x182/0x450
Jul 16 09:47:39 basti kernel: [  398.444781]  [<ffffffff81084641>] worker_thread+0x121/0x410
Jul 16 09:47:39 basti kernel: [  398.444784]  [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0
Jul 16 09:47:39 basti kernel: [  398.444787]  [<ffffffff8108b312>] kthread+0xd2/0xf0
Jul 16 09:47:39 basti kernel: [  398.444790]  [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
Jul 16 09:47:39 basti kernel: [  398.444793]  [<ffffffff8172a2fc>] ret_from_fork+0x7c/0xb0
Jul 16 09:47:39 basti kernel: [  398.444795]  [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
Jul 16 09:47:39 basti kernel: [  398.444806] Code: 00 48 89 e5 5d 48 8b 40 c8 48 c1 e8 02 83 e0 01 c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 87 a8 03 00 00 55 48 89 e5 <48> 8b 40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 
Jul 16 09:47:39 basti kernel: [  398.444807]  RSP <ffff88022e69fba0>
Jul 16 09:47:39 basti kernel: [  398.444808] CR2: ffffffffffffffd8
Jul 16 09:47:39 basti kernel: [  398.444808] ---[ end trace 9a9b0afb92b8c422 ]---
Jul 16 09:47:39 basti kernel: [  398.444808] Fixing recursive fault but reboot is needed!

6. Two small shell script to trigger the bug (on an 8 CPU machine)

crash_governor.sh:
#!/bin/sh
# this is called concurrently via runme.sh
for I in `seq 1000`
do
	echo ondemand | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
	echo userspace | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
done

runme.sh:
#!/bin/sh
# run 8 concurrent instances
for I in `seq 8`
do
	./crash_governor.sh &
done

Just run runme.sh and crash your system :)

7. Environment
7.1. ver_linux

Linux basti 3.13.0-27-generic #50-Ubuntu SMP Thu May 15 18:06:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
 
Gnu C                  4.8
Gnu make               3.81
binutils               2.24
util-linux             2.20.1
mount                  support
module-init-tools      15
e2fsprogs              1.42.9
Linux C Library        2.19
Dynamic linker (ldd)   2.19
Procps                 3.3.9
Net-tools              1.60
Kbd                    1.15.5
Sh-utils               8.21
wireless-tools         30
Modules Loaded         sep3_15 pax apwr3_1 nfsv3 rfcomm bnep bluetooth binfmt_misc nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache snd_hda_codec_hdmi snd_hda_codec_conexant gpio_ich intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp ppdev kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd parport_pc snd_hda_intel snd_hda_codec serio_raw i915 snd_hwdep snd_pcm video snd_page_alloc tpm_infineon drm_kms_helper snd_timer drm snd lpc_ich soundcore mei_me mac_hid mei i2c_algo_bit lp parport e1000e psmouse ahci ptp libahci pps_core

7.2. /proc/cpuinfo (first out of 8 CPUs (4 cores plus hyper threading))
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
stepping	: 7
microcode	: 0x18
cpu MHz		: 1600.000
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 6782.74
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

7.3. /proc/modules
sep3_15 517400 0 - Live 0x0000000000000000 (OF)
pax 13181 0 - Live 0x0000000000000000 (OF)
apwr3_1 56811 0 - Live 0x0000000000000000 (OF)
nfsv3 39326 1 - Live 0x0000000000000000
rfcomm 69160 0 - Live 0x0000000000000000
bnep 19624 2 - Live 0x0000000000000000
bluetooth 395423 10 rfcomm,bnep, Live 0x0000000000000000
binfmt_misc 17468 1 - Live 0x0000000000000000
nfsd 280297 2 - Live 0x0000000000000000
auth_rpcgss 59338 1 nfsd, Live 0x0000000000000000
nfs_acl 12837 2 nfsv3,nfsd, Live 0x0000000000000000
nfs 236636 2 nfsv3, Live 0x0000000000000000
lockd 93977 3 nfsv3,nfsd,nfs, Live 0x0000000000000000
sunrpc 284404 21 nfsv3,nfsd,auth_rpcgss,nfs_acl,nfs,lockd, Live 0x0000000000000000
fscache 63988 1 nfs, Live 0x0000000000000000
snd_hda_codec_hdmi 46207 1 - Live 0x0000000000000000
snd_hda_codec_conexant 57441 1 - Live 0x0000000000000000
gpio_ich 13476 0 - Live 0x0000000000000000
intel_rapl 18773 0 - Live 0x0000000000000000
x86_pkg_temp_thermal 14205 0 - Live 0x0000000000000000
intel_powerclamp 14705 0 - Live 0x0000000000000000
coretemp 13435 0 - Live 0x0000000000000000
ppdev 17671 0 - Live 0x0000000000000000
kvm_intel 143060 0 - Live 0x0000000000000000
kvm 451511 1 kvm_intel, Live 0x0000000000000000
crct10dif_pclmul 14289 0 - Live 0x0000000000000000
crc32_pclmul 13113 0 - Live 0x0000000000000000
ghash_clmulni_intel 13216 0 - Live 0x0000000000000000
aesni_intel 55624 0 - Live 0x0000000000000000
aes_x86_64 17131 1 aesni_intel, Live 0x0000000000000000
lrw 13286 1 aesni_intel, Live 0x0000000000000000
gf128mul 14951 1 lrw, Live 0x0000000000000000
glue_helper 13990 1 aesni_intel, Live 0x0000000000000000
ablk_helper 13597 1 aesni_intel, Live 0x0000000000000000
cryptd 20359 3 ghash_clmulni_intel,aesni_intel,ablk_helper, Live 0x0000000000000000
parport_pc 32701 1 - Live 0x0000000000000000
snd_hda_intel 52355 0 - Live 0x0000000000000000
snd_hda_codec 192906 3 snd_hda_codec_hdmi,snd_hda_codec_conexant,snd_hda_intel, Live 0x0000000000000000
serio_raw 13462 0 - Live 0x0000000000000000
i915 783485 1 - Live 0x0000000000000000
snd_hwdep 13602 1 snd_hda_codec, Live 0x0000000000000000
snd_pcm 102099 3 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec, Live 0x0000000000000000
video 19476 1 i915, Live 0x0000000000000000
snd_page_alloc 18710 2 snd_hda_intel,snd_pcm, Live 0x0000000000000000
tpm_infineon 17372 0 - Live 0x0000000000000000
drm_kms_helper 52758 1 i915, Live 0x0000000000000000
snd_timer 29482 1 snd_pcm, Live 0x0000000000000000
drm 302817 2 i915,drm_kms_helper, Live 0x0000000000000000
snd 69238 7 snd_hda_codec_hdmi,snd_hda_codec_conexant,snd_hda_intel,snd_hda_codec,snd_hwdep,snd_pcm,snd_timer, Live 0x0000000000000000
lpc_ich 21080 0 - Live 0x0000000000000000
soundcore 12680 1 snd, Live 0x0000000000000000
mei_me 18627 0 - Live 0x0000000000000000
mac_hid 13205 0 - Live 0x0000000000000000
mei 82274 1 mei_me, Live 0x0000000000000000
i2c_algo_bit 13413 1 i915, Live 0x0000000000000000
lp 17759 0 - Live 0x0000000000000000
parport 42348 3 ppdev,parport_pc,lp, Live 0x0000000000000000
e1000e 254433 0 - Live 0x0000000000000000
psmouse 102222 0 - Live 0x0000000000000000
ahci 25819 2 - Live 0x0000000000000000
ptp 18933 1 e1000e, Live 0x0000000000000000
libahci 32168 1 ahci, Live 0x0000000000000000
pps_core 19382 1 ptp, Live 0x0000000000000000


-- 

Dipl.-Inf. Robert Schoene
Computer Scientist - R&D Energy Efficient Computing

Technische Universitaet Dresden
Center for Information Services and High Performance Computing
Distributed and Data Intensive Computing
01062 Dresden
Tel.: +49 (351) 463-42483
Fax : +49 (351) 463-37773
E-Mail: Robert.Schoene@tu-dresden.de


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6357 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* PROBLEM: Kernel OOPS and possible system freeze after concurrent writing to cpufreq/scaling_governor (Resend)
  2014-07-16 14:53 PROBLEM: Kernel OOPS and possible system freeze after concurrent writing to cpufreq/scaling_governor Robert Schöne
@ 2014-07-24  7:11 ` Robert Schöne
  2014-07-24  9:42   ` Viresh Kumar
  0 siblings, 1 reply; 13+ messages in thread
From: Robert Schöne @ 2014-07-24  7:11 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Viresh Kumar, linux-pm

(Resend, because there hasn't been a reply within the last week)

1. Summary:
When two or more processes concurrently activate the ondemand governor, Linux might crash.

2. Problem:
When I write concurrently to the cpufreq sysfs scaling_governor files, the kernel gives me first a warning and as a follow-up multiple oops'. Afterwards several systems within the kernel fail and I have to reboot the system. While the bug report is based on the current Ubuntu kernel (3.13.0-27-generic), it also appears on the latest mainline (3.16-rc5). However, I only managed to get the kernel log from the older Ubuntu kernel.


3. Keywords: cpufreq, governor, policy, ondemand

4. Kernel Version
Linux version 3.13.0-27-generic (buildd@akateko) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #50-Ubuntu SMP Thu May 15 18:06:16 UTC 2014

5. Warning message, followed by OOPS messages:

Jul 16 09:47:39 basti kernel: [  398.441455] ------------[ cut here ]------------
Jul 16 09:47:39 basti kernel: [  398.441462] WARNING: CPU: 5 PID: 4263 at /build/buildd/linux-3.13.0/drivers/cpufreq/cpufreq_governor.c:203 cpufreq_governor_dbs+0x682/0x6f0()
Jul 16 09:47:39 basti kernel: [  398.441494] Modules linked in: vtsspp(OF) sep3_15(OF) pax(OF) apwr3_1(OF) nfsv3 rfcomm bnep bluetooth binfmt_misc nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache snd_hda_codec_hdmi snd_hda_codec_conexant ppdev gpio_ich intel_rapl x86_pkg
_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd parport_pc serio_raw snd_hda_intel snd_hda_codec i915 snd_hwdep lpc_ich snd_pcm video drm_kms_helper 
tpm_infineon drm snd_page_alloc mei_me snd_timer mei snd i2c_algo_bit soundcore mac_hid lp parport e1000e ahci psmouse libahci ptp pps_core
Jul 16 09:47:39 basti kernel: [  398.441496] CPU: 5 PID: 4263 Comm: tee Tainted: GF          O 3.13.0-27-generic #50-Ubuntu
Jul 16 09:47:39 basti kernel: [  398.441497] Hardware name: FUJITSU ESPRIMO P700/D3061-A1, BIOS V4.6.4.0 R1.12.0 for D3061-A1x 07/04/2011
Jul 16 09:47:39 basti kernel: [  398.441500]  0000000000000009 ffff8800b3081bc0 ffffffff817199c4 0000000000000000
Jul 16 09:47:39 basti kernel: [  398.441502]  ffff8800b3081bf8 ffffffff810676bd 0000000000000000 ffff88022ebb4e00
Jul 16 09:47:39 basti kernel: [  398.441504]  0000000000000004 0000000000000002 ffffffff81cd3ae0 ffff8800b3081c08
Jul 16 09:47:39 basti kernel: [  398.441504] Call Trace:
Jul 16 09:47:39 basti kernel: [  398.441509]  [<ffffffff817199c4>] dump_stack+0x45/0x56
Jul 16 09:47:39 basti kernel: [  398.441512]  [<ffffffff810676bd>] warn_slowpath_common+0x7d/0xa0
Jul 16 09:47:39 basti kernel: [  398.441513]  [<ffffffff8106779a>] warn_slowpath_null+0x1a/0x20
Jul 16 09:47:39 basti kernel: [  398.441515]  [<ffffffff815c7142>] cpufreq_governor_dbs+0x682/0x6f0
Jul 16 09:47:39 basti kernel: [  398.441518]  [<ffffffff81725ebc>] ? notifier_call_chain+0x4c/0x70
Jul 16 09:47:39 basti kernel: [  398.441520]  [<ffffffff815c4fc7>] od_cpufreq_governor_dbs+0x17/0x20
Jul 16 09:47:39 basti kernel: [  398.441522]  [<ffffffff815c10cd>] __cpufreq_governor+0xfd/0x230
Jul 16 09:47:39 basti kernel: [  398.441524]  [<ffffffff815c1349>] cpufreq_set_policy+0x149/0x2e0
Jul 16 09:47:39 basti kernel: [  398.441526]  [<ffffffff815c28bd>] store_scaling_governor+0xad/0xf0
Jul 16 09:47:39 basti kernel: [  398.441527]  [<ffffffff815c2260>] ? cpufreq_update_policy+0x170/0x170
Jul 16 09:47:39 basti kernel: [  398.441529]  [<ffffffff815c1a19>] store+0x79/0xc0
Jul 16 09:47:39 basti kernel: [  398.441532]  [<ffffffff812325b8>] sysfs_write_file+0x128/0x1c0
Jul 16 09:47:39 basti kernel: [  398.441534]  [<ffffffff811bc664>] vfs_write+0xb4/0x1f0
Jul 16 09:47:39 basti kernel: [  398.441536]  [<ffffffff811bd099>] SyS_write+0x49/0xa0
Jul 16 09:47:39 basti kernel: [  398.441539]  [<ffffffff8172a5bf>] tracesys+0xe1/0xe6
Jul 16 09:47:39 basti kernel: [  398.441540] ---[ end trace 9a9b0afb92b8c41f ]---
Jul 16 09:47:39 basti kernel: [  398.441545] BUG: unable to handle kernel NULL pointer dereference at           (null)
Jul 16 09:47:39 basti kernel: [  398.441547] IP: [<ffffffff815c6b12>] cpufreq_governor_dbs+0x52/0x6f0
Jul 16 09:47:39 basti kernel: [  398.441549] PGD b39cc067 PUD b30cb067 PMD 0 
Jul 16 09:47:39 basti kernel: [  398.441550] Oops: 0000 [#1] SMP 
Jul 16 09:47:39 basti kernel: [  398.441571] Modules linked in: vtsspp(OF) sep3_15(OF) pax(OF) apwr3_1(OF) nfsv3 rfcomm bnep bluetooth binfmt_misc nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache snd_hda_codec_hdmi snd_hda_codec_conexant ppdev gpio_ich intel_rapl x86_pkg
_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd parport_pc serio_raw snd_hda_intel snd_hda_codec i915 snd_hwdep lpc_ich snd_pcm video drm_kms_helper 
tpm_infineon drm snd_page_alloc mei_me snd_timer mei snd i2c_algo_bit soundcore mac_hid lp parport e1000e ahci psmouse libahci ptp pps_core
Jul 16 09:47:39 basti kernel: [  398.441573] CPU: 5 PID: 4263 Comm: tee Tainted: GF       W  O 3.13.0-27-generic #50-Ubuntu
Jul 16 09:47:39 basti kernel: [  398.441573] Hardware name: FUJITSU ESPRIMO P700/D3061-A1, BIOS V4.6.4.0 R1.12.0 for D3061-A1x 07/04/2011
Jul 16 09:47:39 basti kernel: [  398.441574] task: ffff88022e5f17f0 ti: ffff8800b3080000 task.ti: ffff8800b3080000
Jul 16 09:47:39 basti kernel: [  398.441576] RIP: 0010:[<ffffffff815c6b12>]  [<ffffffff815c6b12>] cpufreq_governor_dbs+0x52/0x6f0
Jul 16 09:47:39 basti kernel: [  398.441577] RSP: 0018:ffff8800b3081c18  EFLAGS: 00010293
Jul 16 09:47:39 basti kernel: [  398.441577] RAX: 0000000000000024 RBX: 0000000000000000 RCX: 00000000000096aa
Jul 16 09:47:39 basti kernel: [  398.441578] RDX: 0000000096aa96aa RSI: 0000000000000000 RDI: 0000000000000009
Jul 16 09:47:39 basti kernel: [  398.441579] RBP: ffff8800b3081c88 R08: 0000000000000082 R09: ffffffff81ecdd30
Jul 16 09:47:39 basti kernel: [  398.441579] R10: 000000000002f8a0 R11: 0000000000040000 R12: ffff88022ebb4e00
Jul 16 09:47:39 basti kernel: [  398.441580] R13: 0000000000000004 R14: 0000000000000002 R15: ffffffff81cd3ae0
Jul 16 09:47:39 basti kernel: [  398.441580] FS:  00002b53b77acb80(0000) GS:ffff88023e340000(0000) knlGS:0000000000000000
Jul 16 09:47:39 basti kernel: [  398.441581] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 16 09:47:39 basti kernel: [  398.441582] CR2: 0000000000000000 CR3: 00000000b395c000 CR4: 00000000000407e0
Jul 16 09:47:39 basti kernel: [  398.441582] Stack:
Jul 16 09:47:39 basti kernel: [  398.441583]  0000000000000000 0000000000000002 ffff8800b3081d00 0000000000000000
Jul 16 09:47:39 basti kernel: [  398.441584]  ffff8800b3081c70 ffffffff81725ebc ffffffff81cd3520 0000000000000000
Jul 16 09:47:39 basti kernel: [  398.441586]  0000000000000002 ffff88022ebb4e00 0000000000000002 ffffffff81cd3b40
Jul 16 09:47:39 basti kernel: [  398.441586] Call Trace:
Jul 16 09:47:39 basti kernel: [  398.441587]  [<ffffffff81725ebc>] ? notifier_call_chain+0x4c/0x70
Jul 16 09:47:39 basti kernel: [  398.441589]  [<ffffffff815c4fc7>] od_cpufreq_governor_dbs+0x17/0x20
Jul 16 09:47:39 basti kernel: [  398.441590]  [<ffffffff815c10cd>] __cpufreq_governor+0xfd/0x230
Jul 16 09:47:39 basti kernel: [  398.441591]  [<ffffffff815c1349>] cpufreq_set_policy+0x149/0x2e0
Jul 16 09:47:39 basti kernel: [  398.441592]  [<ffffffff815c28bd>] store_scaling_governor+0xad/0xf0
Jul 16 09:47:39 basti kernel: [  398.441593]  [<ffffffff815c2260>] ? cpufreq_update_policy+0x170/0x170
Jul 16 09:47:39 basti kernel: [  398.441594]  [<ffffffff815c1a19>] store+0x79/0xc0
Jul 16 09:47:39 basti kernel: [  398.441595]  [<ffffffff812325b8>] sysfs_write_file+0x128/0x1c0
Jul 16 09:47:39 basti kernel: [  398.441597]  [<ffffffff811bc664>] vfs_write+0xb4/0x1f0
Jul 16 09:47:39 basti kernel: [  398.441600]  [<ffffffff811bd099>] SyS_write+0x49/0xa0
Jul 16 09:47:39 basti kernel: [  398.441601]  [<ffffffff8172a5bf>] tracesys+0xe1/0xe6
Jul 16 09:47:39 basti kernel: [  398.441611] Code: ff 84 c0 0f 84 40 02 00 00 49 8b 5c 24 70 48 85 db 0f 84 29 06 00 00 41 83 fe 04 0f 84 60 02 00 00 41 83 fe 05 0f 84 2e 02 00 00 <48> 8b 03 44 89 ef ff 50 20 48 89 45 c0 48 8b 03 83 38 01 0f 84 
Jul 16 09:47:39 basti kernel: [  398.441612] RIP  [<ffffffff815c6b12>] cpufreq_governor_dbs+0x52/0x6f0
Jul 16 09:47:39 basti kernel: [  398.441613]  RSP <ffff8800b3081c18>
Jul 16 09:47:39 basti kernel: [  398.441613] CR2: 0000000000000000
Jul 16 09:47:39 basti kernel: [  398.441615] ---[ end trace 9a9b0afb92b8c420 ]---
Jul 16 09:47:39 basti kernel: [  398.444572] general protection fault: 0000 [#2] SMP 
Jul 16 09:47:39 basti kernel: [  398.444628] Modules linked in: vtsspp(OF) sep3_15(OF) pax(OF) apwr3_1(OF) nfsv3 rfcomm bnep bluetooth binfmt_misc nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache snd_hda_codec_hdmi snd_hda_codec_conexant ppdev gpio_ich intel_rapl x86_pkg
_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd parport_pc serio_raw snd_hda_intel snd_hda_codec i915 snd_hwdep lpc_ich snd_pcm video drm_kms_helper tpm_infineon drm snd_page_alloc mei_me snd_timer mei snd i2c_algo_bit soundcore mac_hid lp parport e1000e ahci psmouse libahci ptp pps_core
Jul 16 09:47:39 basti kernel: [  398.444629] CPU: 4 PID: 126 Comm: kworker/4:1 Tainted: GF     D W  O 3.13.0-27-generic #50-Ubuntu
Jul 16 09:47:39 basti kernel: [  398.444630] Hardware name: FUJITSU ESPRIMO P700/D3061-A1, BIOS V4.6.4.0 R1.12.0 for D3061-A1x 07/04/2011
Jul 16 09:47:39 basti kernel: [  398.444633] Workqueue: events od_dbs_timer
Jul 16 09:47:39 basti kernel: [  398.444634] task: ffff88022e5a97f0 ti: ffff88022e69e000 task.ti: ffff88022e69e000
Jul 16 09:47:39 basti kernel: [  398.444636] RIP: 0010:[<ffffffff815c5957>]  [<ffffffff815c5957>] od_dbs_timer+0x57/0x160
Jul 16 09:47:39 basti kernel: [  398.444636] RSP: 0000:ffff88022e69fde8  EFLAGS: 00010246
Jul 16 09:47:39 basti kernel: [  398.444637] RAX: ffff88022e5a97f0 RBX: ffff88023e310e20 RCX: 0000000000000004
Jul 16 09:47:39 basti kernel: [  398.444637] RDX: 0000000000000004 RSI: 00000000170e170c RDI: ffff88023e310ec8
Jul 16 09:47:39 basti kernel: [  398.444638] RBP: ffff88022e69fe20 R08: 2008f8c439200000 R09: 7240000000000000
Jul 16 09:47:39 basti kernel: [  398.444638] R10: dff68f3e05110e48 R11: 0000000000000004 R12: 0000000000000000
Jul 16 09:47:39 basti kernel: [  398.444639] R13: ffff880231144b80 R14: ffff88023e310e48 R15: dead000000100100
Jul 16 09:47:39 basti kernel: [  398.444640] FS:  0000000000000000(0000) GS:ffff88023e300000(0000) knlGS:0000000000000000
Jul 16 09:47:39 basti kernel: [  398.444640] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 16 09:47:39 basti kernel: [  398.444641] CR2: 00002b8959f3f1f0 CR3: 000000022e68f000 CR4: 00000000000407e0
Jul 16 09:47:39 basti kernel: [  398.444641] Stack:
Jul 16 09:47:39 basti kernel: [  398.444643]  000000042e61f700 ffff88023e310ec8 ffff88022eb3c800 ffff88023e313cc0
Jul 16 09:47:39 basti kernel: [  398.444644]  ffff88023e310e48 0000000000000000 0000000000000100 ffff88022e69fe68
Jul 16 09:47:39 basti kernel: [  398.444645]  ffffffff810838a2 000000003e313cd8 ffff88023e317e00 ffff88023e313cd8
Jul 16 09:47:39 basti kernel: [  398.444645] Call Trace:
Jul 16 09:47:39 basti kernel: [  398.444649]  [<ffffffff810838a2>] process_one_work+0x182/0x450
Jul 16 09:47:39 basti kernel: [  398.444651]  [<ffffffff81084641>] worker_thread+0x121/0x410
Jul 16 09:47:39 basti kernel: [  398.444652]  [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0
Jul 16 09:47:39 basti kernel: [  398.444654]  [<ffffffff8108b312>] kthread+0xd2/0xf0
Jul 16 09:47:39 basti kernel: [  398.444655]  [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
Jul 16 09:47:39 basti kernel: [  398.444658]  [<ffffffff8172a2fc>] ret_from_fork+0x7c/0xb0
Jul 16 09:47:39 basti kernel: [  398.444659]  [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
Jul 16 09:47:39 basti kernel: [  398.444668] Code: d1 4d 8b 7d 10 89 55 cc 48 03 1c cd 60 28 d1 81 48 8d 83 a8 00 00 00 44 0f b6 a3 f0 00 00 00 48 89 c7 48 89 45 d0 e8 99 a8 15 00 <41> 8b 77 04 48 89 df 41 83 e4 01 e8 19 10 00 00 84 c0 8b 55 cc 
Jul 16 09:47:39 basti kernel: [  398.444670] RIP  [<ffffffff815c5957>] od_dbs_timer+0x57/0x160
Jul 16 09:47:39 basti kernel: [  398.444670]  RSP <ffff88022e69fde8>
Jul 16 09:47:39 basti kernel: [  398.444671] ---[ end trace 9a9b0afb92b8c421 ]---
Jul 16 09:47:39 basti kernel: [  398.444703] BUG: unable to handle kernel paging request at ffffffffffffffd8
Jul 16 09:47:39 basti kernel: [  398.444706] IP: [<ffffffff8108b9b0>] kthread_data+0x10/0x20
Jul 16 09:47:39 basti kernel: [  398.444710] PGD 1c11067 PUD 1c13067 PMD 0 
Jul 16 09:47:39 basti kernel: [  398.444713] Oops: 0000 [#3] SMP 
Jul 16 09:47:39 basti kernel: [  398.444725] Modules linked in: vtsspp(OF) sep3_15(OF) pax(OF) apwr3_1(OF) nfsv3 rfcomm bnep bluetooth binfmt_misc nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache snd_hda_codec_hdmi snd_hda_codec_conexant ppdev gpio_ich intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd parport_pc serio_raw snd_hda_intel snd_hda_codec i915 snd_hwdep lpc_ich snd_pcm video drm_kms_helper tpm_infineon drm snd_page_alloc mei_me snd_timer mei snd i2c_algo_bit soundcore mac_hid lp parport e1000e ahci psmouse libahci ptp pps_core
Jul 16 09:47:39 basti kernel: [  398.444726] CPU: 4 PID: 126 Comm: kworker/4:1 Tainted: GF     D W  O 3.13.0-27-generic #50-Ubuntu
Jul 16 09:47:39 basti kernel: [  398.444727] Hardware name: FUJITSU ESPRIMO P700/D3061-A1, BIOS V4.6.4.0 R1.12.0 for D3061-A1x 07/04/2011
Jul 16 09:47:39 basti kernel: [  398.444732] task: ffff88022e5a97f0 ti: ffff88022e69e000 task.ti: ffff88022e69e000
Jul 16 09:47:39 basti kernel: [  398.444733] RIP: 0010:[<ffffffff8108b9b0>]  [<ffffffff8108b9b0>] kthread_data+0x10/0x20
Jul 16 09:47:39 basti kernel: [  398.444734] RSP: 0000:ffff88022e69fba0  EFLAGS: 00010002
Jul 16 09:47:39 basti kernel: [  398.444734] RAX: 0000000000000000 RBX: 0000000000000004 RCX: 000000000000000d
Jul 16 09:47:39 basti kernel: [  398.444735] RDX: 0000000000000005 RSI: 0000000000000004 RDI: ffff88022e5a97f0
Jul 16 09:47:39 basti kernel: [  398.444735] RBP: ffff88022e69fba0 R08: 0000000000000000 R09: 0000000000000000
Jul 16 09:47:39 basti kernel: [  398.444736] R10: ffffffff8106518c R11: ffffea0008c99800 R12: ffff88023e314440
Jul 16 09:47:39 basti kernel: [  398.444736] R13: 0000000000000004 R14: ffff88022e5a97e0 R15: ffff88022e5a97f0
Jul 16 09:47:39 basti kernel: [  398.444738] FS:  0000000000000000(0000) GS:ffff88023e300000(0000) knlGS:0000000000000000
Jul 16 09:47:39 basti kernel: [  398.444740] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 16 09:47:39 basti kernel: [  398.444741] CR2: 0000000000000028 CR3: 000000022e68f000 CR4: 00000000000407e0
Jul 16 09:47:39 basti kernel: [  398.444742] Stack:
Jul 16 09:47:39 basti kernel: [  398.444748]  ffff88022e69fbb8 ffffffff81084d51 ffff88022e5a97f0 ffff88022e69fc18
Jul 16 09:47:39 basti kernel: [  398.444753]  ffffffff8171db79 ffff88022e5a97f0 ffff88022e69ffd8 0000000000014440
Jul 16 09:47:39 basti kernel: [  398.444757]  0000000000014440 ffff88022e5a97f0 ffff88022e5a9e28 ffff88022e5a97e0
Jul 16 09:47:39 basti kernel: [  398.444757] Call Trace:
Jul 16 09:47:39 basti kernel: [  398.444759]  [<ffffffff81084d51>] wq_worker_sleeping+0x11/0x90
Jul 16 09:47:39 basti kernel: [  398.444761]  [<ffffffff8171db79>] __schedule+0x589/0x7d0
Jul 16 09:47:39 basti kernel: [  398.444762]  [<ffffffff8171dde9>] schedule+0x29/0x70
Jul 16 09:47:39 basti kernel: [  398.444764]  [<ffffffff8106a02f>] do_exit+0x6df/0xa50
Jul 16 09:47:39 basti kernel: [  398.444765]  [<ffffffff81722e79>] oops_end+0xa9/0x150
Jul 16 09:47:39 basti kernel: [  398.444767]  [<ffffffff810171cb>] die+0x4b/0x70
Jul 16 09:47:39 basti kernel: [  398.444768]  [<ffffffff8172280e>] do_general_protection+0x11e/0x1b0
Jul 16 09:47:39 basti kernel: [  398.444770]  [<ffffffff81722128>] general_protection+0x28/0x30
Jul 16 09:47:39 basti kernel: [  398.444773]  [<ffffffff815c5957>] ? od_dbs_timer+0x57/0x160
Jul 16 09:47:39 basti kernel: [  398.444776]  [<ffffffff815c5957>] ? od_dbs_timer+0x57/0x160
Jul 16 09:47:39 basti kernel: [  398.444779]  [<ffffffff810838a2>] process_one_work+0x182/0x450
Jul 16 09:47:39 basti kernel: [  398.444781]  [<ffffffff81084641>] worker_thread+0x121/0x410
Jul 16 09:47:39 basti kernel: [  398.444784]  [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0
Jul 16 09:47:39 basti kernel: [  398.444787]  [<ffffffff8108b312>] kthread+0xd2/0xf0
Jul 16 09:47:39 basti kernel: [  398.444790]  [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
Jul 16 09:47:39 basti kernel: [  398.444793]  [<ffffffff8172a2fc>] ret_from_fork+0x7c/0xb0
Jul 16 09:47:39 basti kernel: [  398.444795]  [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
Jul 16 09:47:39 basti kernel: [  398.444806] Code: 00 48 89 e5 5d 48 8b 40 c8 48 c1 e8 02 83 e0 01 c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 87 a8 03 00 00 55 48 89 e5 <48> 8b 40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 
Jul 16 09:47:39 basti kernel: [  398.444807]  RSP <ffff88022e69fba0>
Jul 16 09:47:39 basti kernel: [  398.444808] CR2: ffffffffffffffd8
Jul 16 09:47:39 basti kernel: [  398.444808] ---[ end trace 9a9b0afb92b8c422 ]---
Jul 16 09:47:39 basti kernel: [  398.444808] Fixing recursive fault but reboot is needed!

6. Two small shell script to trigger the bug (on an 8 CPU machine)

crash_governor.sh:
#!/bin/sh
# this is called concurrently via runme.sh
for I in `seq 1000`
do
	echo ondemand | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
	echo userspace | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
done

runme.sh:
#!/bin/sh
# run 8 concurrent instances
for I in `seq 8`
do
	./crash_governor.sh &
done

Just run runme.sh and crash your system :)

7. Environment
7.1. ver_linux

Linux basti 3.13.0-27-generic #50-Ubuntu SMP Thu May 15 18:06:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
 
Gnu C                  4.8
Gnu make               3.81
binutils               2.24
util-linux             2.20.1
mount                  support
module-init-tools      15
e2fsprogs              1.42.9
Linux C Library        2.19
Dynamic linker (ldd)   2.19
Procps                 3.3.9
Net-tools              1.60
Kbd                    1.15.5
Sh-utils               8.21
wireless-tools         30
Modules Loaded         sep3_15 pax apwr3_1 nfsv3 rfcomm bnep bluetooth binfmt_misc nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache snd_hda_codec_hdmi snd_hda_codec_conexant gpio_ich intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp ppdev kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd parport_pc snd_hda_intel snd_hda_codec serio_raw i915 snd_hwdep snd_pcm video snd_page_alloc tpm_infineon drm_kms_helper snd_timer drm snd lpc_ich soundcore mei_me mac_hid mei i2c_algo_bit lp parport e1000e psmouse ahci ptp libahci pps_core

7.2. /proc/cpuinfo (first out of 8 CPUs (4 cores plus hyper threading))
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
stepping	: 7
microcode	: 0x18
cpu MHz		: 1600.000
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 6782.74
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

7.3. /proc/modules
sep3_15 517400 0 - Live 0x0000000000000000 (OF)
pax 13181 0 - Live 0x0000000000000000 (OF)
apwr3_1 56811 0 - Live 0x0000000000000000 (OF)
nfsv3 39326 1 - Live 0x0000000000000000
rfcomm 69160 0 - Live 0x0000000000000000
bnep 19624 2 - Live 0x0000000000000000
bluetooth 395423 10 rfcomm,bnep, Live 0x0000000000000000
binfmt_misc 17468 1 - Live 0x0000000000000000
nfsd 280297 2 - Live 0x0000000000000000
auth_rpcgss 59338 1 nfsd, Live 0x0000000000000000
nfs_acl 12837 2 nfsv3,nfsd, Live 0x0000000000000000
nfs 236636 2 nfsv3, Live 0x0000000000000000
lockd 93977 3 nfsv3,nfsd,nfs, Live 0x0000000000000000
sunrpc 284404 21 nfsv3,nfsd,auth_rpcgss,nfs_acl,nfs,lockd, Live 0x0000000000000000
fscache 63988 1 nfs, Live 0x0000000000000000
snd_hda_codec_hdmi 46207 1 - Live 0x0000000000000000
snd_hda_codec_conexant 57441 1 - Live 0x0000000000000000
gpio_ich 13476 0 - Live 0x0000000000000000
intel_rapl 18773 0 - Live 0x0000000000000000
x86_pkg_temp_thermal 14205 0 - Live 0x0000000000000000
intel_powerclamp 14705 0 - Live 0x0000000000000000
coretemp 13435 0 - Live 0x0000000000000000
ppdev 17671 0 - Live 0x0000000000000000
kvm_intel 143060 0 - Live 0x0000000000000000
kvm 451511 1 kvm_intel, Live 0x0000000000000000
crct10dif_pclmul 14289 0 - Live 0x0000000000000000
crc32_pclmul 13113 0 - Live 0x0000000000000000
ghash_clmulni_intel 13216 0 - Live 0x0000000000000000
aesni_intel 55624 0 - Live 0x0000000000000000
aes_x86_64 17131 1 aesni_intel, Live 0x0000000000000000
lrw 13286 1 aesni_intel, Live 0x0000000000000000
gf128mul 14951 1 lrw, Live 0x0000000000000000
glue_helper 13990 1 aesni_intel, Live 0x0000000000000000
ablk_helper 13597 1 aesni_intel, Live 0x0000000000000000
cryptd 20359 3 ghash_clmulni_intel,aesni_intel,ablk_helper, Live 0x0000000000000000
parport_pc 32701 1 - Live 0x0000000000000000
snd_hda_intel 52355 0 - Live 0x0000000000000000
snd_hda_codec 192906 3 snd_hda_codec_hdmi,snd_hda_codec_conexant,snd_hda_intel, Live 0x0000000000000000
serio_raw 13462 0 - Live 0x0000000000000000
i915 783485 1 - Live 0x0000000000000000
snd_hwdep 13602 1 snd_hda_codec, Live 0x0000000000000000
snd_pcm 102099 3 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec, Live 0x0000000000000000
video 19476 1 i915, Live 0x0000000000000000
snd_page_alloc 18710 2 snd_hda_intel,snd_pcm, Live 0x0000000000000000
tpm_infineon 17372 0 - Live 0x0000000000000000
drm_kms_helper 52758 1 i915, Live 0x0000000000000000
snd_timer 29482 1 snd_pcm, Live 0x0000000000000000
drm 302817 2 i915,drm_kms_helper, Live 0x0000000000000000
snd 69238 7 snd_hda_codec_hdmi,snd_hda_codec_conexant,snd_hda_intel,snd_hda_codec,snd_hwdep,snd_pcm,snd_timer, Live 0x0000000000000000
lpc_ich 21080 0 - Live 0x0000000000000000
soundcore 12680 1 snd, Live 0x0000000000000000
mei_me 18627 0 - Live 0x0000000000000000
mac_hid 13205 0 - Live 0x0000000000000000
mei 82274 1 mei_me, Live 0x0000000000000000
i2c_algo_bit 13413 1 i915, Live 0x0000000000000000
lp 17759 0 - Live 0x0000000000000000
parport 42348 3 ppdev,parport_pc,lp, Live 0x0000000000000000
e1000e 254433 0 - Live 0x0000000000000000
psmouse 102222 0 - Live 0x0000000000000000
ahci 25819 2 - Live 0x0000000000000000
ptp 18933 1 e1000e, Live 0x0000000000000000
libahci 32168 1 ahci, Live 0x0000000000000000
pps_core 19382 1 ptp, Live 0x0000000000000000


-- 

Dipl.-Inf. Robert Schoene
Computer Scientist - R&D Energy Efficient Computing

Technische Universitaet Dresden
Center for Information Services and High Performance Computing
Distributed and Data Intensive Computing
01062 Dresden
Tel.: +49 (351) 463-42483
Fax : +49 (351) 463-37773
E-Mail: Robert.Schoene@tu-dresden.de




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: Kernel OOPS and possible system freeze after concurrent writing to cpufreq/scaling_governor (Resend)
  2014-07-24  7:11 ` PROBLEM: Kernel OOPS and possible system freeze after concurrent writing to cpufreq/scaling_governor (Resend) Robert Schöne
@ 2014-07-24  9:42   ` Viresh Kumar
  2014-07-25  8:42     ` Robert Schöne
  0 siblings, 1 reply; 13+ messages in thread
From: Viresh Kumar @ 2014-07-24  9:42 UTC (permalink / raw)
  To: Robert Schöne, Srivatsa S. Bhat; +Cc: Rafael J. Wysocki, linux-pm

On 24 July 2014 12:41, Robert Schöne <robert.schoene@tu-dresden.de> wrote:
> (Resend, because there hasn't been a reply within the last week)

How did I miss it? Yes its in my inbox. Sorry buddy..

> crash_governor.sh:
> #!/bin/sh
> # this is called concurrently via runme.sh
> for I in `seq 1000`
> do
>         echo ondemand | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
>         echo userspace | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
> done
>
> runme.sh:
> #!/bin/sh
> # run 8 concurrent instances
> for I in `seq 8`
> do
>         ./crash_governor.sh &
> done
>
> Just run runme.sh and crash your system :)

Oh, yes. Quite easily I could see that happening :)

I have pushed a fix earlier for similar issues:
19c7630 cpufreq: serialize calls to __cpufreq_governor()

but was later reverted by Srivatsa:
56d07db cpufreq: Remove temporary fix for race between CPU hotplug and
sysfs-writes

because we didn't thought about this usecase.

I propose we get that back. I have tested a revert of 56d07db on
my setup and didn't see any crash..

Please see if that works for you as well..

--
viresh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: Kernel OOPS and possible system freeze after concurrent writing to cpufreq/scaling_governor (Resend)
  2014-07-24  9:42   ` Viresh Kumar
@ 2014-07-25  8:42     ` Robert Schöne
  2014-07-25  9:03       ` Viresh Kumar
  0 siblings, 1 reply; 13+ messages in thread
From: Robert Schöne @ 2014-07-25  8:42 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: Srivatsa S. Bhat, Rafael J. Wysocki, linux-pm

The bug is still there. Here's my bash history, so that you can be sure I checked it correctly:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux-git
cd linux-git/
git show 56d07db 
git branch --contains 19c7630
git branch --contains 56d07db
git revert -n 56d07db
make oldconfig
make clean
make -j9 deb-pkg
cd ..
sudo dpkg -i linux-libc-dev_3.16.0-rc6+-2_amd64.deb
sudo dpkg -i linux-image-3.16.0-rc6+_3.16.0-rc6+-2_amd64.deb
sudo dpkg -i linux-headers-3.16.0-rc6+_3.16.0-rc6+-2_amd64.deb
sudo dpkg -i linux-firmware-image-3.16.0-rc6+_3.16.0-rc6+-2_amd64.deb
sudo update-grub2
sudo reboot

Again, I didn't get a backtrace for the error, but only this:

...
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
...


Robert


Am Donnerstag, den 24.07.2014, 15:12 +0530 schrieb Viresh Kumar:
> On 24 July 2014 12:41, Robert Schöne <robert.schoene@tu-dresden.de> wrote:
> > (Resend, because there hasn't been a reply within the last week)
> 
> How did I miss it? Yes its in my inbox. Sorry buddy..
> 
> > crash_governor.sh:
> > #!/bin/sh
> > # this is called concurrently via runme.sh
> > for I in `seq 1000`
> > do
> >         echo ondemand | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
> >         echo userspace | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
> > done
> >
> > runme.sh:
> > #!/bin/sh
> > # run 8 concurrent instances
> > for I in `seq 8`
> > do
> >         ./crash_governor.sh &
> > done
> >
> > Just run runme.sh and crash your system :)
> 
> Oh, yes. Quite easily I could see that happening :)
> 
> I have pushed a fix earlier for similar issues:
> 19c7630 cpufreq: serialize calls to __cpufreq_governor()
> 
> but was later reverted by Srivatsa:
> 56d07db cpufreq: Remove temporary fix for race between CPU hotplug and
> sysfs-writes
> 
> because we didn't thought about this usecase.
> 
> I propose we get that back. I have tested a revert of 56d07db on
> my setup and didn't see any crash..
> 
> Please see if that works for you as well..
> 
> --
> viresh

-- 

Dipl.-Inf. Robert Schoene
Computer Scientist - R&D Energy Efficient Computing

Technische Universitaet Dresden
Center for Information Services and High Performance Computing
Distributed and Data Intensive Computing
01062 Dresden
Tel.: +49 (351) 463-42483
Fax : +49 (351) 463-37773
E-Mail: Robert.Schoene@tu-dresden.de


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: Kernel OOPS and possible system freeze after concurrent writing to cpufreq/scaling_governor (Resend)
  2014-07-25  8:42     ` Robert Schöne
@ 2014-07-25  9:03       ` Viresh Kumar
  2014-07-25 13:19         ` Robert Schöne
                           ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Viresh Kumar @ 2014-07-25  9:03 UTC (permalink / raw)
  To: Robert Schöne; +Cc: Srivatsa S. Bhat, Rafael J. Wysocki, linux-pm

On 25 July 2014 14:12, Robert Schöne <robert.schoene@tu-dresden.de> wrote:
> The bug is still there. Here's my bash history, so that you can be sure I checked it correctly:
>
> git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux-git
> cd linux-git/
> git show 56d07db
> git branch --contains 19c7630
> git branch --contains 56d07db
> git revert -n 56d07db
> make oldconfig
> make clean
> make -j9 deb-pkg
> cd ..
> sudo dpkg -i linux-libc-dev_3.16.0-rc6+-2_amd64.deb
> sudo dpkg -i linux-image-3.16.0-rc6+_3.16.0-rc6+-2_amd64.deb
> sudo dpkg -i linux-headers-3.16.0-rc6+_3.16.0-rc6+-2_amd64.deb
> sudo dpkg -i linux-firmware-image-3.16.0-rc6+_3.16.0-rc6+-2_amd64.deb
> sudo update-grub2
> sudo reboot
>
> Again, I didn't get a backtrace for the error, but only this:

Not sure if you did exactly what I asked for:

Please run this in your repository:

git revert 56d07db

save-quit the editor window and try again..

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: Kernel OOPS and possible system freeze after concurrent writing to cpufreq/scaling_governor (Resend)
  2014-07-25  9:03       ` Viresh Kumar
@ 2014-07-25 13:19         ` Robert Schöne
  2014-09-08  8:13         ` Robert Schöne
  2014-09-08  8:16         ` Robert Schöne
  2 siblings, 0 replies; 13+ messages in thread
From: Robert Schöne @ 2014-07-25 13:19 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: Srivatsa S. Bhat, Rafael J. Wysocki, linux-pm

[-- Attachment #1: Type: text/plain, Size: 8796 bytes --]


> Please run this in your repository:
> 
> git revert 56d07db
> 
> save-quit the editor window and try again..

I just did so, but I still get a kernel OOPS.
However, I've been able to get the call trace this time :)


Jul 25 14:52:55 basti kernel: [ 2073.816176] ------------[ cut here ]------------
Jul 25 14:52:55 basti kernel: [ 2073.816184] WARNING: CPU: 1 PID: 2458 at drivers/cpufreq/cpufreq_governor.c:261 cpufreq_governor_dbs+0x6d2/0x740()
Jul 25 14:52:55 basti kernel: [ 2073.816186] Modules linked in: nfsv3(E) nfsd(E) bnep(E) auth_rpcgss(E) rfcomm(E) bluetooth(E) nfs_acl(E) nfs(E) lockd(E) binfmt_misc(E) sunrpc(E) fscache(E) intel_rapl(E) i915(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) snd_
hda_codec_hdmi(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) kvm_intel(E) snd_hda_intel(E) kvm(E) video(E) snd_hda_controller(E) drm_kms_helper(E) snd_hda_codec(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) drm(
E) snd(E) aesni_intel(E) aes_x86_64(E) ppdev(E) mei_me(E) gpio_ich(E) parport_pc(E) serio_raw(E) mac_hid(E) lpc_ich(E) lp(E) i2c_algo_bit(E) lrw(E) soundcore(E) mei(E) parport(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) tpm_infineon(E) psmouse(E) ahci(E) e1000e
(E) libahci(E) ptp(E) pps_core(E)
Jul 25 14:52:55 basti kernel: [ 2073.816224] CPU: 1 PID: 2458 Comm: tee Tainted: G           OE 3.16.0-rc6+ #1
Jul 25 14:52:55 basti kernel: [ 2073.816225] Hardware name: FUJITSU ESPRIMO P700/D3061-A1, BIOS V4.6.4.0 R1.12.0 for D3061-A1x 07/04/2011
Jul 25 14:52:55 basti kernel: [ 2073.816226]  0000000000000009 ffff8800ae403b78 ffffffff8173b0bf 0000000000000000
Jul 25 14:52:55 basti kernel: [ 2073.816229]  ffff8800ae403bb0 ffffffff8106c82d 0000000000000000 ffff88022fa27000
Jul 25 14:52:55 basti kernel: [ 2073.816231]  0000000000000005 0000000000000002 ffffffff81cd5d00 ffff8800ae403bc0
Jul 25 14:52:55 basti kernel: [ 2073.816234] Call Trace:
Jul 25 14:52:55 basti kernel: [ 2073.816239]  [<ffffffff8173b0bf>] dump_stack+0x45/0x56
Jul 25 14:52:55 basti kernel: [ 2073.816243]  [<ffffffff8106c82d>] warn_slowpath_common+0x7d/0xa0
Jul 25 14:52:55 basti kernel: [ 2073.816246]  [<ffffffff8106c90a>] warn_slowpath_null+0x1a/0x20
Jul 25 14:52:55 basti kernel: [ 2073.816256]  [<ffffffff815e4a12>] cpufreq_governor_dbs+0x6d2/0x740
Jul 25 14:52:55 basti kernel: [ 2073.816266]  [<ffffffff810941fc>] ? notifier_call_chain+0x4c/0x70
Jul 25 14:52:55 basti kernel: [ 2073.816269]  [<ffffffff815e2757>] od_cpufreq_governor_dbs+0x17/0x20
Jul 25 14:52:55 basti kernel: [ 2073.816272]  [<ffffffff815dea50>] __cpufreq_governor+0xb0/0x2a0
Jul 25 14:52:55 basti kernel: [ 2073.816275]  [<ffffffff815ded8c>] cpufreq_set_policy+0x14c/0x2f0
Jul 25 14:52:55 basti kernel: [ 2073.816277]  [<ffffffff815df796>] store_scaling_governor+0x96/0xf0
Jul 25 14:52:55 basti kernel: [ 2073.816280]  [<ffffffff815df100>] ? cpufreq_update_policy+0x1d0/0x1d0
Jul 25 14:52:55 basti kernel: [ 2073.816284]  [<ffffffff815de3c9>] store+0x79/0xc0
Jul 25 14:52:55 basti kernel: [ 2073.816288]  [<ffffffff81245bed>] sysfs_kf_write+0x3d/0x50
Jul 25 14:52:55 basti kernel: [ 2073.816290]  [<ffffffff81245120>] kernfs_fop_write+0xe0/0x160
Jul 25 14:52:55 basti kernel: [ 2073.816292]  [<ffffffff811d00d7>] vfs_write+0xb7/0x1f0
Jul 25 14:52:55 basti kernel: [ 2073.816294]  [<ffffffff811d0c76>] SyS_write+0x46/0xb0
Jul 25 14:52:55 basti kernel: [ 2073.816296]  [<ffffffff817439ff>] tracesys+0xe1/0xe6
Jul 25 14:52:55 basti kernel: [ 2073.816297] ---[ end trace a2dad7e42b22c796 ]---
Jul 25 14:52:55 basti kernel: [ 2073.816702] BUG: unable to handle kernel NULL pointer dereference at           (null)
Jul 25 14:52:55 basti kernel: [ 2073.816731] IP: [<ffffffff815e4395>] cpufreq_governor_dbs+0x55/0x740
Jul 25 14:52:55 basti kernel: [ 2073.816751] PGD 36a05067 PUD b47df067 PMD 0 
Jul 25 14:52:55 basti kernel: [ 2073.816766] Oops: 0000 [#1] SMP 
Jul 25 14:52:55 basti kernel: [ 2073.816779] Modules linked in: nfsv3(E) nfsd(E) bnep(E) auth_rpcgss(E) rfcomm(E) bluetooth(E) nfs_acl(E) nfs(E) lockd(E) binfmt_misc(E) sunrpc(E) fscache(E) intel_rapl(E) i915(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) snd_hda_codec_hdmi(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) kvm_intel(E) snd_hda_intel(E) kvm(E) video(E) snd_hda_controller(E) drm_kms_helper(E) snd_hda_codec(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) drm(E) snd(E) aesni_intel(E) aes_x86_64(E) ppdev(E) mei_me(E) gpio_ich(E) parport_pc(E) serio_raw(E) mac_hid(E) lpc_ich(E) lp(E) i2c_algo_bit(E) lrw(E) soundcore(E) mei(E) parport(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) tpm_infineon(E) psmouse(E) ahci(E) e1000e(E) libahci(E) ptp(E) pps_core(E)
Jul 25 14:52:55 basti kernel: [ 2073.817023] CPU: 1 PID: 2458 Comm: tee Tainted: G        W  OE 3.16.0-rc6+ #1
Jul 25 14:52:55 basti kernel: [ 2073.817041] Hardware name: FUJITSU ESPRIMO P700/D3061-A1, BIOS V4.6.4.0 R1.12.0 for D3061-A1x 07/04/2011
Jul 25 14:52:55 basti kernel: [ 2073.817065] task: ffff8800b53db240 ti: ffff8800ae400000 task.ti: ffff8800ae400000
Jul 25 14:52:55 basti kernel: [ 2073.817085] RIP: 0010:[<ffffffff815e4395>]  [<ffffffff815e4395>] cpufreq_governor_dbs+0x55/0x740
Jul 25 14:52:55 basti kernel: [ 2073.817110] RSP: 0018:ffff8800ae403bd0  EFLAGS: 00010293
Jul 25 14:52:55 basti kernel: [ 2073.817124] RAX: 0000000000000024 RBX: 0000000000000000 RCX: 0000000000000006
Jul 25 14:52:55 basti kernel: [ 2073.817142] RDX: 0000000000000007 RSI: 0000000000000000 RDI: 0000000000000009
Jul 25 14:52:55 basti kernel: [ 2073.817160] RBP: ffff8800ae403c40 R08: 0000000000000086 R09: 000000000000036e
Jul 25 14:52:55 basti kernel: [ 2073.817178] R10: 0000000000000000 R11: ffff8800ae4038a6 R12: ffff88022fa27000
Jul 25 14:52:55 basti kernel: [ 2073.817196] R13: 0000000000000005 R14: 0000000000000002 R15: ffffffff81cd5d00
Jul 25 14:52:55 basti kernel: [ 2073.817214] FS:  00002b63742dab80(0000) GS:ffff88023e240000(0000) knlGS:0000000000000000
Jul 25 14:52:55 basti kernel: [ 2073.817235] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 25 14:52:55 basti kernel: [ 2073.817250] CR2: 00002ba7f5d7f300 CR3: 00000000b4ce8000 CR4: 00000000000407e0
Jul 25 14:52:55 basti kernel: [ 2073.817268] Stack:
Jul 25 14:52:55 basti kernel: [ 2073.817274]  ffff8800ae403cc8 00000000fffffffd 0000000000000000 0000000000000002
Jul 25 14:52:55 basti kernel: [ 2073.817297]  ffff8800ae403c28 ffffffff810941fc ffffffff81cd5720 0000000000000000
Jul 25 14:52:55 basti kernel: [ 2073.817319]  0000000000000002 ffff88022fa27000 0000000000000002 ffffffff81cd5d60
Jul 25 14:52:55 basti kernel: [ 2073.817342] Call Trace:
Jul 25 14:52:55 basti kernel: [ 2073.817351]  [<ffffffff810941fc>] ? notifier_call_chain+0x4c/0x70
Jul 25 14:52:55 basti kernel: [ 2073.817367]  [<ffffffff815e2757>] od_cpufreq_governor_dbs+0x17/0x20
Jul 25 14:52:55 basti kernel: [ 2073.817384]  [<ffffffff815dea50>] __cpufreq_governor+0xb0/0x2a0
Jul 25 14:52:55 basti kernel: [ 2073.817401]  [<ffffffff815ded8c>] cpufreq_set_policy+0x14c/0x2f0
Jul 25 14:52:55 basti kernel: [ 2073.817417]  [<ffffffff815df796>] store_scaling_governor+0x96/0xf0
Jul 25 14:52:55 basti kernel: [ 2073.817434]  [<ffffffff815df100>] ? cpufreq_update_policy+0x1d0/0x1d0
Jul 25 14:52:55 basti kernel: [ 2073.817452]  [<ffffffff815de3c9>] store+0x79/0xc0
Jul 25 14:52:55 basti kernel: [ 2073.817467]  [<ffffffff81245bed>] sysfs_kf_write+0x3d/0x50
Jul 25 14:52:55 basti kernel: [ 2073.817482]  [<ffffffff81245120>] kernfs_fop_write+0xe0/0x160
Jul 25 14:52:55 basti kernel: [ 2073.817498]  [<ffffffff811d00d7>] vfs_write+0xb7/0x1f0
Jul 25 14:52:55 basti kernel: [ 2073.817513]  [<ffffffff811d0c76>] SyS_write+0x46/0xb0
Jul 25 14:52:55 basti kernel: [ 2073.817527]  [<ffffffff817439ff>] tracesys+0xe1/0xe6
Jul 25 14:52:55 basti kernel: [ 2073.817540] Code: 0f 84 60 02 00 00 49 8b 9c 24 88 00 00 00 48 85 db 0f 84 76 06 00 00 41 83 fe 04 0f 84 85 02 00 00 41 83 fe 05 0f 84 4b 02 00 00 <48> 8b 03 44 89 ef ff 50 20 48 89 45 c0 48 8b 03 83 38 01 0f 84 
Jul 25 14:52:55 basti kernel: [ 2073.817666] RIP  [<ffffffff815e4395>] cpufreq_governor_dbs+0x55/0x740
Jul 25 14:52:55 basti kernel: [ 2073.817685]  RSP <ffff8800ae403bd0>
Jul 25 14:52:55 basti kernel: [ 2073.818467] CR2: 0000000000000000
Jul 25 14:52:55 basti kernel: [ 2073.825744] ---[ end trace a2dad7e42b22c797 ]---




-- 

Dipl.-Inf. Robert Schoene
Computer Scientist - R&D Energy Efficient Computing

Technische Universitaet Dresden
Center for Information Services and High Performance Computing
Distributed and Data Intensive Computing
01062 Dresden
Tel.: +49 (351) 463-42483
Fax : +49 (351) 463-37773
E-Mail: Robert.Schoene@tu-dresden.de

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6357 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: Kernel OOPS and possible system freeze after concurrent writing to cpufreq/scaling_governor (Resend)
  2014-07-25  9:03       ` Viresh Kumar
  2014-07-25 13:19         ` Robert Schöne
@ 2014-09-08  8:13         ` Robert Schöne
  2014-09-08  8:16         ` Robert Schöne
  2 siblings, 0 replies; 13+ messages in thread
From: Robert Schöne @ 2014-09-08  8:13 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: Srivatsa S. Bhat, Rafael J. Wysocki, linux-pm

[-- Attachment #1: Type: text/plain, Size: 1870 bytes --]

The patch you suggested did not work, so I introduced a new mutex in the
patch below.

I am not happy with adding just another mutex, but it fixes my problem
of changing governors concurrently.

Robert



This patch fixes a race condition when concurrently writing to cpufreq/scaling_governor


diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index d9fdedd..2ad6b03 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -572,8 +572,15 @@ static ssize_t show_scaling_governor(struct cpufreq_policy *policy, char *buf)
 }
 
 /**
+ * This mutex guarantees, that a concurrent writing to cpuX/scaling_governor
+ * does not run into an OOPS.
+ */
+static DEFINE_MUTEX(cpufreq_store_governor_lock);
+
+/**
  * store_scaling_governor - store policy for the specified CPU
  */
+
 static ssize_t store_scaling_governor(struct cpufreq_policy *policy,
                                        const char *buf, size_t count)
 {
@@ -593,11 +600,15 @@ static ssize_t store_scaling_governor(struct cpufreq_policy *policy,
                                                &new_policy.governor))
                return -EINVAL;
 
+       if (! mutex_trylock(&cpufreq_store_governor_lock))
+               return -EBUSY;
+
        ret = cpufreq_set_policy(policy, &new_policy);
 
        policy->user_policy.policy = policy->policy;
        policy->user_policy.governor = policy->governor;
 
+       mutex_unlock(&cpufreq_store_governor_lock);
        if (ret)
                return ret;
        else



[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6454 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: PROBLEM: Kernel OOPS and possible system freeze after concurrent writing to cpufreq/scaling_governor (Resend)
  2014-07-25  9:03       ` Viresh Kumar
  2014-07-25 13:19         ` Robert Schöne
  2014-09-08  8:13         ` Robert Schöne
@ 2014-09-08  8:16         ` Robert Schöne
  2014-09-08 10:56           ` Viresh Kumar
  2 siblings, 1 reply; 13+ messages in thread
From: Robert Schöne @ 2014-09-08  8:16 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: Srivatsa S. Bhat, Rafael J. Wysocki, linux-pm

(Sorry for the resend, I forgot to disable my S/MIME signature)

The patch you suggested did not work, so I introduced a new mutex in the
patch below.

I am not happy with adding just another mutex, but it fixes my problem
of changing governors concurrently.

Robert



This patch fixes a race condition when concurrently writing to cpufreq/scaling_governor


diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index d9fdedd..2ad6b03 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -572,8 +572,15 @@ static ssize_t show_scaling_governor(struct cpufreq_policy *policy, char *buf)
 }
 
 /**
+ * This mutex guarantees, that a concurrent writing to cpuX/scaling_governor
+ * does not run into an OOPS.
+ */
+static DEFINE_MUTEX(cpufreq_store_governor_lock);
+
+/**
  * store_scaling_governor - store policy for the specified CPU
  */
+
 static ssize_t store_scaling_governor(struct cpufreq_policy *policy,
                                        const char *buf, size_t count)
 {
@@ -593,11 +600,15 @@ static ssize_t store_scaling_governor(struct cpufreq_policy *policy,
                                                &new_policy.governor))
                return -EINVAL;
 
+       if (! mutex_trylock(&cpufreq_store_governor_lock))
+               return -EBUSY;
+
        ret = cpufreq_set_policy(policy, &new_policy);
 
        policy->user_policy.policy = policy->policy;
        policy->user_policy.governor = policy->governor;
 
+       mutex_unlock(&cpufreq_store_governor_lock);
        if (ret)
                return ret;
        else



-- 

Dipl.-Inf. Robert Schoene
Computer Scientist - R&D Energy Efficient Computing

Technische Universitaet Dresden
Center for Information Services and High Performance Computing
Distributed and Data Intensive Computing
01062 Dresden
Tel.: +49 (351) 463-42483
Fax : +49 (351) 463-37773
E-Mail: Robert.Schoene@tu-dresden.de


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: PROBLEM: Kernel OOPS and possible system freeze after concurrent writing to cpufreq/scaling_governor (Resend)
  2014-09-08  8:16         ` Robert Schöne
@ 2014-09-08 10:56           ` Viresh Kumar
  2014-09-08 12:28             ` Robert Schöne
  2014-09-08 21:14             ` Rafael J. Wysocki
  0 siblings, 2 replies; 13+ messages in thread
From: Viresh Kumar @ 2014-09-08 10:56 UTC (permalink / raw)
  To: Robert Schöne
  Cc: Rafael J. Wysocki, linux-pm, Srivatsa S. Bhat, Prarit Bhargava,
	Saravana Kannan, Stephen Boyd

[-- Attachment #1: Type: text/plain, Size: 932 bytes --]

On 8 September 2014 13:46, Robert Schöne <robert.schoene@tu-dresden.de> wrote:
> (Sorry for the resend, I forgot to disable my S/MIME signature)
>
> The patch you suggested did not work, so I introduced a new mutex in the
> patch below.

Okay, let me apologize first. This thread has taken much longer than I
expected, to some level due to me. I have changed my job recently
and have been running busy with new assignments, etc..

> I am not happy with adding just another mutex, but it fixes my problem
> of changing governors concurrently.

So, yeah back to the problem.

Can you please try attached patch with the revert earlier suggested?
To make it clear again, cherry-pick: 19c7630 and then apply  this
patch.

Let me know if it still doesn't work for you..

I don't have a local setup to test this and so its just compile tested.

Cc'd few more people who also reported similar issues.

--
viresh

[-- Attachment #2: 0001-cpufreq-Track-governor-state-with-policy-governor_st.patch --]
[-- Type: text/x-patch, Size: 4860 bytes --]

From 250686bfdbb183a9d798b071d32972b65d35c915 Mon Sep 17 00:00:00 2001
Message-Id: <250686bfdbb183a9d798b071d32972b65d35c915.1410173112.git.viresh.kumar@linaro.org>
From: Viresh Kumar <viresh.kumar@linaro.org>
Date: Mon, 8 Sep 2014 16:01:15 +0530
Subject: [PATCH] cpufreq: Track governor-state with 'policy->governor_state'

Even after serializing calls to __cpufreq_governor() there are some races left.
The races are around doing the invalid operation during some state of cpufreq
governors. For example, while the governor is in CPUFREQ_GOV_POLICY_EXIT state,
we can't do CPUFREQ_GOV_START without first doing CPUFREQ_GOV_POLICY_INIT.

All these cases weren't handled elegantly in __cpufreq_governor() and so there
were enough chances that things may go wrong when governors are changed with
multiple thread.

This patch renames an existing field 'policy->governor_enabled' with
'policy->governor_state' which can have more values than 0 & 1 now.

We maintain the current state of governors for each policy now and reject any
invalid operation.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq.c          | 26 +++++++++++++-------------
 drivers/cpufreq/cpufreq_governor.c |  2 +-
 include/linux/cpufreq.h            |  2 +-
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index a7ceae3..c597361 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -935,6 +935,7 @@ static void cpufreq_init_policy(struct cpufreq_policy *policy)
 	struct cpufreq_policy new_policy;
 	int ret = 0;
 
+	policy->governor_state = CPUFREQ_GOV_POLICY_EXIT;
 	memcpy(&new_policy, policy, sizeof(*policy));
 
 	/* Update governor of new_policy to the governor used before hotplug */
@@ -1976,7 +1977,7 @@ EXPORT_SYMBOL_GPL(cpufreq_driver_target);
 static int __cpufreq_governor(struct cpufreq_policy *policy,
 					unsigned int event)
 {
-	int ret;
+	int ret, state;
 
 	/* Only must be defined when default governor is known to have latency
 	   restrictions, like e.g. conservative or ondemand.
@@ -2012,19 +2013,21 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
 		 policy->cpu, event);
 
 	mutex_lock(&cpufreq_governor_lock);
+	state = policy->governor_state;
+
+	/* Check if operation is permitted or not */
 	if (policy->governor_busy
-	    || (policy->governor_enabled && event == CPUFREQ_GOV_START)
-	    || (!policy->governor_enabled
-	    && (event == CPUFREQ_GOV_LIMITS || event == CPUFREQ_GOV_STOP))) {
+	    || (state == CPUFREQ_GOV_START && event != CPUFREQ_GOV_LIMITS && event != CPUFREQ_GOV_STOP)
+	    || (state == CPUFREQ_GOV_STOP && event != CPUFREQ_GOV_START && event != CPUFREQ_GOV_POLICY_EXIT)
+	    || (state == CPUFREQ_GOV_POLICY_INIT && event != CPUFREQ_GOV_START && event != CPUFREQ_GOV_POLICY_EXIT)
+	    || (state == CPUFREQ_GOV_POLICY_EXIT && event != CPUFREQ_GOV_POLICY_INIT)) {
 		mutex_unlock(&cpufreq_governor_lock);
 		return -EBUSY;
 	}
 
 	policy->governor_busy = true;
-	if (event == CPUFREQ_GOV_STOP)
-		policy->governor_enabled = false;
-	else if (event == CPUFREQ_GOV_START)
-		policy->governor_enabled = true;
+	if (event != CPUFREQ_GOV_LIMITS)
+		policy->governor_state = event;
 
 	mutex_unlock(&cpufreq_governor_lock);
 
@@ -2035,13 +2038,10 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
 			policy->governor->initialized++;
 		else if (event == CPUFREQ_GOV_POLICY_EXIT)
 			policy->governor->initialized--;
-	} else {
+	} else if (event != CPUFREQ_GOV_LIMITS) {
 		/* Restore original values */
 		mutex_lock(&cpufreq_governor_lock);
-		if (event == CPUFREQ_GOV_STOP)
-			policy->governor_enabled = true;
-		else if (event == CPUFREQ_GOV_START)
-			policy->governor_enabled = false;
+		policy->governor_state = state;
 		mutex_unlock(&cpufreq_governor_lock);
 	}
 
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 1b44496..d173181 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -174,7 +174,7 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
 	int i;
 
 	mutex_lock(&cpufreq_governor_lock);
-	if (!policy->governor_enabled)
+	if (policy->governor_state != CPUFREQ_GOV_START)
 		goto out_unlock;
 
 	if (!all_cpus) {
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index c7aa96b..39ddd3e 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -81,7 +81,7 @@ struct cpufreq_policy {
 	unsigned int		policy; /* see above */
 	struct cpufreq_governor	*governor; /* see below */
 	void			*governor_data;
-	bool			governor_enabled; /* governor start/stop flag */
+	bool			governor_state; /* Governor's current state */
 	bool			governor_busy;
 
 	struct work_struct	update; /* if update_policy() needs to be
-- 
2.0.3.693.g996b0fd


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: PROBLEM: Kernel OOPS and possible system freeze after concurrent writing to cpufreq/scaling_governor (Resend)
  2014-09-08 10:56           ` Viresh Kumar
@ 2014-09-08 12:28             ` Robert Schöne
  2014-09-08 12:57               ` Viresh Kumar
  2014-09-08 21:14             ` Rafael J. Wysocki
  1 sibling, 1 reply; 13+ messages in thread
From: Robert Schöne @ 2014-09-08 12:28 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, linux-pm, Srivatsa S. Bhat, Prarit Bhargava,
	Saravana Kannan, Stephen Boyd

Am Montag, den 08.09.2014, 16:26 +0530 schrieb Viresh Kumar:
> Can you please try attached patch with the revert earlier suggested?
> To make it clear again, cherry-pick: 19c7630 and then apply  this
> patch.

The patch works for me.

Thank you!

PS: With this patch I get an EINVAL when writing fails (instead of an
EBUSY). Is this intended? 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: Kernel OOPS and possible system freeze after concurrent writing to cpufreq/scaling_governor (Resend)
  2014-09-08 12:28             ` Robert Schöne
@ 2014-09-08 12:57               ` Viresh Kumar
  0 siblings, 0 replies; 13+ messages in thread
From: Viresh Kumar @ 2014-09-08 12:57 UTC (permalink / raw)
  To: Robert Schöne
  Cc: Rafael J. Wysocki, linux-pm, Srivatsa S. Bhat, Prarit Bhargava,
	Saravana Kannan, Stephen Boyd

On 8 September 2014 17:58, Robert Schöne <robert.schoene@tu-dresden.de> wrote:
> The patch works for me.
>
> Thank you!
>
> PS: With this patch I get an EINVAL when writing fails (instead of an
> EBUSY). Is this intended?

That's because cpufreq_set_policy() returns.. We can fix it separately if we
really want.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: Kernel OOPS and possible system freeze after concurrent writing to cpufreq/scaling_governor (Resend)
  2014-09-08 10:56           ` Viresh Kumar
  2014-09-08 12:28             ` Robert Schöne
@ 2014-09-08 21:14             ` Rafael J. Wysocki
  2014-09-09  4:18               ` Viresh Kumar
  1 sibling, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2014-09-08 21:14 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Robert Schöne, linux-pm, Srivatsa S. Bhat, Prarit Bhargava,
	Saravana Kannan, Stephen Boyd

On Monday, September 08, 2014 04:26:42 PM Viresh Kumar wrote:
> 
> --001a11c2eb5293ba1305028bac8b
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: quoted-printable
> 
> On 8 September 2014 13:46, Robert Sch=C3=B6ne <robert.schoene@tu-dresden.de=
> > wrote:
> > (Sorry for the resend, I forgot to disable my S/MIME signature)
> >
> > The patch you suggested did not work, so I introduced a new mutex in the
> > patch below.
> 
> Okay, let me apologize first. This thread has taken much longer than I
> expected, to some level due to me. I have changed my job recently
> and have been running busy with new assignments, etc..
> 
> > I am not happy with adding just another mutex, but it fixes my problem
> > of changing governors concurrently.
> 
> So, yeah back to the problem.
> 
> Can you please try attached patch with the revert earlier suggested?
> To make it clear again, cherry-pick: 19c7630 and then apply  this
> patch.
> 
> Let me know if it still doesn't work for you..
> 
> I don't have a local setup to test this and so its just compile tested.
> 
> Cc'd few more people who also reported similar issues.

Some mailer (mine or yours) mangled the attachment.  I can see it in patchwork,
though.

It looks reasonable to me.

Am I supposed to pick it up?  If so, do we need it in -stable?  If so, which
-stable (or what commit does it fix)?

Rafael


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: Kernel OOPS and possible system freeze after concurrent writing to cpufreq/scaling_governor (Resend)
  2014-09-08 21:14             ` Rafael J. Wysocki
@ 2014-09-09  4:18               ` Viresh Kumar
  0 siblings, 0 replies; 13+ messages in thread
From: Viresh Kumar @ 2014-09-09  4:18 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Robert Schöne, linux-pm, Srivatsa S. Bhat, Prarit Bhargava,
	Saravana Kannan, Stephen Boyd

On 9 September 2014 02:44, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> Some mailer (mine or yours) mangled the attachment.  I can see it in patchwork,
> though.

I didn't knew that they play with attachments as well :)

> It looks reasonable to me.
>
> Am I supposed to pick it up?  If so, do we need it in -stable?  If so, which
> -stable (or what commit does it fix)?

I wanted to send this separately and so attached it here. I have sent
two patches
to you with details about stable as well.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-09-09  4:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-16 14:53 PROBLEM: Kernel OOPS and possible system freeze after concurrent writing to cpufreq/scaling_governor Robert Schöne
2014-07-24  7:11 ` PROBLEM: Kernel OOPS and possible system freeze after concurrent writing to cpufreq/scaling_governor (Resend) Robert Schöne
2014-07-24  9:42   ` Viresh Kumar
2014-07-25  8:42     ` Robert Schöne
2014-07-25  9:03       ` Viresh Kumar
2014-07-25 13:19         ` Robert Schöne
2014-09-08  8:13         ` Robert Schöne
2014-09-08  8:16         ` Robert Schöne
2014-09-08 10:56           ` Viresh Kumar
2014-09-08 12:28             ` Robert Schöne
2014-09-08 12:57               ` Viresh Kumar
2014-09-08 21:14             ` Rafael J. Wysocki
2014-09-09  4:18               ` Viresh Kumar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.