linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
@ 2012-01-24 15:05 Jiri Slaby
  2012-01-24 16:18 ` [linux-pm] " Srivatsa S. Bhat
  0 siblings, 1 reply; 24+ messages in thread
From: Jiri Slaby @ 2012-01-24 15:05 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux-pm mailing list, LKML, Jiri Slaby

Hi,

this is a freshly booted system. When I do s2dsk, I see:
...
Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true!
------------[ cut here ]------------
kernel BUG at /l/latest/linux/kernel/workqueue.c:3659!
invalid opcode: 0000 [#1] SMP
CPU 0
Modules linked in:

Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627
Bochs Bochs
RIP: 0010:[<ffffffff8107e365>]  [<ffffffff8107e365>]
freeze_workqueues_begin+0x195/0x1a0
RSP: 0018:ffff880046f01d68  EFLAGS: 00010292
RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9
RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c
RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004
FS:  00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process s2disk (pid: 2669, threadinfo ffff880046f00000, task
ffff880047251980)
Stack:
 ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc
 ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9
 00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1
Call Trace:
 [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0
 [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90
 [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0
 [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0
 [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550
 [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180
 [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80
 [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b
48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b
66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89
RIP  [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0
 RSP <ffff880046f01d68>
---[ end trace 632574abdc098963 ]---

I wanted to reproduce a different crash which happens on 3.2.1. When
there is not enough free swap, the kernel crashes (when s2disk is used).
I will post more info after the one above is resolved and I can
reproduce the latter :).

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
  2012-01-24 15:05 PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 Jiri Slaby
@ 2012-01-24 16:18 ` Srivatsa S. Bhat
  2012-01-24 16:24   ` Jiri Slaby
  0 siblings, 1 reply; 24+ messages in thread
From: Srivatsa S. Bhat @ 2012-01-24 16:18 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Rafael J. Wysocki, Linux-pm mailing list, LKML, Jiri Slaby

Hi Jiri,

On 01/24/2012 08:35 PM, Jiri Slaby wrote:

> Hi,
> 
> this is a freshly booted system. When I do s2dsk, I see:
> ...
> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true!
> ------------[ cut here ]------------
> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659!
> invalid opcode: 0000 [#1] SMP
> CPU 0
> Modules linked in:
> 
> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627
> Bochs Bochs
> RIP: 0010:[<ffffffff8107e365>]  [<ffffffff8107e365>]
> freeze_workqueues_begin+0x195/0x1a0
> RSP: 0018:ffff880046f01d68  EFLAGS: 00010292
> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9
> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c
> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004
> FS:  00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task
> ffff880047251980)
> Stack:
>  ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc
>  ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9
>  00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1
> Call Trace:
>  [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0
>  [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90
>  [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0
>  [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0
>  [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550
>  [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180
>  [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80
>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b
> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b
> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89
> RIP  [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0
>  RSP <ffff880046f01d68>
> ---[ end trace 632574abdc098963 ]---
> 


I couldn't find any obvious root-cause from a quick check. Is this completely
reproducible upon a fresh boot?
 
Regards,
Srivatsa S. Bhat
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
  2012-01-24 16:18 ` [linux-pm] " Srivatsa S. Bhat
@ 2012-01-24 16:24   ` Jiri Slaby
  2012-01-24 22:36     ` Rafael J. Wysocki
  0 siblings, 1 reply; 24+ messages in thread
From: Jiri Slaby @ 2012-01-24 16:24 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Rafael J. Wysocki, Linux-pm mailing list, LKML, Jiri Slaby

On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote:
> Hi Jiri,
> 
> On 01/24/2012 08:35 PM, Jiri Slaby wrote:
> 
>> Hi,
>>
>> this is a freshly booted system. When I do s2dsk, I see:
>> ...
>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true!
>> ------------[ cut here ]------------
>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659!
>> invalid opcode: 0000 [#1] SMP
>> CPU 0
>> Modules linked in:
>>
>> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627
>> Bochs Bochs
>> RIP: 0010:[<ffffffff8107e365>]  [<ffffffff8107e365>]
>> freeze_workqueues_begin+0x195/0x1a0
>> RSP: 0018:ffff880046f01d68  EFLAGS: 00010292
>> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9
>> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c
>> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004
>> FS:  00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task
>> ffff880047251980)
>> Stack:
>>  ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc
>>  ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9
>>  00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1
>> Call Trace:
>>  [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0
>>  [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90
>>  [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0
>>  [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0
>>  [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550
>>  [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180
>>  [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80
>>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
>> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b
>> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b
>> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89
>> RIP  [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0
>>  RSP <ffff880046f01d68>
>> ---[ end trace 632574abdc098963 ]---
>>
> 
> 
> I couldn't find any obvious root-cause from a quick check. Is this completely
> reproducible upon a fresh boot?

True.

The cause is that the function is called twice:
# s2disk
PM: Marking nosave pages: 000000000009f000 - 0000000000100000
PM: Basic memory bitmaps created
Syncing filesystems ... done.
Freezing user space processes ... (elapsed 0.01 seconds) done.
PM: Preallocating image memory... done (allocated 264170 pages)
PM: Allocated 1056680 kbytes in 0.45 seconds (2348.17 MB/s)
Freezing remaining freezable tasks ... Pid: 2650, comm: s2disk Not
tainted 3.3.0-rc1-next-20120124_64+ #1628
Call Trace:
 [<ffffffff8109edd2>] ? getnstimeofday+0x52/0xf0
 [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0
 [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0
 [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90
 [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0
 [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0
 [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550
 [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180
 [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80
 [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
(elapsed 0.03 seconds) done.
PM: freeze of devices complete after 57.216 msecs
PM: late freeze of devices complete after 0.720 msecs
ACPI: Preparing to enter system sleep state S4
PM: Saving platform NVS memory
Disabling non-boot CPUs ...
CPU 1 is now offline
PM: Creating hibernation image:
PM: Need to copy 24112 pages
PM: Normal pages needed: 24112 + 1024, available pages: 277076
PM: Hibernation image created (24112 pages copied)
Enabling non-boot CPUs ...
Booting Node 0 Processor 1 APIC 0x1
smpboot cpu 1: start_ip = 9a000
Calibrating delay loop (skipped) already calibrated this CPU
NMI watchdog disabled (cpu1): hardware events not enabled
CPU1 is up
ACPI: Waking up from system sleep state S4
PM: early thaw of devices complete after 0.093 msecs
ata_piix 0000:00:01.1: setting latency timer to 64
e1000 0000:00:03.0: setting latency timer to 64
e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
ata2.01: NODEV after polling detection
ata1.00: configured for MWDMA2
ata2.00: configured for MWDMA2
ata1.01: configured for MWDMA2
sd 0:0:1:0: [sdb] Starting disk
sd 0:0:0:0: [sda] Starting disk
PM: thaw of devices complete after 192.317 msecs
PM: Preallocating image memory... done (allocated 264295 pages)
PM: Allocated 1057180 kbytes in 0.12 seconds (8809.83 MB/s)
Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true!
------------[ cut here ]------------
kernel BUG at /l/latest/linux/kernel/workqueue.c:3659!
invalid opcode: 0000 [#1] SMP
CPU 1
Modules linked in:

Pid: 2650, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1628
Bochs Bochs
RIP: 0010:[<ffffffff8107e371>]  [<ffffffff8107e371>]
freeze_workqueues_begin+0x1a1/0x1b0
RSP: 0018:ffff8800456d7d68  EFLAGS: 00010292
RAX: 0000000000000023 RBX: 0000000000000001 RCX: 0000000000000073
RDX: 00000000000000ea RSI: 0000000000000046 RDI: ffffffff81b51f7c
RBP: ffff8800456d7d98 R08: ffffffff81a9d760 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 00007fff11d8709c R14: ffffffffffffffff R15: 0000000000000004
FS:  00007f4744397700(0000) GS:ffff880049700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000001f46bc0 CR3: 000000004483b000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process s2disk (pid: 2650, threadinfo ffff8800456d6000, task
ffff880046c93300)
Stack:
 ffff8800456d7d98 0000000000000001 0000000000000000 00007fff11d8709c
 ffffffffffffffff 0000000000000004 ffff8800456d7e18 ffffffff81096cc9
 00000000ffff008f 0000000000000004 ffff8800456d7e18 000000004f1edaaf
Call Trace:
 [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0
 [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90
 [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0
 [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0
 [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550
 [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180
 [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80
 [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 1d 94 5a 00 0f 0b
48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 06 94 5a 00 <0f> 0b
66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b
RIP  [<ffffffff8107e371>] freeze_workqueues_begin+0x1a1/0x1b0
 RSP <ffff8800456d7d68>
---[ end trace 696a23a038efce41 ]---



>  
> Regards,
> Srivatsa S. Bhat
> IBM Linux Technology Center
> 


-- 
js
suse labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
  2012-01-24 16:24   ` Jiri Slaby
@ 2012-01-24 22:36     ` Rafael J. Wysocki
  2012-01-24 22:47       ` Jiri Slaby
  0 siblings, 1 reply; 24+ messages in thread
From: Rafael J. Wysocki @ 2012-01-24 22:36 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Srivatsa S. Bhat, Linux-pm mailing list, LKML, Jiri Slaby

On Tuesday, January 24, 2012, Jiri Slaby wrote:
> On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote:
> > Hi Jiri,
> > 
> > On 01/24/2012 08:35 PM, Jiri Slaby wrote:
> > 
> >> Hi,
> >>
> >> this is a freshly booted system. When I do s2dsk, I see:
> >> ...
> >> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true!
> >> ------------[ cut here ]------------
> >> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659!
> >> invalid opcode: 0000 [#1] SMP
> >> CPU 0
> >> Modules linked in:
> >>
> >> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627
> >> Bochs Bochs
> >> RIP: 0010:[<ffffffff8107e365>]  [<ffffffff8107e365>]
> >> freeze_workqueues_begin+0x195/0x1a0
> >> RSP: 0018:ffff880046f01d68  EFLAGS: 00010292
> >> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9
> >> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c
> >> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000
> >> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> >> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004
> >> FS:  00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000
> >> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> >> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0
> >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task
> >> ffff880047251980)
> >> Stack:
> >>  ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc
> >>  ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9
> >>  00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1
> >> Call Trace:
> >>  [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0
> >>  [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90
> >>  [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0
> >>  [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0
> >>  [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550
> >>  [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180
> >>  [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80
> >>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
> >> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b
> >> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b
> >> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89
> >> RIP  [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0
> >>  RSP <ffff880046f01d68>
> >> ---[ end trace 632574abdc098963 ]---
> >>
> > 
> > 
> > I couldn't find any obvious root-cause from a quick check. Is this completely
> > reproducible upon a fresh boot?
> 
> True.
> 
> The cause is that the function is called twice:

Which function?

Rafael


> # s2disk
> PM: Marking nosave pages: 000000000009f000 - 0000000000100000
> PM: Basic memory bitmaps created
> Syncing filesystems ... done.
> Freezing user space processes ... (elapsed 0.01 seconds) done.
> PM: Preallocating image memory... done (allocated 264170 pages)
> PM: Allocated 1056680 kbytes in 0.45 seconds (2348.17 MB/s)
> Freezing remaining freezable tasks ... Pid: 2650, comm: s2disk Not
> tainted 3.3.0-rc1-next-20120124_64+ #1628
> Call Trace:
>  [<ffffffff8109edd2>] ? getnstimeofday+0x52/0xf0
>  [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0
>  [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0
>  [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90
>  [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0
>  [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0
>  [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550
>  [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180
>  [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80
>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
> (elapsed 0.03 seconds) done.
> PM: freeze of devices complete after 57.216 msecs
> PM: late freeze of devices complete after 0.720 msecs
> ACPI: Preparing to enter system sleep state S4
> PM: Saving platform NVS memory
> Disabling non-boot CPUs ...
> CPU 1 is now offline
> PM: Creating hibernation image:
> PM: Need to copy 24112 pages
> PM: Normal pages needed: 24112 + 1024, available pages: 277076
> PM: Hibernation image created (24112 pages copied)
> Enabling non-boot CPUs ...
> Booting Node 0 Processor 1 APIC 0x1
> smpboot cpu 1: start_ip = 9a000
> Calibrating delay loop (skipped) already calibrated this CPU
> NMI watchdog disabled (cpu1): hardware events not enabled
> CPU1 is up
> ACPI: Waking up from system sleep state S4
> PM: early thaw of devices complete after 0.093 msecs
> ata_piix 0000:00:01.1: setting latency timer to 64
> e1000 0000:00:03.0: setting latency timer to 64
> e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
> ata2.01: NODEV after polling detection
> ata1.00: configured for MWDMA2
> ata2.00: configured for MWDMA2
> ata1.01: configured for MWDMA2
> sd 0:0:1:0: [sdb] Starting disk
> sd 0:0:0:0: [sda] Starting disk
> PM: thaw of devices complete after 192.317 msecs
> PM: Preallocating image memory... done (allocated 264295 pages)
> PM: Allocated 1057180 kbytes in 0.12 seconds (8809.83 MB/s)
> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true!
> ------------[ cut here ]------------
> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659!
> invalid opcode: 0000 [#1] SMP
> CPU 1
> Modules linked in:
> 
> Pid: 2650, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1628
> Bochs Bochs
> RIP: 0010:[<ffffffff8107e371>]  [<ffffffff8107e371>]
> freeze_workqueues_begin+0x1a1/0x1b0
> RSP: 0018:ffff8800456d7d68  EFLAGS: 00010292
> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 0000000000000073
> RDX: 00000000000000ea RSI: 0000000000000046 RDI: ffffffff81b51f7c
> RBP: ffff8800456d7d98 R08: ffffffff81a9d760 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> R13: 00007fff11d8709c R14: ffffffffffffffff R15: 0000000000000004
> FS:  00007f4744397700(0000) GS:ffff880049700000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000001f46bc0 CR3: 000000004483b000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process s2disk (pid: 2650, threadinfo ffff8800456d6000, task
> ffff880046c93300)
> Stack:
>  ffff8800456d7d98 0000000000000001 0000000000000000 00007fff11d8709c
>  ffffffffffffffff 0000000000000004 ffff8800456d7e18 ffffffff81096cc9
>  00000000ffff008f 0000000000000004 ffff8800456d7e18 000000004f1edaaf
> Call Trace:
>  [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0
>  [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90
>  [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0
>  [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0
>  [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550
>  [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180
>  [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80
>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 1d 94 5a 00 0f 0b
> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 06 94 5a 00 <0f> 0b
> 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b
> RIP  [<ffffffff8107e371>] freeze_workqueues_begin+0x1a1/0x1b0
>  RSP <ffff8800456d7d68>
> ---[ end trace 696a23a038efce41 ]---
> 
> 
> 
> >  
> > Regards,
> > Srivatsa S. Bhat
> > IBM Linux Technology Center
> > 
> 
> 
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
  2012-01-24 22:36     ` Rafael J. Wysocki
@ 2012-01-24 22:47       ` Jiri Slaby
  2012-01-24 23:02         ` Rafael J. Wysocki
  0 siblings, 1 reply; 24+ messages in thread
From: Jiri Slaby @ 2012-01-24 22:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Srivatsa S. Bhat, Linux-pm mailing list, LKML, Jiri Slaby

On 01/24/2012 11:36 PM, Rafael J. Wysocki wrote:
> On Tuesday, January 24, 2012, Jiri Slaby wrote:
>> On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote:
>>> Hi Jiri,
>>>
>>> On 01/24/2012 08:35 PM, Jiri Slaby wrote:
>>>
>>>> Hi,
>>>>
>>>> this is a freshly booted system. When I do s2dsk, I see:
>>>> ...
>>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true!
>>>> ------------[ cut here ]------------
>>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659!
>>>> invalid opcode: 0000 [#1] SMP
>>>> CPU 0
>>>> Modules linked in:
>>>>
>>>> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627
>>>> Bochs Bochs
>>>> RIP: 0010:[<ffffffff8107e365>]  [<ffffffff8107e365>]
>>>> freeze_workqueues_begin+0x195/0x1a0
>>>> RSP: 0018:ffff880046f01d68  EFLAGS: 00010292
>>>> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9
>>>> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c
>>>> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000
>>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>>>> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004
>>>> FS:  00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000
>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0
>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task
>>>> ffff880047251980)
>>>> Stack:
>>>>  ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc
>>>>  ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9
>>>>  00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1
>>>> Call Trace:
>>>>  [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0
>>>>  [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90
>>>>  [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0
>>>>  [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0
>>>>  [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550
>>>>  [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180
>>>>  [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80
>>>>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
>>>> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b
>>>> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b
>>>> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89
>>>> RIP  [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0
>>>>  RSP <ffff880046f01d68>
>>>> ---[ end trace 632574abdc098963 ]---
>>>>
>>>
>>>
>>> I couldn't find any obvious root-cause from a quick check. Is this completely
>>> reproducible upon a fresh boot?
>>
>> True.
>>
>> The cause is that the function is called twice:
> 
> Which function?

The one where the BUG is. Maybe the functions which should clear the
flag is not called in between? See:

>>  [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0
                         ^^^^^^^^^^^^^^^^^^^^^^^
>>  [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0
>>  [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90
>>  [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0
>>  [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0
>>  [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550
>>  [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180
>>  [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80
>>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
>> (elapsed 0.03 seconds) done.
...
>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true!
>> ------------[ cut here ]------------
>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659!
...
>> RIP: 0010:[<ffffffff8107e371>]  [<ffffffff8107e371>
>> freeze_workqueues_begin+0x1a1/0x1b0
   ^^^^^^^^^^^^^^^^^^^^^^^
>> Call Trace:
>>  [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0
>>  [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90
>>  [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0
>>  [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0
>>  [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550
>>  [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180
>>  [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80
>>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b

-- 
js
suse labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
  2012-01-24 22:47       ` Jiri Slaby
@ 2012-01-24 23:02         ` Rafael J. Wysocki
  2012-01-25  0:04           ` Jiri Slaby
  0 siblings, 1 reply; 24+ messages in thread
From: Rafael J. Wysocki @ 2012-01-24 23:02 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Srivatsa S. Bhat, Linux-pm mailing list, LKML, Jiri Slaby

On Tuesday, January 24, 2012, Jiri Slaby wrote:
> On 01/24/2012 11:36 PM, Rafael J. Wysocki wrote:
> > On Tuesday, January 24, 2012, Jiri Slaby wrote:
> >> On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote:
> >>> Hi Jiri,
> >>>
> >>> On 01/24/2012 08:35 PM, Jiri Slaby wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> this is a freshly booted system. When I do s2dsk, I see:
> >>>> ...
> >>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true!
> >>>> ------------[ cut here ]------------
> >>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659!
> >>>> invalid opcode: 0000 [#1] SMP
> >>>> CPU 0
> >>>> Modules linked in:
> >>>>
> >>>> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627
> >>>> Bochs Bochs
> >>>> RIP: 0010:[<ffffffff8107e365>]  [<ffffffff8107e365>]
> >>>> freeze_workqueues_begin+0x195/0x1a0
> >>>> RSP: 0018:ffff880046f01d68  EFLAGS: 00010292
> >>>> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9
> >>>> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c
> >>>> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000
> >>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> >>>> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004
> >>>> FS:  00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000
> >>>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> >>>> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0
> >>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >>>> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task
> >>>> ffff880047251980)
> >>>> Stack:
> >>>>  ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc
> >>>>  ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9
> >>>>  00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1
> >>>> Call Trace:
> >>>>  [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0
> >>>>  [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90
> >>>>  [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0
> >>>>  [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0
> >>>>  [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550
> >>>>  [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180
> >>>>  [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80
> >>>>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
> >>>> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b
> >>>> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b
> >>>> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89
> >>>> RIP  [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0
> >>>>  RSP <ffff880046f01d68>
> >>>> ---[ end trace 632574abdc098963 ]---
> >>>>
> >>>
> >>>
> >>> I couldn't find any obvious root-cause from a quick check. Is this completely
> >>> reproducible upon a fresh boot?
> >>
> >> True.
> >>
> >> The cause is that the function is called twice:
> > 
> > Which function?
> 
> The one where the BUG is. Maybe the functions which should clear the
> flag is not called in between? See:
> 
> >>  [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0
>                          ^^^^^^^^^^^^^^^^^^^^^^^
> >>  [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0
> >>  [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90
> >>  [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0
> >>  [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0
> >>  [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550
> >>  [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180
> >>  [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80
> >>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
> >> (elapsed 0.03 seconds) done.
> ...
> >> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true!
> >> ------------[ cut here ]------------
> >> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659!
> ...
> >> RIP: 0010:[<ffffffff8107e371>]  [<ffffffff8107e371>
> >> freeze_workqueues_begin+0x1a1/0x1b0
>    ^^^^^^^^^^^^^^^^^^^^^^^
> >> Call Trace:
> >>  [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0
> >>  [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90
> >>  [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0
> >>  [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0
> >>  [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550
> >>  [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180
> >>  [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80
> >>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b

Ah.  So this is linux-next, right?

Can you please test the linux-next branch of the linux-pm tree and see if
the problem is reproducible in there?

Rafael

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
  2012-01-24 23:02         ` Rafael J. Wysocki
@ 2012-01-25  0:04           ` Jiri Slaby
  2012-01-25  0:10             ` Rafael J. Wysocki
  0 siblings, 1 reply; 24+ messages in thread
From: Jiri Slaby @ 2012-01-25  0:04 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jiri Slaby, Srivatsa S. Bhat, Linux-pm mailing list, LKML

On 01/25/2012 12:02 AM, Rafael J. Wysocki wrote:
> On Tuesday, January 24, 2012, Jiri Slaby wrote:
>> On 01/24/2012 11:36 PM, Rafael J. Wysocki wrote:
>>> On Tuesday, January 24, 2012, Jiri Slaby wrote:
>>>> On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote:
>>>>> Hi Jiri,
>>>>>
>>>>> On 01/24/2012 08:35 PM, Jiri Slaby wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> this is a freshly booted system. When I do s2dsk, I see:
>>>>>> ...
>>>>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true!
>>>>>> ------------[ cut here ]------------
>>>>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659!
>>>>>> invalid opcode: 0000 [#1] SMP
>>>>>> CPU 0
>>>>>> Modules linked in:
>>>>>>
>>>>>> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627
>>>>>> Bochs Bochs
>>>>>> RIP: 0010:[<ffffffff8107e365>]  [<ffffffff8107e365>]
>>>>>> freeze_workqueues_begin+0x195/0x1a0
>>>>>> RSP: 0018:ffff880046f01d68  EFLAGS: 00010292
>>>>>> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9
>>>>>> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c
>>>>>> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000
>>>>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>>>>>> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004
>>>>>> FS:  00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000
>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>>>> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0
>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>>>> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task
>>>>>> ffff880047251980)
>>>>>> Stack:
>>>>>>  ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc
>>>>>>  ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9
>>>>>>  00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1
>>>>>> Call Trace:
>>>>>>  [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0
>>>>>>  [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90
>>>>>>  [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0
>>>>>>  [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0
>>>>>>  [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550
>>>>>>  [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180
>>>>>>  [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80
>>>>>>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
>>>>>> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b
>>>>>> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b
>>>>>> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89
>>>>>> RIP  [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0
>>>>>>  RSP <ffff880046f01d68>
>>>>>> ---[ end trace 632574abdc098963 ]---
>>>>>>
>>>>>
>>>>>
>>>>> I couldn't find any obvious root-cause from a quick check. Is this completely
>>>>> reproducible upon a fresh boot?
>>>>
>>>> True.
>>>>
>>>> The cause is that the function is called twice:
>>>
>>> Which function?
>>
>> The one where the BUG is. Maybe the functions which should clear the
>> flag is not called in between? See:
>>
>>>>  [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0
>>                          ^^^^^^^^^^^^^^^^^^^^^^^
>>>>  [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0
>>>>  [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90
>>>>  [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0
>>>>  [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0
>>>>  [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550
>>>>  [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180
>>>>  [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80
>>>>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
>>>> (elapsed 0.03 seconds) done.
>> ...
>>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true!
>>>> ------------[ cut here ]------------
>>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659!
>> ...
>>>> RIP: 0010:[<ffffffff8107e371>]  [<ffffffff8107e371>
>>>> freeze_workqueues_begin+0x1a1/0x1b0
>>    ^^^^^^^^^^^^^^^^^^^^^^^
>>>> Call Trace:
>>>>  [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0
>>>>  [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90
>>>>  [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0
>>>>  [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0
>>>>  [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550
>>>>  [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180
>>>>  [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80
>>>>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
> 
> Ah.  So this is linux-next, right?

Right.

> Can you please test the linux-next branch of the linux-pm tree and see if
> the problem is reproducible in there?

Yeah, 100%. Just try it with a small enough swap.

thanks,
-- 
js

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
  2012-01-25  0:04           ` Jiri Slaby
@ 2012-01-25  0:10             ` Rafael J. Wysocki
  2012-01-25 14:25               ` Jiri Slaby
  2012-01-25 15:31               ` Srivatsa S. Bhat
  0 siblings, 2 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2012-01-25  0:10 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Jiri Slaby, Srivatsa S. Bhat, Linux-pm mailing list, LKML

On Wednesday, January 25, 2012, Jiri Slaby wrote:
> On 01/25/2012 12:02 AM, Rafael J. Wysocki wrote:
> > On Tuesday, January 24, 2012, Jiri Slaby wrote:
> >> On 01/24/2012 11:36 PM, Rafael J. Wysocki wrote:
> >>> On Tuesday, January 24, 2012, Jiri Slaby wrote:
> >>>> On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote:
> >>>>> Hi Jiri,
> >>>>>
> >>>>> On 01/24/2012 08:35 PM, Jiri Slaby wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> this is a freshly booted system. When I do s2dsk, I see:
> >>>>>> ...
> >>>>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true!
> >>>>>> ------------[ cut here ]------------
> >>>>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659!
> >>>>>> invalid opcode: 0000 [#1] SMP
> >>>>>> CPU 0
> >>>>>> Modules linked in:
> >>>>>>
> >>>>>> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627
> >>>>>> Bochs Bochs
> >>>>>> RIP: 0010:[<ffffffff8107e365>]  [<ffffffff8107e365>]
> >>>>>> freeze_workqueues_begin+0x195/0x1a0
> >>>>>> RSP: 0018:ffff880046f01d68  EFLAGS: 00010292
> >>>>>> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9
> >>>>>> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c
> >>>>>> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000
> >>>>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> >>>>>> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004
> >>>>>> FS:  00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000
> >>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> >>>>>> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0
> >>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >>>>>> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task
> >>>>>> ffff880047251980)
> >>>>>> Stack:
> >>>>>>  ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc
> >>>>>>  ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9
> >>>>>>  00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1
> >>>>>> Call Trace:
> >>>>>>  [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0
> >>>>>>  [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90
> >>>>>>  [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0
> >>>>>>  [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0
> >>>>>>  [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550
> >>>>>>  [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180
> >>>>>>  [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80
> >>>>>>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
> >>>>>> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b
> >>>>>> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b
> >>>>>> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89
> >>>>>> RIP  [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0
> >>>>>>  RSP <ffff880046f01d68>
> >>>>>> ---[ end trace 632574abdc098963 ]---
> >>>>>>
> >>>>>
> >>>>>
> >>>>> I couldn't find any obvious root-cause from a quick check. Is this completely
> >>>>> reproducible upon a fresh boot?
> >>>>
> >>>> True.
> >>>>
> >>>> The cause is that the function is called twice:
> >>>
> >>> Which function?
> >>
> >> The one where the BUG is. Maybe the functions which should clear the
> >> flag is not called in between? See:
> >>
> >>>>  [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0
> >>                          ^^^^^^^^^^^^^^^^^^^^^^^
> >>>>  [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0
> >>>>  [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90
> >>>>  [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0
> >>>>  [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0
> >>>>  [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550
> >>>>  [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180
> >>>>  [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80
> >>>>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
> >>>> (elapsed 0.03 seconds) done.
> >> ...
> >>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true!
> >>>> ------------[ cut here ]------------
> >>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659!
> >> ...
> >>>> RIP: 0010:[<ffffffff8107e371>]  [<ffffffff8107e371>
> >>>> freeze_workqueues_begin+0x1a1/0x1b0
> >>    ^^^^^^^^^^^^^^^^^^^^^^^
> >>>> Call Trace:
> >>>>  [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0
> >>>>  [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90
> >>>>  [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0
> >>>>  [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0
> >>>>  [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550
> >>>>  [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180
> >>>>  [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80
> >>>>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
> > 
> > Ah.  So this is linux-next, right?
> 
> Right.
> 
> > Can you please test the linux-next branch of the linux-pm tree and see if
> > the problem is reproducible in there?
> 
> Yeah, 100%. Just try it with a small enough swap.

Ah, thanks, so that's an error code path problem and most likely in the Linus'
tree.

Srivatsa, any ideas?

Rafael

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
  2012-01-25  0:10             ` Rafael J. Wysocki
@ 2012-01-25 14:25               ` Jiri Slaby
  2012-01-25 15:31               ` Srivatsa S. Bhat
  1 sibling, 0 replies; 24+ messages in thread
From: Jiri Slaby @ 2012-01-25 14:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jiri Slaby, Srivatsa S. Bhat, Linux-pm mailing list, LKML

On 01/25/2012 01:10 AM, Rafael J. Wysocki wrote:
> On Wednesday, January 25, 2012, Jiri Slaby wrote:
>> Yeah, 100%. Just try it with a small enough swap.
> 
> Ah, thanks, so that's an error code path problem and most likely in the Linus'
> tree.

Just to emphasize what I wrote earlier. I would apprectiate this fixed
in 3.2 stable too.

-- 
js

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
  2012-01-25  0:10             ` Rafael J. Wysocki
  2012-01-25 14:25               ` Jiri Slaby
@ 2012-01-25 15:31               ` Srivatsa S. Bhat
  2012-01-25 16:00                 ` Srivatsa S. Bhat
  2012-01-26 19:39                 ` Srivatsa S. Bhat
  1 sibling, 2 replies; 24+ messages in thread
From: Srivatsa S. Bhat @ 2012-01-25 15:31 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jiri Slaby, Linux-pm mailing list, Jiri Slaby, LKML, Baohua.Song,
	Tejun Heo, pavel

On 01/25/2012 05:40 AM, Rafael J. Wysocki wrote:

> On Wednesday, January 25, 2012, Jiri Slaby wrote:
>> On 01/25/2012 12:02 AM, Rafael J. Wysocki wrote:
>>> On Tuesday, January 24, 2012, Jiri Slaby wrote:
>>>> On 01/24/2012 11:36 PM, Rafael J. Wysocki wrote:
>>>>> On Tuesday, January 24, 2012, Jiri Slaby wrote:
>>>>>> On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote:
>>>>>>> Hi Jiri,
>>>>>>>
>>>>>>> On 01/24/2012 08:35 PM, Jiri Slaby wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> this is a freshly booted system. When I do s2dsk, I see:
>>>>>>>> ...
>>>>>>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true!
>>>>>>>> ------------[ cut here ]------------
>>>>>>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659!
>>>>>>>> invalid opcode: 0000 [#1] SMP
>>>>>>>> CPU 0
>>>>>>>> Modules linked in:
>>>>>>>>
>>>>>>>> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627
>>>>>>>> Bochs Bochs
>>>>>>>> RIP: 0010:[<ffffffff8107e365>]  [<ffffffff8107e365>]
>>>>>>>> freeze_workqueues_begin+0x195/0x1a0
>>>>>>>> RSP: 0018:ffff880046f01d68  EFLAGS: 00010292
>>>>>>>> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9
>>>>>>>> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c
>>>>>>>> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000
>>>>>>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>>>>>>>> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004
>>>>>>>> FS:  00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000
>>>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>>>>>> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0
>>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>>>>>> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task
>>>>>>>> ffff880047251980)
>>>>>>>> Stack:
>>>>>>>>  ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc
>>>>>>>>  ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9
>>>>>>>>  00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1
>>>>>>>> Call Trace:
>>>>>>>>  [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0
>>>>>>>>  [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90
>>>>>>>>  [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0
>>>>>>>>  [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0
>>>>>>>>  [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550
>>>>>>>>  [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180
>>>>>>>>  [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80
>>>>>>>>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
>>>>>>>> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b
>>>>>>>> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b
>>>>>>>> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89
>>>>>>>> RIP  [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0
>>>>>>>>  RSP <ffff880046f01d68>
>>>>>>>> ---[ end trace 632574abdc098963 ]---
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I couldn't find any obvious root-cause from a quick check. Is this completely
>>>>>>> reproducible upon a fresh boot?
>>>>>>
>>>>>> True.
>>>>>>
>>>>>> The cause is that the function is called twice:
>>>>>
>>>>> Which function?
>>>>
>>>> The one where the BUG is. Maybe the functions which should clear the
>>>> flag is not called in between? See:
>>>>
>>>>>>  [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0
>>>>                          ^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>  [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0
>>>>>>  [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90
>>>>>>  [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0
>>>>>>  [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0
>>>>>>  [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550
>>>>>>  [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180
>>>>>>  [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80
>>>>>>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
>>>>>> (elapsed 0.03 seconds) done.
>>>> ...
>>>>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true!
>>>>>> ------------[ cut here ]------------
>>>>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659!
>>>> ...
>>>>>> RIP: 0010:[<ffffffff8107e371>]  [<ffffffff8107e371>
>>>>>> freeze_workqueues_begin+0x1a1/0x1b0
>>>>    ^^^^^^^^^^^^^^^^^^^^^^^
>>>>>> Call Trace:
>>>>>>  [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0
>>>>>>  [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90
>>>>>>  [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0
>>>>>>  [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0
>>>>>>  [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550
>>>>>>  [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180
>>>>>>  [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80
>>>>>>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
>>>
>>> Ah.  So this is linux-next, right?
>>
>> Right.
>>
>>> Can you please test the linux-next branch of the linux-pm tree and see if
>>> the problem is reproducible in there?
>>
>> Yeah, 100%. Just try it with a small enough swap.
> 
> Ah, thanks, so that's an error code path problem and most likely in the Linus'
> tree.
> 
> Srivatsa, any ideas?
> 


Ok, I will need to quote a part of the userspace utility to explain the
problem.

In suspend.c inside the suspend-utils userspace package, I see a loop such
as:

        error = freeze(snapshot_fd);
	...
        attempts = 2;
        do {
                if (set_image_size(snapshot_fd, image_size)) {
                        error = errno;
                        break;
                }
                if (atomic_snapshot(snapshot_fd, &in_suspend)) {
                        error = errno;
                        break;
                }
                if (!in_suspend) {
                        /* first unblank the console, see console_codes(4) */
                        printf("\e[13]");
                        printf("%s: returned to userspace\n", my_name);
                        free_snapshot(snapshot_fd);
                        break;
                }

                error = write_image(snapshot_fd, resume_fd, -1);
                if (error) {
                        free_swap_pages(snapshot_fd);
                        free_snapshot(snapshot_fd);
                        image_size = 0;
                        error = -error;
                        if (error != ENOSPC)
                                break;
                } else {
                        splash.progress(100);
#ifdef CONFIG_BOTH
                        if (s2ram_kms || s2ram) {
                                /* If we die (and allow system to continue)
                                 * between now and reset_signature(), very bad
                                 * things will happen. */
                                error = suspend_to_ram(snapshot_fd);
                                if (error)
                                        goto Shutdown;
                                reset_signature(resume_fd);
                                free_swap_pages(snapshot_fd);
                                free_snapshot(snapshot_fd);
                                if (!s2ram_kms)
                                        s2ram_resume();
                                goto Unfreeze;
                        }
Shutdown:
#endif
                        close(resume_fd);
                        suspend_shutdown(snapshot_fd);
                }
        } while (--attempts);

...
Unfreeze:
        unfreeze(snapshot_fd);


Let me reply to this thread so that I can comment on the above code.

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
  2012-01-25 15:31               ` Srivatsa S. Bhat
@ 2012-01-25 16:00                 ` Srivatsa S. Bhat
  2012-01-25 18:44                   ` Srivatsa S. Bhat
  2012-01-25 22:00                   ` Jiri Slaby
  2012-01-26 19:39                 ` Srivatsa S. Bhat
  1 sibling, 2 replies; 24+ messages in thread
From: Srivatsa S. Bhat @ 2012-01-25 16:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jiri Slaby, Linux-pm mailing list, Jiri Slaby, LKML, Baohua.Song,
	Tejun Heo, pavel

On 01/25/2012 09:01 PM, Srivatsa S. Bhat wrote:

> On 01/25/2012 05:40 AM, Rafael J. Wysocki wrote:
> 
>> On Wednesday, January 25, 2012, Jiri Slaby wrote:
>>> On 01/25/2012 12:02 AM, Rafael J. Wysocki wrote:
>>>> On Tuesday, January 24, 2012, Jiri Slaby wrote:
>>>>> On 01/24/2012 11:36 PM, Rafael J. Wysocki wrote:
>>>>>> On Tuesday, January 24, 2012, Jiri Slaby wrote:
>>>>>>> On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote:
>>>>>>>> Hi Jiri,
>>>>>>>>
>>>>>>>> On 01/24/2012 08:35 PM, Jiri Slaby wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> this is a freshly booted system. When I do s2dsk, I see:
>>>>>>>>> ...
>>>>>>>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true!
>>>>>>>>> ------------[ cut here ]------------
>>>>>>>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659!
>>>>>>>>> invalid opcode: 0000 [#1] SMP
>>>>>>>>> CPU 0
>>>>>>>>> Modules linked in:
>>>>>>>>>
>>>>>>>>> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627
>>>>>>>>> Bochs Bochs
>>>>>>>>> RIP: 0010:[<ffffffff8107e365>]  [<ffffffff8107e365>]
>>>>>>>>> freeze_workqueues_begin+0x195/0x1a0
>>>>>>>>> RSP: 0018:ffff880046f01d68  EFLAGS: 00010292
>>>>>>>>> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9
>>>>>>>>> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c
>>>>>>>>> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000
>>>>>>>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>>>>>>>>> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004
>>>>>>>>> FS:  00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000
>>>>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>>>>>>> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0
>>>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>>>>>>> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task
>>>>>>>>> ffff880047251980)
>>>>>>>>> Stack:
>>>>>>>>>  ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc
>>>>>>>>>  ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9
>>>>>>>>>  00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1
>>>>>>>>> Call Trace:
>>>>>>>>>  [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0
>>>>>>>>>  [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90
>>>>>>>>>  [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0
>>>>>>>>>  [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0
>>>>>>>>>  [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550
>>>>>>>>>  [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180
>>>>>>>>>  [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80
>>>>>>>>>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
>>>>>>>>> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b
>>>>>>>>> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b
>>>>>>>>> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89
>>>>>>>>> RIP  [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0
>>>>>>>>>  RSP <ffff880046f01d68>
>>>>>>>>> ---[ end trace 632574abdc098963 ]---
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I couldn't find any obvious root-cause from a quick check. Is this completely
>>>>>>>> reproducible upon a fresh boot?
>>>>>>>
>>>>>>> True.
>>>>>>>
>>>>>>> The cause is that the function is called twice:
>>>>>>
>>>>>> Which function?
>>>>>
>>>>> The one where the BUG is. Maybe the functions which should clear the
>>>>> flag is not called in between? See:
>>>>>
>>>>>>>  [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0
>>>>>                          ^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>  [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0
>>>>>>>  [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90
>>>>>>>  [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0
>>>>>>>  [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0
>>>>>>>  [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550
>>>>>>>  [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180
>>>>>>>  [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80
>>>>>>>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
>>>>>>> (elapsed 0.03 seconds) done.
>>>>> ...
>>>>>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true!
>>>>>>> ------------[ cut here ]------------
>>>>>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659!
>>>>> ...
>>>>>>> RIP: 0010:[<ffffffff8107e371>]  [<ffffffff8107e371>
>>>>>>> freeze_workqueues_begin+0x1a1/0x1b0
>>>>>    ^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>> Call Trace:
>>>>>>>  [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0
>>>>>>>  [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90
>>>>>>>  [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0
>>>>>>>  [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0
>>>>>>>  [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550
>>>>>>>  [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180
>>>>>>>  [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80
>>>>>>>  [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b
>>>>
>>>> Ah.  So this is linux-next, right?
>>>
>>> Right.
>>>
>>>> Can you please test the linux-next branch of the linux-pm tree and see if
>>>> the problem is reproducible in there?
>>>
>>> Yeah, 100%. Just try it with a small enough swap.
>>
>> Ah, thanks, so that's an error code path problem and most likely in the Linus'
>> tree.
>>
>> Srivatsa, any ideas?
>>
> 
> 
> Ok, I will need to quote a part of the userspace utility to explain the
> problem.
> 




Commit 2aede851 (PM / Hibernate: Freeze kernel threads after preallocating
memory) split up the freezing phase and moved the freezing of kernel threads
to hibernation_snapshot() function.
(This is a pretty old commit btw, I think it came in sometime around 3.2).

And there is a BUG_ON() in freeze_workqueues_begin() which complains if
this function is called when the kernel threads are already frozen.

Now, coming to the userspace utility:

> In suspend.c inside the suspend-utils userspace package, I see a loop such
> as:
> 
>         error = freeze(snapshot_fd);
> 	...
>         attempts = 2;
>         do {
>                 if (set_image_size(snapshot_fd, image_size)) {
>                         error = errno;
>                         break;
>                 }
>                 if (atomic_snapshot(snapshot_fd, &in_suspend)) {



                        ^^^ 
        This ends up in the call to hibernation_snapshot().

>                         error = errno;
>                         break;
>                 }
>                 if (!in_suspend) {
>                         /* first unblank the console, see console_codes(4) */
>                         printf("\e[13]");
>                         printf("%s: returned to userspace\n", my_name);
>                         free_snapshot(snapshot_fd);
>                         break;
>                 }
> 
>                 error = write_image(snapshot_fd, resume_fd, -1);
>                 if (error) {




        Since swap space is limited, we come here.

>                         free_swap_pages(snapshot_fd);
>                         free_snapshot(snapshot_fd);
>                         image_size = 0;
>                         error = -error;
>                         if (error != ENOSPC)
>                                 break;




        And because of the above 'if' condition, we don't break here.
        IOW, we will run the loop a second time.

>                 } else {
>                         splash.progress(100);
> #ifdef CONFIG_BOTH
>                         if (s2ram_kms || s2ram) {
>                                 /* If we die (and allow system to continue)
>                                  * between now and reset_signature(), very bad
>                                  * things will happen. */
>                                 error = suspend_to_ram(snapshot_fd);
>                                 if (error)
>                                         goto Shutdown;
>                                 reset_signature(resume_fd);
>                                 free_swap_pages(snapshot_fd);
>                                 free_snapshot(snapshot_fd);
>                                 if (!s2ram_kms)
>                                         s2ram_resume();
>                                 goto Unfreeze;
>                         }
> Shutdown:
> #endif
>                         close(resume_fd);
>                         suspend_shutdown(snapshot_fd);
>                 }
>         } while (--attempts);
> 
> ...
> Unfreeze:
>         unfreeze(snapshot_fd);
> 
> 


So, since the loop is run a second time if not enough space was available,
we end up calling hibernation_snapshot(), IOW freeze_workqueues_begin() a
second time - and this is what makes the BUG_ON() to fire!

To solve this, I feel we can simply omit the BUG_ON() inside
freeze_workqueues_begin() and replace it with a simple check (just like
what we do in thaw_workqueues())...

And moreover, after the change that moved the freezing of kernel threads
to hibernation_snapshot(), it is quite natural to hit a scenario such
as this, because the entire freezing phase is not in one single place.
IOW, the existing BUG_ON() is not qualified to be there anymore!

(Also, exposing a straight-forward method like this to userspace, to
trigger a BUG_ON sounds ridiculous to start with!)

Of course, the other method would be to refactor the kernel code and stuff
like that, but that not only would be messy but it would also involve
breaking existing userspace applications, AFAICS.



So, Jiri, can you please try the following patch and see if it works for
you as expected? I'll be happy to provide a formal patch with a changelog
if this works.


---
 kernel/workqueue.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index bec7b5b..cb26c5d 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3656,7 +3656,9 @@ void freeze_workqueues_begin(void)
 
 	spin_lock(&workqueue_lock);
 
-	BUG_ON(workqueue_freezing);
+	if (workqueue_freezing)
+		goto out_unlock;
+
 	workqueue_freezing = true;
 
 	for_each_gcwq_cpu(cpu) {
@@ -3678,6 +3680,7 @@ void freeze_workqueues_begin(void)
 		spin_unlock_irq(&gcwq->lock);
 	}
 
+out_unlock:
 	spin_unlock(&workqueue_lock);
 }
 
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
  2012-01-25 16:00                 ` Srivatsa S. Bhat
@ 2012-01-25 18:44                   ` Srivatsa S. Bhat
  2012-01-25 23:51                     ` Rafael J. Wysocki
  2012-01-25 22:00                   ` Jiri Slaby
  1 sibling, 1 reply; 24+ messages in thread
From: Srivatsa S. Bhat @ 2012-01-25 18:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jiri Slaby, Linux-pm mailing list, Jiri Slaby, LKML, Baohua.Song,
	Tejun Heo, pavel, Linux PM mailing list

> 

> Commit 2aede851 (PM / Hibernate: Freeze kernel threads after preallocating
> memory) split up the freezing phase and moved the freezing of kernel threads
> to hibernation_snapshot() function.
> (This is a pretty old commit btw, I think it came in sometime around 3.2).
> 
> And there is a BUG_ON() in freeze_workqueues_begin() which complains if
> this function is called when the kernel threads are already frozen.
> 
> Now, coming to the userspace utility:
> 
>> In suspend.c inside the suspend-utils userspace package, I see a loop such
>> as:
>>
>>         error = freeze(snapshot_fd);
>> 	...
>>         attempts = 2;
>>         do {
>>                 if (set_image_size(snapshot_fd, image_size)) {
>>                         error = errno;
>>                         break;
>>                 }
>>                 if (atomic_snapshot(snapshot_fd, &in_suspend)) {
>                         ^^^ 
>         This ends up in the call to hibernation_snapshot().
> 
>>                         error = errno;
>>                         break;
>>                 }
>>                 if (!in_suspend) {
>>                         /* first unblank the console, see console_codes(4) */
>>                         printf("\e[13]");
>>                         printf("%s: returned to userspace\n", my_name);
>>                         free_snapshot(snapshot_fd);
>>                         break;
>>                 }
>>
>>                 error = write_image(snapshot_fd, resume_fd, -1);
>>                 if (error) {
> 
>         Since swap space is limited, we come here.
> 
>>                         free_swap_pages(snapshot_fd);
>>                         free_snapshot(snapshot_fd);

				^^^
	This function makes the ioctl call SNAPSHOT_FREE.

>>                         image_size = 0;
>>                         error = -error;
>>                         if (error != ENOSPC)
>>                                 break;
> 
> 
> 
> 
>         And because of the above 'if' condition, we don't break here.
>         IOW, we will run the loop a second time.
> 
>>                 } else {
>>                         splash.progress(100);
>> #ifdef CONFIG_BOTH
>>                         if (s2ram_kms || s2ram) {
>>                                 /* If we die (and allow system to continue)
>>                                  * between now and reset_signature(), very bad
>>                                  * things will happen. */
>>                                 error = suspend_to_ram(snapshot_fd);
>>                                 if (error)
>>                                         goto Shutdown;
>>                                 reset_signature(resume_fd);
>>                                 free_swap_pages(snapshot_fd);
>>                                 free_snapshot(snapshot_fd);
>>                                 if (!s2ram_kms)
>>                                         s2ram_resume();
>>                                 goto Unfreeze;
>>                         }
>> Shutdown:
>> #endif
>>                         close(resume_fd);
>>                         suspend_shutdown(snapshot_fd);
>>                 }
>>         } while (--attempts);
>>
>> ...
>> Unfreeze:
>>         unfreeze(snapshot_fd);
>>
>>
> 
> 
> So, since the loop is run a second time if not enough space was available,
> we end up calling hibernation_snapshot(), IOW freeze_workqueues_begin() a
> second time - and this is what makes the BUG_ON() to fire!
> 


SNAPSHOT_CREATE_IMAGE has a check for data->ready such as:

        if (data->mode != O_RDONLY || !data->frozen  || data->ready) {
                error = -EPERM;
                break;
        }

data->ready would be set to 1 only under SNAPSHOT_CREATE_IMAGE. However,
SNAPSHOT_FREE (invoked at the place shown above) will reset the value to 0.
This makes it possible for hibernation_snapshot() and hence
freeze_workqueues_begin() to be called a second time, which is unfortunate.

And actually, the patch I posted in my previous mail is not really the right
long-term fix, though it might fix the particular issue that Jiri is facing..

Because, allowing hibernation_snapshot() to get called a second time while
kernel threads are still frozen brings us to the same situation that commit
2aede851 (PM / Hibernate: Freeze kernel threads after preallocating memory)
tried to prevent! IOW, a call to hibernate_preallocate_memory() would be
done inside hibernation_snapshot(), when kernel threads are frozen.. which
is known to break XFS, to give one example as mentioned in the changelog
of the above commit.

So, the right way to fix this IMHO, would be to split up thaw_processes()
just like freezing phase:

/* freezes or thaws user space processes */
freeze_processes() - thaw_processes()

/* freezes or thaws kernel threads */
freeze_kernel_threads() - thaw_kernel_threads()

We have to insert this thaw_kernel_threads() at appropriate places in such a
way as to not require another ioctl if possible... Then things would be
more symmetric (and hence more easy to understand) and we can avoid getting
into strange situations as discussed here.

But before we venture into that, it would be good to know if the patch posted
in the previous mail fixes the particular problem reported in this thread,
atleast just to see if there are other problems lurking that we aren't aware
of yet..

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
  2012-01-25 16:00                 ` Srivatsa S. Bhat
  2012-01-25 18:44                   ` Srivatsa S. Bhat
@ 2012-01-25 22:00                   ` Jiri Slaby
  1 sibling, 0 replies; 24+ messages in thread
From: Jiri Slaby @ 2012-01-25 22:00 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Rafael J. Wysocki, Linux-pm mailing list, Jiri Slaby, LKML,
	Baohua.Song, Tejun Heo, pavel

On 01/25/2012 05:00 PM, Srivatsa S. Bhat wrote:
> So, Jiri, can you please try the following patch and see if it works for
> you as expected? I'll be happy to provide a formal patch with a changelog
> if this works.

FWIW it works, thanks. If you want me test a proper fix, jsut let me know.

> ---
>  kernel/workqueue.c |    5 ++++-
>  1 files changed, 4 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index bec7b5b..cb26c5d 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -3656,7 +3656,9 @@ void freeze_workqueues_begin(void)
>  
>  	spin_lock(&workqueue_lock);
>  
> -	BUG_ON(workqueue_freezing);
> +	if (workqueue_freezing)
> +		goto out_unlock;
> +
>  	workqueue_freezing = true;
>  
>  	for_each_gcwq_cpu(cpu) {
> @@ -3678,6 +3680,7 @@ void freeze_workqueues_begin(void)
>  		spin_unlock_irq(&gcwq->lock);
>  	}
>  
> +out_unlock:
>  	spin_unlock(&workqueue_lock);
>  }
>  


-- 
js

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG  at kernel/workqueue.c:3659
  2012-01-25 18:44                   ` Srivatsa S. Bhat
@ 2012-01-25 23:51                     ` Rafael J. Wysocki
  2012-01-26 11:05                       ` Jiri Slaby
                                         ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2012-01-25 23:51 UTC (permalink / raw)
  To: Srivatsa S. Bhat, Jiri Slaby, Tejun Heo
  Cc: Jiri Slaby, LKML, Baohua.Song, pavel, Linux PM mailing list

Hi,

On Wednesday, January 25, 2012, Srivatsa S. Bhat wrote:
> > 
> 
> > Commit 2aede851 (PM / Hibernate: Freeze kernel threads after preallocating
> > memory) split up the freezing phase and moved the freezing of kernel threads
> > to hibernation_snapshot() function.
> > (This is a pretty old commit btw, I think it came in sometime around 3.2).
> > 
> > And there is a BUG_ON() in freeze_workqueues_begin() which complains if
> > this function is called when the kernel threads are already frozen.
> > 
> > Now, coming to the userspace utility:
> > 
> >> In suspend.c inside the suspend-utils userspace package, I see a loop such
> >> as:
> >>
> >>         error = freeze(snapshot_fd);
> >> 	...
> >>         attempts = 2;
> >>         do {
> >>                 if (set_image_size(snapshot_fd, image_size)) {
> >>                         error = errno;
> >>                         break;
> >>                 }
> >>                 if (atomic_snapshot(snapshot_fd, &in_suspend)) {
> >                         ^^^ 
> >         This ends up in the call to hibernation_snapshot().
> > 
> >>                         error = errno;
> >>                         break;
> >>                 }
> >>                 if (!in_suspend) {
> >>                         /* first unblank the console, see console_codes(4) */
> >>                         printf("\e[13]");
> >>                         printf("%s: returned to userspace\n", my_name);
> >>                         free_snapshot(snapshot_fd);
> >>                         break;
> >>                 }
> >>
> >>                 error = write_image(snapshot_fd, resume_fd, -1);
> >>                 if (error) {
> > 
> >         Since swap space is limited, we come here.
> > 
> >>                         free_swap_pages(snapshot_fd);
> >>                         free_snapshot(snapshot_fd);
> 
> 				^^^
> 	This function makes the ioctl call SNAPSHOT_FREE.
> 
> >>                         image_size = 0;
> >>                         error = -error;
> >>                         if (error != ENOSPC)
> >>                                 break;
> > 
> > 
> > 
> > 
> >         And because of the above 'if' condition, we don't break here.
> >         IOW, we will run the loop a second time.
> > 
> >>                 } else {
> >>                         splash.progress(100);
> >> #ifdef CONFIG_BOTH
> >>                         if (s2ram_kms || s2ram) {
> >>                                 /* If we die (and allow system to continue)
> >>                                  * between now and reset_signature(), very bad
> >>                                  * things will happen. */
> >>                                 error = suspend_to_ram(snapshot_fd);
> >>                                 if (error)
> >>                                         goto Shutdown;
> >>                                 reset_signature(resume_fd);
> >>                                 free_swap_pages(snapshot_fd);
> >>                                 free_snapshot(snapshot_fd);
> >>                                 if (!s2ram_kms)
> >>                                         s2ram_resume();
> >>                                 goto Unfreeze;
> >>                         }
> >> Shutdown:
> >> #endif
> >>                         close(resume_fd);
> >>                         suspend_shutdown(snapshot_fd);
> >>                 }
> >>         } while (--attempts);
> >>
> >> ...
> >> Unfreeze:
> >>         unfreeze(snapshot_fd);
> >>
> >>
> > 
> > 
> > So, since the loop is run a second time if not enough space was available,
> > we end up calling hibernation_snapshot(), IOW freeze_workqueues_begin() a
> > second time - and this is what makes the BUG_ON() to fire!
> > 
> 
> 
> SNAPSHOT_CREATE_IMAGE has a check for data->ready such as:
> 
>         if (data->mode != O_RDONLY || !data->frozen  || data->ready) {
>                 error = -EPERM;
>                 break;
>         }
> 
> data->ready would be set to 1 only under SNAPSHOT_CREATE_IMAGE. However,
> SNAPSHOT_FREE (invoked at the place shown above) will reset the value to 0.
> This makes it possible for hibernation_snapshot() and hence
> freeze_workqueues_begin() to be called a second time, which is unfortunate.

Yes, I obviously forgot about that code path when I was working on the commit
that introduced the problem. :-(

Thanks a lot for the great analysis, it's really helpful!

> And actually, the patch I posted in my previous mail is not really the right
> long-term fix, though it might fix the particular issue that Jiri is facing..
> 
> Because, allowing hibernation_snapshot() to get called a second time while
> kernel threads are still frozen brings us to the same situation that commit
> 2aede851 (PM / Hibernate: Freeze kernel threads after preallocating memory)
> tried to prevent! IOW, a call to hibernate_preallocate_memory() would be
> done inside hibernation_snapshot(), when kernel threads are frozen.. which
> is known to break XFS, to give one example as mentioned in the changelog
> of the above commit.

That's exactly right.

> So, the right way to fix this IMHO, would be to split up thaw_processes()
> just like freezing phase:
> 
> /* freezes or thaws user space processes */
> freeze_processes() - thaw_processes()
> 
> /* freezes or thaws kernel threads */
> freeze_kernel_threads() - thaw_kernel_threads()
> 
> We have to insert this thaw_kernel_threads() at appropriate places in such a
> way as to not require another ioctl if possible... Then things would be
> more symmetric (and hence more easy to understand) and we can avoid getting
> into strange situations as discussed here.
> 
> But before we venture into that, it would be good to know if the patch posted
> in the previous mail fixes the particular problem reported in this thread,
> atleast just to see if there are other problems lurking that we aren't aware
> of yet..

Jiri has already said that the patch works.

I think we could avoid the issue entirely by introducing thaw_kernel_threads
and making SNAPSHOT_FREE call it.  No other changes should be necessary.

IOW, Jiri, does the patch below help?

[BTW, the freeze_tasks()'s kerneldoc seems to be outdated.  Tejun?]

---
 include/linux/freezer.h |    2 ++
 kernel/power/process.c  |   19 +++++++++++++++++++
 kernel/power/user.c     |    1 +
 3 files changed, 22 insertions(+)

Index: linux/include/linux/freezer.h
===================================================================
--- linux.orig/include/linux/freezer.h
+++ linux/include/linux/freezer.h
@@ -39,6 +39,7 @@ extern bool __refrigerator(bool check_kt
 extern int freeze_processes(void);
 extern int freeze_kernel_threads(void);
 extern void thaw_processes(void);
+extern void thaw_kernel_threads(void);
 
 static inline bool try_to_freeze(void)
 {
@@ -174,6 +175,7 @@ static inline bool __refrigerator(bool c
 static inline int freeze_processes(void) { return -ENOSYS; }
 static inline int freeze_kernel_threads(void) { return -ENOSYS; }
 static inline void thaw_processes(void) {}
+static inline void thaw_kernel_threads(void) {}
 
 static inline bool try_to_freeze(void) { return false; }
 
Index: linux/kernel/power/process.c
===================================================================
--- linux.orig/kernel/power/process.c
+++ linux/kernel/power/process.c
@@ -188,3 +188,22 @@ void thaw_processes(void)
 	printk("done.\n");
 }
 
+void thaw_kernel_threads(void)
+{
+	struct task_struct *g, *p;
+
+	pm_nosig_freezing = false;
+	printk("Restarting kernel threads ... ");
+
+	thaw_workqueues();
+
+	read_lock(&tasklist_lock);
+	do_each_thread(g, p) {
+		if (p->flags & (PF_KTHREAD | PF_WQ_WORKER))
+			__thaw_task(p);
+	} while_each_thread(g, p);
+	read_unlock(&tasklist_lock);
+
+	schedule();
+	printk("done.\n");
+}
Index: linux/kernel/power/user.c
===================================================================
--- linux.orig/kernel/power/user.c
+++ linux/kernel/power/user.c
@@ -274,6 +274,7 @@ static long snapshot_ioctl(struct file *
 		swsusp_free();
 		memset(&data->handle, 0, sizeof(struct snapshot_handle));
 		data->ready = 0;
+		thaw_kernel_threads();
 		break;
 
 	case SNAPSHOT_PREF_IMAGE_SIZE:

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
  2012-01-25 23:51                     ` Rafael J. Wysocki
@ 2012-01-26 11:05                       ` Jiri Slaby
  2012-01-27  1:01                         ` Rafael J. Wysocki
  2012-01-26 19:22                       ` Srivatsa S. Bhat
  2012-01-27 10:04                       ` Srivatsa S. Bhat
  2 siblings, 1 reply; 24+ messages in thread
From: Jiri Slaby @ 2012-01-26 11:05 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Srivatsa S. Bhat, Jiri Slaby, Tejun Heo, LKML, Baohua.Song,
	pavel, Linux PM mailing list

On 01/26/2012 12:51 AM, Rafael J. Wysocki wrote:
> IOW, Jiri, does the patch below help?

Yeah, this fixes the issue as well. Thanks.

> [BTW, the freeze_tasks()'s kerneldoc seems to be outdated.  Tejun?]
> 
> ---
>  include/linux/freezer.h |    2 ++
>  kernel/power/process.c  |   19 +++++++++++++++++++
>  kernel/power/user.c     |    1 +
>  3 files changed, 22 insertions(+)
> 
> Index: linux/include/linux/freezer.h
> ===================================================================
> --- linux.orig/include/linux/freezer.h
> +++ linux/include/linux/freezer.h
> @@ -39,6 +39,7 @@ extern bool __refrigerator(bool check_kt
>  extern int freeze_processes(void);
>  extern int freeze_kernel_threads(void);
>  extern void thaw_processes(void);
> +extern void thaw_kernel_threads(void);
>  
>  static inline bool try_to_freeze(void)
>  {
> @@ -174,6 +175,7 @@ static inline bool __refrigerator(bool c
>  static inline int freeze_processes(void) { return -ENOSYS; }
>  static inline int freeze_kernel_threads(void) { return -ENOSYS; }
>  static inline void thaw_processes(void) {}
> +static inline void thaw_kernel_threads(void) {}
>  
>  static inline bool try_to_freeze(void) { return false; }
>  
> Index: linux/kernel/power/process.c
> ===================================================================
> --- linux.orig/kernel/power/process.c
> +++ linux/kernel/power/process.c
> @@ -188,3 +188,22 @@ void thaw_processes(void)
>  	printk("done.\n");
>  }
>  
> +void thaw_kernel_threads(void)
> +{
> +	struct task_struct *g, *p;
> +
> +	pm_nosig_freezing = false;
> +	printk("Restarting kernel threads ... ");
> +
> +	thaw_workqueues();
> +
> +	read_lock(&tasklist_lock);
> +	do_each_thread(g, p) {
> +		if (p->flags & (PF_KTHREAD | PF_WQ_WORKER))
> +			__thaw_task(p);
> +	} while_each_thread(g, p);
> +	read_unlock(&tasklist_lock);
> +
> +	schedule();
> +	printk("done.\n");
> +}
> Index: linux/kernel/power/user.c
> ===================================================================
> --- linux.orig/kernel/power/user.c
> +++ linux/kernel/power/user.c
> @@ -274,6 +274,7 @@ static long snapshot_ioctl(struct file *
>  		swsusp_free();
>  		memset(&data->handle, 0, sizeof(struct snapshot_handle));
>  		data->ready = 0;
> +		thaw_kernel_threads();
>  		break;
>  
>  	case SNAPSHOT_PREF_IMAGE_SIZE:


-- 
js
suse labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
  2012-01-25 23:51                     ` Rafael J. Wysocki
  2012-01-26 11:05                       ` Jiri Slaby
@ 2012-01-26 19:22                       ` Srivatsa S. Bhat
  2012-01-27  1:01                         ` Rafael J. Wysocki
  2012-01-27 10:04                       ` Srivatsa S. Bhat
  2 siblings, 1 reply; 24+ messages in thread
From: Srivatsa S. Bhat @ 2012-01-26 19:22 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jiri Slaby, Tejun Heo, Jiri Slaby, LKML, Baohua.Song, pavel,
	Linux PM mailing list

On 01/26/2012 05:21 AM, Rafael J. Wysocki wrote:

> Hi,
>>
>> SNAPSHOT_CREATE_IMAGE has a check for data->ready such as:
>>
>>         if (data->mode != O_RDONLY || !data->frozen  || data->ready) {
>>                 error = -EPERM;
>>                 break;
>>         }
>>
>> data->ready would be set to 1 only under SNAPSHOT_CREATE_IMAGE. However,
>> SNAPSHOT_FREE (invoked at the place shown above) will reset the value to 0.
>> This makes it possible for hibernation_snapshot() and hence
>> freeze_workqueues_begin() to be called a second time, which is unfortunate.
> 
> Yes, I obviously forgot about that code path when I was working on the commit
> that introduced the problem. :-(
> 
> Thanks a lot for the great analysis, it's really helpful!
>


Welcome :-) It was fun!

 
>> And actually, the patch I posted in my previous mail is not really the right
>> long-term fix, though it might fix the particular issue that Jiri is facing..
>>
>> Because, allowing hibernation_snapshot() to get called a second time while
>> kernel threads are still frozen brings us to the same situation that commit
>> 2aede851 (PM / Hibernate: Freeze kernel threads after preallocating memory)
>> tried to prevent! IOW, a call to hibernate_preallocate_memory() would be
>> done inside hibernation_snapshot(), when kernel threads are frozen.. which
>> is known to break XFS, to give one example as mentioned in the changelog
>> of the above commit.
> 
> That's exactly right.
> 
>> So, the right way to fix this IMHO, would be to split up thaw_processes()
>> just like freezing phase:
>>
>> /* freezes or thaws user space processes */
>> freeze_processes() - thaw_processes()
>>
>> /* freezes or thaws kernel threads */
>> freeze_kernel_threads() - thaw_kernel_threads()
>>
>> We have to insert this thaw_kernel_threads() at appropriate places in such a
>> way as to not require another ioctl if possible... Then things would be
>> more symmetric (and hence more easy to understand) and we can avoid getting
>> into strange situations as discussed here.
>>
>> But before we venture into that, it would be good to know if the patch posted
>> in the previous mail fixes the particular problem reported in this thread,
>> atleast just to see if there are other problems lurking that we aren't aware
>> of yet..
> 
> Jiri has already said that the patch works.
> 
> I think we could avoid the issue entirely by introducing thaw_kernel_threads
> and making SNAPSHOT_FREE call it.  No other changes should be necessary.
> 
> IOW, Jiri, does the patch below help?
> 
> [BTW, the freeze_tasks()'s kerneldoc seems to be outdated.  Tejun?]
> 
> ---


This is exactly the kind of fix I was suggesting.. Thanks Rafael!

I have a small request for a comment. Please see below.
I have a question too, but for that I'll have to reply to my earlier
thread so that I can comment on the userspace code.

>  include/linux/freezer.h |    2 ++
>  kernel/power/process.c  |   19 +++++++++++++++++++
>  kernel/power/user.c     |    1 +
>  3 files changed, 22 insertions(+)
> 
> Index: linux/include/linux/freezer.h
> ===================================================================
> --- linux.orig/include/linux/freezer.h
> +++ linux/include/linux/freezer.h
> @@ -39,6 +39,7 @@ extern bool __refrigerator(bool check_kt
>  extern int freeze_processes(void);
>  extern int freeze_kernel_threads(void);
>  extern void thaw_processes(void);
> +extern void thaw_kernel_threads(void);
> 
>  static inline bool try_to_freeze(void)
>  {
> @@ -174,6 +175,7 @@ static inline bool __refrigerator(bool c
>  static inline int freeze_processes(void) { return -ENOSYS; }
>  static inline int freeze_kernel_threads(void) { return -ENOSYS; }
>  static inline void thaw_processes(void) {}
> +static inline void thaw_kernel_threads(void) {}
> 
>  static inline bool try_to_freeze(void) { return false; }
> 
> Index: linux/kernel/power/process.c
> ===================================================================
> --- linux.orig/kernel/power/process.c
> +++ linux/kernel/power/process.c
> @@ -188,3 +188,22 @@ void thaw_processes(void)
>  	printk("done.\n");
>  }
> 
> +void thaw_kernel_threads(void)
> +{
> +	struct task_struct *g, *p;
> +
> +	pm_nosig_freezing = false;
> +	printk("Restarting kernel threads ... ");
> +
> +	thaw_workqueues();
> +
> +	read_lock(&tasklist_lock);
> +	do_each_thread(g, p) {
> +		if (p->flags & (PF_KTHREAD | PF_WQ_WORKER))
> +			__thaw_task(p);
> +	} while_each_thread(g, p);
> +	read_unlock(&tasklist_lock);
> +
> +	schedule();
> +	printk("done.\n");
> +}
> Index: linux/kernel/power/user.c
> ===================================================================
> --- linux.orig/kernel/power/user.c
> +++ linux/kernel/power/user.c
> @@ -274,6 +274,7 @@ static long snapshot_ioctl(struct file *
>  		swsusp_free();
>  		memset(&data->handle, 0, sizeof(struct snapshot_handle));
>  		data->ready = 0;


It would be nice to have a comment here explaining why we call
thaw_kernel_threads() here. (Such a comment would avoid confusion when people
look at SNAPSHOT_CREATE_IMAGE and SNAPSHOT_FREE and wonder why there is
thawing involved, while the corresponding freezing is nowhere in sight..
Of course the freezing is hidden inside hibernation_snapshot(), but that
might not be immediately apparent to everyone.)

> +		thaw_kernel_threads();

>  		break;

> 
>  	case SNAPSHOT_PREF_IMAGE_SIZE:


Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
  2012-01-25 15:31               ` Srivatsa S. Bhat
  2012-01-25 16:00                 ` Srivatsa S. Bhat
@ 2012-01-26 19:39                 ` Srivatsa S. Bhat
  2012-01-27  1:10                   ` Rafael J. Wysocki
  1 sibling, 1 reply; 24+ messages in thread
From: Srivatsa S. Bhat @ 2012-01-26 19:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jiri Slaby, Linux-pm mailing list, Jiri Slaby, LKML, Baohua.Song,
	Tejun Heo, pavel


Hi Rafael,

On 01/25/2012 09:01 PM, Srivatsa S. Bhat wrote:

> 
> Ok, I will need to quote a part of the userspace utility to explain the
> problem.
> 
> In suspend.c inside the suspend-utils userspace package, I see a loop such
> as:
> 
>         error = freeze(snapshot_fd);
> 	...
>         attempts = 2;
>         do {
>                 if (set_image_size(snapshot_fd, image_size)) {
>                         error = errno;
>                         break;
>                 }
>                 if (atomic_snapshot(snapshot_fd, &in_suspend)) {
>                         error = errno;
>                         break;
>                 }
>                 if (!in_suspend) {
>                         /* first unblank the console, see console_codes(4) */
>                         printf("\e[13]");
>                         printf("%s: returned to userspace\n", my_name);
>                         free_snapshot(snapshot_fd);
>                         break;
>                 }
> 
>                 error = write_image(snapshot_fd, resume_fd, -1);
>                 if (error) {
>                         free_swap_pages(snapshot_fd);
>                         free_snapshot(snapshot_fd);
>                         image_size = 0;
>                         error = -error;
>                         if (error != ENOSPC)
>                                 break;
>                 } else {
>                         splash.progress(100);
> #ifdef CONFIG_BOTH
>                         if (s2ram_kms || s2ram) {
>                                 /* If we die (and allow system to continue)
>                                  * between now and reset_signature(), very bad
>                                  * things will happen. */
>                                 error = suspend_to_ram(snapshot_fd);
>                                 if (error)
>                                         goto Shutdown;
>                                 reset_signature(resume_fd);
>                                 free_swap_pages(snapshot_fd);
>                                 free_snapshot(snapshot_fd);
>                                 if (!s2ram_kms)
>                                         s2ram_resume();


Your patch alters how SNAPSHOT_FREE (IOW, free_snapshot() in this utility) is
handled. So, I was trying to see if there are any points of concern...

In the above code, s2ram_resume() gets invoked after free_snapshot(). Will that
pose any problems because kernel threads would have been thawed at that point,
after applying your patch?

And other than that, do you foresee any problems arising from the change caused
to SNAPSHOT_FREE by your patch? I mean, s2ram/s2disk/suspend-utils package are
not the only userspace utilities after all... so I just wanted to ensure that
we don't over-fit our solution to this particular utility and end up breaking
others...

Regards,
Srivatsa S. Bhat

>                                 goto Unfreeze;
>                         }
> Shutdown:
> #endif
>                         close(resume_fd);
>                         suspend_shutdown(snapshot_fd);
>                 }
>         } while (--attempts);
> 
> ...
> Unfreeze:
>         unfreeze(snapshot_fd);
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
  2012-01-26 19:22                       ` Srivatsa S. Bhat
@ 2012-01-27  1:01                         ` Rafael J. Wysocki
  0 siblings, 0 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2012-01-27  1:01 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Jiri Slaby, Tejun Heo, Jiri Slaby, LKML, Baohua.Song, pavel,
	Linux PM mailing list

On Thursday, January 26, 2012, Srivatsa S. Bhat wrote:
> On 01/26/2012 05:21 AM, Rafael J. Wysocki wrote:
> 
> > Hi,
> >>
> >> SNAPSHOT_CREATE_IMAGE has a check for data->ready such as:
> >>
> >>         if (data->mode != O_RDONLY || !data->frozen  || data->ready) {
> >>                 error = -EPERM;
> >>                 break;
> >>         }
> >>
> >> data->ready would be set to 1 only under SNAPSHOT_CREATE_IMAGE. However,
> >> SNAPSHOT_FREE (invoked at the place shown above) will reset the value to 0.
> >> This makes it possible for hibernation_snapshot() and hence
> >> freeze_workqueues_begin() to be called a second time, which is unfortunate.
> > 
> > Yes, I obviously forgot about that code path when I was working on the commit
> > that introduced the problem. :-(
> > 
> > Thanks a lot for the great analysis, it's really helpful!
> >
> 
> 
> Welcome :-) It was fun!
> 
>  
> >> And actually, the patch I posted in my previous mail is not really the right
> >> long-term fix, though it might fix the particular issue that Jiri is facing..
> >>
> >> Because, allowing hibernation_snapshot() to get called a second time while
> >> kernel threads are still frozen brings us to the same situation that commit
> >> 2aede851 (PM / Hibernate: Freeze kernel threads after preallocating memory)
> >> tried to prevent! IOW, a call to hibernate_preallocate_memory() would be
> >> done inside hibernation_snapshot(), when kernel threads are frozen.. which
> >> is known to break XFS, to give one example as mentioned in the changelog
> >> of the above commit.
> > 
> > That's exactly right.
> > 
> >> So, the right way to fix this IMHO, would be to split up thaw_processes()
> >> just like freezing phase:
> >>
> >> /* freezes or thaws user space processes */
> >> freeze_processes() - thaw_processes()
> >>
> >> /* freezes or thaws kernel threads */
> >> freeze_kernel_threads() - thaw_kernel_threads()
> >>
> >> We have to insert this thaw_kernel_threads() at appropriate places in such a
> >> way as to not require another ioctl if possible... Then things would be
> >> more symmetric (and hence more easy to understand) and we can avoid getting
> >> into strange situations as discussed here.
> >>
> >> But before we venture into that, it would be good to know if the patch posted
> >> in the previous mail fixes the particular problem reported in this thread,
> >> atleast just to see if there are other problems lurking that we aren't aware
> >> of yet..
> > 
> > Jiri has already said that the patch works.
> > 
> > I think we could avoid the issue entirely by introducing thaw_kernel_threads
> > and making SNAPSHOT_FREE call it.  No other changes should be necessary.
> > 
> > IOW, Jiri, does the patch below help?
> > 
> > [BTW, the freeze_tasks()'s kerneldoc seems to be outdated.  Tejun?]
> > 
> > ---
> 
> 
> This is exactly the kind of fix I was suggesting.. Thanks Rafael!
> 
> I have a small request for a comment. Please see below.
> I have a question too, but for that I'll have to reply to my earlier
> thread so that I can comment on the userspace code.
> 
> >  include/linux/freezer.h |    2 ++
> >  kernel/power/process.c  |   19 +++++++++++++++++++
> >  kernel/power/user.c     |    1 +
> >  3 files changed, 22 insertions(+)
> > 
> > Index: linux/include/linux/freezer.h
> > ===================================================================
> > --- linux.orig/include/linux/freezer.h
> > +++ linux/include/linux/freezer.h
> > @@ -39,6 +39,7 @@ extern bool __refrigerator(bool check_kt
> >  extern int freeze_processes(void);
> >  extern int freeze_kernel_threads(void);
> >  extern void thaw_processes(void);
> > +extern void thaw_kernel_threads(void);
> > 
> >  static inline bool try_to_freeze(void)
> >  {
> > @@ -174,6 +175,7 @@ static inline bool __refrigerator(bool c
> >  static inline int freeze_processes(void) { return -ENOSYS; }
> >  static inline int freeze_kernel_threads(void) { return -ENOSYS; }
> >  static inline void thaw_processes(void) {}
> > +static inline void thaw_kernel_threads(void) {}
> > 
> >  static inline bool try_to_freeze(void) { return false; }
> > 
> > Index: linux/kernel/power/process.c
> > ===================================================================
> > --- linux.orig/kernel/power/process.c
> > +++ linux/kernel/power/process.c
> > @@ -188,3 +188,22 @@ void thaw_processes(void)
> >  	printk("done.\n");
> >  }
> > 
> > +void thaw_kernel_threads(void)
> > +{
> > +	struct task_struct *g, *p;
> > +
> > +	pm_nosig_freezing = false;
> > +	printk("Restarting kernel threads ... ");
> > +
> > +	thaw_workqueues();
> > +
> > +	read_lock(&tasklist_lock);
> > +	do_each_thread(g, p) {
> > +		if (p->flags & (PF_KTHREAD | PF_WQ_WORKER))
> > +			__thaw_task(p);
> > +	} while_each_thread(g, p);
> > +	read_unlock(&tasklist_lock);
> > +
> > +	schedule();
> > +	printk("done.\n");
> > +}
> > Index: linux/kernel/power/user.c
> > ===================================================================
> > --- linux.orig/kernel/power/user.c
> > +++ linux/kernel/power/user.c
> > @@ -274,6 +274,7 @@ static long snapshot_ioctl(struct file *
> >  		swsusp_free();
> >  		memset(&data->handle, 0, sizeof(struct snapshot_handle));
> >  		data->ready = 0;
> 
> 
> It would be nice to have a comment here explaining why we call
> thaw_kernel_threads() here.

I agree and I'll add one in the final version.

> (Such a comment would avoid confusion when people
> look at SNAPSHOT_CREATE_IMAGE and SNAPSHOT_FREE and wonder why there is
> thawing involved, while the corresponding freezing is nowhere in sight..
> Of course the freezing is hidden inside hibernation_snapshot(), but that
> might not be immediately apparent to everyone.)

Sure.

> > +		thaw_kernel_threads();
> 
> >  		break;
> 
> > 
> >  	case SNAPSHOT_PREF_IMAGE_SIZE:

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
  2012-01-26 11:05                       ` Jiri Slaby
@ 2012-01-27  1:01                         ` Rafael J. Wysocki
  0 siblings, 0 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2012-01-27  1:01 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Srivatsa S. Bhat, Jiri Slaby, Tejun Heo, LKML, Baohua.Song,
	pavel, Linux PM mailing list

On Thursday, January 26, 2012, Jiri Slaby wrote:
> On 01/26/2012 12:51 AM, Rafael J. Wysocki wrote:
> > IOW, Jiri, does the patch below help?
> 
> Yeah, this fixes the issue as well. Thanks.

Cool, thanks for the confirmation!

> > [BTW, the freeze_tasks()'s kerneldoc seems to be outdated.  Tejun?]
> > 
> > ---
> >  include/linux/freezer.h |    2 ++
> >  kernel/power/process.c  |   19 +++++++++++++++++++
> >  kernel/power/user.c     |    1 +
> >  3 files changed, 22 insertions(+)
> > 
> > Index: linux/include/linux/freezer.h
> > ===================================================================
> > --- linux.orig/include/linux/freezer.h
> > +++ linux/include/linux/freezer.h
> > @@ -39,6 +39,7 @@ extern bool __refrigerator(bool check_kt
> >  extern int freeze_processes(void);
> >  extern int freeze_kernel_threads(void);
> >  extern void thaw_processes(void);
> > +extern void thaw_kernel_threads(void);
> >  
> >  static inline bool try_to_freeze(void)
> >  {
> > @@ -174,6 +175,7 @@ static inline bool __refrigerator(bool c
> >  static inline int freeze_processes(void) { return -ENOSYS; }
> >  static inline int freeze_kernel_threads(void) { return -ENOSYS; }
> >  static inline void thaw_processes(void) {}
> > +static inline void thaw_kernel_threads(void) {}
> >  
> >  static inline bool try_to_freeze(void) { return false; }
> >  
> > Index: linux/kernel/power/process.c
> > ===================================================================
> > --- linux.orig/kernel/power/process.c
> > +++ linux/kernel/power/process.c
> > @@ -188,3 +188,22 @@ void thaw_processes(void)
> >  	printk("done.\n");
> >  }
> >  
> > +void thaw_kernel_threads(void)
> > +{
> > +	struct task_struct *g, *p;
> > +
> > +	pm_nosig_freezing = false;
> > +	printk("Restarting kernel threads ... ");
> > +
> > +	thaw_workqueues();
> > +
> > +	read_lock(&tasklist_lock);
> > +	do_each_thread(g, p) {
> > +		if (p->flags & (PF_KTHREAD | PF_WQ_WORKER))
> > +			__thaw_task(p);
> > +	} while_each_thread(g, p);
> > +	read_unlock(&tasklist_lock);
> > +
> > +	schedule();
> > +	printk("done.\n");
> > +}
> > Index: linux/kernel/power/user.c
> > ===================================================================
> > --- linux.orig/kernel/power/user.c
> > +++ linux/kernel/power/user.c
> > @@ -274,6 +274,7 @@ static long snapshot_ioctl(struct file *
> >  		swsusp_free();
> >  		memset(&data->handle, 0, sizeof(struct snapshot_handle));
> >  		data->ready = 0;
> > +		thaw_kernel_threads();
> >  		break;
> >  
> >  	case SNAPSHOT_PREF_IMAGE_SIZE:
> 
> 
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG  at kernel/workqueue.c:3659
  2012-01-26 19:39                 ` Srivatsa S. Bhat
@ 2012-01-27  1:10                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2012-01-27  1:10 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Jiri Slaby, Linux-pm mailing list, Jiri Slaby, LKML, Baohua.Song,
	Tejun Heo, pavel

On Thursday, January 26, 2012, Srivatsa S. Bhat wrote:
> 
> Hi Rafael,
> 
> On 01/25/2012 09:01 PM, Srivatsa S. Bhat wrote:
> 
> > 
> > Ok, I will need to quote a part of the userspace utility to explain the
> > problem.
> > 
> > In suspend.c inside the suspend-utils userspace package, I see a loop such
> > as:
> > 
> >         error = freeze(snapshot_fd);
> > 	...
> >         attempts = 2;
> >         do {
> >                 if (set_image_size(snapshot_fd, image_size)) {
> >                         error = errno;
> >                         break;
> >                 }
> >                 if (atomic_snapshot(snapshot_fd, &in_suspend)) {
> >                         error = errno;
> >                         break;
> >                 }
> >                 if (!in_suspend) {
> >                         /* first unblank the console, see console_codes(4) */
> >                         printf("\e[13]");
> >                         printf("%s: returned to userspace\n", my_name);
> >                         free_snapshot(snapshot_fd);
> >                         break;
> >                 }
> > 
> >                 error = write_image(snapshot_fd, resume_fd, -1);
> >                 if (error) {
> >                         free_swap_pages(snapshot_fd);
> >                         free_snapshot(snapshot_fd);
> >                         image_size = 0;
> >                         error = -error;
> >                         if (error != ENOSPC)
> >                                 break;
> >                 } else {
> >                         splash.progress(100);
> > #ifdef CONFIG_BOTH
> >                         if (s2ram_kms || s2ram) {
> >                                 /* If we die (and allow system to continue)
> >                                  * between now and reset_signature(), very bad
> >                                  * things will happen. */
> >                                 error = suspend_to_ram(snapshot_fd);
> >                                 if (error)
> >                                         goto Shutdown;
> >                                 reset_signature(resume_fd);
> >                                 free_swap_pages(snapshot_fd);
> >                                 free_snapshot(snapshot_fd);
> >                                 if (!s2ram_kms)
> >                                         s2ram_resume();
> 
> 
> Your patch alters how SNAPSHOT_FREE (IOW, free_snapshot() in this utility) is
> handled. So, I was trying to see if there are any points of concern...
> 
> In the above code, s2ram_resume() gets invoked after free_snapshot(). Will that
> pose any problems because kernel threads would have been thawed at that point,
> after applying your patch?

No, it shouldn't.  s2ram_resume() only executes quirks needed to restore the
state of graphics if KMS is not being used.  That shouldn't interfere with
any kernel threads.

> And other than that, do you foresee any problems arising from the change caused
> to SNAPSHOT_FREE by your patch? I mean, s2ram/s2disk/suspend-utils package are
> not the only userspace utilities after all... so I just wanted to ensure that
> we don't over-fit our solution to this particular utility and end up breaking
> others...

I'm quite sure they are the only package using the interface in
kernel/power/user.c.  At least, I'm not aware of any other users. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
  2012-01-25 23:51                     ` Rafael J. Wysocki
  2012-01-26 11:05                       ` Jiri Slaby
  2012-01-26 19:22                       ` Srivatsa S. Bhat
@ 2012-01-27 10:04                       ` Srivatsa S. Bhat
  2012-01-27 22:44                         ` Rafael J. Wysocki
  2 siblings, 1 reply; 24+ messages in thread
From: Srivatsa S. Bhat @ 2012-01-27 10:04 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jiri Slaby, Tejun Heo, Jiri Slaby, LKML, Baohua.Song, pavel,
	Linux PM mailing list

On 01/26/2012 05:21 AM, Rafael J. Wysocki wrote:

> Jiri has already said that the patch works.
> 
> I think we could avoid the issue entirely by introducing thaw_kernel_threads
> and making SNAPSHOT_FREE call it.  No other changes should be necessary.
> 
> IOW, Jiri, does the patch below help?
> 
> [BTW, the freeze_tasks()'s kerneldoc seems to be outdated.  Tejun?]
> 
> ---


Rafael, thanks for the clarification in the other mail.

Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

I think we need to get this into 3.2-stable as well, as Jiri mentioned.

Regards,
Srivatsa S. Bhat

>  include/linux/freezer.h |    2 ++
>  kernel/power/process.c  |   19 +++++++++++++++++++
>  kernel/power/user.c     |    1 +
>  3 files changed, 22 insertions(+)
> 
> Index: linux/include/linux/freezer.h
> ===================================================================
> --- linux.orig/include/linux/freezer.h
> +++ linux/include/linux/freezer.h
> @@ -39,6 +39,7 @@ extern bool __refrigerator(bool check_kt
>  extern int freeze_processes(void);
>  extern int freeze_kernel_threads(void);
>  extern void thaw_processes(void);
> +extern void thaw_kernel_threads(void);
> 
>  static inline bool try_to_freeze(void)
>  {
> @@ -174,6 +175,7 @@ static inline bool __refrigerator(bool c
>  static inline int freeze_processes(void) { return -ENOSYS; }
>  static inline int freeze_kernel_threads(void) { return -ENOSYS; }
>  static inline void thaw_processes(void) {}
> +static inline void thaw_kernel_threads(void) {}
> 
>  static inline bool try_to_freeze(void) { return false; }
> 
> Index: linux/kernel/power/process.c
> ===================================================================
> --- linux.orig/kernel/power/process.c
> +++ linux/kernel/power/process.c
> @@ -188,3 +188,22 @@ void thaw_processes(void)
>  	printk("done.\n");
>  }
> 
> +void thaw_kernel_threads(void)
> +{
> +	struct task_struct *g, *p;
> +
> +	pm_nosig_freezing = false;
> +	printk("Restarting kernel threads ... ");
> +
> +	thaw_workqueues();
> +
> +	read_lock(&tasklist_lock);
> +	do_each_thread(g, p) {
> +		if (p->flags & (PF_KTHREAD | PF_WQ_WORKER))
> +			__thaw_task(p);
> +	} while_each_thread(g, p);
> +	read_unlock(&tasklist_lock);
> +
> +	schedule();
> +	printk("done.\n");
> +}
> Index: linux/kernel/power/user.c
> ===================================================================
> --- linux.orig/kernel/power/user.c
> +++ linux/kernel/power/user.c
> @@ -274,6 +274,7 @@ static long snapshot_ioctl(struct file *
>  		swsusp_free();
>  		memset(&data->handle, 0, sizeof(struct snapshot_handle));
>  		data->ready = 0;
> +		thaw_kernel_threads();
>  		break;
> 
>  	case SNAPSHOT_PREF_IMAGE_SIZE:
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
  2012-01-27 10:04                       ` Srivatsa S. Bhat
@ 2012-01-27 22:44                         ` Rafael J. Wysocki
  2012-01-28 15:41                           ` Srivatsa S. Bhat
  0 siblings, 1 reply; 24+ messages in thread
From: Rafael J. Wysocki @ 2012-01-27 22:44 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Jiri Slaby, Tejun Heo, Jiri Slaby, LKML, Baohua.Song, pavel,
	Linux PM mailing list

On Friday, January 27, 2012, Srivatsa S. Bhat wrote:
> On 01/26/2012 05:21 AM, Rafael J. Wysocki wrote:
> 
> > Jiri has already said that the patch works.
> > 
> > I think we could avoid the issue entirely by introducing thaw_kernel_threads
> > and making SNAPSHOT_FREE call it.  No other changes should be necessary.
> > 
> > IOW, Jiri, does the patch below help?
> > 
> > [BTW, the freeze_tasks()'s kerneldoc seems to be outdated.  Tejun?]
> > 
> > ---
> 
> 
> Rafael, thanks for the clarification in the other mail.
> 
> Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> 
> I think we need to get this into 3.2-stable as well, as Jiri mentioned.

Yes, we do.

Below it goes again with the missing comment in user.c and a changelog.

Please let me know if there's anything wrong with it, otherwise I'm going
to push it to Linus in a couple of days.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM / Hibernate: Fix s2disk regression related to freezing workqueues

Commit 2aede851ddf08666f68ffc17be446420e9d2a056

  PM / Hibernate: Freeze kernel threads after preallocating memory

introduced a mechanism by which kernel threads were frozen after
the preallocation of hibernate image memory to avoid problems with
frozen kernel threads not responding to memory freeing requests.
However, it overlooked the s2disk code path in which the
SNAPSHOT_CREATE_IMAGE ioctl was run directly after SNAPSHOT_FREE,
which caused freeze_workqueues_begin() to BUG(), because it saw
that worqueues had been already frozen.

Although in principle this issue might be addressed by removing
the relevant BUG_ON() from freeze_workqueues_begin(), that would
reintroduce the very problem that commit 2aede851ddf08666f68ffc17be4
attempted to avoid into that particular code path.  For this reason,
to fix the issue at hand, introduce thaw_kernel_threads() and make
the SNAPSHOT_FREE ioctl execute it.

Special thanks to Srivatsa S. Bhat for detailed analysis of the
problem.

Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---
 include/linux/freezer.h |    2 ++
 kernel/power/process.c  |   19 +++++++++++++++++++
 kernel/power/user.c     |    9 +++++++++
 3 files changed, 30 insertions(+)

Index: linux/include/linux/freezer.h
===================================================================
--- linux.orig/include/linux/freezer.h
+++ linux/include/linux/freezer.h
@@ -39,6 +39,7 @@ extern bool __refrigerator(bool check_kt
 extern int freeze_processes(void);
 extern int freeze_kernel_threads(void);
 extern void thaw_processes(void);
+extern void thaw_kernel_threads(void);
 
 static inline bool try_to_freeze(void)
 {
@@ -174,6 +175,7 @@ static inline bool __refrigerator(bool c
 static inline int freeze_processes(void) { return -ENOSYS; }
 static inline int freeze_kernel_threads(void) { return -ENOSYS; }
 static inline void thaw_processes(void) {}
+static inline void thaw_kernel_threads(void) {}
 
 static inline bool try_to_freeze(void) { return false; }
 
Index: linux/kernel/power/process.c
===================================================================
--- linux.orig/kernel/power/process.c
+++ linux/kernel/power/process.c
@@ -188,3 +188,22 @@ void thaw_processes(void)
 	printk("done.\n");
 }
 
+void thaw_kernel_threads(void)
+{
+	struct task_struct *g, *p;
+
+	pm_nosig_freezing = false;
+	printk("Restarting kernel threads ... ");
+
+	thaw_workqueues();
+
+	read_lock(&tasklist_lock);
+	do_each_thread(g, p) {
+		if (p->flags & (PF_KTHREAD | PF_WQ_WORKER))
+			__thaw_task(p);
+	} while_each_thread(g, p);
+	read_unlock(&tasklist_lock);
+
+	schedule();
+	printk("done.\n");
+}
Index: linux/kernel/power/user.c
===================================================================
--- linux.orig/kernel/power/user.c
+++ linux/kernel/power/user.c
@@ -274,6 +274,15 @@ static long snapshot_ioctl(struct file *
 		swsusp_free();
 		memset(&data->handle, 0, sizeof(struct snapshot_handle));
 		data->ready = 0;
+		/*
+		 * It is necessary to thaw kernel threads here, because
+		 * SNAPSHOT_CREATE_IMAGE may be invoked directly after
+		 * SNAPSHOT_FREE.  In that case, if kernel threads were not
+		 * thawed, the preallocation of memory carried out by
+		 * hibernation_snapshot() might run into problems (i.e. it
+		 * might fail or even deadlock).
+		 */
+		thaw_kernel_threads();
 		break;
 
 	case SNAPSHOT_PREF_IMAGE_SIZE:

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
  2012-01-27 22:44                         ` Rafael J. Wysocki
@ 2012-01-28 15:41                           ` Srivatsa S. Bhat
  2012-01-29  0:14                             ` Rafael J. Wysocki
  0 siblings, 1 reply; 24+ messages in thread
From: Srivatsa S. Bhat @ 2012-01-28 15:41 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jiri Slaby, Tejun Heo, Jiri Slaby, LKML, Baohua.Song, pavel,
	Linux PM mailing list

On 01/28/2012 04:14 AM, Rafael J. Wysocki wrote:

>> Rafael, thanks for the clarification in the other mail.
>>
>> Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>>
>> I think we need to get this into 3.2-stable as well, as Jiri mentioned.
> 
> Yes, we do.
> 
> Below it goes again with the missing comment in user.c and a changelog.
> 
> Please let me know if there's anything wrong with it, otherwise I'm going
> to push it to Linus in a couple of days.
> 
> Thanks,
> Rafael
> 
> ---
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM / Hibernate: Fix s2disk regression related to freezing workqueues
> 
> Commit 2aede851ddf08666f68ffc17be446420e9d2a056
> 
>   PM / Hibernate: Freeze kernel threads after preallocating memory
> 
> introduced a mechanism by which kernel threads were frozen after
> the preallocation of hibernate image memory to avoid problems with
> frozen kernel threads not responding to memory freeing requests.
> However, it overlooked the s2disk code path in which the
> SNAPSHOT_CREATE_IMAGE ioctl was run directly after SNAPSHOT_FREE,
> which caused freeze_workqueues_begin() to BUG(), because it saw
> that worqueues had been already frozen.
> 
> Although in principle this issue might be addressed by removing
> the relevant BUG_ON() from freeze_workqueues_begin(), that would
> reintroduce the very problem that commit 2aede851ddf08666f68ffc17be4
> attempted to avoid into that particular code path.  For this reason,
> to fix the issue at hand, introduce thaw_kernel_threads() and make
> the SNAPSHOT_FREE ioctl execute it.
> 
> Special thanks to Srivatsa S. Bhat for detailed analysis of the
> problem.
> 
> Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz>
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>


Cc: stable@vger.kernel.org ?

Other than that, the patch is perfect!

Regards,
Srivatsa S. Bhat

> ---
>  include/linux/freezer.h |    2 ++
>  kernel/power/process.c  |   19 +++++++++++++++++++
>  kernel/power/user.c     |    9 +++++++++
>  3 files changed, 30 insertions(+)
> 
> Index: linux/include/linux/freezer.h
> ===================================================================
> --- linux.orig/include/linux/freezer.h
> +++ linux/include/linux/freezer.h
> @@ -39,6 +39,7 @@ extern bool __refrigerator(bool check_kt
>  extern int freeze_processes(void);
>  extern int freeze_kernel_threads(void);
>  extern void thaw_processes(void);
> +extern void thaw_kernel_threads(void);
> 
>  static inline bool try_to_freeze(void)
>  {
> @@ -174,6 +175,7 @@ static inline bool __refrigerator(bool c
>  static inline int freeze_processes(void) { return -ENOSYS; }
>  static inline int freeze_kernel_threads(void) { return -ENOSYS; }
>  static inline void thaw_processes(void) {}
> +static inline void thaw_kernel_threads(void) {}
> 
>  static inline bool try_to_freeze(void) { return false; }
> 
> Index: linux/kernel/power/process.c
> ===================================================================
> --- linux.orig/kernel/power/process.c
> +++ linux/kernel/power/process.c
> @@ -188,3 +188,22 @@ void thaw_processes(void)
>  	printk("done.\n");
>  }
> 
> +void thaw_kernel_threads(void)
> +{
> +	struct task_struct *g, *p;
> +
> +	pm_nosig_freezing = false;
> +	printk("Restarting kernel threads ... ");
> +
> +	thaw_workqueues();
> +
> +	read_lock(&tasklist_lock);
> +	do_each_thread(g, p) {
> +		if (p->flags & (PF_KTHREAD | PF_WQ_WORKER))
> +			__thaw_task(p);
> +	} while_each_thread(g, p);
> +	read_unlock(&tasklist_lock);
> +
> +	schedule();
> +	printk("done.\n");
> +}
> Index: linux/kernel/power/user.c
> ===================================================================
> --- linux.orig/kernel/power/user.c
> +++ linux/kernel/power/user.c
> @@ -274,6 +274,15 @@ static long snapshot_ioctl(struct file *
>  		swsusp_free();
>  		memset(&data->handle, 0, sizeof(struct snapshot_handle));
>  		data->ready = 0;
> +		/*
> +		 * It is necessary to thaw kernel threads here, because
> +		 * SNAPSHOT_CREATE_IMAGE may be invoked directly after
> +		 * SNAPSHOT_FREE.  In that case, if kernel threads were not
> +		 * thawed, the preallocation of memory carried out by
> +		 * hibernation_snapshot() might run into problems (i.e. it
> +		 * might fail or even deadlock).
> +		 */
> +		thaw_kernel_threads();
>  		break;
> 
>  	case SNAPSHOT_PREF_IMAGE_SIZE:



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659
  2012-01-28 15:41                           ` Srivatsa S. Bhat
@ 2012-01-29  0:14                             ` Rafael J. Wysocki
  0 siblings, 0 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2012-01-29  0:14 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Jiri Slaby, Tejun Heo, Jiri Slaby, LKML, Baohua.Song, pavel,
	Linux PM mailing list

On Saturday, January 28, 2012, Srivatsa S. Bhat wrote:
> On 01/28/2012 04:14 AM, Rafael J. Wysocki wrote:
> 
> >> Rafael, thanks for the clarification in the other mail.
> >>
> >> Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> >>
> >> I think we need to get this into 3.2-stable as well, as Jiri mentioned.
> > 
> > Yes, we do.
> > 
> > Below it goes again with the missing comment in user.c and a changelog.
> > 
> > Please let me know if there's anything wrong with it, otherwise I'm going
> > to push it to Linus in a couple of days.
> > 
> > Thanks,
> > Rafael
> > 
> > ---
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PM / Hibernate: Fix s2disk regression related to freezing workqueues
> > 
> > Commit 2aede851ddf08666f68ffc17be446420e9d2a056
> > 
> >   PM / Hibernate: Freeze kernel threads after preallocating memory
> > 
> > introduced a mechanism by which kernel threads were frozen after
> > the preallocation of hibernate image memory to avoid problems with
> > frozen kernel threads not responding to memory freeing requests.
> > However, it overlooked the s2disk code path in which the
> > SNAPSHOT_CREATE_IMAGE ioctl was run directly after SNAPSHOT_FREE,
> > which caused freeze_workqueues_begin() to BUG(), because it saw
> > that worqueues had been already frozen.
> > 
> > Although in principle this issue might be addressed by removing
> > the relevant BUG_ON() from freeze_workqueues_begin(), that would
> > reintroduce the very problem that commit 2aede851ddf08666f68ffc17be4
> > attempted to avoid into that particular code path.  For this reason,
> > to fix the issue at hand, introduce thaw_kernel_threads() and make
> > the SNAPSHOT_FREE ioctl execute it.
> > 
> > Special thanks to Srivatsa S. Bhat for detailed analysis of the
> > problem.
> > 
> > Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz>
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> 
> 
> Cc: stable@vger.kernel.org ?

That's going to show up in the actual commit.

> Other than that, the patch is perfect!

Cool, thanks!

Rafael


> > ---
> >  include/linux/freezer.h |    2 ++
> >  kernel/power/process.c  |   19 +++++++++++++++++++
> >  kernel/power/user.c     |    9 +++++++++
> >  3 files changed, 30 insertions(+)
> > 
> > Index: linux/include/linux/freezer.h
> > ===================================================================
> > --- linux.orig/include/linux/freezer.h
> > +++ linux/include/linux/freezer.h
> > @@ -39,6 +39,7 @@ extern bool __refrigerator(bool check_kt
> >  extern int freeze_processes(void);
> >  extern int freeze_kernel_threads(void);
> >  extern void thaw_processes(void);
> > +extern void thaw_kernel_threads(void);
> > 
> >  static inline bool try_to_freeze(void)
> >  {
> > @@ -174,6 +175,7 @@ static inline bool __refrigerator(bool c
> >  static inline int freeze_processes(void) { return -ENOSYS; }
> >  static inline int freeze_kernel_threads(void) { return -ENOSYS; }
> >  static inline void thaw_processes(void) {}
> > +static inline void thaw_kernel_threads(void) {}
> > 
> >  static inline bool try_to_freeze(void) { return false; }
> > 
> > Index: linux/kernel/power/process.c
> > ===================================================================
> > --- linux.orig/kernel/power/process.c
> > +++ linux/kernel/power/process.c
> > @@ -188,3 +188,22 @@ void thaw_processes(void)
> >  	printk("done.\n");
> >  }
> > 
> > +void thaw_kernel_threads(void)
> > +{
> > +	struct task_struct *g, *p;
> > +
> > +	pm_nosig_freezing = false;
> > +	printk("Restarting kernel threads ... ");
> > +
> > +	thaw_workqueues();
> > +
> > +	read_lock(&tasklist_lock);
> > +	do_each_thread(g, p) {
> > +		if (p->flags & (PF_KTHREAD | PF_WQ_WORKER))
> > +			__thaw_task(p);
> > +	} while_each_thread(g, p);
> > +	read_unlock(&tasklist_lock);
> > +
> > +	schedule();
> > +	printk("done.\n");
> > +}
> > Index: linux/kernel/power/user.c
> > ===================================================================
> > --- linux.orig/kernel/power/user.c
> > +++ linux/kernel/power/user.c
> > @@ -274,6 +274,15 @@ static long snapshot_ioctl(struct file *
> >  		swsusp_free();
> >  		memset(&data->handle, 0, sizeof(struct snapshot_handle));
> >  		data->ready = 0;
> > +		/*
> > +		 * It is necessary to thaw kernel threads here, because
> > +		 * SNAPSHOT_CREATE_IMAGE may be invoked directly after
> > +		 * SNAPSHOT_FREE.  In that case, if kernel threads were not
> > +		 * thawed, the preallocation of memory carried out by
> > +		 * hibernation_snapshot() might run into problems (i.e. it
> > +		 * might fail or even deadlock).
> > +		 */
> > +		thaw_kernel_threads();
> >  		break;
> > 
> >  	case SNAPSHOT_PREF_IMAGE_SIZE:
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2012-01-29  0:11 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-24 15:05 PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 Jiri Slaby
2012-01-24 16:18 ` [linux-pm] " Srivatsa S. Bhat
2012-01-24 16:24   ` Jiri Slaby
2012-01-24 22:36     ` Rafael J. Wysocki
2012-01-24 22:47       ` Jiri Slaby
2012-01-24 23:02         ` Rafael J. Wysocki
2012-01-25  0:04           ` Jiri Slaby
2012-01-25  0:10             ` Rafael J. Wysocki
2012-01-25 14:25               ` Jiri Slaby
2012-01-25 15:31               ` Srivatsa S. Bhat
2012-01-25 16:00                 ` Srivatsa S. Bhat
2012-01-25 18:44                   ` Srivatsa S. Bhat
2012-01-25 23:51                     ` Rafael J. Wysocki
2012-01-26 11:05                       ` Jiri Slaby
2012-01-27  1:01                         ` Rafael J. Wysocki
2012-01-26 19:22                       ` Srivatsa S. Bhat
2012-01-27  1:01                         ` Rafael J. Wysocki
2012-01-27 10:04                       ` Srivatsa S. Bhat
2012-01-27 22:44                         ` Rafael J. Wysocki
2012-01-28 15:41                           ` Srivatsa S. Bhat
2012-01-29  0:14                             ` Rafael J. Wysocki
2012-01-25 22:00                   ` Jiri Slaby
2012-01-26 19:39                 ` Srivatsa S. Bhat
2012-01-27  1:10                   ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).