* PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 @ 2012-01-24 15:05 Jiri Slaby 2012-01-24 16:18 ` [linux-pm] " Srivatsa S. Bhat 0 siblings, 1 reply; 24+ messages in thread From: Jiri Slaby @ 2012-01-24 15:05 UTC (permalink / raw) To: Rafael J. Wysocki, Linux-pm mailing list, LKML, Jiri Slaby Hi, this is a freshly booted system. When I do s2dsk, I see: ... Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! ------------[ cut here ]------------ kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! invalid opcode: 0000 [#1] SMP CPU 0 Modules linked in: Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627 Bochs Bochs RIP: 0010:[<ffffffff8107e365>] [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0 RSP: 0018:ffff880046f01d68 EFLAGS: 00010292 RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9 RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004 FS: 00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process s2disk (pid: 2669, threadinfo ffff880046f00000, task ffff880047251980) Stack: ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9 00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1 Call Trace: [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0 [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90 [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0 [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0 [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550 [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180 [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80 [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89 RIP [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0 RSP <ffff880046f01d68> ---[ end trace 632574abdc098963 ]--- I wanted to reproduce a different crash which happens on 3.2.1. When there is not enough free swap, the kernel crashes (when s2disk is used). I will post more info after the one above is resolved and I can reproduce the latter :). thanks, -- js suse labs ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-24 15:05 PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 Jiri Slaby @ 2012-01-24 16:18 ` Srivatsa S. Bhat 2012-01-24 16:24 ` Jiri Slaby 0 siblings, 1 reply; 24+ messages in thread From: Srivatsa S. Bhat @ 2012-01-24 16:18 UTC (permalink / raw) To: Jiri Slaby; +Cc: Rafael J. Wysocki, Linux-pm mailing list, LKML, Jiri Slaby Hi Jiri, On 01/24/2012 08:35 PM, Jiri Slaby wrote: > Hi, > > this is a freshly booted system. When I do s2dsk, I see: > ... > Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! > ------------[ cut here ]------------ > kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! > invalid opcode: 0000 [#1] SMP > CPU 0 > Modules linked in: > > Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627 > Bochs Bochs > RIP: 0010:[<ffffffff8107e365>] [<ffffffff8107e365>] > freeze_workqueues_begin+0x195/0x1a0 > RSP: 0018:ffff880046f01d68 EFLAGS: 00010292 > RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9 > RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c > RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004 > FS: 00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process s2disk (pid: 2669, threadinfo ffff880046f00000, task > ffff880047251980) > Stack: > ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc > ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9 > 00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1 > Call Trace: > [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0 > [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90 > [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0 > [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0 > [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550 > [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180 > [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80 > [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b > Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b > 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b > 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89 > RIP [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0 > RSP <ffff880046f01d68> > ---[ end trace 632574abdc098963 ]--- > I couldn't find any obvious root-cause from a quick check. Is this completely reproducible upon a fresh boot? Regards, Srivatsa S. Bhat IBM Linux Technology Center ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-24 16:18 ` [linux-pm] " Srivatsa S. Bhat @ 2012-01-24 16:24 ` Jiri Slaby 2012-01-24 22:36 ` Rafael J. Wysocki 0 siblings, 1 reply; 24+ messages in thread From: Jiri Slaby @ 2012-01-24 16:24 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Rafael J. Wysocki, Linux-pm mailing list, LKML, Jiri Slaby On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote: > Hi Jiri, > > On 01/24/2012 08:35 PM, Jiri Slaby wrote: > >> Hi, >> >> this is a freshly booted system. When I do s2dsk, I see: >> ... >> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! >> ------------[ cut here ]------------ >> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! >> invalid opcode: 0000 [#1] SMP >> CPU 0 >> Modules linked in: >> >> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627 >> Bochs Bochs >> RIP: 0010:[<ffffffff8107e365>] [<ffffffff8107e365>] >> freeze_workqueues_begin+0x195/0x1a0 >> RSP: 0018:ffff880046f01d68 EFLAGS: 00010292 >> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9 >> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c >> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000 >> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 >> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004 >> FS: 00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task >> ffff880047251980) >> Stack: >> ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc >> ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9 >> 00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1 >> Call Trace: >> [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0 >> [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90 >> [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0 >> [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0 >> [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550 >> [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180 >> [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80 >> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b >> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b >> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b >> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89 >> RIP [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0 >> RSP <ffff880046f01d68> >> ---[ end trace 632574abdc098963 ]--- >> > > > I couldn't find any obvious root-cause from a quick check. Is this completely > reproducible upon a fresh boot? True. The cause is that the function is called twice: # s2disk PM: Marking nosave pages: 000000000009f000 - 0000000000100000 PM: Basic memory bitmaps created Syncing filesystems ... done. Freezing user space processes ... (elapsed 0.01 seconds) done. PM: Preallocating image memory... done (allocated 264170 pages) PM: Allocated 1056680 kbytes in 0.45 seconds (2348.17 MB/s) Freezing remaining freezable tasks ... Pid: 2650, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1628 Call Trace: [<ffffffff8109edd2>] ? getnstimeofday+0x52/0xf0 [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0 [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b (elapsed 0.03 seconds) done. PM: freeze of devices complete after 57.216 msecs PM: late freeze of devices complete after 0.720 msecs ACPI: Preparing to enter system sleep state S4 PM: Saving platform NVS memory Disabling non-boot CPUs ... CPU 1 is now offline PM: Creating hibernation image: PM: Need to copy 24112 pages PM: Normal pages needed: 24112 + 1024, available pages: 277076 PM: Hibernation image created (24112 pages copied) Enabling non-boot CPUs ... Booting Node 0 Processor 1 APIC 0x1 smpboot cpu 1: start_ip = 9a000 Calibrating delay loop (skipped) already calibrated this CPU NMI watchdog disabled (cpu1): hardware events not enabled CPU1 is up ACPI: Waking up from system sleep state S4 PM: early thaw of devices complete after 0.093 msecs ata_piix 0000:00:01.1: setting latency timer to 64 e1000 0000:00:03.0: setting latency timer to 64 e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX ata2.01: NODEV after polling detection ata1.00: configured for MWDMA2 ata2.00: configured for MWDMA2 ata1.01: configured for MWDMA2 sd 0:0:1:0: [sdb] Starting disk sd 0:0:0:0: [sda] Starting disk PM: thaw of devices complete after 192.317 msecs PM: Preallocating image memory... done (allocated 264295 pages) PM: Allocated 1057180 kbytes in 0.12 seconds (8809.83 MB/s) Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! ------------[ cut here ]------------ kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! invalid opcode: 0000 [#1] SMP CPU 1 Modules linked in: Pid: 2650, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1628 Bochs Bochs RIP: 0010:[<ffffffff8107e371>] [<ffffffff8107e371>] freeze_workqueues_begin+0x1a1/0x1b0 RSP: 0018:ffff8800456d7d68 EFLAGS: 00010292 RAX: 0000000000000023 RBX: 0000000000000001 RCX: 0000000000000073 RDX: 00000000000000ea RSI: 0000000000000046 RDI: ffffffff81b51f7c RBP: ffff8800456d7d98 R08: ffffffff81a9d760 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 00007fff11d8709c R14: ffffffffffffffff R15: 0000000000000004 FS: 00007f4744397700(0000) GS:ffff880049700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000001f46bc0 CR3: 000000004483b000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process s2disk (pid: 2650, threadinfo ffff8800456d6000, task ffff880046c93300) Stack: ffff8800456d7d98 0000000000000001 0000000000000000 00007fff11d8709c ffffffffffffffff 0000000000000004 ffff8800456d7e18 ffffffff81096cc9 00000000ffff008f 0000000000000004 ffff8800456d7e18 000000004f1edaaf Call Trace: [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 1d 94 5a 00 0f 0b 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 06 94 5a 00 <0f> 0b 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b RIP [<ffffffff8107e371>] freeze_workqueues_begin+0x1a1/0x1b0 RSP <ffff8800456d7d68> ---[ end trace 696a23a038efce41 ]--- > > Regards, > Srivatsa S. Bhat > IBM Linux Technology Center > -- js suse labs ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-24 16:24 ` Jiri Slaby @ 2012-01-24 22:36 ` Rafael J. Wysocki 2012-01-24 22:47 ` Jiri Slaby 0 siblings, 1 reply; 24+ messages in thread From: Rafael J. Wysocki @ 2012-01-24 22:36 UTC (permalink / raw) To: Jiri Slaby; +Cc: Srivatsa S. Bhat, Linux-pm mailing list, LKML, Jiri Slaby On Tuesday, January 24, 2012, Jiri Slaby wrote: > On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote: > > Hi Jiri, > > > > On 01/24/2012 08:35 PM, Jiri Slaby wrote: > > > >> Hi, > >> > >> this is a freshly booted system. When I do s2dsk, I see: > >> ... > >> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! > >> ------------[ cut here ]------------ > >> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! > >> invalid opcode: 0000 [#1] SMP > >> CPU 0 > >> Modules linked in: > >> > >> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627 > >> Bochs Bochs > >> RIP: 0010:[<ffffffff8107e365>] [<ffffffff8107e365>] > >> freeze_workqueues_begin+0x195/0x1a0 > >> RSP: 0018:ffff880046f01d68 EFLAGS: 00010292 > >> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9 > >> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c > >> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000 > >> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > >> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004 > >> FS: 00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000 > >> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > >> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0 > >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > >> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task > >> ffff880047251980) > >> Stack: > >> ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc > >> ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9 > >> 00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1 > >> Call Trace: > >> [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0 > >> [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90 > >> [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0 > >> [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0 > >> [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550 > >> [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180 > >> [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80 > >> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b > >> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b > >> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b > >> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89 > >> RIP [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0 > >> RSP <ffff880046f01d68> > >> ---[ end trace 632574abdc098963 ]--- > >> > > > > > > I couldn't find any obvious root-cause from a quick check. Is this completely > > reproducible upon a fresh boot? > > True. > > The cause is that the function is called twice: Which function? Rafael > # s2disk > PM: Marking nosave pages: 000000000009f000 - 0000000000100000 > PM: Basic memory bitmaps created > Syncing filesystems ... done. > Freezing user space processes ... (elapsed 0.01 seconds) done. > PM: Preallocating image memory... done (allocated 264170 pages) > PM: Allocated 1056680 kbytes in 0.45 seconds (2348.17 MB/s) > Freezing remaining freezable tasks ... Pid: 2650, comm: s2disk Not > tainted 3.3.0-rc1-next-20120124_64+ #1628 > Call Trace: > [<ffffffff8109edd2>] ? getnstimeofday+0x52/0xf0 > [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0 > [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 > [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 > [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 > [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 > [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 > [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 > [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 > [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b > (elapsed 0.03 seconds) done. > PM: freeze of devices complete after 57.216 msecs > PM: late freeze of devices complete after 0.720 msecs > ACPI: Preparing to enter system sleep state S4 > PM: Saving platform NVS memory > Disabling non-boot CPUs ... > CPU 1 is now offline > PM: Creating hibernation image: > PM: Need to copy 24112 pages > PM: Normal pages needed: 24112 + 1024, available pages: 277076 > PM: Hibernation image created (24112 pages copied) > Enabling non-boot CPUs ... > Booting Node 0 Processor 1 APIC 0x1 > smpboot cpu 1: start_ip = 9a000 > Calibrating delay loop (skipped) already calibrated this CPU > NMI watchdog disabled (cpu1): hardware events not enabled > CPU1 is up > ACPI: Waking up from system sleep state S4 > PM: early thaw of devices complete after 0.093 msecs > ata_piix 0000:00:01.1: setting latency timer to 64 > e1000 0000:00:03.0: setting latency timer to 64 > e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX > ata2.01: NODEV after polling detection > ata1.00: configured for MWDMA2 > ata2.00: configured for MWDMA2 > ata1.01: configured for MWDMA2 > sd 0:0:1:0: [sdb] Starting disk > sd 0:0:0:0: [sda] Starting disk > PM: thaw of devices complete after 192.317 msecs > PM: Preallocating image memory... done (allocated 264295 pages) > PM: Allocated 1057180 kbytes in 0.12 seconds (8809.83 MB/s) > Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! > ------------[ cut here ]------------ > kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! > invalid opcode: 0000 [#1] SMP > CPU 1 > Modules linked in: > > Pid: 2650, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1628 > Bochs Bochs > RIP: 0010:[<ffffffff8107e371>] [<ffffffff8107e371>] > freeze_workqueues_begin+0x1a1/0x1b0 > RSP: 0018:ffff8800456d7d68 EFLAGS: 00010292 > RAX: 0000000000000023 RBX: 0000000000000001 RCX: 0000000000000073 > RDX: 00000000000000ea RSI: 0000000000000046 RDI: ffffffff81b51f7c > RBP: ffff8800456d7d98 R08: ffffffff81a9d760 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > R13: 00007fff11d8709c R14: ffffffffffffffff R15: 0000000000000004 > FS: 00007f4744397700(0000) GS:ffff880049700000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000000001f46bc0 CR3: 000000004483b000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process s2disk (pid: 2650, threadinfo ffff8800456d6000, task > ffff880046c93300) > Stack: > ffff8800456d7d98 0000000000000001 0000000000000000 00007fff11d8709c > ffffffffffffffff 0000000000000004 ffff8800456d7e18 ffffffff81096cc9 > 00000000ffff008f 0000000000000004 ffff8800456d7e18 000000004f1edaaf > Call Trace: > [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 > [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 > [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 > [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 > [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 > [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 > [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 > [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b > Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 1d 94 5a 00 0f 0b > 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 06 94 5a 00 <0f> 0b > 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b > RIP [<ffffffff8107e371>] freeze_workqueues_begin+0x1a1/0x1b0 > RSP <ffff8800456d7d68> > ---[ end trace 696a23a038efce41 ]--- > > > > > > > Regards, > > Srivatsa S. Bhat > > IBM Linux Technology Center > > > > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-24 22:36 ` Rafael J. Wysocki @ 2012-01-24 22:47 ` Jiri Slaby 2012-01-24 23:02 ` Rafael J. Wysocki 0 siblings, 1 reply; 24+ messages in thread From: Jiri Slaby @ 2012-01-24 22:47 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Srivatsa S. Bhat, Linux-pm mailing list, LKML, Jiri Slaby On 01/24/2012 11:36 PM, Rafael J. Wysocki wrote: > On Tuesday, January 24, 2012, Jiri Slaby wrote: >> On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote: >>> Hi Jiri, >>> >>> On 01/24/2012 08:35 PM, Jiri Slaby wrote: >>> >>>> Hi, >>>> >>>> this is a freshly booted system. When I do s2dsk, I see: >>>> ... >>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! >>>> ------------[ cut here ]------------ >>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! >>>> invalid opcode: 0000 [#1] SMP >>>> CPU 0 >>>> Modules linked in: >>>> >>>> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627 >>>> Bochs Bochs >>>> RIP: 0010:[<ffffffff8107e365>] [<ffffffff8107e365>] >>>> freeze_workqueues_begin+0x195/0x1a0 >>>> RSP: 0018:ffff880046f01d68 EFLAGS: 00010292 >>>> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9 >>>> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c >>>> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000 >>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 >>>> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004 >>>> FS: 00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000 >>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0 >>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task >>>> ffff880047251980) >>>> Stack: >>>> ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc >>>> ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9 >>>> 00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1 >>>> Call Trace: >>>> [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0 >>>> [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90 >>>> [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0 >>>> [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0 >>>> [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550 >>>> [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180 >>>> [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80 >>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b >>>> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b >>>> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b >>>> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89 >>>> RIP [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0 >>>> RSP <ffff880046f01d68> >>>> ---[ end trace 632574abdc098963 ]--- >>>> >>> >>> >>> I couldn't find any obvious root-cause from a quick check. Is this completely >>> reproducible upon a fresh boot? >> >> True. >> >> The cause is that the function is called twice: > > Which function? The one where the BUG is. Maybe the functions which should clear the flag is not called in between? See: >> [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0 ^^^^^^^^^^^^^^^^^^^^^^^ >> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 >> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 >> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 >> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 >> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 >> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 >> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 >> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b >> (elapsed 0.03 seconds) done. ... >> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! >> ------------[ cut here ]------------ >> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! ... >> RIP: 0010:[<ffffffff8107e371>] [<ffffffff8107e371> >> freeze_workqueues_begin+0x1a1/0x1b0 ^^^^^^^^^^^^^^^^^^^^^^^ >> Call Trace: >> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 >> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 >> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 >> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 >> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 >> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 >> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 >> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b -- js suse labs ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-24 22:47 ` Jiri Slaby @ 2012-01-24 23:02 ` Rafael J. Wysocki 2012-01-25 0:04 ` Jiri Slaby 0 siblings, 1 reply; 24+ messages in thread From: Rafael J. Wysocki @ 2012-01-24 23:02 UTC (permalink / raw) To: Jiri Slaby; +Cc: Srivatsa S. Bhat, Linux-pm mailing list, LKML, Jiri Slaby On Tuesday, January 24, 2012, Jiri Slaby wrote: > On 01/24/2012 11:36 PM, Rafael J. Wysocki wrote: > > On Tuesday, January 24, 2012, Jiri Slaby wrote: > >> On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote: > >>> Hi Jiri, > >>> > >>> On 01/24/2012 08:35 PM, Jiri Slaby wrote: > >>> > >>>> Hi, > >>>> > >>>> this is a freshly booted system. When I do s2dsk, I see: > >>>> ... > >>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! > >>>> ------------[ cut here ]------------ > >>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! > >>>> invalid opcode: 0000 [#1] SMP > >>>> CPU 0 > >>>> Modules linked in: > >>>> > >>>> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627 > >>>> Bochs Bochs > >>>> RIP: 0010:[<ffffffff8107e365>] [<ffffffff8107e365>] > >>>> freeze_workqueues_begin+0x195/0x1a0 > >>>> RSP: 0018:ffff880046f01d68 EFLAGS: 00010292 > >>>> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9 > >>>> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c > >>>> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000 > >>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > >>>> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004 > >>>> FS: 00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000 > >>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > >>>> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0 > >>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > >>>> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task > >>>> ffff880047251980) > >>>> Stack: > >>>> ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc > >>>> ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9 > >>>> 00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1 > >>>> Call Trace: > >>>> [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0 > >>>> [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90 > >>>> [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0 > >>>> [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0 > >>>> [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550 > >>>> [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180 > >>>> [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80 > >>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b > >>>> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b > >>>> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b > >>>> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89 > >>>> RIP [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0 > >>>> RSP <ffff880046f01d68> > >>>> ---[ end trace 632574abdc098963 ]--- > >>>> > >>> > >>> > >>> I couldn't find any obvious root-cause from a quick check. Is this completely > >>> reproducible upon a fresh boot? > >> > >> True. > >> > >> The cause is that the function is called twice: > > > > Which function? > > The one where the BUG is. Maybe the functions which should clear the > flag is not called in between? See: > > >> [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0 > ^^^^^^^^^^^^^^^^^^^^^^^ > >> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 > >> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 > >> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 > >> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 > >> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 > >> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 > >> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 > >> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b > >> (elapsed 0.03 seconds) done. > ... > >> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! > >> ------------[ cut here ]------------ > >> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! > ... > >> RIP: 0010:[<ffffffff8107e371>] [<ffffffff8107e371> > >> freeze_workqueues_begin+0x1a1/0x1b0 > ^^^^^^^^^^^^^^^^^^^^^^^ > >> Call Trace: > >> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 > >> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 > >> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 > >> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 > >> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 > >> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 > >> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 > >> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b Ah. So this is linux-next, right? Can you please test the linux-next branch of the linux-pm tree and see if the problem is reproducible in there? Rafael ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-24 23:02 ` Rafael J. Wysocki @ 2012-01-25 0:04 ` Jiri Slaby 2012-01-25 0:10 ` Rafael J. Wysocki 0 siblings, 1 reply; 24+ messages in thread From: Jiri Slaby @ 2012-01-25 0:04 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jiri Slaby, Srivatsa S. Bhat, Linux-pm mailing list, LKML On 01/25/2012 12:02 AM, Rafael J. Wysocki wrote: > On Tuesday, January 24, 2012, Jiri Slaby wrote: >> On 01/24/2012 11:36 PM, Rafael J. Wysocki wrote: >>> On Tuesday, January 24, 2012, Jiri Slaby wrote: >>>> On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote: >>>>> Hi Jiri, >>>>> >>>>> On 01/24/2012 08:35 PM, Jiri Slaby wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> this is a freshly booted system. When I do s2dsk, I see: >>>>>> ... >>>>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! >>>>>> ------------[ cut here ]------------ >>>>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! >>>>>> invalid opcode: 0000 [#1] SMP >>>>>> CPU 0 >>>>>> Modules linked in: >>>>>> >>>>>> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627 >>>>>> Bochs Bochs >>>>>> RIP: 0010:[<ffffffff8107e365>] [<ffffffff8107e365>] >>>>>> freeze_workqueues_begin+0x195/0x1a0 >>>>>> RSP: 0018:ffff880046f01d68 EFLAGS: 00010292 >>>>>> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9 >>>>>> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c >>>>>> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000 >>>>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 >>>>>> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004 >>>>>> FS: 00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000 >>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>>>> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0 >>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>>>> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task >>>>>> ffff880047251980) >>>>>> Stack: >>>>>> ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc >>>>>> ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9 >>>>>> 00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1 >>>>>> Call Trace: >>>>>> [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0 >>>>>> [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90 >>>>>> [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0 >>>>>> [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0 >>>>>> [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550 >>>>>> [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180 >>>>>> [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80 >>>>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b >>>>>> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b >>>>>> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b >>>>>> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89 >>>>>> RIP [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0 >>>>>> RSP <ffff880046f01d68> >>>>>> ---[ end trace 632574abdc098963 ]--- >>>>>> >>>>> >>>>> >>>>> I couldn't find any obvious root-cause from a quick check. Is this completely >>>>> reproducible upon a fresh boot? >>>> >>>> True. >>>> >>>> The cause is that the function is called twice: >>> >>> Which function? >> >> The one where the BUG is. Maybe the functions which should clear the >> flag is not called in between? See: >> >>>> [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0 >> ^^^^^^^^^^^^^^^^^^^^^^^ >>>> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 >>>> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 >>>> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 >>>> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 >>>> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 >>>> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 >>>> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 >>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b >>>> (elapsed 0.03 seconds) done. >> ... >>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! >>>> ------------[ cut here ]------------ >>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! >> ... >>>> RIP: 0010:[<ffffffff8107e371>] [<ffffffff8107e371> >>>> freeze_workqueues_begin+0x1a1/0x1b0 >> ^^^^^^^^^^^^^^^^^^^^^^^ >>>> Call Trace: >>>> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 >>>> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 >>>> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 >>>> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 >>>> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 >>>> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 >>>> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 >>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b > > Ah. So this is linux-next, right? Right. > Can you please test the linux-next branch of the linux-pm tree and see if > the problem is reproducible in there? Yeah, 100%. Just try it with a small enough swap. thanks, -- js ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-25 0:04 ` Jiri Slaby @ 2012-01-25 0:10 ` Rafael J. Wysocki 2012-01-25 14:25 ` Jiri Slaby 2012-01-25 15:31 ` Srivatsa S. Bhat 0 siblings, 2 replies; 24+ messages in thread From: Rafael J. Wysocki @ 2012-01-25 0:10 UTC (permalink / raw) To: Jiri Slaby; +Cc: Jiri Slaby, Srivatsa S. Bhat, Linux-pm mailing list, LKML On Wednesday, January 25, 2012, Jiri Slaby wrote: > On 01/25/2012 12:02 AM, Rafael J. Wysocki wrote: > > On Tuesday, January 24, 2012, Jiri Slaby wrote: > >> On 01/24/2012 11:36 PM, Rafael J. Wysocki wrote: > >>> On Tuesday, January 24, 2012, Jiri Slaby wrote: > >>>> On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote: > >>>>> Hi Jiri, > >>>>> > >>>>> On 01/24/2012 08:35 PM, Jiri Slaby wrote: > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> this is a freshly booted system. When I do s2dsk, I see: > >>>>>> ... > >>>>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! > >>>>>> ------------[ cut here ]------------ > >>>>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! > >>>>>> invalid opcode: 0000 [#1] SMP > >>>>>> CPU 0 > >>>>>> Modules linked in: > >>>>>> > >>>>>> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627 > >>>>>> Bochs Bochs > >>>>>> RIP: 0010:[<ffffffff8107e365>] [<ffffffff8107e365>] > >>>>>> freeze_workqueues_begin+0x195/0x1a0 > >>>>>> RSP: 0018:ffff880046f01d68 EFLAGS: 00010292 > >>>>>> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9 > >>>>>> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c > >>>>>> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000 > >>>>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > >>>>>> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004 > >>>>>> FS: 00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000 > >>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > >>>>>> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0 > >>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > >>>>>> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task > >>>>>> ffff880047251980) > >>>>>> Stack: > >>>>>> ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc > >>>>>> ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9 > >>>>>> 00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1 > >>>>>> Call Trace: > >>>>>> [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0 > >>>>>> [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90 > >>>>>> [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0 > >>>>>> [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0 > >>>>>> [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550 > >>>>>> [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180 > >>>>>> [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80 > >>>>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b > >>>>>> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b > >>>>>> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b > >>>>>> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89 > >>>>>> RIP [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0 > >>>>>> RSP <ffff880046f01d68> > >>>>>> ---[ end trace 632574abdc098963 ]--- > >>>>>> > >>>>> > >>>>> > >>>>> I couldn't find any obvious root-cause from a quick check. Is this completely > >>>>> reproducible upon a fresh boot? > >>>> > >>>> True. > >>>> > >>>> The cause is that the function is called twice: > >>> > >>> Which function? > >> > >> The one where the BUG is. Maybe the functions which should clear the > >> flag is not called in between? See: > >> > >>>> [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0 > >> ^^^^^^^^^^^^^^^^^^^^^^^ > >>>> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 > >>>> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 > >>>> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 > >>>> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 > >>>> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 > >>>> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 > >>>> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 > >>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b > >>>> (elapsed 0.03 seconds) done. > >> ... > >>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! > >>>> ------------[ cut here ]------------ > >>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! > >> ... > >>>> RIP: 0010:[<ffffffff8107e371>] [<ffffffff8107e371> > >>>> freeze_workqueues_begin+0x1a1/0x1b0 > >> ^^^^^^^^^^^^^^^^^^^^^^^ > >>>> Call Trace: > >>>> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 > >>>> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 > >>>> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 > >>>> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 > >>>> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 > >>>> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 > >>>> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 > >>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b > > > > Ah. So this is linux-next, right? > > Right. > > > Can you please test the linux-next branch of the linux-pm tree and see if > > the problem is reproducible in there? > > Yeah, 100%. Just try it with a small enough swap. Ah, thanks, so that's an error code path problem and most likely in the Linus' tree. Srivatsa, any ideas? Rafael ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-25 0:10 ` Rafael J. Wysocki @ 2012-01-25 14:25 ` Jiri Slaby 2012-01-25 15:31 ` Srivatsa S. Bhat 1 sibling, 0 replies; 24+ messages in thread From: Jiri Slaby @ 2012-01-25 14:25 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jiri Slaby, Srivatsa S. Bhat, Linux-pm mailing list, LKML On 01/25/2012 01:10 AM, Rafael J. Wysocki wrote: > On Wednesday, January 25, 2012, Jiri Slaby wrote: >> Yeah, 100%. Just try it with a small enough swap. > > Ah, thanks, so that's an error code path problem and most likely in the Linus' > tree. Just to emphasize what I wrote earlier. I would apprectiate this fixed in 3.2 stable too. -- js ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-25 0:10 ` Rafael J. Wysocki 2012-01-25 14:25 ` Jiri Slaby @ 2012-01-25 15:31 ` Srivatsa S. Bhat 2012-01-25 16:00 ` Srivatsa S. Bhat 2012-01-26 19:39 ` Srivatsa S. Bhat 1 sibling, 2 replies; 24+ messages in thread From: Srivatsa S. Bhat @ 2012-01-25 15:31 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jiri Slaby, Linux-pm mailing list, Jiri Slaby, LKML, Baohua.Song, Tejun Heo, pavel On 01/25/2012 05:40 AM, Rafael J. Wysocki wrote: > On Wednesday, January 25, 2012, Jiri Slaby wrote: >> On 01/25/2012 12:02 AM, Rafael J. Wysocki wrote: >>> On Tuesday, January 24, 2012, Jiri Slaby wrote: >>>> On 01/24/2012 11:36 PM, Rafael J. Wysocki wrote: >>>>> On Tuesday, January 24, 2012, Jiri Slaby wrote: >>>>>> On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote: >>>>>>> Hi Jiri, >>>>>>> >>>>>>> On 01/24/2012 08:35 PM, Jiri Slaby wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> this is a freshly booted system. When I do s2dsk, I see: >>>>>>>> ... >>>>>>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! >>>>>>>> ------------[ cut here ]------------ >>>>>>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! >>>>>>>> invalid opcode: 0000 [#1] SMP >>>>>>>> CPU 0 >>>>>>>> Modules linked in: >>>>>>>> >>>>>>>> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627 >>>>>>>> Bochs Bochs >>>>>>>> RIP: 0010:[<ffffffff8107e365>] [<ffffffff8107e365>] >>>>>>>> freeze_workqueues_begin+0x195/0x1a0 >>>>>>>> RSP: 0018:ffff880046f01d68 EFLAGS: 00010292 >>>>>>>> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9 >>>>>>>> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c >>>>>>>> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000 >>>>>>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 >>>>>>>> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004 >>>>>>>> FS: 00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000 >>>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>>>>>> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0 >>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>>>>>> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task >>>>>>>> ffff880047251980) >>>>>>>> Stack: >>>>>>>> ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc >>>>>>>> ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9 >>>>>>>> 00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1 >>>>>>>> Call Trace: >>>>>>>> [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0 >>>>>>>> [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90 >>>>>>>> [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0 >>>>>>>> [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0 >>>>>>>> [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550 >>>>>>>> [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180 >>>>>>>> [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80 >>>>>>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b >>>>>>>> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b >>>>>>>> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b >>>>>>>> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89 >>>>>>>> RIP [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0 >>>>>>>> RSP <ffff880046f01d68> >>>>>>>> ---[ end trace 632574abdc098963 ]--- >>>>>>>> >>>>>>> >>>>>>> >>>>>>> I couldn't find any obvious root-cause from a quick check. Is this completely >>>>>>> reproducible upon a fresh boot? >>>>>> >>>>>> True. >>>>>> >>>>>> The cause is that the function is called twice: >>>>> >>>>> Which function? >>>> >>>> The one where the BUG is. Maybe the functions which should clear the >>>> flag is not called in between? See: >>>> >>>>>> [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0 >>>> ^^^^^^^^^^^^^^^^^^^^^^^ >>>>>> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 >>>>>> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 >>>>>> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 >>>>>> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 >>>>>> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 >>>>>> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 >>>>>> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 >>>>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b >>>>>> (elapsed 0.03 seconds) done. >>>> ... >>>>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! >>>>>> ------------[ cut here ]------------ >>>>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! >>>> ... >>>>>> RIP: 0010:[<ffffffff8107e371>] [<ffffffff8107e371> >>>>>> freeze_workqueues_begin+0x1a1/0x1b0 >>>> ^^^^^^^^^^^^^^^^^^^^^^^ >>>>>> Call Trace: >>>>>> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 >>>>>> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 >>>>>> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 >>>>>> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 >>>>>> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 >>>>>> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 >>>>>> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 >>>>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b >>> >>> Ah. So this is linux-next, right? >> >> Right. >> >>> Can you please test the linux-next branch of the linux-pm tree and see if >>> the problem is reproducible in there? >> >> Yeah, 100%. Just try it with a small enough swap. > > Ah, thanks, so that's an error code path problem and most likely in the Linus' > tree. > > Srivatsa, any ideas? > Ok, I will need to quote a part of the userspace utility to explain the problem. In suspend.c inside the suspend-utils userspace package, I see a loop such as: error = freeze(snapshot_fd); ... attempts = 2; do { if (set_image_size(snapshot_fd, image_size)) { error = errno; break; } if (atomic_snapshot(snapshot_fd, &in_suspend)) { error = errno; break; } if (!in_suspend) { /* first unblank the console, see console_codes(4) */ printf("\e[13]"); printf("%s: returned to userspace\n", my_name); free_snapshot(snapshot_fd); break; } error = write_image(snapshot_fd, resume_fd, -1); if (error) { free_swap_pages(snapshot_fd); free_snapshot(snapshot_fd); image_size = 0; error = -error; if (error != ENOSPC) break; } else { splash.progress(100); #ifdef CONFIG_BOTH if (s2ram_kms || s2ram) { /* If we die (and allow system to continue) * between now and reset_signature(), very bad * things will happen. */ error = suspend_to_ram(snapshot_fd); if (error) goto Shutdown; reset_signature(resume_fd); free_swap_pages(snapshot_fd); free_snapshot(snapshot_fd); if (!s2ram_kms) s2ram_resume(); goto Unfreeze; } Shutdown: #endif close(resume_fd); suspend_shutdown(snapshot_fd); } } while (--attempts); ... Unfreeze: unfreeze(snapshot_fd); Let me reply to this thread so that I can comment on the above code. Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-25 15:31 ` Srivatsa S. Bhat @ 2012-01-25 16:00 ` Srivatsa S. Bhat 2012-01-25 18:44 ` Srivatsa S. Bhat 2012-01-25 22:00 ` Jiri Slaby 2012-01-26 19:39 ` Srivatsa S. Bhat 1 sibling, 2 replies; 24+ messages in thread From: Srivatsa S. Bhat @ 2012-01-25 16:00 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jiri Slaby, Linux-pm mailing list, Jiri Slaby, LKML, Baohua.Song, Tejun Heo, pavel On 01/25/2012 09:01 PM, Srivatsa S. Bhat wrote: > On 01/25/2012 05:40 AM, Rafael J. Wysocki wrote: > >> On Wednesday, January 25, 2012, Jiri Slaby wrote: >>> On 01/25/2012 12:02 AM, Rafael J. Wysocki wrote: >>>> On Tuesday, January 24, 2012, Jiri Slaby wrote: >>>>> On 01/24/2012 11:36 PM, Rafael J. Wysocki wrote: >>>>>> On Tuesday, January 24, 2012, Jiri Slaby wrote: >>>>>>> On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote: >>>>>>>> Hi Jiri, >>>>>>>> >>>>>>>> On 01/24/2012 08:35 PM, Jiri Slaby wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> this is a freshly booted system. When I do s2dsk, I see: >>>>>>>>> ... >>>>>>>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! >>>>>>>>> ------------[ cut here ]------------ >>>>>>>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! >>>>>>>>> invalid opcode: 0000 [#1] SMP >>>>>>>>> CPU 0 >>>>>>>>> Modules linked in: >>>>>>>>> >>>>>>>>> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627 >>>>>>>>> Bochs Bochs >>>>>>>>> RIP: 0010:[<ffffffff8107e365>] [<ffffffff8107e365>] >>>>>>>>> freeze_workqueues_begin+0x195/0x1a0 >>>>>>>>> RSP: 0018:ffff880046f01d68 EFLAGS: 00010292 >>>>>>>>> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9 >>>>>>>>> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c >>>>>>>>> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000 >>>>>>>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 >>>>>>>>> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004 >>>>>>>>> FS: 00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000 >>>>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>>>>>>> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0 >>>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>>>>>>> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task >>>>>>>>> ffff880047251980) >>>>>>>>> Stack: >>>>>>>>> ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc >>>>>>>>> ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9 >>>>>>>>> 00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1 >>>>>>>>> Call Trace: >>>>>>>>> [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0 >>>>>>>>> [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90 >>>>>>>>> [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0 >>>>>>>>> [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0 >>>>>>>>> [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550 >>>>>>>>> [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180 >>>>>>>>> [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80 >>>>>>>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b >>>>>>>>> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b >>>>>>>>> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b >>>>>>>>> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89 >>>>>>>>> RIP [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0 >>>>>>>>> RSP <ffff880046f01d68> >>>>>>>>> ---[ end trace 632574abdc098963 ]--- >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I couldn't find any obvious root-cause from a quick check. Is this completely >>>>>>>> reproducible upon a fresh boot? >>>>>>> >>>>>>> True. >>>>>>> >>>>>>> The cause is that the function is called twice: >>>>>> >>>>>> Which function? >>>>> >>>>> The one where the BUG is. Maybe the functions which should clear the >>>>> flag is not called in between? See: >>>>> >>>>>>> [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0 >>>>> ^^^^^^^^^^^^^^^^^^^^^^^ >>>>>>> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 >>>>>>> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 >>>>>>> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 >>>>>>> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 >>>>>>> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 >>>>>>> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 >>>>>>> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 >>>>>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b >>>>>>> (elapsed 0.03 seconds) done. >>>>> ... >>>>>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! >>>>>>> ------------[ cut here ]------------ >>>>>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! >>>>> ... >>>>>>> RIP: 0010:[<ffffffff8107e371>] [<ffffffff8107e371> >>>>>>> freeze_workqueues_begin+0x1a1/0x1b0 >>>>> ^^^^^^^^^^^^^^^^^^^^^^^ >>>>>>> Call Trace: >>>>>>> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 >>>>>>> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 >>>>>>> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 >>>>>>> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 >>>>>>> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 >>>>>>> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 >>>>>>> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 >>>>>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b >>>> >>>> Ah. So this is linux-next, right? >>> >>> Right. >>> >>>> Can you please test the linux-next branch of the linux-pm tree and see if >>>> the problem is reproducible in there? >>> >>> Yeah, 100%. Just try it with a small enough swap. >> >> Ah, thanks, so that's an error code path problem and most likely in the Linus' >> tree. >> >> Srivatsa, any ideas? >> > > > Ok, I will need to quote a part of the userspace utility to explain the > problem. > Commit 2aede851 (PM / Hibernate: Freeze kernel threads after preallocating memory) split up the freezing phase and moved the freezing of kernel threads to hibernation_snapshot() function. (This is a pretty old commit btw, I think it came in sometime around 3.2). And there is a BUG_ON() in freeze_workqueues_begin() which complains if this function is called when the kernel threads are already frozen. Now, coming to the userspace utility: > In suspend.c inside the suspend-utils userspace package, I see a loop such > as: > > error = freeze(snapshot_fd); > ... > attempts = 2; > do { > if (set_image_size(snapshot_fd, image_size)) { > error = errno; > break; > } > if (atomic_snapshot(snapshot_fd, &in_suspend)) { ^^^ This ends up in the call to hibernation_snapshot(). > error = errno; > break; > } > if (!in_suspend) { > /* first unblank the console, see console_codes(4) */ > printf("\e[13]"); > printf("%s: returned to userspace\n", my_name); > free_snapshot(snapshot_fd); > break; > } > > error = write_image(snapshot_fd, resume_fd, -1); > if (error) { Since swap space is limited, we come here. > free_swap_pages(snapshot_fd); > free_snapshot(snapshot_fd); > image_size = 0; > error = -error; > if (error != ENOSPC) > break; And because of the above 'if' condition, we don't break here. IOW, we will run the loop a second time. > } else { > splash.progress(100); > #ifdef CONFIG_BOTH > if (s2ram_kms || s2ram) { > /* If we die (and allow system to continue) > * between now and reset_signature(), very bad > * things will happen. */ > error = suspend_to_ram(snapshot_fd); > if (error) > goto Shutdown; > reset_signature(resume_fd); > free_swap_pages(snapshot_fd); > free_snapshot(snapshot_fd); > if (!s2ram_kms) > s2ram_resume(); > goto Unfreeze; > } > Shutdown: > #endif > close(resume_fd); > suspend_shutdown(snapshot_fd); > } > } while (--attempts); > > ... > Unfreeze: > unfreeze(snapshot_fd); > > So, since the loop is run a second time if not enough space was available, we end up calling hibernation_snapshot(), IOW freeze_workqueues_begin() a second time - and this is what makes the BUG_ON() to fire! To solve this, I feel we can simply omit the BUG_ON() inside freeze_workqueues_begin() and replace it with a simple check (just like what we do in thaw_workqueues())... And moreover, after the change that moved the freezing of kernel threads to hibernation_snapshot(), it is quite natural to hit a scenario such as this, because the entire freezing phase is not in one single place. IOW, the existing BUG_ON() is not qualified to be there anymore! (Also, exposing a straight-forward method like this to userspace, to trigger a BUG_ON sounds ridiculous to start with!) Of course, the other method would be to refactor the kernel code and stuff like that, but that not only would be messy but it would also involve breaking existing userspace applications, AFAICS. So, Jiri, can you please try the following patch and see if it works for you as expected? I'll be happy to provide a formal patch with a changelog if this works. --- kernel/workqueue.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index bec7b5b..cb26c5d 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -3656,7 +3656,9 @@ void freeze_workqueues_begin(void) spin_lock(&workqueue_lock); - BUG_ON(workqueue_freezing); + if (workqueue_freezing) + goto out_unlock; + workqueue_freezing = true; for_each_gcwq_cpu(cpu) { @@ -3678,6 +3680,7 @@ void freeze_workqueues_begin(void) spin_unlock_irq(&gcwq->lock); } +out_unlock: spin_unlock(&workqueue_lock); } -- 1.7.6 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-25 16:00 ` Srivatsa S. Bhat @ 2012-01-25 18:44 ` Srivatsa S. Bhat 2012-01-25 23:51 ` Rafael J. Wysocki 2012-01-25 22:00 ` Jiri Slaby 1 sibling, 1 reply; 24+ messages in thread From: Srivatsa S. Bhat @ 2012-01-25 18:44 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jiri Slaby, Linux-pm mailing list, Jiri Slaby, LKML, Baohua.Song, Tejun Heo, pavel, Linux PM mailing list > > Commit 2aede851 (PM / Hibernate: Freeze kernel threads after preallocating > memory) split up the freezing phase and moved the freezing of kernel threads > to hibernation_snapshot() function. > (This is a pretty old commit btw, I think it came in sometime around 3.2). > > And there is a BUG_ON() in freeze_workqueues_begin() which complains if > this function is called when the kernel threads are already frozen. > > Now, coming to the userspace utility: > >> In suspend.c inside the suspend-utils userspace package, I see a loop such >> as: >> >> error = freeze(snapshot_fd); >> ... >> attempts = 2; >> do { >> if (set_image_size(snapshot_fd, image_size)) { >> error = errno; >> break; >> } >> if (atomic_snapshot(snapshot_fd, &in_suspend)) { > ^^^ > This ends up in the call to hibernation_snapshot(). > >> error = errno; >> break; >> } >> if (!in_suspend) { >> /* first unblank the console, see console_codes(4) */ >> printf("\e[13]"); >> printf("%s: returned to userspace\n", my_name); >> free_snapshot(snapshot_fd); >> break; >> } >> >> error = write_image(snapshot_fd, resume_fd, -1); >> if (error) { > > Since swap space is limited, we come here. > >> free_swap_pages(snapshot_fd); >> free_snapshot(snapshot_fd); ^^^ This function makes the ioctl call SNAPSHOT_FREE. >> image_size = 0; >> error = -error; >> if (error != ENOSPC) >> break; > > > > > And because of the above 'if' condition, we don't break here. > IOW, we will run the loop a second time. > >> } else { >> splash.progress(100); >> #ifdef CONFIG_BOTH >> if (s2ram_kms || s2ram) { >> /* If we die (and allow system to continue) >> * between now and reset_signature(), very bad >> * things will happen. */ >> error = suspend_to_ram(snapshot_fd); >> if (error) >> goto Shutdown; >> reset_signature(resume_fd); >> free_swap_pages(snapshot_fd); >> free_snapshot(snapshot_fd); >> if (!s2ram_kms) >> s2ram_resume(); >> goto Unfreeze; >> } >> Shutdown: >> #endif >> close(resume_fd); >> suspend_shutdown(snapshot_fd); >> } >> } while (--attempts); >> >> ... >> Unfreeze: >> unfreeze(snapshot_fd); >> >> > > > So, since the loop is run a second time if not enough space was available, > we end up calling hibernation_snapshot(), IOW freeze_workqueues_begin() a > second time - and this is what makes the BUG_ON() to fire! > SNAPSHOT_CREATE_IMAGE has a check for data->ready such as: if (data->mode != O_RDONLY || !data->frozen || data->ready) { error = -EPERM; break; } data->ready would be set to 1 only under SNAPSHOT_CREATE_IMAGE. However, SNAPSHOT_FREE (invoked at the place shown above) will reset the value to 0. This makes it possible for hibernation_snapshot() and hence freeze_workqueues_begin() to be called a second time, which is unfortunate. And actually, the patch I posted in my previous mail is not really the right long-term fix, though it might fix the particular issue that Jiri is facing.. Because, allowing hibernation_snapshot() to get called a second time while kernel threads are still frozen brings us to the same situation that commit 2aede851 (PM / Hibernate: Freeze kernel threads after preallocating memory) tried to prevent! IOW, a call to hibernate_preallocate_memory() would be done inside hibernation_snapshot(), when kernel threads are frozen.. which is known to break XFS, to give one example as mentioned in the changelog of the above commit. So, the right way to fix this IMHO, would be to split up thaw_processes() just like freezing phase: /* freezes or thaws user space processes */ freeze_processes() - thaw_processes() /* freezes or thaws kernel threads */ freeze_kernel_threads() - thaw_kernel_threads() We have to insert this thaw_kernel_threads() at appropriate places in such a way as to not require another ioctl if possible... Then things would be more symmetric (and hence more easy to understand) and we can avoid getting into strange situations as discussed here. But before we venture into that, it would be good to know if the patch posted in the previous mail fixes the particular problem reported in this thread, atleast just to see if there are other problems lurking that we aren't aware of yet.. Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-25 18:44 ` Srivatsa S. Bhat @ 2012-01-25 23:51 ` Rafael J. Wysocki 2012-01-26 11:05 ` Jiri Slaby ` (2 more replies) 0 siblings, 3 replies; 24+ messages in thread From: Rafael J. Wysocki @ 2012-01-25 23:51 UTC (permalink / raw) To: Srivatsa S. Bhat, Jiri Slaby, Tejun Heo Cc: Jiri Slaby, LKML, Baohua.Song, pavel, Linux PM mailing list Hi, On Wednesday, January 25, 2012, Srivatsa S. Bhat wrote: > > > > > Commit 2aede851 (PM / Hibernate: Freeze kernel threads after preallocating > > memory) split up the freezing phase and moved the freezing of kernel threads > > to hibernation_snapshot() function. > > (This is a pretty old commit btw, I think it came in sometime around 3.2). > > > > And there is a BUG_ON() in freeze_workqueues_begin() which complains if > > this function is called when the kernel threads are already frozen. > > > > Now, coming to the userspace utility: > > > >> In suspend.c inside the suspend-utils userspace package, I see a loop such > >> as: > >> > >> error = freeze(snapshot_fd); > >> ... > >> attempts = 2; > >> do { > >> if (set_image_size(snapshot_fd, image_size)) { > >> error = errno; > >> break; > >> } > >> if (atomic_snapshot(snapshot_fd, &in_suspend)) { > > ^^^ > > This ends up in the call to hibernation_snapshot(). > > > >> error = errno; > >> break; > >> } > >> if (!in_suspend) { > >> /* first unblank the console, see console_codes(4) */ > >> printf("\e[13]"); > >> printf("%s: returned to userspace\n", my_name); > >> free_snapshot(snapshot_fd); > >> break; > >> } > >> > >> error = write_image(snapshot_fd, resume_fd, -1); > >> if (error) { > > > > Since swap space is limited, we come here. > > > >> free_swap_pages(snapshot_fd); > >> free_snapshot(snapshot_fd); > > ^^^ > This function makes the ioctl call SNAPSHOT_FREE. > > >> image_size = 0; > >> error = -error; > >> if (error != ENOSPC) > >> break; > > > > > > > > > > And because of the above 'if' condition, we don't break here. > > IOW, we will run the loop a second time. > > > >> } else { > >> splash.progress(100); > >> #ifdef CONFIG_BOTH > >> if (s2ram_kms || s2ram) { > >> /* If we die (and allow system to continue) > >> * between now and reset_signature(), very bad > >> * things will happen. */ > >> error = suspend_to_ram(snapshot_fd); > >> if (error) > >> goto Shutdown; > >> reset_signature(resume_fd); > >> free_swap_pages(snapshot_fd); > >> free_snapshot(snapshot_fd); > >> if (!s2ram_kms) > >> s2ram_resume(); > >> goto Unfreeze; > >> } > >> Shutdown: > >> #endif > >> close(resume_fd); > >> suspend_shutdown(snapshot_fd); > >> } > >> } while (--attempts); > >> > >> ... > >> Unfreeze: > >> unfreeze(snapshot_fd); > >> > >> > > > > > > So, since the loop is run a second time if not enough space was available, > > we end up calling hibernation_snapshot(), IOW freeze_workqueues_begin() a > > second time - and this is what makes the BUG_ON() to fire! > > > > > SNAPSHOT_CREATE_IMAGE has a check for data->ready such as: > > if (data->mode != O_RDONLY || !data->frozen || data->ready) { > error = -EPERM; > break; > } > > data->ready would be set to 1 only under SNAPSHOT_CREATE_IMAGE. However, > SNAPSHOT_FREE (invoked at the place shown above) will reset the value to 0. > This makes it possible for hibernation_snapshot() and hence > freeze_workqueues_begin() to be called a second time, which is unfortunate. Yes, I obviously forgot about that code path when I was working on the commit that introduced the problem. :-( Thanks a lot for the great analysis, it's really helpful! > And actually, the patch I posted in my previous mail is not really the right > long-term fix, though it might fix the particular issue that Jiri is facing.. > > Because, allowing hibernation_snapshot() to get called a second time while > kernel threads are still frozen brings us to the same situation that commit > 2aede851 (PM / Hibernate: Freeze kernel threads after preallocating memory) > tried to prevent! IOW, a call to hibernate_preallocate_memory() would be > done inside hibernation_snapshot(), when kernel threads are frozen.. which > is known to break XFS, to give one example as mentioned in the changelog > of the above commit. That's exactly right. > So, the right way to fix this IMHO, would be to split up thaw_processes() > just like freezing phase: > > /* freezes or thaws user space processes */ > freeze_processes() - thaw_processes() > > /* freezes or thaws kernel threads */ > freeze_kernel_threads() - thaw_kernel_threads() > > We have to insert this thaw_kernel_threads() at appropriate places in such a > way as to not require another ioctl if possible... Then things would be > more symmetric (and hence more easy to understand) and we can avoid getting > into strange situations as discussed here. > > But before we venture into that, it would be good to know if the patch posted > in the previous mail fixes the particular problem reported in this thread, > atleast just to see if there are other problems lurking that we aren't aware > of yet.. Jiri has already said that the patch works. I think we could avoid the issue entirely by introducing thaw_kernel_threads and making SNAPSHOT_FREE call it. No other changes should be necessary. IOW, Jiri, does the patch below help? [BTW, the freeze_tasks()'s kerneldoc seems to be outdated. Tejun?] --- include/linux/freezer.h | 2 ++ kernel/power/process.c | 19 +++++++++++++++++++ kernel/power/user.c | 1 + 3 files changed, 22 insertions(+) Index: linux/include/linux/freezer.h =================================================================== --- linux.orig/include/linux/freezer.h +++ linux/include/linux/freezer.h @@ -39,6 +39,7 @@ extern bool __refrigerator(bool check_kt extern int freeze_processes(void); extern int freeze_kernel_threads(void); extern void thaw_processes(void); +extern void thaw_kernel_threads(void); static inline bool try_to_freeze(void) { @@ -174,6 +175,7 @@ static inline bool __refrigerator(bool c static inline int freeze_processes(void) { return -ENOSYS; } static inline int freeze_kernel_threads(void) { return -ENOSYS; } static inline void thaw_processes(void) {} +static inline void thaw_kernel_threads(void) {} static inline bool try_to_freeze(void) { return false; } Index: linux/kernel/power/process.c =================================================================== --- linux.orig/kernel/power/process.c +++ linux/kernel/power/process.c @@ -188,3 +188,22 @@ void thaw_processes(void) printk("done.\n"); } +void thaw_kernel_threads(void) +{ + struct task_struct *g, *p; + + pm_nosig_freezing = false; + printk("Restarting kernel threads ... "); + + thaw_workqueues(); + + read_lock(&tasklist_lock); + do_each_thread(g, p) { + if (p->flags & (PF_KTHREAD | PF_WQ_WORKER)) + __thaw_task(p); + } while_each_thread(g, p); + read_unlock(&tasklist_lock); + + schedule(); + printk("done.\n"); +} Index: linux/kernel/power/user.c =================================================================== --- linux.orig/kernel/power/user.c +++ linux/kernel/power/user.c @@ -274,6 +274,7 @@ static long snapshot_ioctl(struct file * swsusp_free(); memset(&data->handle, 0, sizeof(struct snapshot_handle)); data->ready = 0; + thaw_kernel_threads(); break; case SNAPSHOT_PREF_IMAGE_SIZE: ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-25 23:51 ` Rafael J. Wysocki @ 2012-01-26 11:05 ` Jiri Slaby 2012-01-27 1:01 ` Rafael J. Wysocki 2012-01-26 19:22 ` Srivatsa S. Bhat 2012-01-27 10:04 ` Srivatsa S. Bhat 2 siblings, 1 reply; 24+ messages in thread From: Jiri Slaby @ 2012-01-26 11:05 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Srivatsa S. Bhat, Jiri Slaby, Tejun Heo, LKML, Baohua.Song, pavel, Linux PM mailing list On 01/26/2012 12:51 AM, Rafael J. Wysocki wrote: > IOW, Jiri, does the patch below help? Yeah, this fixes the issue as well. Thanks. > [BTW, the freeze_tasks()'s kerneldoc seems to be outdated. Tejun?] > > --- > include/linux/freezer.h | 2 ++ > kernel/power/process.c | 19 +++++++++++++++++++ > kernel/power/user.c | 1 + > 3 files changed, 22 insertions(+) > > Index: linux/include/linux/freezer.h > =================================================================== > --- linux.orig/include/linux/freezer.h > +++ linux/include/linux/freezer.h > @@ -39,6 +39,7 @@ extern bool __refrigerator(bool check_kt > extern int freeze_processes(void); > extern int freeze_kernel_threads(void); > extern void thaw_processes(void); > +extern void thaw_kernel_threads(void); > > static inline bool try_to_freeze(void) > { > @@ -174,6 +175,7 @@ static inline bool __refrigerator(bool c > static inline int freeze_processes(void) { return -ENOSYS; } > static inline int freeze_kernel_threads(void) { return -ENOSYS; } > static inline void thaw_processes(void) {} > +static inline void thaw_kernel_threads(void) {} > > static inline bool try_to_freeze(void) { return false; } > > Index: linux/kernel/power/process.c > =================================================================== > --- linux.orig/kernel/power/process.c > +++ linux/kernel/power/process.c > @@ -188,3 +188,22 @@ void thaw_processes(void) > printk("done.\n"); > } > > +void thaw_kernel_threads(void) > +{ > + struct task_struct *g, *p; > + > + pm_nosig_freezing = false; > + printk("Restarting kernel threads ... "); > + > + thaw_workqueues(); > + > + read_lock(&tasklist_lock); > + do_each_thread(g, p) { > + if (p->flags & (PF_KTHREAD | PF_WQ_WORKER)) > + __thaw_task(p); > + } while_each_thread(g, p); > + read_unlock(&tasklist_lock); > + > + schedule(); > + printk("done.\n"); > +} > Index: linux/kernel/power/user.c > =================================================================== > --- linux.orig/kernel/power/user.c > +++ linux/kernel/power/user.c > @@ -274,6 +274,7 @@ static long snapshot_ioctl(struct file * > swsusp_free(); > memset(&data->handle, 0, sizeof(struct snapshot_handle)); > data->ready = 0; > + thaw_kernel_threads(); > break; > > case SNAPSHOT_PREF_IMAGE_SIZE: -- js suse labs ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-26 11:05 ` Jiri Slaby @ 2012-01-27 1:01 ` Rafael J. Wysocki 0 siblings, 0 replies; 24+ messages in thread From: Rafael J. Wysocki @ 2012-01-27 1:01 UTC (permalink / raw) To: Jiri Slaby Cc: Srivatsa S. Bhat, Jiri Slaby, Tejun Heo, LKML, Baohua.Song, pavel, Linux PM mailing list On Thursday, January 26, 2012, Jiri Slaby wrote: > On 01/26/2012 12:51 AM, Rafael J. Wysocki wrote: > > IOW, Jiri, does the patch below help? > > Yeah, this fixes the issue as well. Thanks. Cool, thanks for the confirmation! > > [BTW, the freeze_tasks()'s kerneldoc seems to be outdated. Tejun?] > > > > --- > > include/linux/freezer.h | 2 ++ > > kernel/power/process.c | 19 +++++++++++++++++++ > > kernel/power/user.c | 1 + > > 3 files changed, 22 insertions(+) > > > > Index: linux/include/linux/freezer.h > > =================================================================== > > --- linux.orig/include/linux/freezer.h > > +++ linux/include/linux/freezer.h > > @@ -39,6 +39,7 @@ extern bool __refrigerator(bool check_kt > > extern int freeze_processes(void); > > extern int freeze_kernel_threads(void); > > extern void thaw_processes(void); > > +extern void thaw_kernel_threads(void); > > > > static inline bool try_to_freeze(void) > > { > > @@ -174,6 +175,7 @@ static inline bool __refrigerator(bool c > > static inline int freeze_processes(void) { return -ENOSYS; } > > static inline int freeze_kernel_threads(void) { return -ENOSYS; } > > static inline void thaw_processes(void) {} > > +static inline void thaw_kernel_threads(void) {} > > > > static inline bool try_to_freeze(void) { return false; } > > > > Index: linux/kernel/power/process.c > > =================================================================== > > --- linux.orig/kernel/power/process.c > > +++ linux/kernel/power/process.c > > @@ -188,3 +188,22 @@ void thaw_processes(void) > > printk("done.\n"); > > } > > > > +void thaw_kernel_threads(void) > > +{ > > + struct task_struct *g, *p; > > + > > + pm_nosig_freezing = false; > > + printk("Restarting kernel threads ... "); > > + > > + thaw_workqueues(); > > + > > + read_lock(&tasklist_lock); > > + do_each_thread(g, p) { > > + if (p->flags & (PF_KTHREAD | PF_WQ_WORKER)) > > + __thaw_task(p); > > + } while_each_thread(g, p); > > + read_unlock(&tasklist_lock); > > + > > + schedule(); > > + printk("done.\n"); > > +} > > Index: linux/kernel/power/user.c > > =================================================================== > > --- linux.orig/kernel/power/user.c > > +++ linux/kernel/power/user.c > > @@ -274,6 +274,7 @@ static long snapshot_ioctl(struct file * > > swsusp_free(); > > memset(&data->handle, 0, sizeof(struct snapshot_handle)); > > data->ready = 0; > > + thaw_kernel_threads(); > > break; > > > > case SNAPSHOT_PREF_IMAGE_SIZE: > > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-25 23:51 ` Rafael J. Wysocki 2012-01-26 11:05 ` Jiri Slaby @ 2012-01-26 19:22 ` Srivatsa S. Bhat 2012-01-27 1:01 ` Rafael J. Wysocki 2012-01-27 10:04 ` Srivatsa S. Bhat 2 siblings, 1 reply; 24+ messages in thread From: Srivatsa S. Bhat @ 2012-01-26 19:22 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jiri Slaby, Tejun Heo, Jiri Slaby, LKML, Baohua.Song, pavel, Linux PM mailing list On 01/26/2012 05:21 AM, Rafael J. Wysocki wrote: > Hi, >> >> SNAPSHOT_CREATE_IMAGE has a check for data->ready such as: >> >> if (data->mode != O_RDONLY || !data->frozen || data->ready) { >> error = -EPERM; >> break; >> } >> >> data->ready would be set to 1 only under SNAPSHOT_CREATE_IMAGE. However, >> SNAPSHOT_FREE (invoked at the place shown above) will reset the value to 0. >> This makes it possible for hibernation_snapshot() and hence >> freeze_workqueues_begin() to be called a second time, which is unfortunate. > > Yes, I obviously forgot about that code path when I was working on the commit > that introduced the problem. :-( > > Thanks a lot for the great analysis, it's really helpful! > Welcome :-) It was fun! >> And actually, the patch I posted in my previous mail is not really the right >> long-term fix, though it might fix the particular issue that Jiri is facing.. >> >> Because, allowing hibernation_snapshot() to get called a second time while >> kernel threads are still frozen brings us to the same situation that commit >> 2aede851 (PM / Hibernate: Freeze kernel threads after preallocating memory) >> tried to prevent! IOW, a call to hibernate_preallocate_memory() would be >> done inside hibernation_snapshot(), when kernel threads are frozen.. which >> is known to break XFS, to give one example as mentioned in the changelog >> of the above commit. > > That's exactly right. > >> So, the right way to fix this IMHO, would be to split up thaw_processes() >> just like freezing phase: >> >> /* freezes or thaws user space processes */ >> freeze_processes() - thaw_processes() >> >> /* freezes or thaws kernel threads */ >> freeze_kernel_threads() - thaw_kernel_threads() >> >> We have to insert this thaw_kernel_threads() at appropriate places in such a >> way as to not require another ioctl if possible... Then things would be >> more symmetric (and hence more easy to understand) and we can avoid getting >> into strange situations as discussed here. >> >> But before we venture into that, it would be good to know if the patch posted >> in the previous mail fixes the particular problem reported in this thread, >> atleast just to see if there are other problems lurking that we aren't aware >> of yet.. > > Jiri has already said that the patch works. > > I think we could avoid the issue entirely by introducing thaw_kernel_threads > and making SNAPSHOT_FREE call it. No other changes should be necessary. > > IOW, Jiri, does the patch below help? > > [BTW, the freeze_tasks()'s kerneldoc seems to be outdated. Tejun?] > > --- This is exactly the kind of fix I was suggesting.. Thanks Rafael! I have a small request for a comment. Please see below. I have a question too, but for that I'll have to reply to my earlier thread so that I can comment on the userspace code. > include/linux/freezer.h | 2 ++ > kernel/power/process.c | 19 +++++++++++++++++++ > kernel/power/user.c | 1 + > 3 files changed, 22 insertions(+) > > Index: linux/include/linux/freezer.h > =================================================================== > --- linux.orig/include/linux/freezer.h > +++ linux/include/linux/freezer.h > @@ -39,6 +39,7 @@ extern bool __refrigerator(bool check_kt > extern int freeze_processes(void); > extern int freeze_kernel_threads(void); > extern void thaw_processes(void); > +extern void thaw_kernel_threads(void); > > static inline bool try_to_freeze(void) > { > @@ -174,6 +175,7 @@ static inline bool __refrigerator(bool c > static inline int freeze_processes(void) { return -ENOSYS; } > static inline int freeze_kernel_threads(void) { return -ENOSYS; } > static inline void thaw_processes(void) {} > +static inline void thaw_kernel_threads(void) {} > > static inline bool try_to_freeze(void) { return false; } > > Index: linux/kernel/power/process.c > =================================================================== > --- linux.orig/kernel/power/process.c > +++ linux/kernel/power/process.c > @@ -188,3 +188,22 @@ void thaw_processes(void) > printk("done.\n"); > } > > +void thaw_kernel_threads(void) > +{ > + struct task_struct *g, *p; > + > + pm_nosig_freezing = false; > + printk("Restarting kernel threads ... "); > + > + thaw_workqueues(); > + > + read_lock(&tasklist_lock); > + do_each_thread(g, p) { > + if (p->flags & (PF_KTHREAD | PF_WQ_WORKER)) > + __thaw_task(p); > + } while_each_thread(g, p); > + read_unlock(&tasklist_lock); > + > + schedule(); > + printk("done.\n"); > +} > Index: linux/kernel/power/user.c > =================================================================== > --- linux.orig/kernel/power/user.c > +++ linux/kernel/power/user.c > @@ -274,6 +274,7 @@ static long snapshot_ioctl(struct file * > swsusp_free(); > memset(&data->handle, 0, sizeof(struct snapshot_handle)); > data->ready = 0; It would be nice to have a comment here explaining why we call thaw_kernel_threads() here. (Such a comment would avoid confusion when people look at SNAPSHOT_CREATE_IMAGE and SNAPSHOT_FREE and wonder why there is thawing involved, while the corresponding freezing is nowhere in sight.. Of course the freezing is hidden inside hibernation_snapshot(), but that might not be immediately apparent to everyone.) > + thaw_kernel_threads(); > break; > > case SNAPSHOT_PREF_IMAGE_SIZE: Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-26 19:22 ` Srivatsa S. Bhat @ 2012-01-27 1:01 ` Rafael J. Wysocki 0 siblings, 0 replies; 24+ messages in thread From: Rafael J. Wysocki @ 2012-01-27 1:01 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Jiri Slaby, Tejun Heo, Jiri Slaby, LKML, Baohua.Song, pavel, Linux PM mailing list On Thursday, January 26, 2012, Srivatsa S. Bhat wrote: > On 01/26/2012 05:21 AM, Rafael J. Wysocki wrote: > > > Hi, > >> > >> SNAPSHOT_CREATE_IMAGE has a check for data->ready such as: > >> > >> if (data->mode != O_RDONLY || !data->frozen || data->ready) { > >> error = -EPERM; > >> break; > >> } > >> > >> data->ready would be set to 1 only under SNAPSHOT_CREATE_IMAGE. However, > >> SNAPSHOT_FREE (invoked at the place shown above) will reset the value to 0. > >> This makes it possible for hibernation_snapshot() and hence > >> freeze_workqueues_begin() to be called a second time, which is unfortunate. > > > > Yes, I obviously forgot about that code path when I was working on the commit > > that introduced the problem. :-( > > > > Thanks a lot for the great analysis, it's really helpful! > > > > > Welcome :-) It was fun! > > > >> And actually, the patch I posted in my previous mail is not really the right > >> long-term fix, though it might fix the particular issue that Jiri is facing.. > >> > >> Because, allowing hibernation_snapshot() to get called a second time while > >> kernel threads are still frozen brings us to the same situation that commit > >> 2aede851 (PM / Hibernate: Freeze kernel threads after preallocating memory) > >> tried to prevent! IOW, a call to hibernate_preallocate_memory() would be > >> done inside hibernation_snapshot(), when kernel threads are frozen.. which > >> is known to break XFS, to give one example as mentioned in the changelog > >> of the above commit. > > > > That's exactly right. > > > >> So, the right way to fix this IMHO, would be to split up thaw_processes() > >> just like freezing phase: > >> > >> /* freezes or thaws user space processes */ > >> freeze_processes() - thaw_processes() > >> > >> /* freezes or thaws kernel threads */ > >> freeze_kernel_threads() - thaw_kernel_threads() > >> > >> We have to insert this thaw_kernel_threads() at appropriate places in such a > >> way as to not require another ioctl if possible... Then things would be > >> more symmetric (and hence more easy to understand) and we can avoid getting > >> into strange situations as discussed here. > >> > >> But before we venture into that, it would be good to know if the patch posted > >> in the previous mail fixes the particular problem reported in this thread, > >> atleast just to see if there are other problems lurking that we aren't aware > >> of yet.. > > > > Jiri has already said that the patch works. > > > > I think we could avoid the issue entirely by introducing thaw_kernel_threads > > and making SNAPSHOT_FREE call it. No other changes should be necessary. > > > > IOW, Jiri, does the patch below help? > > > > [BTW, the freeze_tasks()'s kerneldoc seems to be outdated. Tejun?] > > > > --- > > > This is exactly the kind of fix I was suggesting.. Thanks Rafael! > > I have a small request for a comment. Please see below. > I have a question too, but for that I'll have to reply to my earlier > thread so that I can comment on the userspace code. > > > include/linux/freezer.h | 2 ++ > > kernel/power/process.c | 19 +++++++++++++++++++ > > kernel/power/user.c | 1 + > > 3 files changed, 22 insertions(+) > > > > Index: linux/include/linux/freezer.h > > =================================================================== > > --- linux.orig/include/linux/freezer.h > > +++ linux/include/linux/freezer.h > > @@ -39,6 +39,7 @@ extern bool __refrigerator(bool check_kt > > extern int freeze_processes(void); > > extern int freeze_kernel_threads(void); > > extern void thaw_processes(void); > > +extern void thaw_kernel_threads(void); > > > > static inline bool try_to_freeze(void) > > { > > @@ -174,6 +175,7 @@ static inline bool __refrigerator(bool c > > static inline int freeze_processes(void) { return -ENOSYS; } > > static inline int freeze_kernel_threads(void) { return -ENOSYS; } > > static inline void thaw_processes(void) {} > > +static inline void thaw_kernel_threads(void) {} > > > > static inline bool try_to_freeze(void) { return false; } > > > > Index: linux/kernel/power/process.c > > =================================================================== > > --- linux.orig/kernel/power/process.c > > +++ linux/kernel/power/process.c > > @@ -188,3 +188,22 @@ void thaw_processes(void) > > printk("done.\n"); > > } > > > > +void thaw_kernel_threads(void) > > +{ > > + struct task_struct *g, *p; > > + > > + pm_nosig_freezing = false; > > + printk("Restarting kernel threads ... "); > > + > > + thaw_workqueues(); > > + > > + read_lock(&tasklist_lock); > > + do_each_thread(g, p) { > > + if (p->flags & (PF_KTHREAD | PF_WQ_WORKER)) > > + __thaw_task(p); > > + } while_each_thread(g, p); > > + read_unlock(&tasklist_lock); > > + > > + schedule(); > > + printk("done.\n"); > > +} > > Index: linux/kernel/power/user.c > > =================================================================== > > --- linux.orig/kernel/power/user.c > > +++ linux/kernel/power/user.c > > @@ -274,6 +274,7 @@ static long snapshot_ioctl(struct file * > > swsusp_free(); > > memset(&data->handle, 0, sizeof(struct snapshot_handle)); > > data->ready = 0; > > > It would be nice to have a comment here explaining why we call > thaw_kernel_threads() here. I agree and I'll add one in the final version. > (Such a comment would avoid confusion when people > look at SNAPSHOT_CREATE_IMAGE and SNAPSHOT_FREE and wonder why there is > thawing involved, while the corresponding freezing is nowhere in sight.. > Of course the freezing is hidden inside hibernation_snapshot(), but that > might not be immediately apparent to everyone.) Sure. > > + thaw_kernel_threads(); > > > break; > > > > > case SNAPSHOT_PREF_IMAGE_SIZE: Thanks, Rafael ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-25 23:51 ` Rafael J. Wysocki 2012-01-26 11:05 ` Jiri Slaby 2012-01-26 19:22 ` Srivatsa S. Bhat @ 2012-01-27 10:04 ` Srivatsa S. Bhat 2012-01-27 22:44 ` Rafael J. Wysocki 2 siblings, 1 reply; 24+ messages in thread From: Srivatsa S. Bhat @ 2012-01-27 10:04 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jiri Slaby, Tejun Heo, Jiri Slaby, LKML, Baohua.Song, pavel, Linux PM mailing list On 01/26/2012 05:21 AM, Rafael J. Wysocki wrote: > Jiri has already said that the patch works. > > I think we could avoid the issue entirely by introducing thaw_kernel_threads > and making SNAPSHOT_FREE call it. No other changes should be necessary. > > IOW, Jiri, does the patch below help? > > [BTW, the freeze_tasks()'s kerneldoc seems to be outdated. Tejun?] > > --- Rafael, thanks for the clarification in the other mail. Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> I think we need to get this into 3.2-stable as well, as Jiri mentioned. Regards, Srivatsa S. Bhat > include/linux/freezer.h | 2 ++ > kernel/power/process.c | 19 +++++++++++++++++++ > kernel/power/user.c | 1 + > 3 files changed, 22 insertions(+) > > Index: linux/include/linux/freezer.h > =================================================================== > --- linux.orig/include/linux/freezer.h > +++ linux/include/linux/freezer.h > @@ -39,6 +39,7 @@ extern bool __refrigerator(bool check_kt > extern int freeze_processes(void); > extern int freeze_kernel_threads(void); > extern void thaw_processes(void); > +extern void thaw_kernel_threads(void); > > static inline bool try_to_freeze(void) > { > @@ -174,6 +175,7 @@ static inline bool __refrigerator(bool c > static inline int freeze_processes(void) { return -ENOSYS; } > static inline int freeze_kernel_threads(void) { return -ENOSYS; } > static inline void thaw_processes(void) {} > +static inline void thaw_kernel_threads(void) {} > > static inline bool try_to_freeze(void) { return false; } > > Index: linux/kernel/power/process.c > =================================================================== > --- linux.orig/kernel/power/process.c > +++ linux/kernel/power/process.c > @@ -188,3 +188,22 @@ void thaw_processes(void) > printk("done.\n"); > } > > +void thaw_kernel_threads(void) > +{ > + struct task_struct *g, *p; > + > + pm_nosig_freezing = false; > + printk("Restarting kernel threads ... "); > + > + thaw_workqueues(); > + > + read_lock(&tasklist_lock); > + do_each_thread(g, p) { > + if (p->flags & (PF_KTHREAD | PF_WQ_WORKER)) > + __thaw_task(p); > + } while_each_thread(g, p); > + read_unlock(&tasklist_lock); > + > + schedule(); > + printk("done.\n"); > +} > Index: linux/kernel/power/user.c > =================================================================== > --- linux.orig/kernel/power/user.c > +++ linux/kernel/power/user.c > @@ -274,6 +274,7 @@ static long snapshot_ioctl(struct file * > swsusp_free(); > memset(&data->handle, 0, sizeof(struct snapshot_handle)); > data->ready = 0; > + thaw_kernel_threads(); > break; > > case SNAPSHOT_PREF_IMAGE_SIZE: > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-27 10:04 ` Srivatsa S. Bhat @ 2012-01-27 22:44 ` Rafael J. Wysocki 2012-01-28 15:41 ` Srivatsa S. Bhat 0 siblings, 1 reply; 24+ messages in thread From: Rafael J. Wysocki @ 2012-01-27 22:44 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Jiri Slaby, Tejun Heo, Jiri Slaby, LKML, Baohua.Song, pavel, Linux PM mailing list On Friday, January 27, 2012, Srivatsa S. Bhat wrote: > On 01/26/2012 05:21 AM, Rafael J. Wysocki wrote: > > > Jiri has already said that the patch works. > > > > I think we could avoid the issue entirely by introducing thaw_kernel_threads > > and making SNAPSHOT_FREE call it. No other changes should be necessary. > > > > IOW, Jiri, does the patch below help? > > > > [BTW, the freeze_tasks()'s kerneldoc seems to be outdated. Tejun?] > > > > --- > > > Rafael, thanks for the clarification in the other mail. > > Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> > > I think we need to get this into 3.2-stable as well, as Jiri mentioned. Yes, we do. Below it goes again with the missing comment in user.c and a changelog. Please let me know if there's anything wrong with it, otherwise I'm going to push it to Linus in a couple of days. Thanks, Rafael --- From: Rafael J. Wysocki <rjw@sisk.pl> Subject: PM / Hibernate: Fix s2disk regression related to freezing workqueues Commit 2aede851ddf08666f68ffc17be446420e9d2a056 PM / Hibernate: Freeze kernel threads after preallocating memory introduced a mechanism by which kernel threads were frozen after the preallocation of hibernate image memory to avoid problems with frozen kernel threads not responding to memory freeing requests. However, it overlooked the s2disk code path in which the SNAPSHOT_CREATE_IMAGE ioctl was run directly after SNAPSHOT_FREE, which caused freeze_workqueues_begin() to BUG(), because it saw that worqueues had been already frozen. Although in principle this issue might be addressed by removing the relevant BUG_ON() from freeze_workqueues_begin(), that would reintroduce the very problem that commit 2aede851ddf08666f68ffc17be4 attempted to avoid into that particular code path. For this reason, to fix the issue at hand, introduce thaw_kernel_threads() and make the SNAPSHOT_FREE ioctl execute it. Special thanks to Srivatsa S. Bhat for detailed analysis of the problem. Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> --- include/linux/freezer.h | 2 ++ kernel/power/process.c | 19 +++++++++++++++++++ kernel/power/user.c | 9 +++++++++ 3 files changed, 30 insertions(+) Index: linux/include/linux/freezer.h =================================================================== --- linux.orig/include/linux/freezer.h +++ linux/include/linux/freezer.h @@ -39,6 +39,7 @@ extern bool __refrigerator(bool check_kt extern int freeze_processes(void); extern int freeze_kernel_threads(void); extern void thaw_processes(void); +extern void thaw_kernel_threads(void); static inline bool try_to_freeze(void) { @@ -174,6 +175,7 @@ static inline bool __refrigerator(bool c static inline int freeze_processes(void) { return -ENOSYS; } static inline int freeze_kernel_threads(void) { return -ENOSYS; } static inline void thaw_processes(void) {} +static inline void thaw_kernel_threads(void) {} static inline bool try_to_freeze(void) { return false; } Index: linux/kernel/power/process.c =================================================================== --- linux.orig/kernel/power/process.c +++ linux/kernel/power/process.c @@ -188,3 +188,22 @@ void thaw_processes(void) printk("done.\n"); } +void thaw_kernel_threads(void) +{ + struct task_struct *g, *p; + + pm_nosig_freezing = false; + printk("Restarting kernel threads ... "); + + thaw_workqueues(); + + read_lock(&tasklist_lock); + do_each_thread(g, p) { + if (p->flags & (PF_KTHREAD | PF_WQ_WORKER)) + __thaw_task(p); + } while_each_thread(g, p); + read_unlock(&tasklist_lock); + + schedule(); + printk("done.\n"); +} Index: linux/kernel/power/user.c =================================================================== --- linux.orig/kernel/power/user.c +++ linux/kernel/power/user.c @@ -274,6 +274,15 @@ static long snapshot_ioctl(struct file * swsusp_free(); memset(&data->handle, 0, sizeof(struct snapshot_handle)); data->ready = 0; + /* + * It is necessary to thaw kernel threads here, because + * SNAPSHOT_CREATE_IMAGE may be invoked directly after + * SNAPSHOT_FREE. In that case, if kernel threads were not + * thawed, the preallocation of memory carried out by + * hibernation_snapshot() might run into problems (i.e. it + * might fail or even deadlock). + */ + thaw_kernel_threads(); break; case SNAPSHOT_PREF_IMAGE_SIZE: ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-27 22:44 ` Rafael J. Wysocki @ 2012-01-28 15:41 ` Srivatsa S. Bhat 2012-01-29 0:14 ` Rafael J. Wysocki 0 siblings, 1 reply; 24+ messages in thread From: Srivatsa S. Bhat @ 2012-01-28 15:41 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jiri Slaby, Tejun Heo, Jiri Slaby, LKML, Baohua.Song, pavel, Linux PM mailing list On 01/28/2012 04:14 AM, Rafael J. Wysocki wrote: >> Rafael, thanks for the clarification in the other mail. >> >> Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> >> >> I think we need to get this into 3.2-stable as well, as Jiri mentioned. > > Yes, we do. > > Below it goes again with the missing comment in user.c and a changelog. > > Please let me know if there's anything wrong with it, otherwise I'm going > to push it to Linus in a couple of days. > > Thanks, > Rafael > > --- > From: Rafael J. Wysocki <rjw@sisk.pl> > Subject: PM / Hibernate: Fix s2disk regression related to freezing workqueues > > Commit 2aede851ddf08666f68ffc17be446420e9d2a056 > > PM / Hibernate: Freeze kernel threads after preallocating memory > > introduced a mechanism by which kernel threads were frozen after > the preallocation of hibernate image memory to avoid problems with > frozen kernel threads not responding to memory freeing requests. > However, it overlooked the s2disk code path in which the > SNAPSHOT_CREATE_IMAGE ioctl was run directly after SNAPSHOT_FREE, > which caused freeze_workqueues_begin() to BUG(), because it saw > that worqueues had been already frozen. > > Although in principle this issue might be addressed by removing > the relevant BUG_ON() from freeze_workqueues_begin(), that would > reintroduce the very problem that commit 2aede851ddf08666f68ffc17be4 > attempted to avoid into that particular code path. For this reason, > to fix the issue at hand, introduce thaw_kernel_threads() and make > the SNAPSHOT_FREE ioctl execute it. > > Special thanks to Srivatsa S. Bhat for detailed analysis of the > problem. > > Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> > Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: stable@vger.kernel.org ? Other than that, the patch is perfect! Regards, Srivatsa S. Bhat > --- > include/linux/freezer.h | 2 ++ > kernel/power/process.c | 19 +++++++++++++++++++ > kernel/power/user.c | 9 +++++++++ > 3 files changed, 30 insertions(+) > > Index: linux/include/linux/freezer.h > =================================================================== > --- linux.orig/include/linux/freezer.h > +++ linux/include/linux/freezer.h > @@ -39,6 +39,7 @@ extern bool __refrigerator(bool check_kt > extern int freeze_processes(void); > extern int freeze_kernel_threads(void); > extern void thaw_processes(void); > +extern void thaw_kernel_threads(void); > > static inline bool try_to_freeze(void) > { > @@ -174,6 +175,7 @@ static inline bool __refrigerator(bool c > static inline int freeze_processes(void) { return -ENOSYS; } > static inline int freeze_kernel_threads(void) { return -ENOSYS; } > static inline void thaw_processes(void) {} > +static inline void thaw_kernel_threads(void) {} > > static inline bool try_to_freeze(void) { return false; } > > Index: linux/kernel/power/process.c > =================================================================== > --- linux.orig/kernel/power/process.c > +++ linux/kernel/power/process.c > @@ -188,3 +188,22 @@ void thaw_processes(void) > printk("done.\n"); > } > > +void thaw_kernel_threads(void) > +{ > + struct task_struct *g, *p; > + > + pm_nosig_freezing = false; > + printk("Restarting kernel threads ... "); > + > + thaw_workqueues(); > + > + read_lock(&tasklist_lock); > + do_each_thread(g, p) { > + if (p->flags & (PF_KTHREAD | PF_WQ_WORKER)) > + __thaw_task(p); > + } while_each_thread(g, p); > + read_unlock(&tasklist_lock); > + > + schedule(); > + printk("done.\n"); > +} > Index: linux/kernel/power/user.c > =================================================================== > --- linux.orig/kernel/power/user.c > +++ linux/kernel/power/user.c > @@ -274,6 +274,15 @@ static long snapshot_ioctl(struct file * > swsusp_free(); > memset(&data->handle, 0, sizeof(struct snapshot_handle)); > data->ready = 0; > + /* > + * It is necessary to thaw kernel threads here, because > + * SNAPSHOT_CREATE_IMAGE may be invoked directly after > + * SNAPSHOT_FREE. In that case, if kernel threads were not > + * thawed, the preallocation of memory carried out by > + * hibernation_snapshot() might run into problems (i.e. it > + * might fail or even deadlock). > + */ > + thaw_kernel_threads(); > break; > > case SNAPSHOT_PREF_IMAGE_SIZE: ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-28 15:41 ` Srivatsa S. Bhat @ 2012-01-29 0:14 ` Rafael J. Wysocki 0 siblings, 0 replies; 24+ messages in thread From: Rafael J. Wysocki @ 2012-01-29 0:14 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Jiri Slaby, Tejun Heo, Jiri Slaby, LKML, Baohua.Song, pavel, Linux PM mailing list On Saturday, January 28, 2012, Srivatsa S. Bhat wrote: > On 01/28/2012 04:14 AM, Rafael J. Wysocki wrote: > > >> Rafael, thanks for the clarification in the other mail. > >> > >> Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> > >> > >> I think we need to get this into 3.2-stable as well, as Jiri mentioned. > > > > Yes, we do. > > > > Below it goes again with the missing comment in user.c and a changelog. > > > > Please let me know if there's anything wrong with it, otherwise I'm going > > to push it to Linus in a couple of days. > > > > Thanks, > > Rafael > > > > --- > > From: Rafael J. Wysocki <rjw@sisk.pl> > > Subject: PM / Hibernate: Fix s2disk regression related to freezing workqueues > > > > Commit 2aede851ddf08666f68ffc17be446420e9d2a056 > > > > PM / Hibernate: Freeze kernel threads after preallocating memory > > > > introduced a mechanism by which kernel threads were frozen after > > the preallocation of hibernate image memory to avoid problems with > > frozen kernel threads not responding to memory freeing requests. > > However, it overlooked the s2disk code path in which the > > SNAPSHOT_CREATE_IMAGE ioctl was run directly after SNAPSHOT_FREE, > > which caused freeze_workqueues_begin() to BUG(), because it saw > > that worqueues had been already frozen. > > > > Although in principle this issue might be addressed by removing > > the relevant BUG_ON() from freeze_workqueues_begin(), that would > > reintroduce the very problem that commit 2aede851ddf08666f68ffc17be4 > > attempted to avoid into that particular code path. For this reason, > > to fix the issue at hand, introduce thaw_kernel_threads() and make > > the SNAPSHOT_FREE ioctl execute it. > > > > Special thanks to Srivatsa S. Bhat for detailed analysis of the > > problem. > > > > Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz> > > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> > > Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> > > > Cc: stable@vger.kernel.org ? That's going to show up in the actual commit. > Other than that, the patch is perfect! Cool, thanks! Rafael > > --- > > include/linux/freezer.h | 2 ++ > > kernel/power/process.c | 19 +++++++++++++++++++ > > kernel/power/user.c | 9 +++++++++ > > 3 files changed, 30 insertions(+) > > > > Index: linux/include/linux/freezer.h > > =================================================================== > > --- linux.orig/include/linux/freezer.h > > +++ linux/include/linux/freezer.h > > @@ -39,6 +39,7 @@ extern bool __refrigerator(bool check_kt > > extern int freeze_processes(void); > > extern int freeze_kernel_threads(void); > > extern void thaw_processes(void); > > +extern void thaw_kernel_threads(void); > > > > static inline bool try_to_freeze(void) > > { > > @@ -174,6 +175,7 @@ static inline bool __refrigerator(bool c > > static inline int freeze_processes(void) { return -ENOSYS; } > > static inline int freeze_kernel_threads(void) { return -ENOSYS; } > > static inline void thaw_processes(void) {} > > +static inline void thaw_kernel_threads(void) {} > > > > static inline bool try_to_freeze(void) { return false; } > > > > Index: linux/kernel/power/process.c > > =================================================================== > > --- linux.orig/kernel/power/process.c > > +++ linux/kernel/power/process.c > > @@ -188,3 +188,22 @@ void thaw_processes(void) > > printk("done.\n"); > > } > > > > +void thaw_kernel_threads(void) > > +{ > > + struct task_struct *g, *p; > > + > > + pm_nosig_freezing = false; > > + printk("Restarting kernel threads ... "); > > + > > + thaw_workqueues(); > > + > > + read_lock(&tasklist_lock); > > + do_each_thread(g, p) { > > + if (p->flags & (PF_KTHREAD | PF_WQ_WORKER)) > > + __thaw_task(p); > > + } while_each_thread(g, p); > > + read_unlock(&tasklist_lock); > > + > > + schedule(); > > + printk("done.\n"); > > +} > > Index: linux/kernel/power/user.c > > =================================================================== > > --- linux.orig/kernel/power/user.c > > +++ linux/kernel/power/user.c > > @@ -274,6 +274,15 @@ static long snapshot_ioctl(struct file * > > swsusp_free(); > > memset(&data->handle, 0, sizeof(struct snapshot_handle)); > > data->ready = 0; > > + /* > > + * It is necessary to thaw kernel threads here, because > > + * SNAPSHOT_CREATE_IMAGE may be invoked directly after > > + * SNAPSHOT_FREE. In that case, if kernel threads were not > > + * thawed, the preallocation of memory carried out by > > + * hibernation_snapshot() might run into problems (i.e. it > > + * might fail or even deadlock). > > + */ > > + thaw_kernel_threads(); > > break; > > > > case SNAPSHOT_PREF_IMAGE_SIZE: > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-pm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-25 16:00 ` Srivatsa S. Bhat 2012-01-25 18:44 ` Srivatsa S. Bhat @ 2012-01-25 22:00 ` Jiri Slaby 1 sibling, 0 replies; 24+ messages in thread From: Jiri Slaby @ 2012-01-25 22:00 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Rafael J. Wysocki, Linux-pm mailing list, Jiri Slaby, LKML, Baohua.Song, Tejun Heo, pavel On 01/25/2012 05:00 PM, Srivatsa S. Bhat wrote: > So, Jiri, can you please try the following patch and see if it works for > you as expected? I'll be happy to provide a formal patch with a changelog > if this works. FWIW it works, thanks. If you want me test a proper fix, jsut let me know. > --- > kernel/workqueue.c | 5 ++++- > 1 files changed, 4 insertions(+), 1 deletions(-) > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index bec7b5b..cb26c5d 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -3656,7 +3656,9 @@ void freeze_workqueues_begin(void) > > spin_lock(&workqueue_lock); > > - BUG_ON(workqueue_freezing); > + if (workqueue_freezing) > + goto out_unlock; > + > workqueue_freezing = true; > > for_each_gcwq_cpu(cpu) { > @@ -3678,6 +3680,7 @@ void freeze_workqueues_begin(void) > spin_unlock_irq(&gcwq->lock); > } > > +out_unlock: > spin_unlock(&workqueue_lock); > } > -- js ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-25 15:31 ` Srivatsa S. Bhat 2012-01-25 16:00 ` Srivatsa S. Bhat @ 2012-01-26 19:39 ` Srivatsa S. Bhat 2012-01-27 1:10 ` Rafael J. Wysocki 1 sibling, 1 reply; 24+ messages in thread From: Srivatsa S. Bhat @ 2012-01-26 19:39 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jiri Slaby, Linux-pm mailing list, Jiri Slaby, LKML, Baohua.Song, Tejun Heo, pavel Hi Rafael, On 01/25/2012 09:01 PM, Srivatsa S. Bhat wrote: > > Ok, I will need to quote a part of the userspace utility to explain the > problem. > > In suspend.c inside the suspend-utils userspace package, I see a loop such > as: > > error = freeze(snapshot_fd); > ... > attempts = 2; > do { > if (set_image_size(snapshot_fd, image_size)) { > error = errno; > break; > } > if (atomic_snapshot(snapshot_fd, &in_suspend)) { > error = errno; > break; > } > if (!in_suspend) { > /* first unblank the console, see console_codes(4) */ > printf("\e[13]"); > printf("%s: returned to userspace\n", my_name); > free_snapshot(snapshot_fd); > break; > } > > error = write_image(snapshot_fd, resume_fd, -1); > if (error) { > free_swap_pages(snapshot_fd); > free_snapshot(snapshot_fd); > image_size = 0; > error = -error; > if (error != ENOSPC) > break; > } else { > splash.progress(100); > #ifdef CONFIG_BOTH > if (s2ram_kms || s2ram) { > /* If we die (and allow system to continue) > * between now and reset_signature(), very bad > * things will happen. */ > error = suspend_to_ram(snapshot_fd); > if (error) > goto Shutdown; > reset_signature(resume_fd); > free_swap_pages(snapshot_fd); > free_snapshot(snapshot_fd); > if (!s2ram_kms) > s2ram_resume(); Your patch alters how SNAPSHOT_FREE (IOW, free_snapshot() in this utility) is handled. So, I was trying to see if there are any points of concern... In the above code, s2ram_resume() gets invoked after free_snapshot(). Will that pose any problems because kernel threads would have been thawed at that point, after applying your patch? And other than that, do you foresee any problems arising from the change caused to SNAPSHOT_FREE by your patch? I mean, s2ram/s2disk/suspend-utils package are not the only userspace utilities after all... so I just wanted to ensure that we don't over-fit our solution to this particular utility and end up breaking others... Regards, Srivatsa S. Bhat > goto Unfreeze; > } > Shutdown: > #endif > close(resume_fd); > suspend_shutdown(snapshot_fd); > } > } while (--attempts); > > ... > Unfreeze: > unfreeze(snapshot_fd); > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-26 19:39 ` Srivatsa S. Bhat @ 2012-01-27 1:10 ` Rafael J. Wysocki 0 siblings, 0 replies; 24+ messages in thread From: Rafael J. Wysocki @ 2012-01-27 1:10 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Jiri Slaby, Linux-pm mailing list, Jiri Slaby, LKML, Baohua.Song, Tejun Heo, pavel On Thursday, January 26, 2012, Srivatsa S. Bhat wrote: > > Hi Rafael, > > On 01/25/2012 09:01 PM, Srivatsa S. Bhat wrote: > > > > > Ok, I will need to quote a part of the userspace utility to explain the > > problem. > > > > In suspend.c inside the suspend-utils userspace package, I see a loop such > > as: > > > > error = freeze(snapshot_fd); > > ... > > attempts = 2; > > do { > > if (set_image_size(snapshot_fd, image_size)) { > > error = errno; > > break; > > } > > if (atomic_snapshot(snapshot_fd, &in_suspend)) { > > error = errno; > > break; > > } > > if (!in_suspend) { > > /* first unblank the console, see console_codes(4) */ > > printf("\e[13]"); > > printf("%s: returned to userspace\n", my_name); > > free_snapshot(snapshot_fd); > > break; > > } > > > > error = write_image(snapshot_fd, resume_fd, -1); > > if (error) { > > free_swap_pages(snapshot_fd); > > free_snapshot(snapshot_fd); > > image_size = 0; > > error = -error; > > if (error != ENOSPC) > > break; > > } else { > > splash.progress(100); > > #ifdef CONFIG_BOTH > > if (s2ram_kms || s2ram) { > > /* If we die (and allow system to continue) > > * between now and reset_signature(), very bad > > * things will happen. */ > > error = suspend_to_ram(snapshot_fd); > > if (error) > > goto Shutdown; > > reset_signature(resume_fd); > > free_swap_pages(snapshot_fd); > > free_snapshot(snapshot_fd); > > if (!s2ram_kms) > > s2ram_resume(); > > > Your patch alters how SNAPSHOT_FREE (IOW, free_snapshot() in this utility) is > handled. So, I was trying to see if there are any points of concern... > > In the above code, s2ram_resume() gets invoked after free_snapshot(). Will that > pose any problems because kernel threads would have been thawed at that point, > after applying your patch? No, it shouldn't. s2ram_resume() only executes quirks needed to restore the state of graphics if KMS is not being used. That shouldn't interfere with any kernel threads. > And other than that, do you foresee any problems arising from the change caused > to SNAPSHOT_FREE by your patch? I mean, s2ram/s2disk/suspend-utils package are > not the only userspace utilities after all... so I just wanted to ensure that > we don't over-fit our solution to this particular utility and end up breaking > others... I'm quite sure they are the only package using the interface in kernel/power/user.c. At least, I'm not aware of any other users. :-) Thanks, Rafael ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2012-01-29 0:11 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-01-24 15:05 PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 Jiri Slaby 2012-01-24 16:18 ` [linux-pm] " Srivatsa S. Bhat 2012-01-24 16:24 ` Jiri Slaby 2012-01-24 22:36 ` Rafael J. Wysocki 2012-01-24 22:47 ` Jiri Slaby 2012-01-24 23:02 ` Rafael J. Wysocki 2012-01-25 0:04 ` Jiri Slaby 2012-01-25 0:10 ` Rafael J. Wysocki 2012-01-25 14:25 ` Jiri Slaby 2012-01-25 15:31 ` Srivatsa S. Bhat 2012-01-25 16:00 ` Srivatsa S. Bhat 2012-01-25 18:44 ` Srivatsa S. Bhat 2012-01-25 23:51 ` Rafael J. Wysocki 2012-01-26 11:05 ` Jiri Slaby 2012-01-27 1:01 ` Rafael J. Wysocki 2012-01-26 19:22 ` Srivatsa S. Bhat 2012-01-27 1:01 ` Rafael J. Wysocki 2012-01-27 10:04 ` Srivatsa S. Bhat 2012-01-27 22:44 ` Rafael J. Wysocki 2012-01-28 15:41 ` Srivatsa S. Bhat 2012-01-29 0:14 ` Rafael J. Wysocki 2012-01-25 22:00 ` Jiri Slaby 2012-01-26 19:39 ` Srivatsa S. Bhat 2012-01-27 1:10 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).