* [2.6.21.1] soft lockup when removing netconsole module @ 2007-05-26 15:40 Folkert van Heusden 2007-05-26 15:53 ` Parag Warudkar 2007-05-29 7:56 ` Andrew Morton 0 siblings, 2 replies; 18+ messages in thread From: Folkert van Heusden @ 2007-05-26 15:40 UTC (permalink / raw) To: linux-kernel Hi, When trying to remove the netconsole module, I got the following kernel output after a while (couple of minutes iirc): [525720.117293] BUG: soft lockup detected on CPU#1! [525720.117353] [<c1004d53>] show_trace_log_lvl+0x1a/0x30 [525720.117439] [<c1004d7b>] show_trace+0x12/0x14 [525720.117526] [<c1004e75>] dump_stack+0x16/0x18 [525720.117613] [<c104dd5b>] softlockup_tick+0xa6/0xc2 [525720.117694] [<c1026855>] run_local_timers+0x12/0x14 [525720.117738] [<c1026669>] update_process_times+0x72/0xa1 [525720.117744] [<c1038673>] tick_sched_timer+0x53/0xb6 [525720.117748] [<c1033d62>] hrtimer_interrupt+0x189/0x1e3 [525720.117753] [<c100e9e2>] local_apic_timer_interrupt+0x55/0x5b [525720.117761] [<c100ea12>] smp_apic_timer_interrupt+0x2a/0x39 [525720.117766] [<c1004a3f>] apic_timer_interrupt+0x33/0x38 [525720.117770] [<c120f4b1>] mutex_lock+0x8/0xa [525720.117775] [<c102d2f0>] flush_workqueue+0x2f/0x8f [525720.117780] [<c102d7a0>] cancel_rearming_delayed_workqueue+0x29/0x2b [525720.117785] [<c102d7b1>] cancel_rearming_delayed_work+0xf/0x11 [525720.117790] [<c11be143>] netpoll_cleanup+0x75/0xa5 [525720.117794] [<f893712d>] cleanup_netconsole+0x17/0x1a [netconsole] [525720.117804] [<c1041f11>] sys_delete_module+0x12f/0x14f [525720.117809] [<c1003f74>] syscall_call+0x7/0xb [525720.117812] ======================= Also the rmmod hangs and would not exit even with kill -9. It also sucks up 100% cpu. Folkert van Heusden -- MultiTail es una herramienta flexibele para consiguir archivos de log, y para ejecutar ordenes. Filtrar, añadir colores, merger y vista de las differencias. http://www.vanheusden.com/multitail/ ---------------------------------------------------------------------- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [2.6.21.1] soft lockup when removing netconsole module 2007-05-26 15:40 [2.6.21.1] soft lockup when removing netconsole module Folkert van Heusden @ 2007-05-26 15:53 ` Parag Warudkar 2007-05-26 16:12 ` Thomas Gleixner 2007-05-29 7:56 ` Andrew Morton 1 sibling, 1 reply; 18+ messages in thread From: Parag Warudkar @ 2007-05-26 15:53 UTC (permalink / raw) To: Folkert van Heusden; +Cc: linux-kernel, tglx Seems to be hrtimers related - CC'ed Thomas. Parag Folkert van Heusden wrote: > Hi, > > When trying to remove the netconsole module, I got the following kernel > output after a while (couple of minutes iirc): > > [525720.117293] BUG: soft lockup detected on CPU#1! > [525720.117353] [<c1004d53>] show_trace_log_lvl+0x1a/0x30 > [525720.117439] [<c1004d7b>] show_trace+0x12/0x14 > [525720.117526] [<c1004e75>] dump_stack+0x16/0x18 > [525720.117613] [<c104dd5b>] softlockup_tick+0xa6/0xc2 > [525720.117694] [<c1026855>] run_local_timers+0x12/0x14 > [525720.117738] [<c1026669>] update_process_times+0x72/0xa1 > [525720.117744] [<c1038673>] tick_sched_timer+0x53/0xb6 > [525720.117748] [<c1033d62>] hrtimer_interrupt+0x189/0x1e3 > [525720.117753] [<c100e9e2>] local_apic_timer_interrupt+0x55/0x5b > [525720.117761] [<c100ea12>] smp_apic_timer_interrupt+0x2a/0x39 > [525720.117766] [<c1004a3f>] apic_timer_interrupt+0x33/0x38 > [525720.117770] [<c120f4b1>] mutex_lock+0x8/0xa > [525720.117775] [<c102d2f0>] flush_workqueue+0x2f/0x8f > [525720.117780] [<c102d7a0>] cancel_rearming_delayed_workqueue+0x29/0x2b > [525720.117785] [<c102d7b1>] cancel_rearming_delayed_work+0xf/0x11 > [525720.117790] [<c11be143>] netpoll_cleanup+0x75/0xa5 > [525720.117794] [<f893712d>] cleanup_netconsole+0x17/0x1a [netconsole] > [525720.117804] [<c1041f11>] sys_delete_module+0x12f/0x14f > [525720.117809] [<c1003f74>] syscall_call+0x7/0xb > [525720.117812] ======================= > > Also the rmmod hangs and would not exit even with kill -9. It also > sucks up 100% cpu. > > > Folkert van Heusden > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [2.6.21.1] soft lockup when removing netconsole module 2007-05-26 15:53 ` Parag Warudkar @ 2007-05-26 16:12 ` Thomas Gleixner 2007-05-26 16:17 ` Folkert van Heusden 0 siblings, 1 reply; 18+ messages in thread From: Thomas Gleixner @ 2007-05-26 16:12 UTC (permalink / raw) To: Parag Warudkar; +Cc: Folkert van Heusden, linux-kernel On Sat, 2007-05-26 at 11:53 -0400, Parag Warudkar wrote: > Seems to be hrtimers related - CC'ed Thomas. I doubt that. The tick interrupt just finds out, that the machine is stuck in rmmod. > > Also the rmmod hangs and would not exit even with kill -9. It also > > sucks up 100% cpu. Can you please enable CONFIG_PROVE_LOCKING ? Thanks, tglx ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [2.6.21.1] soft lockup when removing netconsole module 2007-05-26 16:12 ` Thomas Gleixner @ 2007-05-26 16:17 ` Folkert van Heusden 2007-05-26 16:35 ` Thomas Gleixner 0 siblings, 1 reply; 18+ messages in thread From: Folkert van Heusden @ 2007-05-26 16:17 UTC (permalink / raw) To: Thomas Gleixner; +Cc: Parag Warudkar, linux-kernel > > Seems to be hrtimers related - CC'ed Thomas. > I doubt that. The tick interrupt just finds out, that the machine is > stuck in rmmod. > > > Also the rmmod hangs and would not exit even with kill -9. It also > > > sucks up 100% cpu. > Can you please enable CONFIG_PROVE_LOCKING ? Luckily I already had :-) Here is its output: [ 44.500182] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar [ 44.500228] ... MAX_LOCKDEP_SUBCLASSES: 8 [ 44.500266] ... MAX_LOCK_DEPTH: 30 [ 44.500305] ... MAX_LOCKDEP_KEYS: 2048 [ 44.500343] ... CLASSHASH_SIZE: 1024 [ 44.500383] ... MAX_LOCKDEP_ENTRIES: 8192 [ 44.500421] ... MAX_LOCKDEP_CHAINS: 16384 [ 44.500459] ... CHAINHASH_SIZE: 8192 [ 44.500498] memory used by lock dependency info: 1096 kB [ 44.500537] per task-struct memory footprint: 1200 bytes [ 44.500576] ------------------------ [ 44.500614] | Locking API testsuite: [ 44.500651] ---------------------------------------------------------------------------- [ 44.500697] | spin |wlock |rlock |mutex | wsem | rsem | [ 44.500743] -------------------------------------------------------------------------- [ 44.500794] A-A deadlock: ok | ok | ok | ok | ok | ok | [ 44.501739] A-B-B-A deadlock: ok | ok | ok | ok | ok | ok | [ 44.502675] A-B-B-C-C-A deadlock: ok | ok | ok | ok | ok | ok | [ 44.503617] A-B-C-A-B-C deadlock: ok | ok | ok | ok | ok | ok | [ 44.504557] A-B-B-C-C-D-D-A deadlock: ok | ok | ok | ok | ok | ok | [ 44.505509] A-B-C-D-B-D-D-A deadlock: ok | ok | ok | ok | ok | ok | [ 44.506469] A-B-C-D-B-C-D-A deadlock: ok | ok | ok | ok | ok | ok | [ 44.507424] double unlock: ok | ok | ok | ok | ok | ok | [ 44.508327] initialize held: ok | ok | ok | ok | ok | ok | [ 44.509245] bad unlock order: ok | ok | ok | ok | ok | ok | [ 44.510197] -------------------------------------------------------------------------- [ 44.510243] recursive read-lock: | ok | | ok | [ 44.510656] recursive read-lock #2: | ok | | ok | [ 44.511069] mixed read-write-lock: | ok | | ok | [ 44.511507] mixed write-read-lock: | ok | | ok | [ 44.511968] -------------------------------------------------------------------------- [ 44.512015] hard-irqs-on + irq-safe-A/12: ok | ok | ok | [ 44.512501] soft-irqs-on + irq-safe-A/12: ok | ok | ok | [ 44.512998] hard-irqs-on + irq-safe-A/21: ok | ok | ok | [ 44.513484] soft-irqs-on + irq-safe-A/21: ok | ok | ok | [ 44.513970] sirq-safe-A => hirqs-on/12: ok | ok | ok | [ 44.514456] sirq-safe-A => hirqs-on/21: ok | ok | ok | [ 44.514941] hard-safe-A + irqs-on/12: ok | ok | ok | [ 44.515425] soft-safe-A + irqs-on/12: ok | ok | ok | [ 44.515910] hard-safe-A + irqs-on/21: ok | ok | ok | [ 44.516396] soft-safe-A + irqs-on/21: ok | ok | ok | [ 44.516882] hard-safe-A + unsafe-B #1/123: ok | ok | ok | [ 44.517380] soft-safe-A + unsafe-B #1/123: ok | ok | ok | [ 44.517924] hard-safe-A + unsafe-B #1/132: ok | ok | ok | [ 44.518417] soft-safe-A + unsafe-B #1/132: ok | ok | ok | [ 44.518909] hard-safe-A + unsafe-B #1/213: ok | ok | ok | [ 44.519402] soft-safe-A + unsafe-B #1/213: ok | ok | ok | [ 44.519896] hard-safe-A + unsafe-B #1/231: ok | ok | ok | [ 44.520388] soft-safe-A + unsafe-B #1/231: ok | ok | ok | [ 44.520881] hard-safe-A + unsafe-B #1/312: ok | ok | ok | [ 44.521388] soft-safe-A + unsafe-B #1/312: ok | ok | ok | [ 44.521886] hard-safe-A + unsafe-B #1/321: ok | ok | ok | [ 44.522389] soft-safe-A + unsafe-B #1/321: ok | ok | ok | [ 44.522907] hard-safe-A + unsafe-B #2/123: ok | ok | ok | [ 44.523402] soft-safe-A + unsafe-B #2/123: ok | ok | ok | [ 44.523894] hard-safe-A + unsafe-B #2/132: ok | ok | ok | [ 44.524387] soft-safe-A + unsafe-B #2/132: ok | ok | ok | [ 44.524880] hard-safe-A + unsafe-B #2/213: ok | ok | ok | [ 44.525373] soft-safe-A + unsafe-B #2/213: ok | ok | ok | [ 44.525866] hard-safe-A + unsafe-B #2/231: ok | ok | ok | [ 44.526359] soft-safe-A + unsafe-B #2/231: ok | ok | ok | [ 44.526850] hard-safe-A + unsafe-B #2/312: ok | ok | ok | [ 44.527342] soft-safe-A + unsafe-B #2/312: ok | ok | ok | [ 44.527842] hard-safe-A + unsafe-B #2/321: ok | ok | ok | [ 44.528382] soft-safe-A + unsafe-B #2/321: ok | ok | ok | [ 44.529025] hard-irq lock-inversion/123: ok | ok | ok | [ 44.529516] soft-irq lock-inversion/123: ok | ok | ok | [ 44.530010] hard-irq lock-inversion/132: ok | ok | ok | [ 44.530501] soft-irq lock-inversion/132: ok | ok | ok | [ 44.530995] hard-irq lock-inversion/213: ok | ok | ok | [ 44.531488] soft-irq lock-inversion/213: ok | ok | ok | [ 44.531980] hard-irq lock-inversion/231: ok | ok | ok | [ 44.532470] soft-irq lock-inversion/231: ok | ok | ok | [ 44.532962] hard-irq lock-inversion/312: ok | ok | ok | [ 44.533478] soft-irq lock-inversion/312: ok | ok | ok | [ 44.533998] hard-irq lock-inversion/321: ok | ok | ok | [ 44.534515] soft-irq lock-inversion/321: ok | ok | ok | [ 44.535017] hard-irq read-recursion/123: ok | [ 44.535227] soft-irq read-recursion/123: ok | [ 44.535440] hard-irq read-recursion/132: ok | [ 44.535652] soft-irq read-recursion/132: ok | [ 44.535865] hard-irq read-recursion/213: ok | [ 44.536079] soft-irq read-recursion/213: ok | [ 44.536291] hard-irq read-recursion/231: ok | [ 44.536504] soft-irq read-recursion/231: ok | [ 44.536742] hard-irq read-recursion/312: ok | [ 44.536979] soft-irq read-recursion/312: ok | [ 44.537216] hard-irq read-recursion/321: ok | [ 44.537454] soft-irq read-recursion/321: ok | [ 44.537693] ------------------------------------------------------- [ 44.537733] Good, all 218 testcases passed! | [ 44.537774] --------------------------------- Folkert van Heusden -- MultiTail ist eine flexible Applikation um Logfiles und Kommando Eingaben zu überprüfen. Inkl. Filter, Farben, Zusammenführen, Ansichten etc. http://www.vanheusden.com/multitail/ ---------------------------------------------------------------------- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [2.6.21.1] soft lockup when removing netconsole module 2007-05-26 16:17 ` Folkert van Heusden @ 2007-05-26 16:35 ` Thomas Gleixner 2007-05-26 16:49 ` Folkert van Heusden 2007-05-27 20:32 ` Matt Mackall 0 siblings, 2 replies; 18+ messages in thread From: Thomas Gleixner @ 2007-05-26 16:35 UTC (permalink / raw) To: Folkert van Heusden; +Cc: Parag Warudkar, linux-kernel On Sat, 2007-05-26 at 18:17 +0200, Folkert van Heusden wrote: > > > Seems to be hrtimers related - CC'ed Thomas. > > I doubt that. The tick interrupt just finds out, that the machine is > > stuck in rmmod. > > > > Also the rmmod hangs and would not exit even with kill -9. It also > > > > sucks up 100% cpu. > > Can you please enable CONFIG_PROVE_LOCKING ? > > Luckily I already had :-) > Here is its output: So there is no output from lockdep, during the rmmod hang ? The rmmod sucks 100% cpu, so it busy loops at some place. Looking at the stack trace this seems to be in: void cancel_rearming_delayed_workqueue(struct workqueue_struct *wq, struct delayed_work *dwork) { while (!cancel_delayed_work(dwork)) flush_workqueue(wq); } which is not really related to timers :) tglx ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [2.6.21.1] soft lockup when removing netconsole module 2007-05-26 16:35 ` Thomas Gleixner @ 2007-05-26 16:49 ` Folkert van Heusden 2007-05-26 17:06 ` Thomas Gleixner 2007-05-27 20:32 ` Matt Mackall 1 sibling, 1 reply; 18+ messages in thread From: Folkert van Heusden @ 2007-05-26 16:49 UTC (permalink / raw) To: Thomas Gleixner; +Cc: Parag Warudkar, linux-kernel > > > > Seems to be hrtimers related - CC'ed Thomas. > > > I doubt that. The tick interrupt just finds out, that the machine is > > > stuck in rmmod. > > > > > Also the rmmod hangs and would not exit even with kill -9. It also > > > > > sucks up 100% cpu. > > > Can you please enable CONFIG_PROVE_LOCKING ? > > Luckily I already had :-) > > Here is its output: > So there is no output from lockdep, during the rmmod hang ? The rmmod Nope, just other subsystems (nfs, usb) minutes before that. > sucks 100% cpu, so it busy loops at some place. Looking at the stack > trace this seems to be in: > void cancel_rearming_delayed_workqueue(struct workqueue_struct *wq, > struct delayed_work *dwork) > { > while (!cancel_delayed_work(dwork)) > flush_workqueue(wq); > } > which is not really related to timers :) Ok. Folkert van Heusden -- Multitail - gibkaja utilita po sledovaniju log-fajlov i vyvoda kommand. Fil'trovanie, raskraivanie, slijanie, vizual'noe sravnenie, i t.d. http://www.vanheusden.com/multitail/ ---------------------------------------------------------------------- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [2.6.21.1] soft lockup when removing netconsole module 2007-05-26 16:49 ` Folkert van Heusden @ 2007-05-26 17:06 ` Thomas Gleixner 2007-05-26 17:12 ` Folkert van Heusden 0 siblings, 1 reply; 18+ messages in thread From: Thomas Gleixner @ 2007-05-26 17:06 UTC (permalink / raw) To: Folkert van Heusden; +Cc: Parag Warudkar, linux-kernel On Sat, 2007-05-26 at 18:49 +0200, Folkert van Heusden wrote: > > > > > Seems to be hrtimers related - CC'ed Thomas. > > > > I doubt that. The tick interrupt just finds out, that the machine is > > > > stuck in rmmod. > > > > > > Also the rmmod hangs and would not exit even with kill -9. It also > > > > > > sucks up 100% cpu. > > > > Can you please enable CONFIG_PROVE_LOCKING ? > > > Luckily I already had :-) > > > Here is its output: > > So there is no output from lockdep, during the rmmod hang ? The rmmod > > Nope, just other subsystems (nfs, usb) minutes before that. Hmm. Can you provide the output of that ? Might be some stale stuff leading to the hang later. tglx ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [2.6.21.1] soft lockup when removing netconsole module 2007-05-26 17:06 ` Thomas Gleixner @ 2007-05-26 17:12 ` Folkert van Heusden 0 siblings, 0 replies; 18+ messages in thread From: Folkert van Heusden @ 2007-05-26 17:12 UTC (permalink / raw) To: Thomas Gleixner; +Cc: Parag Warudkar, linux-kernel > > > > > > Seems to be hrtimers related - CC'ed Thomas. > > > > > I doubt that. The tick interrupt just finds out, that the machine is > > > > > stuck in rmmod. > > > > > > > Also the rmmod hangs and would not exit even with kill -9. It also > > > > > > > sucks up 100% cpu. > > > > > Can you please enable CONFIG_PROVE_LOCKING ? > > > > Luckily I already had :-) > > > > Here is its output: > > > So there is no output from lockdep, during the rmmod hang ? The rmmod > > Nope, just other subsystems (nfs, usb) minutes before that. > Hmm. Can you provide the output of that ? Might be some stale stuff > leading to the hang later. It is output like: [ 843.122782] usb 7-2: usbfs: USBDEVFS_CONTROL failed cmd newhidups rqt 161 rq 1 len 10 ret -110 [ 1292.121131] usb 7-2: usbfs: USBDEVFS_CONTROL failed cmd newhidups rqt 161 rq 1 len 10 ret -110 and about the nfs server temporarily not available. Please note that I did not do the rmmod from a nfs mountpoint. Folkert van Heusden -- Multitail - gibkaja utilita po sledovaniju log-fajlov i vyvoda kommand. Fil'trovanie, raskraivanie, slijanie, vizual'noe sravnenie, i t.d. http://www.vanheusden.com/multitail/ ---------------------------------------------------------------------- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [2.6.21.1] soft lockup when removing netconsole module 2007-05-26 16:35 ` Thomas Gleixner 2007-05-26 16:49 ` Folkert van Heusden @ 2007-05-27 20:32 ` Matt Mackall 1 sibling, 0 replies; 18+ messages in thread From: Matt Mackall @ 2007-05-27 20:32 UTC (permalink / raw) To: Thomas Gleixner Cc: Folkert van Heusden, Parag Warudkar, linux-kernel, Stephen Hemminger On Sat, May 26, 2007 at 06:35:24PM +0200, Thomas Gleixner wrote: > On Sat, 2007-05-26 at 18:17 +0200, Folkert van Heusden wrote: > > > > Seems to be hrtimers related - CC'ed Thomas. > > > I doubt that. The tick interrupt just finds out, that the machine is > > > stuck in rmmod. > > > > > Also the rmmod hangs and would not exit even with kill -9. It also > > > > > sucks up 100% cpu. > > > Can you please enable CONFIG_PROVE_LOCKING ? > > > > Luckily I already had :-) > > Here is its output: > > So there is no output from lockdep, during the rmmod hang ? The rmmod > sucks 100% cpu, so it busy loops at some place. Looking at the stack > trace this seems to be in: > > > void cancel_rearming_delayed_workqueue(struct workqueue_struct *wq, > struct delayed_work *dwork) > { > while (!cancel_delayed_work(dwork)) > flush_workqueue(wq); > } > > which is not really related to timers :) Looks like Steve added the delayed work bits, though I never saw any of his patches. -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [2.6.21.1] soft lockup when removing netconsole module 2007-05-26 15:40 [2.6.21.1] soft lockup when removing netconsole module Folkert van Heusden 2007-05-26 15:53 ` Parag Warudkar @ 2007-05-29 7:56 ` Andrew Morton 2007-05-30 13:28 ` [PATCH] " Jason Wessel 2007-06-12 11:02 ` Jarek Poplawski 1 sibling, 2 replies; 18+ messages in thread From: Andrew Morton @ 2007-05-29 7:56 UTC (permalink / raw) To: Folkert van Heusden Cc: linux-kernel, Jarek Poplawski, Jason Wessel, Thomas Gleixner, stable On Sat, 26 May 2007 17:40:12 +0200 Folkert van Heusden <folkert@vanheusden.com> wrote: > When trying to remove the netconsole module, I got the following kernel > output after a while (couple of minutes iirc): > > [525720.117293] BUG: soft lockup detected on CPU#1! > [525720.117353] [<c1004d53>] show_trace_log_lvl+0x1a/0x30 > [525720.117439] [<c1004d7b>] show_trace+0x12/0x14 > [525720.117526] [<c1004e75>] dump_stack+0x16/0x18 > [525720.117613] [<c104dd5b>] softlockup_tick+0xa6/0xc2 > [525720.117694] [<c1026855>] run_local_timers+0x12/0x14 > [525720.117738] [<c1026669>] update_process_times+0x72/0xa1 > [525720.117744] [<c1038673>] tick_sched_timer+0x53/0xb6 > [525720.117748] [<c1033d62>] hrtimer_interrupt+0x189/0x1e3 > [525720.117753] [<c100e9e2>] local_apic_timer_interrupt+0x55/0x5b > [525720.117761] [<c100ea12>] smp_apic_timer_interrupt+0x2a/0x39 > [525720.117766] [<c1004a3f>] apic_timer_interrupt+0x33/0x38 > [525720.117770] [<c120f4b1>] mutex_lock+0x8/0xa > [525720.117775] [<c102d2f0>] flush_workqueue+0x2f/0x8f > [525720.117780] [<c102d7a0>] cancel_rearming_delayed_workqueue+0x29/0x2b > [525720.117785] [<c102d7b1>] cancel_rearming_delayed_work+0xf/0x11 > [525720.117790] [<c11be143>] netpoll_cleanup+0x75/0xa5 > [525720.117794] [<f893712d>] cleanup_netconsole+0x17/0x1a [netconsole] > [525720.117804] [<c1041f11>] sys_delete_module+0x12f/0x14f > [525720.117809] [<c1003f74>] syscall_call+0x7/0xb > [525720.117812] ======================= > > Also the rmmod hangs and would not exit even with kill -9. It also > sucks up 100% cpu. Jason recently posted a mystery patch without telling us what problem it fixed. It looks like you just found it: cancel_rearming_delayed_work() will hang if the work isn't actually pending. Please test this: From: Jason Wessel <jason.wessel@windriver.com> Do not call cancel_rearming_delayed_work() if there is no pending work. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- net/core/netpoll.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff -puN net/core/netpoll.c~a net/core/netpoll.c --- a/net/core/netpoll.c~a +++ a/net/core/netpoll.c @@ -784,8 +784,10 @@ void netpoll_cleanup(struct netpoll *np) if (atomic_dec_and_test(&npinfo->refcnt)) { skb_queue_purge(&npinfo->arp_tx); skb_queue_purge(&npinfo->txq); - cancel_rearming_delayed_work(&npinfo->tx_work); - flush_scheduled_work(); + if (delayed_work_pending(&npinfo->tx_work)) { + cancel_rearming_delayed_work(&npinfo->tx_work); + flush_scheduled_work(); + } kfree(npinfo); } _ ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH] Re: [2.6.21.1] soft lockup when removing netconsole module 2007-05-29 7:56 ` Andrew Morton @ 2007-05-30 13:28 ` Jason Wessel 2007-05-30 20:38 ` Folkert van Heusden 2007-06-12 11:02 ` Jarek Poplawski 1 sibling, 1 reply; 18+ messages in thread From: Jason Wessel @ 2007-05-30 13:28 UTC (permalink / raw) To: Andrew Morton Cc: Folkert van Heusden, linux-kernel, Jarek Poplawski, Thomas Gleixner, stable [-- Attachment #1: Type: text/plain, Size: 341 bytes --] Attached is a patch to fix the soft lockup problem when removing the netconsole module. It looks different than the original patch I posted because the context had to change to maintain 80 column code. For reference the original e-mail + patch was "[BUG] 2.6.21 hang in cancel_rearming_delayed_workqueue()" sent on 5/25/07. Jason. [-- Attachment #2: netpoll_cleanup_fix.patch --] [-- Type: text/plain, Size: 2109 bytes --] >From 102a018d601e954f658d272d4daeef139ae40e0e Mon Sep 17 00:00:00 2001 From: Jason Wessel <jason.wessel@windriver.com> Date: Wed, 30 May 2007 08:03:52 -0500 Subject: [PATCH] The netpoll_cleanup handler can hang the kernel if there is no work in the work queue because a call to cancel_rearming_delayed_work() with no work goes into an infinite loop. The typical case where this is a problem is on removing a kernel module such as the netconsole driver or kgdboe. To maintain 80 column code, the function had to have one level of braces dropped. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> --- net/core/netpoll.c | 36 +++++++++++++++++++----------------- 1 files changed, 19 insertions(+), 17 deletions(-) diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 4581ece..28fa50e 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -766,30 +766,32 @@ void netpoll_cleanup(struct netpoll *np) struct netpoll_info *npinfo; unsigned long flags; - if (np->dev) { - npinfo = np->dev->npinfo; - if (npinfo) { - if (npinfo->rx_np == np) { - spin_lock_irqsave(&npinfo->rx_lock, flags); - npinfo->rx_np = NULL; - npinfo->rx_flags &= ~NETPOLL_RX_ENABLED; - spin_unlock_irqrestore(&npinfo->rx_lock, flags); - } + if (!np->dev) + return; + + npinfo = np->dev->npinfo; + if (npinfo) { + if (npinfo->rx_np == np) { + spin_lock_irqsave(&npinfo->rx_lock, flags); + npinfo->rx_np = NULL; + npinfo->rx_flags &= ~NETPOLL_RX_ENABLED; + spin_unlock_irqrestore(&npinfo->rx_lock, flags); + } - np->dev->npinfo = NULL; - if (atomic_dec_and_test(&npinfo->refcnt)) { - skb_queue_purge(&npinfo->arp_tx); - skb_queue_purge(&npinfo->txq); + np->dev->npinfo = NULL; + if (atomic_dec_and_test(&npinfo->refcnt)) { + skb_queue_purge(&npinfo->arp_tx); + skb_queue_purge(&npinfo->txq); + if (delayed_work_pending(&npinfo->tx_work)) { cancel_rearming_delayed_work(&npinfo->tx_work); flush_scheduled_work(); - - kfree(npinfo); } - } - dev_put(np->dev); + kfree(npinfo); + } } + dev_put(np->dev); np->dev = NULL; } -- 1.5.0.6 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] Re: [2.6.21.1] soft lockup when removing netconsole module 2007-05-30 13:28 ` [PATCH] " Jason Wessel @ 2007-05-30 20:38 ` Folkert van Heusden 0 siblings, 0 replies; 18+ messages in thread From: Folkert van Heusden @ 2007-05-30 20:38 UTC (permalink / raw) To: Jason Wessel Cc: Andrew Morton, linux-kernel, Jarek Poplawski, Thomas Gleixner, stable This patch seems to solve the problem. I removed (and inserted) the module twice without any oopses or panics or hangs in general. On Wed, May 30, 2007 at 08:28:09AM -0500, Jason Wessel wrote: > > Attached is a patch to fix the soft lockup problem when removing the > netconsole module. > > It looks different than the original patch I posted because the context had > to change to maintain 80 column code. For reference the original e-mail + > patch was "[BUG] 2.6.21 hang in cancel_rearming_delayed_workqueue()" sent > on 5/25/07. > > Jason. > > > >From 102a018d601e954f658d272d4daeef139ae40e0e Mon Sep 17 00:00:00 2001 > From: Jason Wessel <jason.wessel@windriver.com> > Date: Wed, 30 May 2007 08:03:52 -0500 > Subject: [PATCH] The netpoll_cleanup handler can hang the kernel if there is no work in > the work queue because a call to cancel_rearming_delayed_work() with > no work goes into an infinite loop. > > The typical case where this is a problem is on removing a kernel > module such as the netconsole driver or kgdboe. > > To maintain 80 column code, the function had to have one level of braces > dropped. > > Signed-off-by: Jason Wessel <jason.wessel@windriver.com> > --- > net/core/netpoll.c | 36 +++++++++++++++++++----------------- > 1 files changed, 19 insertions(+), 17 deletions(-) > > diff --git a/net/core/netpoll.c b/net/core/netpoll.c > index 4581ece..28fa50e 100644 > --- a/net/core/netpoll.c > +++ b/net/core/netpoll.c > @@ -766,30 +766,32 @@ void netpoll_cleanup(struct netpoll *np) > struct netpoll_info *npinfo; > unsigned long flags; > > - if (np->dev) { > - npinfo = np->dev->npinfo; > - if (npinfo) { > - if (npinfo->rx_np == np) { > - spin_lock_irqsave(&npinfo->rx_lock, flags); > - npinfo->rx_np = NULL; > - npinfo->rx_flags &= ~NETPOLL_RX_ENABLED; > - spin_unlock_irqrestore(&npinfo->rx_lock, flags); > - } > + if (!np->dev) > + return; > + > + npinfo = np->dev->npinfo; > + if (npinfo) { > + if (npinfo->rx_np == np) { > + spin_lock_irqsave(&npinfo->rx_lock, flags); > + npinfo->rx_np = NULL; > + npinfo->rx_flags &= ~NETPOLL_RX_ENABLED; > + spin_unlock_irqrestore(&npinfo->rx_lock, flags); > + } > > - np->dev->npinfo = NULL; > - if (atomic_dec_and_test(&npinfo->refcnt)) { > - skb_queue_purge(&npinfo->arp_tx); > - skb_queue_purge(&npinfo->txq); > + np->dev->npinfo = NULL; > + if (atomic_dec_and_test(&npinfo->refcnt)) { > + skb_queue_purge(&npinfo->arp_tx); > + skb_queue_purge(&npinfo->txq); > + if (delayed_work_pending(&npinfo->tx_work)) { > cancel_rearming_delayed_work(&npinfo->tx_work); > flush_scheduled_work(); > - > - kfree(npinfo); > } > - } > > - dev_put(np->dev); > + kfree(npinfo); > + } > } > > + dev_put(np->dev); > np->dev = NULL; > } > > -- > 1.5.0.6 > Folkert van Heusden -- Multitail - gibkaja utilita po sledovaniju log-fajlov i vyvoda kommand. Fil'trovanie, raskraivanie, slijanie, vizual'noe sravnenie, i t.d. http://www.vanheusden.com/multitail/ ---------------------------------------------------------------------- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [2.6.21.1] soft lockup when removing netconsole module 2007-05-29 7:56 ` Andrew Morton 2007-05-30 13:28 ` [PATCH] " Jason Wessel @ 2007-06-12 11:02 ` Jarek Poplawski 2007-06-13 9:25 ` [PATCH] " Jarek Poplawski 1 sibling, 1 reply; 18+ messages in thread From: Jarek Poplawski @ 2007-06-12 11:02 UTC (permalink / raw) To: Andrew Morton Cc: Folkert van Heusden, linux-kernel, Jason Wessel, Thomas Gleixner, stable, netdev On Tue, May 29, 2007 at 12:56:28AM -0700, Andrew Morton wrote: > On Sat, 26 May 2007 17:40:12 +0200 Folkert van Heusden <folkert@vanheusden.com> wrote: > > > When trying to remove the netconsole module, I got the following kernel > > output after a while (couple of minutes iirc): > > > > [525720.117293] BUG: soft lockup detected on CPU#1! > > [525720.117353] [<c1004d53>] show_trace_log_lvl+0x1a/0x30 > > [525720.117439] [<c1004d7b>] show_trace+0x12/0x14 > > [525720.117526] [<c1004e75>] dump_stack+0x16/0x18 > > [525720.117613] [<c104dd5b>] softlockup_tick+0xa6/0xc2 > > [525720.117694] [<c1026855>] run_local_timers+0x12/0x14 > > [525720.117738] [<c1026669>] update_process_times+0x72/0xa1 > > [525720.117744] [<c1038673>] tick_sched_timer+0x53/0xb6 > > [525720.117748] [<c1033d62>] hrtimer_interrupt+0x189/0x1e3 > > [525720.117753] [<c100e9e2>] local_apic_timer_interrupt+0x55/0x5b > > [525720.117761] [<c100ea12>] smp_apic_timer_interrupt+0x2a/0x39 > > [525720.117766] [<c1004a3f>] apic_timer_interrupt+0x33/0x38 > > [525720.117770] [<c120f4b1>] mutex_lock+0x8/0xa > > [525720.117775] [<c102d2f0>] flush_workqueue+0x2f/0x8f > > [525720.117780] [<c102d7a0>] cancel_rearming_delayed_workqueue+0x29/0x2b > > [525720.117785] [<c102d7b1>] cancel_rearming_delayed_work+0xf/0x11 > > [525720.117790] [<c11be143>] netpoll_cleanup+0x75/0xa5 > > [525720.117794] [<f893712d>] cleanup_netconsole+0x17/0x1a [netconsole] > > [525720.117804] [<c1041f11>] sys_delete_module+0x12f/0x14f > > [525720.117809] [<c1003f74>] syscall_call+0x7/0xb > > [525720.117812] ======================= > > > > Also the rmmod hangs and would not exit even with kill -9. It also > > sucks up 100% cpu. > > Jason recently posted a mystery patch without telling us what problem it > fixed. > To be fair the problem should be known: http://marc.info/?l=linux-kernel&m=117700287817801&w=2 List: linux-kernel Subject: Re: [PATCH -mm] workqueue: debug possible endless loop in cancel_rearming_delayed_work From: Chuck Ebbert <cebbert () redhat ! com> Date: 2007-04-19 17:07:11 Message-ID: 4627A1BF.8080406 () redhat ! com > Okay, an easy test for it: insmod netconsole ; rmmod netconsole > > In 2.6.20.x it loops forever and cancel_rearming_delayed_work() > is part of the trace... I hoped the discussion about cancel_rearming_delayed_work would reach more people (there was also a patch proposal to add a warning to the usage comment). But it seem it was not enough... Of course such a problem should preferably be fixed by somebody who knows the code (alas I don't know netconsole), to be sure all needed cancels are still done after this change. I hope Jason's patch is right but I'm a little surprised I can't see netdev in cc (I'll try to fix this). Cheers, Jarek P. PS: I'm very sorry for such late response (holidays). > It looks like you just found it: cancel_rearming_delayed_work() will hang > if the work isn't actually pending. Please test this: > > > From: Jason Wessel <jason.wessel@windriver.com> > > Do not call cancel_rearming_delayed_work() if there is no > pending work. > > Signed-off-by: Jason Wessel <jason.wessel@windriver.com> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > net/core/netpoll.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff -puN net/core/netpoll.c~a net/core/netpoll.c > --- a/net/core/netpoll.c~a > +++ a/net/core/netpoll.c > @@ -784,8 +784,10 @@ void netpoll_cleanup(struct netpoll *np) > if (atomic_dec_and_test(&npinfo->refcnt)) { > skb_queue_purge(&npinfo->arp_tx); > skb_queue_purge(&npinfo->txq); > - cancel_rearming_delayed_work(&npinfo->tx_work); > - flush_scheduled_work(); > + if (delayed_work_pending(&npinfo->tx_work)) { > + cancel_rearming_delayed_work(&npinfo->tx_work); > + flush_scheduled_work(); > + } > > kfree(npinfo); > } > _ > ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH] Re: [2.6.21.1] soft lockup when removing netconsole module 2007-06-12 11:02 ` Jarek Poplawski @ 2007-06-13 9:25 ` Jarek Poplawski 2007-06-26 23:07 ` Andrew Morton 0 siblings, 1 reply; 18+ messages in thread From: Jarek Poplawski @ 2007-06-13 9:25 UTC (permalink / raw) To: Andrew Morton Cc: Folkert van Heusden, linux-kernel, Jason Wessel, Thomas Gleixner, stable, netdev On Tue, Jun 12, 2007 at 01:02:33PM +0200, Jarek Poplawski wrote: ... > Of course such a problem should preferably be fixed by somebody who > knows the code (alas I don't know netconsole), to be sure all needed > cancels are still done after this change. I hope Jason's patch is > right but I'm a little surprised I can't see netdev in cc (I'll try > to fix this). So, I've had a look into netpoll and, unfortunately, I don't think this patch is right... > > From: Jason Wessel <jason.wessel@windriver.com> > > > > Do not call cancel_rearming_delayed_work() if there is no > > pending work. > > > > Signed-off-by: Jason Wessel <jason.wessel@windriver.com> > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > > --- > > > > net/core/netpoll.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff -puN net/core/netpoll.c~a net/core/netpoll.c > > --- a/net/core/netpoll.c~a > > +++ a/net/core/netpoll.c > > @@ -784,8 +784,10 @@ void netpoll_cleanup(struct netpoll *np) > > if (atomic_dec_and_test(&npinfo->refcnt)) { > > skb_queue_purge(&npinfo->arp_tx); > > skb_queue_purge(&npinfo->txq); > > - cancel_rearming_delayed_work(&npinfo->tx_work); > > - flush_scheduled_work(); > > + if (delayed_work_pending(&npinfo->tx_work)) { > > + cancel_rearming_delayed_work(&npinfo->tx_work); > > + flush_scheduled_work(); > > + } > > > > kfree(npinfo); > > } > > _ There are such possibilities: 1. After positive delayed_work_pending(&npinfo->tx_work) test some work is queued, but there is no guarantee that when running it'll rearm again, so cancel_rearming_delayed_work can loop again; 2. After negative delayed_work_pending(&npinfo->tx_work) test a work is just running, eg. waiting on netif_tx_lock, while kfree(npinfo) is done here (oops?!). I've found an additional problem here with or without this patch: after deleting a timer in cancel_rearming_delayed_work() there could stay a last skb queued in npinfo->txq, and after kfree(npinfo) we have small memory leak. If I'm right here similar fix is needed in the current netpoll code: additional npinfo->txq purging only or maybe the whole cancel_rearming_ changed like this. I've tried to eliminate these problems in attached below patch proposal. I'm not sure it's all right: as I've written earlier I don't know netconsole enough, but it's probably a little better than above solution. I've some doubts yet (I didn't have time to check this all): 1. I hope this other schedule_delayed_work() from netpoll_send_skb() is not possible when netpoll_cleanup() runs - if I'm wrong additional check of npinfo->refcnt should be done there; 2. I also hope npinfo->refcnt before scheduling should be enough here - if not - another possibility is adding some locking eg.: netif_tx_lock before cancel for synchronization. Of course it would be very nice if somebody could test or verify this patch more. Regards, Jarek P. Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> --- diff -Nurp 2.6.21-/net/core/netpoll.c 2.6.21/net/core/netpoll.c --- 2.6.21-/net/core/netpoll.c 2007-04-26 15:08:32.000000000 +0200 +++ 2.6.21/net/core/netpoll.c 2007-06-12 21:05:23.000000000 +0200 @@ -73,7 +73,8 @@ static void queue_process(struct work_st netif_tx_unlock(dev); local_irq_restore(flags); - schedule_delayed_work(&npinfo->tx_work, HZ/10); + if (atomic_read(&npinfo->refcnt)) + schedule_delayed_work(&npinfo->tx_work, HZ/10); return; } netif_tx_unlock(dev); @@ -780,9 +781,15 @@ void netpoll_cleanup(struct netpoll *np) if (atomic_dec_and_test(&npinfo->refcnt)) { skb_queue_purge(&npinfo->arp_tx); skb_queue_purge(&npinfo->txq); - cancel_rearming_delayed_work(&npinfo->tx_work); + cancel_delayed_work(&npinfo->tx_work); flush_scheduled_work(); + /* clean after last, unfinished work */ + if (!skb_queue_empty(&npinfo->txq)) { + struct sk_buff *skb; + skb = __skb_dequeue(&npinfo->txq); + kfree_skb(skb); + } kfree(npinfo); } } ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Re: [2.6.21.1] soft lockup when removing netconsole module 2007-06-13 9:25 ` [PATCH] " Jarek Poplawski @ 2007-06-26 23:07 ` Andrew Morton 2007-06-27 0:46 ` Wessel, Jason 0 siblings, 1 reply; 18+ messages in thread From: Andrew Morton @ 2007-06-26 23:07 UTC (permalink / raw) To: Jarek Poplawski Cc: Folkert van Heusden, linux-kernel, Jason Wessel, Thomas Gleixner, stable, netdev On Wed, 13 Jun 2007 11:25:37 +0200 Jarek Poplawski <jarkao2@o2.pl> wrote: > On Tue, Jun 12, 2007 at 01:02:33PM +0200, Jarek Poplawski wrote: > ... > > Of course such a problem should preferably be fixed by somebody who > > knows the code (alas I don't know netconsole), to be sure all needed > > cancels are still done after this change. I hope Jason's patch is > > right but I'm a little surprised I can't see netdev in cc (I'll try > > to fix this). > > So, I've had a look into netpoll and, unfortunately, I don't > think this patch is right... > > > > From: Jason Wessel <jason.wessel@windriver.com> > > > > > > Do not call cancel_rearming_delayed_work() if there is no > > > pending work. > > > > > > Signed-off-by: Jason Wessel <jason.wessel@windriver.com> > > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > > > --- > > > > > > net/core/netpoll.c | 6 ++++-- > > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > > > diff -puN net/core/netpoll.c~a net/core/netpoll.c > > > --- a/net/core/netpoll.c~a > > > +++ a/net/core/netpoll.c > > > @@ -784,8 +784,10 @@ void netpoll_cleanup(struct netpoll *np) > > > if (atomic_dec_and_test(&npinfo->refcnt)) { > > > skb_queue_purge(&npinfo->arp_tx); > > > skb_queue_purge(&npinfo->txq); > > > - cancel_rearming_delayed_work(&npinfo->tx_work); > > > - flush_scheduled_work(); > > > + if (delayed_work_pending(&npinfo->tx_work)) { > > > + cancel_rearming_delayed_work(&npinfo->tx_work); > > > + flush_scheduled_work(); > > > + } > > > > > > kfree(npinfo); > > > } > > > _ > > There are such possibilities: > > 1. After positive delayed_work_pending(&npinfo->tx_work) test > some work is queued, but there is no guarantee that when running > it'll rearm again, so cancel_rearming_delayed_work can loop again; > > 2. After negative delayed_work_pending(&npinfo->tx_work) test > a work is just running, eg. waiting on netif_tx_lock, while > kfree(npinfo) is done here (oops?!). > > I've found an additional problem here with or without this patch: > after deleting a timer in cancel_rearming_delayed_work() there could > stay a last skb queued in npinfo->txq, and after kfree(npinfo) > we have small memory leak. If I'm right here similar fix is needed > in the current netpoll code: additional npinfo->txq purging only > or maybe the whole cancel_rearming_ changed like this. > > I've tried to eliminate these problems in attached below patch > proposal. I'm not sure it's all right: as I've written earlier I > don't know netconsole enough, but it's probably a little better > than above solution. > > I've some doubts yet (I didn't have time to check this all): > > 1. I hope this other schedule_delayed_work() from netpoll_send_skb() > is not possible when netpoll_cleanup() runs - if I'm wrong additional > check of npinfo->refcnt should be done there; > 2. I also hope npinfo->refcnt before scheduling should be enough here > - if not - another possibility is adding some locking eg.: > netif_tx_lock before cancel for synchronization. > > Of course it would be very nice if somebody could test or verify > this patch more. > > Regards, > Jarek P. > > > Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> > > --- > > diff -Nurp 2.6.21-/net/core/netpoll.c 2.6.21/net/core/netpoll.c > --- 2.6.21-/net/core/netpoll.c 2007-04-26 15:08:32.000000000 +0200 > +++ 2.6.21/net/core/netpoll.c 2007-06-12 21:05:23.000000000 +0200 > @@ -73,7 +73,8 @@ static void queue_process(struct work_st > netif_tx_unlock(dev); > local_irq_restore(flags); > > - schedule_delayed_work(&npinfo->tx_work, HZ/10); > + if (atomic_read(&npinfo->refcnt)) > + schedule_delayed_work(&npinfo->tx_work, HZ/10); > return; > } > netif_tx_unlock(dev); > @@ -780,9 +781,15 @@ void netpoll_cleanup(struct netpoll *np) > if (atomic_dec_and_test(&npinfo->refcnt)) { > skb_queue_purge(&npinfo->arp_tx); > skb_queue_purge(&npinfo->txq); > - cancel_rearming_delayed_work(&npinfo->tx_work); > + cancel_delayed_work(&npinfo->tx_work); > flush_scheduled_work(); > > + /* clean after last, unfinished work */ > + if (!skb_queue_empty(&npinfo->txq)) { > + struct sk_buff *skb; > + skb = __skb_dequeue(&npinfo->txq); > + kfree_skb(skb); > + } > kfree(npinfo); > } > } Everything went quiet? If this patch has been tested and fixes the bug, can you please send a version which is ready for merging? (ie: add a suitable description of what it does). Thanks. ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH] Re: [2.6.21.1] soft lockup when removing netconsole module 2007-06-26 23:07 ` Andrew Morton @ 2007-06-27 0:46 ` Wessel, Jason 2007-06-27 1:00 ` Andrew Morton 0 siblings, 1 reply; 18+ messages in thread From: Wessel, Jason @ 2007-06-27 0:46 UTC (permalink / raw) To: Andrew Morton, Jarek Poplawski Cc: Folkert van Heusden, linux-kernel, Thomas Gleixner, stable, netdev > -----Original Message----- > From: Andrew Morton [mailto:akpm@linux-foundation.org] > > > > Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> > > > > --- > > > > diff -Nurp 2.6.21-/net/core/netpoll.c 2.6.21/net/core/netpoll.c > > --- 2.6.21-/net/core/netpoll.c 2007-04-26 > 15:08:32.000000000 +0200 > > +++ 2.6.21/net/core/netpoll.c 2007-06-12 > 21:05:23.000000000 +0200 > > @@ -73,7 +73,8 @@ static void queue_process(struct work_st > > netif_tx_unlock(dev); > > local_irq_restore(flags); > > > > - schedule_delayed_work(&npinfo->tx_work, HZ/10); > > + if (atomic_read(&npinfo->refcnt)) > > + > schedule_delayed_work(&npinfo->tx_work, HZ/10); > > return; > > } > > netif_tx_unlock(dev); > > @@ -780,9 +781,15 @@ void netpoll_cleanup(struct netpoll *np) > > if (atomic_dec_and_test(&npinfo->refcnt)) { > > skb_queue_purge(&npinfo->arp_tx); > > skb_queue_purge(&npinfo->txq); > > - > cancel_rearming_delayed_work(&npinfo->tx_work); > > + cancel_delayed_work(&npinfo->tx_work); > > flush_scheduled_work(); > > > > + /* clean after last, unfinished work */ > > + if (!skb_queue_empty(&npinfo->txq)) { > > + struct sk_buff *skb; > > + skb = > __skb_dequeue(&npinfo->txq); > > + kfree_skb(skb); > > + } > > kfree(npinfo); > > } > > } > > Everything went quiet? > > If this patch has been tested and fixes the bug, can you > please send a version which is ready for merging? (ie: add a > suitable description of what it does). > > I mailed Jarek separately. I had tested the patch with netconsole and kgdb and it does in fact fix the problem that was reported. Jason. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Re: [2.6.21.1] soft lockup when removing netconsole module 2007-06-27 0:46 ` Wessel, Jason @ 2007-06-27 1:00 ` Andrew Morton 2007-06-27 7:24 ` Jarek Poplawski 0 siblings, 1 reply; 18+ messages in thread From: Andrew Morton @ 2007-06-27 1:00 UTC (permalink / raw) To: Wessel, Jason Cc: Jarek Poplawski, Folkert van Heusden, linux-kernel, Thomas Gleixner, stable, netdev On Tue, 26 Jun 2007 17:46:13 -0700 "Wessel, Jason" <jason.wessel@windriver.com> wrote: > > > } > > > } > > > > Everything went quiet? > > > > If this patch has been tested and fixes the bug, can you > > please send a version which is ready for merging? (ie: add a > > suitable description of what it does). > > > > > > I mailed Jarek separately. > > I had tested the patch with netconsole and kgdb and it does in fact fix > the problem that was reported. OK, thanks. Please don't mail people separately! I queued this up with a null changelog for now. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH] Re: [2.6.21.1] soft lockup when removing netconsole module 2007-06-27 1:00 ` Andrew Morton @ 2007-06-27 7:24 ` Jarek Poplawski 0 siblings, 0 replies; 18+ messages in thread From: Jarek Poplawski @ 2007-06-27 7:24 UTC (permalink / raw) To: Andrew Morton Cc: Wessel, Jason, Folkert van Heusden, linux-kernel, Thomas Gleixner, stable, netdev On Tue, Jun 26, 2007 at 06:00:00PM -0700, Andrew Morton wrote: > On Tue, 26 Jun 2007 17:46:13 -0700 "Wessel, Jason" <jason.wessel@windriver.com> wrote: ... > > > Everything went quiet? > > > > > > If this patch has been tested and fixes the bug, can you > > > please send a version which is ready for merging? (ie: add a > > > suitable description of what it does). > > > > > > > > > > I mailed Jarek separately. > > > > I had tested the patch with netconsole and kgdb and it does in fact fix > > the problem that was reported. > > OK, thanks. Please don't mail people separately! > > I queued this up with a null changelog for now. > I pasted here this queued version - only the changelog is added. Regards, Jarek P. ------------------------------------------------------ Subject: netconsole: fix soft lockup (2.6.21) and memory leak when removing module From: Jarek Poplawski <jarkao2@o2.pl> #1 Until kernel ver. 2.6.21 (including) cancel_rearming_delayed_work() required a work function should always (unconditionally) rearm with delay > 0 - otherwise it would endlessly loop. This patch replaces this function with cancel_delayed_work(). Later kernel versions don't require this, so here it's only for uniformity. #2 After deleting a timer in cancel_[rearming_]delayed_work() there could stay a last skb queued in npinfo->txq causing a memory leak after kfree(npinfo). Initial patch & testing by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- net/core/netpoll.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff -puN net/core/netpoll.c~netconsole-fix-soft-lockup-when-removing-module net/core/netpoll.c --- a/net/core/netpoll.c~netconsole-fix-soft-lockup-when-removing-module +++ a/net/core/netpoll.c @@ -72,7 +72,8 @@ static void queue_process(struct work_st netif_tx_unlock(dev); local_irq_restore(flags); - schedule_delayed_work(&npinfo->tx_work, HZ/10); + if (atomic_read(&npinfo->refcnt)) + schedule_delayed_work(&npinfo->tx_work, HZ/10); return; } netif_tx_unlock(dev); @@ -785,9 +786,15 @@ void netpoll_cleanup(struct netpoll *np) if (atomic_dec_and_test(&npinfo->refcnt)) { skb_queue_purge(&npinfo->arp_tx); skb_queue_purge(&npinfo->txq); - cancel_rearming_delayed_work(&npinfo->tx_work); + cancel_delayed_work(&npinfo->tx_work); flush_scheduled_work(); + /* clean after last, unfinished work */ + if (!skb_queue_empty(&npinfo->txq)) { + struct sk_buff *skb; + skb = __skb_dequeue(&npinfo->txq); + kfree_skb(skb); + } kfree(npinfo); } } ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2007-06-27 7:16 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-05-26 15:40 [2.6.21.1] soft lockup when removing netconsole module Folkert van Heusden 2007-05-26 15:53 ` Parag Warudkar 2007-05-26 16:12 ` Thomas Gleixner 2007-05-26 16:17 ` Folkert van Heusden 2007-05-26 16:35 ` Thomas Gleixner 2007-05-26 16:49 ` Folkert van Heusden 2007-05-26 17:06 ` Thomas Gleixner 2007-05-26 17:12 ` Folkert van Heusden 2007-05-27 20:32 ` Matt Mackall 2007-05-29 7:56 ` Andrew Morton 2007-05-30 13:28 ` [PATCH] " Jason Wessel 2007-05-30 20:38 ` Folkert van Heusden 2007-06-12 11:02 ` Jarek Poplawski 2007-06-13 9:25 ` [PATCH] " Jarek Poplawski 2007-06-26 23:07 ` Andrew Morton 2007-06-27 0:46 ` Wessel, Jason 2007-06-27 1:00 ` Andrew Morton 2007-06-27 7:24 ` Jarek Poplawski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).