linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.5.67-mm3: Bad: scheduling while atomic with IEEE1394 then hard freeze ( lockup on CPU0)
@ 2003-04-15 22:05 Philippe Gramoullé
  2003-04-15 23:05 ` Andrew Morton
  0 siblings, 1 reply; 17+ messages in thread
From: Philippe Gramoullé @ 2003-04-15 22:05 UTC (permalink / raw)
  To: linux-kernel


Hi,

2.5.67-mm3 froze up after few hours of uptime.

The log captured through the serial console wih nmi_watchdog=1 can be found here:

http://www.philou.org/2.5.67-mm3/2.5.67-mm3.log

The decoded oops can be found here ( hope it gives someone a hint as its looks rather
odd to me)

http://www.philou.org/2.5.67-mm3/ksymoops-2.5.67-mm3

Below is my .config. System is a DELL 530 MT workstation SMP 1.5 GHz Xeons, 512Mo RAM
running Debian unstable. Booted with elevator=as. Plain 2.5.67-mm3 , no other patch applied.
Modules init tools : 0.9.10.

Grub boot parameters are:

kernel /boot/vmlinuz-2.5.67-mm3 root=/dev/sda2 console=tty1 console=ttyS1,9600n8 elevator=as nmi_watchdog=1

Unlike several previous hard freezes since at least 2.5.65, i was still able to ping the machine
but except that it was completely stuck. Sys-rq didn't work.

I have to say that i tried IEEE1394 from shipped kernel and as it didn't work with my DV Camcorder
i tried with the latest SVN checkout still without success ( which correspond to the 2 batch of
IEEE1394 messages)
(used IEEE1394 as modules)

Hard freeze occurred almost half an hour after i rmmod'ed the IEEE1394 modules.

Thanks,

Philippe

/proc/modules output

emu10k1 58560 0 - Live 0xe0ad6000
ac97_codec 13600 1 emu10k1, Live 0xe0abc000
soundcore 7200 1 emu10k1, Live 0xe0a85000
hid 22816 0 - Live 0xe0a94000
uhci_hcd 28744 0 - Live 0xe0a8b000
usbcore 95252 4 hid,uhci_hcd, Live 0xe0a9d000


CONFIG_X86=y
CONFIG_MMU=y
CONFIG_UID16=y
CONFIG_GENERIC_ISA_DMA=y

CONFIG_EXPERIMENTAL=y

CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSCTL=y
CONFIG_LOG_BUF_SHIFT=15

CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_OBSOLETE_MODPARM=y
CONFIG_KMOD=y

CONFIG_X86_BIGSMP=y
CONFIG_MPENTIUMIII=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_PREFETCH=y
CONFIG_SMP=y
CONFIG_PREEMPT=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_NR_CPUS=2
CONFIG_X86_TSC=y
CONFIG_X86_MCE=y
CONFIG_NOHIGHMEM=y
CONFIG_1GB=y
CONFIG_MTRR=y
CONFIG_HAVE_DEC_LOCK=y

CONFIG_PCI=y
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_NAMES=y

CONFIG_KCORE_ELF=y
CONFIG_BINFMT_ELF=y


CONFIG_PARPORT=y
CONFIG_PARPORT_PC=y
CONFIG_PARPORT_PC_CML1=y
CONFIG_PARPORT_PC_FIFO=y
CONFIG_PARPORT_1284=y


CONFIG_BLK_DEV_FD=y
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_SIZE=4096
CONFIG_BLK_DEV_INITRD=y

CONFIG_IDE=y

CONFIG_BLK_DEV_IDE=y

CONFIG_BLK_DEV_IDECD=y


CONFIG_SCSI=y

CONFIG_BLK_DEV_SD=y

CONFIG_SCSI_REPORT_LUNS=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y

CONFIG_SCSI_AIC7XXX=y
CONFIG_AIC7XXX_CMDS_PER_DEVICE=32
CONFIG_AIC7XXX_RESET_DELAY_MS=5000
CONFIG_AIC7XXX_DEBUG_MASK=0

CONFIG_IEEE1394=m

CONFIG_IEEE1394_OUI_DB=y


CONFIG_IEEE1394_OHCI1394=m

CONFIG_IEEE1394_VIDEO1394=m
CONFIG_IEEE1394_ETH1394=m
CONFIG_IEEE1394_DV1394=m
CONFIG_IEEE1394_RAWIO=m
CONFIG_IEEE1394_CMP=m
CONFIG_IEEE1394_AMDTP=m


CONFIG_NET=y

CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_NETFILTER=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y

CONFIG_IP_NF_CONNTRACK=m
CONFIG_IP_NF_FTP=m
CONFIG_IP_NF_IRC=m
CONFIG_IP_NF_TFTP=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_LIMIT=m
CONFIG_IP_NF_MATCH_MAC=m
CONFIG_IP_NF_MATCH_PKTTYPE=m
CONFIG_IP_NF_MATCH_MARK=m
CONFIG_IP_NF_MATCH_MULTIPORT=m
CONFIG_IP_NF_MATCH_TOS=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_DSCP=m
CONFIG_IP_NF_MATCH_AH_ESP=m
CONFIG_IP_NF_MATCH_LENGTH=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_MATCH_TCPMSS=m
CONFIG_IP_NF_MATCH_HELPER=m
CONFIG_IP_NF_MATCH_STATE=m
CONFIG_IP_NF_MATCH_CONNTRACK=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
CONFIG_IP_NF_NAT=m
CONFIG_IP_NF_NAT_NEEDED=y
CONFIG_IP_NF_TARGET_MASQUERADE=m
CONFIG_IP_NF_TARGET_REDIRECT=m
CONFIG_IP_NF_NAT_IRC=m
CONFIG_IP_NF_NAT_FTP=m
CONFIG_IP_NF_NAT_TFTP=m
CONFIG_IP_NF_MANGLE=m
CONFIG_IP_NF_TARGET_TOS=m
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_DSCP=m
CONFIG_IP_NF_TARGET_MARK=m
CONFIG_IP_NF_TARGET_LOG=m
CONFIG_IP_NF_TARGET_TCPMSS=m
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m

CONFIG_IPV6_SCTP__=y

CONFIG_NET_SCHED=y
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_CSZ=m
CONFIG_NET_SCH_PRIO=m
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=m
CONFIG_NET_SCH_INGRESS=m
CONFIG_NET_QOS=y
CONFIG_NET_ESTIMATOR=y
CONFIG_NET_CLS=y
CONFIG_NET_CLS_TCINDEX=m
CONFIG_NET_CLS_ROUTE4=m
CONFIG_NET_CLS_ROUTE=y
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=m
CONFIG_NET_CLS_RSVP=m
CONFIG_NET_CLS_RSVP6=m
CONFIG_NET_CLS_POLICE=y

CONFIG_NETDEVICES=y

CONFIG_NET_ETHERNET=y
CONFIG_MII=y

CONFIG_NET_PCI=y
CONFIG_E100=y

CONFIG_INPUT=y

CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1600
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=1200

CONFIG_SOUND_GAMEPORT=y
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y

CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y

CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y

CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y

CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_UNIX98_PTYS=y
CONFIG_UNIX98_PTY_COUNT=256
CONFIG_PRINTER=y

CONFIG_RTC=y

CONFIG_EXT2_FS=y
CONFIG_EXT3_FS=m
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_JBD=y
CONFIG_FS_MBCACHE=y
CONFIG_REISERFS_FS=y
CONFIG_REISERFS_PROC_INFO=y
CONFIG_FS_POSIX_ACL=y

CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_ZISOFS_FS=y
CONFIG_UDF_FS=y

CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_NTFS_FS=y

CONFIG_PROC_FS=y
CONFIG_DEVPTS_FS=y
CONFIG_TMPFS=y
CONFIG_RAMFS=y

CONFIG_CRAMFS=y

CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_NFSD=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_TCP=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=y
CONFIG_SUNRPC=y
CONFIG_SMB_FS=y

CONFIG_MSDOS_PARTITION=y
CONFIG_SMB_NLS=y
CONFIG_NLS=y

CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_CODEPAGE_850=y
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_ISO8859_15=y
CONFIG_NLS_UTF8=y

CONFIG_VGA_CONSOLE=y
CONFIG_DUMMY_CONSOLE=y

CONFIG_SOUND=m

CONFIG_SOUND_PRIME=m
CONFIG_SOUND_EMU10K1=m

CONFIG_USB=m

CONFIG_USB_DEVICEFS=y

CONFIG_USB_EHCI_HCD=m
CONFIG_USB_UHCI_HCD=m


CONFIG_USB_HID=m
CONFIG_USB_HIDINPUT=y


CONFIG_DEBUG_KERNEL=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_KALLSYMS=y
CONFIG_DEBUG_INFO=y
CONFIG_FRAME_POINTER=y
CONFIG_X86_EXTRA_IRQS=y
CONFIG_X86_FIND_SMP_CONFIG=y
CONFIG_X86_MPPARSE=y

CONFIG_CRYPTO=y
CONFIG_CRYPTO_HMAC=y

CONFIG_ZLIB_INFLATE=y
CONFIG_X86_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_BIOS_REBOOT=y
CONFIG_X86_TRAMPOLINE=y

$ lspci -v

00:00.0 Host bridge: Intel Corp. 82860 860 (Wombat) Chipset Host Bridge (MCH) (rev 04)
        Subsystem: Dell Computer Corporation: Unknown device 00d8
        Flags: bus master, fast devsel, latency 0
        Memory at d0000000 (32-bit, prefetchable) [size=256M]
        Capabilities: [a0] AGP version 2.0

00:01.0 PCI bridge: Intel Corp. 82850 850 (Tehama) Chipset AGP Bridge (rev 04) (prog-if 00 [Normal decode])
        Flags: bus master, 66Mhz, fast devsel, latency 64
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
        Memory behind bridge: fc000000-fdffffff
        Prefetchable memory behind bridge: e0000000-e7ffffff

00:02.0 PCI bridge: Intel Corp. 82860 860 (Wombat) Chipset AGP Bridge (rev 04) (prog-if 00 [Normal decode])
        Flags: bus master, 66Mhz, fast devsel, latency 64
        Bus: primary=00, secondary=02, subordinate=03, sec-latency=0
        I/O behind bridge: 0000e000-0000efff
        Memory behind bridge: fe100000-fe4fffff

00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB PCI Bridge (rev 04) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=04, subordinate=04, sec-latency=64
        I/O behind bridge: 0000d000-0000dfff
        Memory behind bridge: fa000000-fbffffff
        Prefetchable memory behind bridge: e8000000-e9ffffff
                                
00:1f.0 ISA bridge: Intel Corp. 82801BA ISA Bridge (LPC) (rev 04)
        Flags: bus master, medium devsel, latency 0
                                
00:1f.1 IDE interface: Intel Corp. 82801BA IDE U100 (rev 04) (prog-if 80 [Master])
        Subsystem: Dell Computer Corporation: Unknown device 00d8
        Flags: bus master, medium devsel, latency 0
        I/O ports at ffa0 [size=16]
                                
00:1f.2 USB Controller: Intel Corp. 82801BA/BAM USB (Hub #1) (rev 04) (prog-if 00 [UHCI])
        Subsystem: Dell Computer Corporation: Unknown device 00d8
        Flags: bus master, medium devsel, latency 0, IRQ 19
        I/O ports at ff80 [size=32]
                                
00:1f.3 SMBus: Intel Corp. 82801BA/BAM SMBus (rev 04)
        Subsystem: Dell Computer Corporation: Unknown device 00d8
        Flags: medium devsel, IRQ 17
        I/O ports at ccd0 [size=16]
                                
00:1f.4 USB Controller: Intel Corp. 82801BA/BAM USB (Hub #2) (rev 04) (prog-if 00 [UHCI])
        Subsystem: Dell Computer Corporation: Unknown device 00d8
        Flags: bus master, medium devsel, latency 0, IRQ 23
        I/O ports at ff60 [size=32]
                                
00:1f.5 Multimedia audio controller: Intel Corp. 82801BA/BAM AC'97 Audio (rev 04)
        Subsystem: Dell Computer Corporation: Unknown device 00d8
        Flags: bus master, medium devsel, latency 0, IRQ 17
        I/O ports at c800 [size=256]
        I/O ports at cc40 [size=64]
                                
01:00.0 VGA compatible controller: nVidia Corporation NV10DDR [GeForce 256 DDR] (rev 10) (prog-if 00 [VGA])
        Subsystem: nVidia Corporation: Unknown device 0014
        Flags: 66Mhz, medium devsel, IRQ 16
        Memory at fc000000 (32-bit, non-prefetchable) [size=16M]
        Memory at e0000000 (32-bit, prefetchable) [size=128M]
        Expansion ROM at c1000000 [disabled] [size=64K]
        Capabilities: [60] Power Management version 1
        Capabilities: [44] AGP version 2.0
                                
02:1f.0 PCI bridge: Intel Corp. 82806AA PCI64 Hub PCI Bridge (rev 03) (prog-if 00 [Normal decode])
        Flags: bus master, 66Mhz, fast devsel, latency 0
        Bus: primary=02, secondary=03, subordinate=03, sec-latency=64
        I/O behind bridge: 0000e000-0000efff
        Memory behind bridge: fe200000-fe4fffff
                                
03:00.0 PIC: Intel Corp. 82806AA PCI64 Hub Advanced Programmable Interrupt Controller (rev 01) (prog-if 20 [IO(X)-APIC])
        Subsystem: Intel Corp. 82806AA PCI64 Hub APIC
        Flags: fast devsel      
        Memory at fe3ff000 (32-bit, non-prefetchable) [disabled] [size=4K]

03:0c.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08)
        Subsystem: Intel Corp. EtherExpress PRO/100+ Management Adapter
        Flags: bus master, medium devsel, latency 64, IRQ 20
        Memory at fe3fe000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at ecc0 [size=64]
        Memory at fe200000 (32-bit, non-prefetchable) [size=1M]
        Expansion ROM at fe400000 [disabled] [size=1M]
        Capabilities: [dc] Power Management version 2

03:0e.0 SCSI storage controller: Adaptec AIC-7892P U160/m (rev 02)
        Subsystem: Dell Computer Corporation: Unknown device 00d8
        Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 22
        BIST result: 00
        I/O ports at e800 [disabled] [size=256]
        Memory at fe3fd000 (64-bit, non-prefetchable) [size=4K]
        Expansion ROM at fe400000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2

04:0c.0 FireWire (IEEE 1394): Texas Instruments TSB12LV26 IEEE-1394 Controller (Link) (prog-if 10 [OHCI])
        Subsystem: Dell Computer Corporation: Unknown device 00d8
        Flags: bus master, medium devsel, latency 64, IRQ 16
        Memory at fbeff800 (32-bit, non-prefetchable) [size=2K]
        Memory at fbef8000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [44] Power Management version 1

04:0e.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 07)
        Subsystem: Creative Labs CT4780 SBLive! Value
        Flags: bus master, medium devsel, latency 64, IRQ 18
        I/O ports at dce0 [size=32]
        Capabilities: [dc] Power Management version 1

04:0e.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 07)
        Subsystem: Creative Labs Gameport Joystick
        Flags: bus master, medium devsel, latency 64
        I/O ports at dcd8 [size=8]
        Capabilities: [dc] Power Management version 1

04:0f.0 VGA compatible controller: nVidia Corporation NV6 [Vanta/Vanta LT] (rev 15) (prog-if 00 [VGA])
        Subsystem: Creative Labs: Unknown device 1039
        Flags: 66Mhz, medium devsel, IRQ 19
        Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
        Memory at e8000000 (32-bit, prefetchable) [size=32M]
        Expansion ROM at <unassigned> [disabled] [size=64K]
        Capabilities: [60] Power Management version 1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.5.67-mm3: Bad: scheduling while atomic with IEEE1394 then hard freeze ( lockup on CPU0)
  2003-04-15 22:05 2.5.67-mm3: Bad: scheduling while atomic with IEEE1394 then hard freeze ( lockup on CPU0) Philippe Gramoullé
@ 2003-04-15 23:05 ` Andrew Morton
  2003-04-15 23:17   ` Philippe Gramoullé
                     ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Andrew Morton @ 2003-04-15 23:05 UTC (permalink / raw)
  To: Philippe Gramoullé; +Cc: linux-kernel, Greg KH

Philippe Gramoullé  <philippe.gramoulle@mmania.com> wrote:
>
> 
> http://www.philou.org/2.5.67-mm3/2.5.67-mm3.log

This is a great bug report.  Thanks.

The 1394 warnings are known about and I think Ben is working on it.

The NMI watchdog hit is nasty:

NMI Watchdog detected LOCKUP on CPU0, eip c011eb82, registers:
CPU:    0
EIP:    0060:[<c011eb82>]    Tainted: GF  VLI
EFLAGS: 00200086
EIP is at .text.lock.sched+0x10c/0x12a
eax: d79c8000   ebx: d8c578fc   ecx: 00000000   edx: d8c57800
esi: c03a9d20   edi: d774a0c0   ebp: d79c9d94   esp: d79c9d88
ds: 007b   es: 007b   ss: 0068
Process gkrellm (pid: 458, threadinfo=d79c8000 task=dd7152a0)
Stack: d8c578fc d7eaa400 d774a0c0 d79c9da4 c0235e80 c03a9d20 d77491a0 d79c9db0 
       c0265b88 d8c578fc d79c9dbc e0a9d76c d8c578d0 d79c9de0 e0aa1c61 d8c57800 
       e0a97b62 d7d2f894 00200286 00000008 00000004 e0ab38bc d79c9e08 e0aa25f5 
Call Trace:
 [<c0235e80>] kobject_get+0x70/0x80
 [<c0265b88>] get_device+0x18/0x30
 [<e0a9d76c>] usb_get_dev+0x1c/0x30 [usbcore]
 [<e0aa1c61>] hcd_submit_urb+0x71/0x180 [usbcore]
 [<e0a97b62>] hidinput_report_event+0x32/0x50 [hid]
 [<e0ab38bc>] usb_hcd_operations+0x0/0x24 [usbcore]
 [<e0aa25f5>] usb_submit_urb+0x1d5/0x250 [usbcore]
 [<e0a95274>] hid_irq_in+0x34/0xb0 [hid]
 [<e0aa2104>] usb_hcd_giveback_urb+0x24/0x40 [usbcore]
 [<e0a8f23f>] uhci_finish_completion+0x8f/0xf0 [uhci_hcd]
 [<e0aa214c>] usb_hcd_irq+0x2c/0x60 [usbcore]
 [<c010d7f8>] handle_IRQ_event+0x38/0x60
 [<c010da74>] do_IRQ+0xc4/0x190
 [<c010be0c>] common_interrupt+0x18/0x20
 [<c016007b>] unregister_chrdev_region+0x2b/0x100
 [<c0235e2e>] kobject_get+0x1e/0x80
 [<c018b2a0>] check_perm+0x20/0x120
 [<c0157aa7>] get_empty_filp+0x77/0x100
 [<c0155f5f>] dentry_open+0x21f/0x250
 [<c0155d36>] filp_open+0x66/0x70
 [<c0164423>] getname+0x93/0xd0
 [<c01562c5>] sys_open+0x55/0x90
 [<c010b49f>] syscall_call+0x7/0xb

What has happened here is that you were in the middle of a kobject_get(),
holding spin_lock(&kobj_lock) when an interrupt came in.  The USB interrupt
handler comes in and ends up calling kobject_get() again.  This CPU already
holds the lock and blamyouredead.

Turning kobj_lock into an IRQ-safe lock would appear to be a sufficient fix.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.5.67-mm3: Bad: scheduling while atomic with IEEE1394 then hard freeze ( lockup on CPU0)
  2003-04-15 23:05 ` Andrew Morton
@ 2003-04-15 23:17   ` Philippe Gramoullé
  2003-04-15 23:34     ` Andrew Morton
  2003-04-16  0:49   ` Ben Collins
  2003-04-18 18:51   ` Florin Iucha
  2 siblings, 1 reply; 17+ messages in thread
From: Philippe Gramoullé @ 2003-04-15 23:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Greg KH

Hello,

On Tue, 15 Apr 2003 16:05:30 -0700
Andrew Morton <akpm@digeo.com> wrote:

  | Philippe Gramoullé  <philippe.gramoulle@mmania.com> wrote:
  | >
  | > 
  | > http://www.philou.org/2.5.67-mm3/2.5.67-mm3.log
  | 
  | This is a great bug report.  Thanks.

Well, i finally managed to get some output when i learned about the
nmi_watchdog boot option ( reading another thread about debugging hard hangs)
so i'm pleased if this report helped :)
  | 
  | The 1394 warnings are known about and I think Ben is working on it.

Ok, great. I think this is one of the latest thing that prevents me to
use 2.5.x almost full time.

  | 
  | The NMI watchdog hit is nasty:
  | 
[snip]
  | 
  | What has happened here is that you were in the middle of a kobject_get(),
  | holding spin_lock(&kobj_lock) when an interrupt came in.  The USB interrupt
  | handler comes in and ends up calling kobject_get() again.  This CPU already
  | holds the lock and blamyouredead.
  | 
  | Turning kobj_lock into an IRQ-safe lock would appear to be a sufficient fix.

I'll wait for the fix and will happily try it once it's available.

Thanks,

Philippe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.5.67-mm3: Bad: scheduling while atomic with IEEE1394 then hard freeze ( lockup on CPU0)
  2003-04-15 23:17   ` Philippe Gramoullé
@ 2003-04-15 23:34     ` Andrew Morton
  2003-04-16  5:54       ` Greg KH
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2003-04-15 23:34 UTC (permalink / raw)
  To: Philippe Gramoullé; +Cc: linux-kernel, greg

Philippe Gramoullé <philippe.gramoulle@mmania.com> wrote:
>
> I'll wait for the fix and will happily try it once it's available.

Something like this...

diff -puN lib/kobject.c~kobj_lock-fix lib/kobject.c
--- 25/lib/kobject.c~kobj_lock-fix	Tue Apr 15 16:31:28 2003
+++ 25-akpm/lib/kobject.c	Tue Apr 15 16:34:33 2003
@@ -336,12 +336,14 @@ void kobject_unregister(struct kobject *
 struct kobject * kobject_get(struct kobject * kobj)
 {
 	struct kobject * ret = kobj;
-	spin_lock(&kobj_lock);
+	unsigned long flags;
+
+	spin_lock_irqsave(&kobj_lock, flags);
 	if (kobj && atomic_read(&kobj->refcount) > 0)
 		atomic_inc(&kobj->refcount);
 	else
 		ret = NULL;
-	spin_unlock(&kobj_lock);
+	spin_unlock_irqrestore(&kobj_lock, flags);
 	return ret;
 }
 
@@ -371,10 +373,15 @@ void kobject_cleanup(struct kobject * ko
 
 void kobject_put(struct kobject * kobj)
 {
-	if (!atomic_dec_and_lock(&kobj->refcount, &kobj_lock))
-		return;
-	spin_unlock(&kobj_lock);
-	kobject_cleanup(kobj);
+	unsigned long flags;
+
+	local_irq_save(flags);
+	if (atomic_dec_and_lock(&kobj->refcount, &kobj_lock)) {
+		spin_unlock_irqrestore(&kobj_lock, flags);
+		kobject_cleanup(kobj);
+	} else {
+		local_irq_restore(flags);
+	}
 }
 
 

_


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.5.67-mm3: Bad: scheduling while atomic with IEEE1394 then hard freeze ( lockup on CPU0)
  2003-04-15 23:05 ` Andrew Morton
  2003-04-15 23:17   ` Philippe Gramoullé
@ 2003-04-16  0:49   ` Ben Collins
  2003-04-16 16:45     ` Philippe Gramoullé
  2003-04-18 18:51   ` Florin Iucha
  2 siblings, 1 reply; 17+ messages in thread
From: Ben Collins @ 2003-04-16  0:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Philippe Gramoull?, linux-kernel, Greg KH

On Tue, Apr 15, 2003 at 04:05:30PM -0700, Andrew Morton wrote:
> Philippe Gramoull?  <philippe.gramoulle@mmania.com> wrote:
> >
> > 
> > http://www.philou.org/2.5.67-mm3/2.5.67-mm3.log
> 
> This is a great bug report.  Thanks.
> 
> The 1394 warnings are known about and I think Ben is working on it.

Yeah, they are fixed in the linux1394 tree. I'm getting ready to push
them to Linus.

-- 
Debian     - http://www.debian.org/
Linux 1394 - http://www.linux1394.org/
Subversion - http://subversion.tigris.org/
Deqo       - http://www.deqo.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.5.67-mm3: Bad: scheduling while atomic with IEEE1394 then hard freeze ( lockup on CPU0)
  2003-04-15 23:34     ` Andrew Morton
@ 2003-04-16  5:54       ` Greg KH
  2003-04-16 16:58         ` Patrick Mochel
  0 siblings, 1 reply; 17+ messages in thread
From: Greg KH @ 2003-04-16  5:54 UTC (permalink / raw)
  To: Andrew Morton, Patrick Mochel; +Cc: Philippe Gramoullé, linux-kernel

On Tue, Apr 15, 2003 at 04:34:56PM -0700, Andrew Morton wrote:
> Philippe Gramoullé <philippe.gramoulle@mmania.com> wrote:
> >
> > I'll wait for the fix and will happily try it once it's available.
> 
> Something like this...
> 
> diff -puN lib/kobject.c~kobj_lock-fix lib/kobject.c
> --- 25/lib/kobject.c~kobj_lock-fix	Tue Apr 15 16:31:28 2003
> +++ 25-akpm/lib/kobject.c	Tue Apr 15 16:34:33 2003
> @@ -336,12 +336,14 @@ void kobject_unregister(struct kobject *
>  struct kobject * kobject_get(struct kobject * kobj)
>  {
>  	struct kobject * ret = kobj;
> -	spin_lock(&kobj_lock);
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&kobj_lock, flags);
>  	if (kobj && atomic_read(&kobj->refcount) > 0)
>  		atomic_inc(&kobj->refcount);
>  	else
>  		ret = NULL;
> -	spin_unlock(&kobj_lock);
> +	spin_unlock_irqrestore(&kobj_lock, flags);
>  	return ret;
>  }
>  
> @@ -371,10 +373,15 @@ void kobject_cleanup(struct kobject * ko
>  
>  void kobject_put(struct kobject * kobj)
>  {
> -	if (!atomic_dec_and_lock(&kobj->refcount, &kobj_lock))
> -		return;
> -	spin_unlock(&kobj_lock);
> -	kobject_cleanup(kobj);
> +	unsigned long flags;
> +
> +	local_irq_save(flags);
> +	if (atomic_dec_and_lock(&kobj->refcount, &kobj_lock)) {
> +		spin_unlock_irqrestore(&kobj_lock, flags);
> +		kobject_cleanup(kobj);
> +	} else {
> +		local_irq_restore(flags);
> +	}
>  }

CCed Pat, as this is his territory.

Hm yeah, this will fix the problem.  But is there anyway we can do this
without a lock at all?  I think we wouldn't need the lock, if we didn't
test the refcount for > 0, right?  Pat, that just keeps us from getting
a reference count on a kobject that hasn't been initialized, right?
That is a good idea to do, but is it really necessary?

If only atomic_inc_return() was defined for all platforms we might be
able to do the following, dropping the lock entirely:

struct kobject * kobject_get(struct kobject * kobj)
{
	struct kobject * ret = kobj;
	if (kobj)
		if (atomic_inc_return(kobj->refcount) <= 1) {
			atomic_dec(kobj->refcount);
			ret = NULL;
		}
	else
		ret = NULL;
	return ret;
}

void kobject_put(struct kobject * kobj)
{
	if (!atomic_dec(&kobj->refcount))
		return;
	kobject_cleanup(kobj);
}


Or am I missing something?

Anyone know how to whip up a atomic_inc_return() for the platforms
missing it?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.5.67-mm3: Bad: scheduling while atomic with IEEE1394 then hard freeze ( lockup on CPU0)
  2003-04-16  0:49   ` Ben Collins
@ 2003-04-16 16:45     ` Philippe Gramoullé
  2003-04-16 17:32       ` Steve Kinneberg
  2003-04-16 18:09       ` Ben Collins
  0 siblings, 2 replies; 17+ messages in thread
From: Philippe Gramoullé @ 2003-04-16 16:45 UTC (permalink / raw)
  To: Ben Collins; +Cc: Andrew Morton, linux-kernel, Greg KH, linux1394-devel

Hello,

On Tue, 15 Apr 2003 20:49:33 -0400
Ben Collins <bcollins@debian.org> wrote:

  | > The 1394 warnings are known about and I think Ben is working on it.
  | 
  | Yeah, they are fixed in the linux1394 tree. I'm getting ready to push
  | them to Linus.

You mean the tree available with :

svn checkout svn://svn.linux1394.org/ieee1394/trunk/ ieee1394 ?

Because i tried with checkouted revision 867 few minutes ago and i still have the "bad: scheduling
while atomic!" message when i rmmod the modules ( modules init tools were upgraded to 0.9.11a)

DV camcorder still doesn't seem to work ( with dvgrab for example )

# dmesg
oot is not IRM capable, resetting...
ieee1394: Remote root is not IRM capable, resetting...
ieee1394: Remote root is not IRM capable, resetting...
ieee1394: Remote root is not IRM capable, resetting...
[message repeated 178 times and as long as the DV Camcorder in turned on]
ieee1394: Remote root is not IRM capable, resetting...
ieee1394: Remote root is not IRM capable, resetting...

Starting to rmmod the 1394 modules:

dv1394: shutdown...
dv1394: stop_dma: already stopped.
dv1394: shutdown complete
dv1394: shutdown...
dv1394: stop_dma: already stopped.
dv1394: shutdown complete
dv1394: shutdown...
dv1394: stop_dma: already stopped.
dv1394: shutdown complete
dv1394: shutdown...
dv1394: stop_dma: already stopped.
dv1394: shutdown complete
bad: scheduling while atomic!
Call Trace:
 [<c011cccb>] schedule+0x53b/0x540
 [<c011d06d>] wait_for_completion+0x9d/0xf0
 [<c011cd20>] default_wake_function+0x0/0x20
 [<c011cd20>] default_wake_function+0x0/0x20
 [<c012ba9f>] kill_proc_info+0x4f/0x80
 [<e0b84f4b>] nodemgr_remove_host+0x8b/0x100 [ieee1394]
 [<e0b80916>] highlevel_remove_host+0x66/0x70 [ieee1394]
 [<e0b4e688>] ohci1394_pci_driver+0x28/0xa0 [ohci1394]
 [<e0b80269>] hpsb_remove_host+0x29/0x80 [ieee1394]
 [<e0b4e688>] ohci1394_pci_driver+0x28/0xa0 [ohci1394]
 [<e0b4b84e>] ohci1394_pci_remove+0x3e/0x160 [ohci1394]
 [<c023c846>] pci_device_remove+0x36/0x40
 [<c0266866>] device_release_driver+0x66/0x70
 [<e0b4e6dc>] ohci1394_pci_driver+0x7c/0xa0 [ohci1394]
 [<e0b4e6dc>] ohci1394_pci_driver+0x7c/0xa0 [ohci1394]
 [<c026689b>] driver_detach+0x2b/0x40
 [<e0b4e688>] ohci1394_pci_driver+0x28/0xa0 [ohci1394]
 [<c0266b2c>] bus_remove_driver+0x3c/0x80
 [<e0b4e688>] ohci1394_pci_driver+0x28/0xa0 [ohci1394]
 [<e0b4e688>] ohci1394_pci_driver+0x28/0xa0 [ohci1394]
 [<c0266f74>] driver_unregister+0x14/0x2a
 [<e0b4e688>] ohci1394_pci_driver+0x28/0xa0 [ohci1394]
 [<e0b4e700>] +0x0/0x100 [ohci1394]
 [<e0b4bcf2>] +0x12/0x20 [ohci1394]
 [<e0b4e688>] ohci1394_pci_driver+0x28/0xa0 [ohci1394]
 [<c0136f94>] sys_delete_module+0x1a4/0x1e0
 [<e0b4e700>] +0x0/0x100 [ohci1394]
 [<c014a134>] sys_munmap+0x44/0x70
 [<c010b49f>] syscall_call+0x7/0xb


Thanks,

Philippe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.5.67-mm3: Bad: scheduling while atomic with IEEE1394 then hard freeze ( lockup on CPU0)
  2003-04-16  5:54       ` Greg KH
@ 2003-04-16 16:58         ` Patrick Mochel
  2003-04-16 23:40           ` Philippe Gramoullé
  0 siblings, 1 reply; 17+ messages in thread
From: Patrick Mochel @ 2003-04-16 16:58 UTC (permalink / raw)
  To: Greg KH; +Cc: Andrew Morton, Philippe Gramoullé, linux-kernel


> Hm yeah, this will fix the problem.  But is there anyway we can do this
> without a lock at all?  I think we wouldn't need the lock, if we didn't
> test the refcount for > 0, right?  Pat, that just keeps us from getting
> a reference count on a kobject that hasn't been initialized, right?
> That is a good idea to do, but is it really necessary?

Well, it also prevents us from getting a reference to an object that is 
being deleted. The chance of that happening is slim, and it indicates a 
need for synchronization at a higher level (if the kobject_get() on the 
object being deleted was to be delayed for some reason, they would likely 
be referencing freed memory anyway). 

And, in theory, an uninitialized object shouldn't be accessible, and any 
code doing that is buggy, and will have other problems. 

In short, I think we can remove the locks entirely. We can at least see 
what happens.. 


	-pat

===== lib/kobject.c 1.19 vs edited =====
--- 1.19/lib/kobject.c	Sat Apr 12 16:20:38 2003
+++ edited/lib/kobject.c	Wed Apr 16 09:57:15 2003
@@ -9,8 +9,6 @@
 #include <linux/module.h>
 #include <linux/stat.h>
 
-static spinlock_t kobj_lock = SPIN_LOCK_UNLOCKED;
-
 /**
  *	populate_dir - populate directory with attributes.
  *	@kobj:	object we're working on.
@@ -336,12 +334,10 @@
 struct kobject * kobject_get(struct kobject * kobj)
 {
 	struct kobject * ret = kobj;
-	spin_lock(&kobj_lock);
-	if (kobj && atomic_read(&kobj->refcount) > 0)
+	if (kobj)
 		atomic_inc(&kobj->refcount);
 	else
 		ret = NULL;
-	spin_unlock(&kobj_lock);
 	return ret;
 }
 
@@ -371,10 +367,8 @@
 
 void kobject_put(struct kobject * kobj)
 {
-	if (!atomic_dec_and_lock(&kobj->refcount, &kobj_lock))
-		return;
-	spin_unlock(&kobj_lock);
-	kobject_cleanup(kobj);
+	if (atomic_dec_and_test(&kobj->refcount))
+		kobject_cleanup(kobj);
 }
 
 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.5.67-mm3: Bad: scheduling while atomic with IEEE1394 then hard freeze ( lockup on CPU0)
  2003-04-16 16:45     ` Philippe Gramoullé
@ 2003-04-16 17:32       ` Steve Kinneberg
  2003-04-16 22:30         ` Philippe Gramoullé
  2003-04-16 18:09       ` Ben Collins
  1 sibling, 1 reply; 17+ messages in thread
From: Steve Kinneberg @ 2003-04-16 17:32 UTC (permalink / raw)
  To: Philippe Gramoullé
  Cc: Ben Collins, Andrew Morton, linux-kernel, Greg KH, Linux1394dev

On Wed, 2003-04-16 at 09:45, Philippe Gramoullé wrote:
> 
> # dmesg
> oot is not IRM capable, resetting...
> ieee1394: Remote root is not IRM capable, resetting...
> ieee1394: Remote root is not IRM capable, resetting...
> ieee1394: Remote root is not IRM capable, resetting...
> [message repeated 178 times and as long as the DV Camcorder in turned on]

I realize this isn't the problem you're really concerned about, but the
above may happen if you are using a version of the 1394 code off the
linux-2.4 branch prior to the patch I sent to the list Monday that Ben
recently applied.  (You should be able to get around this without
downloading the latest code and recompiling by setting attempt_root=1
when insmodding ohci1394.

-- 
Steve Kinneberg
ACM Systems
3034 Gold Canal Drive
Rancho Cordova, CA  95670
Phone: (916) 463-7987
Email: kinnebergsteve@acmsystems.com


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.5.67-mm3: Bad: scheduling while atomic with IEEE1394 then hard freeze ( lockup on CPU0)
  2003-04-16 16:45     ` Philippe Gramoullé
  2003-04-16 17:32       ` Steve Kinneberg
@ 2003-04-16 18:09       ` Ben Collins
  1 sibling, 0 replies; 17+ messages in thread
From: Ben Collins @ 2003-04-16 18:09 UTC (permalink / raw)
  To: Philippe Gramoull?; +Cc: Andrew Morton, linux-kernel, Greg KH, linux1394-devel

On Wed, Apr 16, 2003 at 06:45:28PM +0200, Philippe Gramoull? wrote:
> Hello,
> 
> On Tue, 15 Apr 2003 20:49:33 -0400
> Ben Collins <bcollins@debian.org> wrote:
> 
>   | > The 1394 warnings are known about and I think Ben is working on it.
>   | 
>   | Yeah, they are fixed in the linux1394 tree. I'm getting ready to push
>   | them to Linus.
> 
> You mean the tree available with :
> 
> svn checkout svn://svn.linux1394.org/ieee1394/trunk/ ieee1394 ?
> 
> Because i tried with checkouted revision 867 few minutes ago and i still have the "bad: scheduling
> while atomic!" message when i rmmod the modules ( modules init tools were upgraded to 0.9.11a)
> 
> DV camcorder still doesn't seem to work ( with dvgrab for example )

Thanks, this was one I wasn't aware of.

-- 
Debian     - http://www.debian.org/
Linux 1394 - http://www.linux1394.org/
Subversion - http://subversion.tigris.org/
Deqo       - http://www.deqo.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.5.67-mm3: Bad: scheduling while atomic with IEEE1394 then hard freeze ( lockup on CPU0)
  2003-04-16 17:32       ` Steve Kinneberg
@ 2003-04-16 22:30         ` Philippe Gramoullé
  2003-04-16 23:35           ` Steve Kinneberg
  2003-04-17  2:48           ` Dan Maas
  0 siblings, 2 replies; 17+ messages in thread
From: Philippe Gramoullé @ 2003-04-16 22:30 UTC (permalink / raw)
  To: Steve Kinneberg
  Cc: Ben Collins, Andrew Morton, linux-kernel, Greg KH, Linux1394dev


Hello,

On 16 Apr 2003 10:32:54 -0700
Steve Kinneberg <kinnebergsteve@acmsystems.com> wrote:

  | On Wed, 2003-04-16 at 09:45, Philippe Gramoullé wrote:
  | > 
  | > # dmesg
  | > oot is not IRM capable, resetting...
  | > ieee1394: Remote root is not IRM capable, resetting...
  | > ieee1394: Remote root is not IRM capable, resetting...
  | > ieee1394: Remote root is not IRM capable, resetting...
  | > [message repeated 178 times and as long as the DV Camcorder in turned on]
  | 
  | I realize this isn't the problem you're really concerned about, but the
  | above may happen if you are using a version of the 1394 code off the
  | linux-2.4 branch prior to the patch I sent to the list Monday that Ben
  | recently applied.  (You should be able to get around this without
  | downloading the latest code and recompiling by setting attempt_root=1
  | when insmodding ohci1394.

Thanks for the tip. Anyway, for me 2.4 is no problem. Looking through the archives i saw that checking not the latest
code worked for me(tm) but this was some time ago and things may have changed ( i.e latest 2.4
code works as expected)

For 2.5, the thing is i remember being able to successfully use my DV Camcorder 
Canon Optura 200MC ( MVX2i in Europe) with IEEE1394 , around 2.5.59 IIRC.

Since then, i only got these "reset storms" versions over versions. Not that i complain about
but it's just that i hope that reporting bugs will be helpful to IEEE1394 developers, because
if it worked once,  then i don't see why i wouldn't work either with newer versions 8)

The goal for me is to switch asap to 2.5 as i see much improvements in my day to day
desktop use.

Thanks,

Philippe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.5.67-mm3: Bad: scheduling while atomic with IEEE1394 then hard freeze ( lockup on CPU0)
  2003-04-16 22:30         ` Philippe Gramoullé
@ 2003-04-16 23:35           ` Steve Kinneberg
  2003-04-16 23:52             ` Philippe Gramoullé
  2003-04-17  2:48           ` Dan Maas
  1 sibling, 1 reply; 17+ messages in thread
From: Steve Kinneberg @ 2003-04-16 23:35 UTC (permalink / raw)
  To: Philippe Gramoullé
  Cc: Ben Collins, Andrew Morton, linux-kernel, Greg KH, Linux1394dev

On Wed, 2003-04-16 at 15:30, Philippe Gramoullé wrote:
> 
> Hello,
> 
> On 16 Apr 2003 10:32:54 -0700
> Steve Kinneberg <kinnebergsteve@acmsystems.com> wrote:
> 
>   | On Wed, 2003-04-16 at 09:45, Philippe Gramoullé wrote:
>   | > 
>   | > # dmesg
>   | > oot is not IRM capable, resetting...
>   | > ieee1394: Remote root is not IRM capable, resetting...
>   | > ieee1394: Remote root is not IRM capable, resetting...
>   | > ieee1394: Remote root is not IRM capable, resetting...
>   | > [message repeated 178 times and as long as the DV Camcorder in turned on]
>   | 
>   | I realize this isn't the problem you're really concerned about, but the
>   | above may happen if you are using a version of the 1394 code off the
>   | linux-2.4 branch prior to the patch I sent to the list Monday that Ben
>   | recently applied.  (You should be able to get around this without
>   | downloading the latest code and recompiling by setting attempt_root=1
>   | when insmodding ohci1394.
> 
> Thanks for the tip. Anyway, for me 2.4 is no problem. Looking through the archives i saw that checking not the latest
> code worked for me(tm) but this was some time ago and things may have changed ( i.e latest 2.4
> code works as expected)

My bad for not reading the subject line more closely.

> 
> For 2.5, the thing is i remember being able to successfully use my DV Camcorder 
> Canon Optura 200MC ( MVX2i in Europe) with IEEE1394 , around 2.5.59 IIRC.
> 
> Since then, i only got these "reset storms" versions over versions. Not that i complain about
> but it's just that i hope that reporting bugs will be helpful to IEEE1394 developers, because
> if it worked once,  then i don't see why i wouldn't work either with newer versions 8)

The code that prints "ieee1394: Remote root is not IRM capable,
resetting..." was added almost 2 months ago to the 1394 SVN trunk, so
its still fairly recent and probably after 2.5.59.  If this message
repeats rapidly under recent 2.5.*, then there is a problem with
initiating a bus reset and forcing the local node to be root.  My
recollection of the 1394 spec is that PHY packet needs to be sent out to
all nodes to clear the their root hold-off bit and the local node sets
its own root hold-off bit.  The OHCI 1394 code doesn't appear to do
anything special to send a PHY packet (it does set the root hold-off bit
in the local PHY chip) and I wonder if that might not be the source of
this problem.  Does anyone, who understands the PHY chip, know if it
automatically sends the appropriate PHY packet when this bit is set?  If
not, we may need to add code to send it.

If anyone can answer the one question I posed above, I'd greatly
appreciate it.

Thanks,
-- 
Steve Kinneberg
ACM Systems
3034 Gold Canal Drive
Rancho Cordova, CA  95670
Phone: (916) 463-7987
Email: kinnebergsteve@acmsystems.com


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.5.67-mm3: Bad: scheduling while atomic with IEEE1394 then hard freeze ( lockup on CPU0)
  2003-04-16 16:58         ` Patrick Mochel
@ 2003-04-16 23:40           ` Philippe Gramoullé
  2003-04-17  3:54             ` Greg KH
  0 siblings, 1 reply; 17+ messages in thread
From: Philippe Gramoullé @ 2003-04-16 23:40 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: Greg KH, Andrew Morton, linux-kernel


Hello,

On Wed, 16 Apr 2003 09:58:36 -0700 (PDT)
Patrick Mochel <mochel@osdl.org> wrote:

  | In short, I think we can remove the locks entirely. We can at least see 
  | what happens.. 

Reverting Andrew's patch and applying yours resulted in not being able to boot:

Captured through the serial console:

Linux version 2.5.67-mm3 (root@test) (gcc version 3.2.3 20030407 (Debian prerelease)) #6 SMP Thu Apr 17 01:26:22 CEST 2003
Video mode to be used for restore is ffff
BIOS-provided physical RAM map: 
 BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000001ff77000 (usable)
 BIOS-e820: 000000001ff77000 - 000000001ff79000 (ACPI NVS)
 BIOS-e820: 000000001ff79000 - 0000000020000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee10000 (reserved)
 BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
511MB LOWMEM available.         
found SMP MP-table at 000fe710  
hm, page 000fe000 reserved twice.
hm, page 000ff000 reserved twice.
hm, page 000f0000 reserved twice.
On node 0 totalpages: 130935    
  DMA zone: 4096 pages, LIFO batch:1
  Normal zone: 126839 pages, LIFO batch:16
  HighMem zone: 0 pages, LIFO batch:1
Intel MultiProcessor Specification v1.4
    Virtual Wire compatibility mode.
OEM ID: DELL     Product ID: WS 530       APIC at: 0xFEE00000
Processor #0 15:1 APIC version 20
Processor #1 15:1 APIC version 20
I/O APIC #2 Version 32 at 0xFEC00000.
Enabling APIC mode:  Cluster.  Using 1 I/O APICs
Processors: 2                   
Building zonelist for node : 0  
Kernel command line: root=/dev/sda2 console=tty1 console=ttyS1,9600n8 elevator=as nmi_watchdog=1
Initializing CPU#0              
PID hash table entries: 2048 (order 11: 16384 bytes)
Detected 1495.331 MHz processor.
Console: colour VGA+ 80x25      
Calibrating delay loop... 2949.12 BogoMIPS
Memory: 513740k/523740k available (2622k kernel code, 9232k reserved, 768k data, 348k init, 0k highmem)
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
bad: scheduling while atomic!
Call Trace:
 [<c011cccb>] schedule+0x53b/0x540
 [<c02535f6>] poke_blanked_console+0x56/0x70
 [<c02528c4>] vt_console_print+0x214/0x300
 [<c02369ac>] rwsem_down_write_failed+0xac/0x160
 [<c0105000>] _stext+0x0/0x70
 [<c023611a>] .text.lock.kobject+0x26/0x6c
 [<c0235ea7>] kset_init+0x17/0x30
 [<c023600c>] subsystem_register+0x1c/0x30
 [<c01737ae>] register_filesystem+0xae/0xd0
 [<c0462b35>] sysfs_init+0x15/0x60
 [<c04621bb>] fs_subsys_init+0xb/0x40
 [<c0105000>] _stext+0x0/0x70
 [<c0462440>] mnt_init+0xe0/0x110
 [<c0461fb6>] vfs_caches_init+0xa6/0xd0
 [<c01579a0>] filp_ctor+0x0/0x50
 [<c01579f0>] filp_dtor+0x0/0x40
 [<c0452918>] start_kernel+0x148/0x1d0
 [<c0452500>] unknown_bootoption+0x0/0x110

bad: scheduling while atomic!
Call Trace:
 [<c011cccb>] schedule+0x53b/0x540
 [<c02528c4>] vt_console_print+0x214/0x300
 [<c02369ac>] rwsem_down_write_failed+0xac/0x160
 [<c0105000>] _stext+0x0/0x70
 [<c023611a>] .text.lock.kobject+0x26/0x6c
 [<c0235ea7>] kset_init+0x17/0x30
 [<c023600c>] subsystem_register+0x1c/0x30
 [<c01737ae>] register_filesystem+0xae/0xd0
 [<c0462b35>] sysfs_init+0x15/0x60
 [<c04621bb>] fs_subsys_init+0xb/0x40
 [<c0105000>] _stext+0x0/0x70
 [<c0462440>] mnt_init+0xe0/0x110
 [<c0461fb6>] vfs_caches_init+0xa6/0xd0
 [<c01579a0>] filp_ctor+0x0/0x50
 [<c01579f0>] filp_dtor+0x0/0x40
 [<c0452918>] start_kernel+0x148/0x1d0
 [<c0452500>] unknown_bootoption+0x0/0x110

Unable to handle kernel paging request at virtual address f000ad96
 printing eip:
c011c944
*pde = 00000000
Oops: 0002 [#1]
CPU:    0
EIP:    0060:[<c011c944>]    Not tainted VLI
EFLAGS: 00010046
EIP is at schedule+0x1b4/0x540
eax: 00000000   ebx: 00000001   ecx: 00000000   edx: f000ad1e
esi: ffffffe0   edi: c044df40   ebp: c0451efc   esp: c0451ec4
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c0450000 task=c03f2c60)
Stack: c03a2aa0 00000001 00000001 c04d0fc0 c03f2ae0 c0451f08 c02528c4 00000000 
       c00bb5c0 0000003a c03f2c60 00000040 c03f2c60 00000002 c0451f2c c02369ac 
       0000003b c0451f28 00000048 00000048 d0001174 c03f2c60 00000002 c03fa7b4 
Call Trace:
 [<c02528c4>] vt_console_print+0x214/0x300
 [<c02369ac>] rwsem_down_write_failed+0xac/0x160
 [<c0105000>] _stext+0x0/0x70
 [<c023611a>] .text.lock.kobject+0x26/0x6c
 [<c0235ea7>] kset_init+0x17/0x30
 [<c023600c>] subsystem_register+0x1c/0x30
 [<c01737ae>] register_filesystem+0xae/0xd0
 [<c0462b35>] sysfs_init+0x15/0x60
 [<c04621bb>] fs_subsys_init+0xb/0x40
 [<c0105000>] _stext+0x0/0x70
 [<c0462440>] mnt_init+0xe0/0x110
 [<c0461fb6>] vfs_caches_init+0xa6/0xd0
 [<c01579a0>] filp_ctor+0x0/0x50
 [<c01579f0>] filp_dtor+0x0/0x40
 [<c0452918>] start_kernel+0x148/0x1d0
 [<c0452500>] unknown_bootoption+0x0/0x110

Code: e0 39 55 d8 8b 48 10 0f 84 05 02 00 00 8b 45 d8 f0 0f b3 48 78 bb 01 00 00 00 89 c8 c1 e0 05 89 98 04 da 44 c0 89 90 00 da 44 c0 <f0> 0f ab 4a 78 8b 42 10 05 00 00 00 40 0f 22 d8 8b 5d d8 8b 82 

I'm back to 2.5.67-mm3 with original patch for now.

Thanks,

Philippe

--

Philippe Gramoullé
philippe.gramoulle@mmania.com
Lycos Europe - NOC France



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.5.67-mm3: Bad: scheduling while atomic with IEEE1394 then hard freeze ( lockup on CPU0)
  2003-04-16 23:35           ` Steve Kinneberg
@ 2003-04-16 23:52             ` Philippe Gramoullé
  0 siblings, 0 replies; 17+ messages in thread
From: Philippe Gramoullé @ 2003-04-16 23:52 UTC (permalink / raw)
  To: Steve Kinneberg
  Cc: Ben Collins, Andrew Morton, linux-kernel, Greg KH, Linux1394dev

On 16 Apr 2003 16:35:10 -0700
Steve Kinneberg <kinnebergsteve@acmsystems.com> wrote:

  | > Since then, i only got these "reset storms" versions over versions. Not that i complain about
  | > but it's just that i hope that reporting bugs will be helpful to IEEE1394 developers, because
  | > if it worked once,  then i don't see why i wouldn't work either with newer versions 8)
  | 
  | The code that prints "ieee1394: Remote root is not IRM capable,
  | resetting..." was added almost 2 months ago to the 1394 SVN trunk, so
  | its still fairly recent and probably after 2.5.59.

I'm rather sure it was around 2.5.59 but i couldn't swear about the exact time frame
as it might be a BK snapshot at the time.

  | If this message repeats rapidly under recent 2.5.*, then there is a
  | problem with initiating a bus reset and forcing the local node to be
  | root.


Well with 2.6.57-mm3 + latest SVN checkout, it has about 1 such line per second 
in my logs as soon as i turn on my DV Camcorder.

  
  | My recollection of the 1394 spec is that PHY packet needs to be sent out to
  | all nodes to clear the their root hold-off bit and the local node sets
  | its own root hold-off bit.  The OHCI 1394 code doesn't appear to do
  | anything special to send a PHY packet (it does set the root hold-off bit
  | in the local PHY chip) and I wonder if that might not be the source of
  | this problem.  Does anyone, who understands the PHY chip, know if it
  | automatically sends the appropriate PHY packet when this bit is set?  If
  | not, we may need to add code to send it.
  | 
  | If anyone can answer the one question I posed above, I'd greatly
  | appreciate it.

I'll happily let someone else answer :)

Thanks,

Philippe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.5.67-mm3: Bad: scheduling while atomic with IEEE1394 then hard freeze ( lockup on CPU0)
  2003-04-16 22:30         ` Philippe Gramoullé
  2003-04-16 23:35           ` Steve Kinneberg
@ 2003-04-17  2:48           ` Dan Maas
  1 sibling, 0 replies; 17+ messages in thread
From: Dan Maas @ 2003-04-17  2:48 UTC (permalink / raw)
  To: Philippe Gramoull?
  Cc: Steve Kinneberg, Ben Collins, Andrew Morton, linux-kernel,
	Greg KH, Linux1394dev

> Since then, i only got these "reset storms" versions over versions.

Jim Radford's nodemgr back-off patch fixes the reset storms, for me at
least. (check the list archives, he posted it a few weeks ago).

It should definitely be applied, but I would hold off until we fix the
nodemgr crash bug, since Jim's patch masks (but does not eliminate) it.
I'd rather force people to deal with the nodemgr crash :)

(if nobody picks up the torch on nodemgr, I'll try myself in a few
days - I'm too busy now but boy I want it FIXED! :)

Regards,
Dan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.5.67-mm3: Bad: scheduling while atomic with IEEE1394 then hard freeze ( lockup on CPU0)
  2003-04-16 23:40           ` Philippe Gramoullé
@ 2003-04-17  3:54             ` Greg KH
  0 siblings, 0 replies; 17+ messages in thread
From: Greg KH @ 2003-04-17  3:54 UTC (permalink / raw)
  To: Philippe Gramoullé; +Cc: Patrick Mochel, Andrew Morton, linux-kernel

On Thu, Apr 17, 2003 at 01:40:51AM +0200, Philippe Gramoullé wrote:
> 
> Hello,
> 
> On Wed, 16 Apr 2003 09:58:36 -0700 (PDT)
> Patrick Mochel <mochel@osdl.org> wrote:
> 
>   | In short, I think we can remove the locks entirely. We can at least see 
>   | what happens.. 
> 
> Reverting Andrew's patch and applying yours resulted in not being able to boot:

Same for me, looks like people are grabbing this lock before it's
initialized :(

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.5.67-mm3: Bad: scheduling while atomic with IEEE1394 then hard freeze ( lockup on CPU0)
  2003-04-15 23:05 ` Andrew Morton
  2003-04-15 23:17   ` Philippe Gramoullé
  2003-04-16  0:49   ` Ben Collins
@ 2003-04-18 18:51   ` Florin Iucha
  2 siblings, 0 replies; 17+ messages in thread
From: Florin Iucha @ 2003-04-18 18:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Greg KH

[-- Attachment #1: Type: text/plain, Size: 3313 bytes --]

On Tue, Apr 15, 2003 at 04:05:30PM -0700, Andrew Morton wrote:
> The NMI watchdog hit is nasty:
> 
> NMI Watchdog detected LOCKUP on CPU0, eip c011eb82, registers:
> CPU:    0
> EIP:    0060:[<c011eb82>]    Tainted: GF  VLI
> EFLAGS: 00200086
> EIP is at .text.lock.sched+0x10c/0x12a
> eax: d79c8000   ebx: d8c578fc   ecx: 00000000   edx: d8c57800
> esi: c03a9d20   edi: d774a0c0   ebp: d79c9d94   esp: d79c9d88
> ds: 007b   es: 007b   ss: 0068
> Process gkrellm (pid: 458, threadinfo=d79c8000 task=dd7152a0)
> Stack: d8c578fc d7eaa400 d774a0c0 d79c9da4 c0235e80 c03a9d20 d77491a0 d79c9db0 
>        c0265b88 d8c578fc d79c9dbc e0a9d76c d8c578d0 d79c9de0 e0aa1c61 d8c57800 
>        e0a97b62 d7d2f894 00200286 00000008 00000004 e0ab38bc d79c9e08 e0aa25f5 
> Call Trace:
>  [<c0235e80>] kobject_get+0x70/0x80
>  [<c0265b88>] get_device+0x18/0x30
>  [<e0a9d76c>] usb_get_dev+0x1c/0x30 [usbcore]
>  [<e0aa1c61>] hcd_submit_urb+0x71/0x180 [usbcore]
>  [<e0a97b62>] hidinput_report_event+0x32/0x50 [hid]
>  [<e0ab38bc>] usb_hcd_operations+0x0/0x24 [usbcore]
>  [<e0aa25f5>] usb_submit_urb+0x1d5/0x250 [usbcore]
>  [<e0a95274>] hid_irq_in+0x34/0xb0 [hid]
>  [<e0aa2104>] usb_hcd_giveback_urb+0x24/0x40 [usbcore]
>  [<e0a8f23f>] uhci_finish_completion+0x8f/0xf0 [uhci_hcd]
>  [<e0aa214c>] usb_hcd_irq+0x2c/0x60 [usbcore]
>  [<c010d7f8>] handle_IRQ_event+0x38/0x60
>  [<c010da74>] do_IRQ+0xc4/0x190
>  [<c010be0c>] common_interrupt+0x18/0x20
>  [<c016007b>] unregister_chrdev_region+0x2b/0x100
>  [<c0235e2e>] kobject_get+0x1e/0x80
>  [<c018b2a0>] check_perm+0x20/0x120
>  [<c0157aa7>] get_empty_filp+0x77/0x100
>  [<c0155f5f>] dentry_open+0x21f/0x250
>  [<c0155d36>] filp_open+0x66/0x70
>  [<c0164423>] getname+0x93/0xd0
>  [<c01562c5>] sys_open+0x55/0x90
>  [<c010b49f>] syscall_call+0x7/0xb

I've got a similar trace, with 2.5.67-bk8:

Unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
00000000
*pde = 00000000
Oops: 0000 [#1]
CPU:    0
EIP:    0060:[<00000000>]    Not tainted
EFLAGS: 00210087
EIP is at 0x0
eax: c0470894   ebx: efc54140   ecx: 00010001   edx: 00000001
esi: 00000000   edi: 00000001   ebp: e7b2de08   esp: e7b2ddec
ds: 007b   es: 007b   ss: 0068
Process phoenix-bin (pid: 873, threadinfo=e7b2c000 task=e883f980)
Stack: c011a3f1 c0470894 00000001 00000000 e7b2c000 00200082 00000000 e7b2de28 
       c011a441 c04709a0 00000001 00000001 00000000 c04709a0 eef36c88 00000000 
       c02e31ca edcf8900 0000001d 00020001 efc8b380 efc8b3dc 00000001 e7b2de60 
Call Trace:
 [<c011a3f1>] __wake_up_common+0x31/0x50
 [<c011a441>] __wake_up+0x31/0x60
 [<c02e31ca>] mousedev_event+0xca/0x2b0
 [<c02e1a5d>] input_event+0xdd/0x360
 [<c02e16c1>] hidinput_report_event+0x31/0x50
 [<c02dede2>] hid_input_report+0xa2/0xe0
 [<c02deec1>] hid_irq_in+0xa1/0xb0
 [<c02d2bd5>] usb_hcd_giveback_urb+0x25/0x40
 [<c02dca6a>] dl_done_list+0xea/0x100
 [<c02dd43b>] ohci_irq+0xeb/0x160
 [<c02d2c1d>] usb_hcd_irq+0x2d/0x60
 [<c010cda8>] handle_IRQ_event+0x38/0x60
 [<c010cfa7>] do_IRQ+0x97/0x120
 [<c010b4e4>] common_interrupt+0x18/0x20

Code:  Bad EIP value.
 <0>Kernel panic: Fatal exception in interrupt
In interrupt handler - not syncing

-- 

"NT is to UNIX what a doughnut is to a particle accelerator."

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2003-04-18 18:39 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-15 22:05 2.5.67-mm3: Bad: scheduling while atomic with IEEE1394 then hard freeze ( lockup on CPU0) Philippe Gramoullé
2003-04-15 23:05 ` Andrew Morton
2003-04-15 23:17   ` Philippe Gramoullé
2003-04-15 23:34     ` Andrew Morton
2003-04-16  5:54       ` Greg KH
2003-04-16 16:58         ` Patrick Mochel
2003-04-16 23:40           ` Philippe Gramoullé
2003-04-17  3:54             ` Greg KH
2003-04-16  0:49   ` Ben Collins
2003-04-16 16:45     ` Philippe Gramoullé
2003-04-16 17:32       ` Steve Kinneberg
2003-04-16 22:30         ` Philippe Gramoullé
2003-04-16 23:35           ` Steve Kinneberg
2003-04-16 23:52             ` Philippe Gramoullé
2003-04-17  2:48           ` Dan Maas
2003-04-16 18:09       ` Ben Collins
2003-04-18 18:51   ` Florin Iucha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).