linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PATCH] PCI patches for 2.6.15 - retry
@ 2006-01-09 20:37 Greg KH
  2006-01-10  0:00 ` Linus Torvalds
  0 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2006-01-09 20:37 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton; +Cc: linux-kernel, linux-pci

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=unknown-8bit, Size: 7682 bytes --]

Here are some PCI patches against your latest git tree.  They have all
been in the -mm tree for a while with no problems.  I've pulled out all
of the offending patches that people objected to, or ones that crashed
older machines from the last series I sent you.

The thing that touches so many different files are the change from the
pci_module_init() to pci_register_driver() that was done by Richard
Knutsson.  Other big stuff is the addition of the pci error recovery
framework, after many different revisions and reworks.
There are also some pci hotplug fixes, and quirks added.

Please pull from:
	rsync://rsync.kernel.org/pub/scm/linux/kernel/git/gregkh/pci-2.6.git/
or if master.kernel.org hasn't synced up yet:
	master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6.git/

The full patches will be sent to the linux-pci mailing list, if anyone
wants to see them.

thanks,

greg k-h

 Documentation/filesystems/sysfs-pci.txt      |   21 +-
 Documentation/pci-error-recovery.txt         |  246 +++++++++++++++++++++++++++
 MAINTAINERS                                  |    7 
 arch/alpha/kernel/sys_alcor.c                |    3 
 arch/alpha/kernel/sys_sio.c                  |    6 
 arch/frv/mb93090-mb00/pci-frv.c              |    8 
 arch/frv/mb93090-mb00/pci-irq.c              |    4 
 arch/i386/kernel/scx200.c                    |    2 
 arch/i386/pci/acpi.c                         |    2 
 arch/i386/pci/fixup.c                        |    7 
 arch/i386/pci/irq.c                          |   42 ++--
 arch/mips/vr41xx/common/vrc4173.c            |    2 
 arch/ppc/kernel/pci.c                        |   21 +-
 arch/ppc/platforms/85xx/mpc85xx_cds_common.c |   11 -
 arch/sparc64/kernel/ebus.c                   |   15 -
 drivers/acpi/pci_irq.c                       |    7 
 drivers/block/DAC960.c                       |    2 
 drivers/block/cciss.c                        |    2 
 drivers/block/sx8.c                          |    2 
 drivers/block/umem.c                         |    2 
 drivers/hwmon/vt8231.c                       |    2 
 drivers/media/radio/radio-gemtek-pci.c       |    2 
 drivers/media/radio/radio-maxiradio.c        |    2 
 drivers/media/video/bttv-driver.c            |    2 
 drivers/media/video/saa7134/saa7134-core.c   |    2 
 drivers/parport/parport_serial.c             |    2 
 drivers/pci/hotplug/acpiphp_glue.c           |    6 
 drivers/pci/hotplug/cpqphp.h                 |    8 
 drivers/pci/hotplug/cpqphp_core.c            |  127 +++++++------
 drivers/pci/hotplug/cpqphp_ctrl.c            |   28 ---
 drivers/pci/hotplug/cpqphp_sysfs.c           |  138 ++++++++++++---
 drivers/pci/hotplug/ibmphp_pci.c             |    2 
 drivers/pci/hotplug/pciehp_core.c            |   92 +++++-----
 drivers/pci/hotplug/pciehp_hpc.c             |   19 +-
 drivers/pci/hotplug/pciehp_pci.c             |   52 +++--
 drivers/pci/hotplug/pciehprm_acpi.c          |   13 -
 drivers/pci/hotplug/rpadlpar_core.c          |   27 --
 drivers/pci/hotplug/rpaphp_pci.c             |   47 -----
 drivers/pci/hotplug/shpchp.h                 |    4 
 drivers/pci/hotplug/shpchp_core.c            |   16 +
 drivers/pci/hotplug/shpchp_ctrl.c            |   37 ----
 drivers/pci/hotplug/shpchp_hpc.c             |  138 +++++++++------
 drivers/pci/hotplug/shpchp_pci.c             |   19 +-
 drivers/pci/pci.c                            |    7 
 drivers/pci/pci.h                            |    5 
 drivers/pci/pcie/portdrv_core.c              |    4 
 drivers/pci/probe.c                          |   49 ++++-
 drivers/pci/proc.c                           |    3 
 drivers/pci/quirks.c                         |   26 ++
 drivers/pci/remove.c                         |    3 
 drivers/pcmcia/vrc4173_cardu.c               |    2 
 drivers/serial/serial_txx9.c                 |    2 
 drivers/video/cyblafb.c                      |    1 
 include/linux/pci.h                          |   69 +++++++
 sound/oss/ad1889.c                           |    2 
 sound/oss/btaudio.c                          |    2 
 sound/oss/cmpci.c                            |    2 
 sound/oss/cs4281/cs4281m.c                   |    2 
 sound/oss/cs46xx.c                           |    2 
 sound/oss/emu10k1/main.c                     |    2 
 sound/oss/es1370.c                           |    2 
 sound/oss/es1371.c                           |    2 
 sound/oss/ite8172.c                          |    2 
 sound/oss/kahlua.c                           |    2 
 sound/oss/maestro.c                          |    2 
 sound/oss/nec_vrc5477.c                      |    2 
 sound/oss/nm256_audio.c                      |    2 
 sound/oss/rme96xx.c                          |    2 
 sound/oss/sonicvibes.c                       |    2 
 sound/oss/ymfpci.c                           |    2 
 70 files changed, 956 insertions(+), 444 deletions(-)

Adrian Bunk:
      PCI Hotplug: cpqphp_ctrl.c: remove dead code
      PCI: drivers/pci: some cleanups

Benjamin Herrenschmidt:
      PCI: Export pci_cfg_space_size

Daniel Marjamäki:
      PCI: irq.c: trivial printk and DBG updates

Daniel Yeisley:
      PCI Quirk: 1K I/O space granularity on Intel P64H2

Dominik Brodowski:
      PCI: use bus numbers sparsely, if necessary

Greg Kroah-Hartman:
      PCI Hotplug: fix up the sysfs file in the compaq pci hotplug driver
      drivers/sound/oss: Replace pci_module_init() with pci_register_driver()

Hanna Linder:
      PCI: arch/i386/pci/acpi.c: use for_each_pci_dev

Jesper Juhl:
      PCI: Reduce nr of ptr derefs in drivers/pci/hotplug/cpqphp_core.c
      PCI: Reduce nr of ptr derefs in drivers/pci/hotplug/rpaphp_pci.c
      PCI: Reduce nr of ptr derefs in drivers/pci/hotplug/pciehp_core.c
      PCI: Reduce nr of ptr derefs in drivers/pci/hotplug/pciehprm_acpi.c

Jesse Barnes:
      PCI: document sysfs rom file interface
      PCI: update Toshiba ohci quirk DMI table

Jiri Slaby:
      PCI: pci_find_device remove (ppc/platforms/85xx/mpc85xx_cds_common.c)
      PCI: pci_find_device remove (alpha/kernel/sys_sio.c)
      PCI: pci_find_device remove (alpha/kernel/sys_alcor.c)
      PCI: pci_find_device remove (ppc/kernel/pci.c)
      PCI: arch: pci_find_device remove (frv/mb93090-mb00/pci-irq.c)
      PCI: pci_find_device remove (frv/mb93090-mb00/pci-frv.c)
      PCI: pci_find_device remove (sparc64/kernel/ebus.c)

Jordan, William P:
      PCI Hotplug: ibmphp_pci.c copy-n-paste fix

Kenji Kaneshige:
      shpchp: fix improper reference to Slot Avail Regsister
      shpchp: fix improper reference to Mode 1 ECC Capability" bit
      shpchp: replace pci_find_slot() with pci_get_slot()
      shpchp: fix improper mmio mapping
      shpchp: fix improper wait for command completion
      shpchp: fix improper write to Command Completion Detect bit
      shpchp: Implement get_address callback

Kristen Accardi:
      pci: use pin stored in pci_dev
      apci: use pin stored in pci_dev
      pci: store PCI_INTERRUPT_PIN in pci_dev
      pci: call pci_read_irq for bridges
      acpiphp: only size new bus

linas:
      PCI Error Recovery: header file patch

linas@austin.ibm.com:
      PCI Hotplug/powerpc: remove duplicated code
      PCI Hotplug/powerpc: more removal of duplicated code
      PCI Error Recovery: documentation

Rajesh Shah:
      pciehp: allow bridged card hotplug

Richard Knutsson:
      arch: Replace pci_module_init() with pci_register_driver()
      drivers/block: Replace pci_module_init() with pci_register_driver()
      drivers/*rest*: Replace pci_module_init() with pci_register_driver()

Sergey Vlasov:
      PCIE: make bus_id for PCI Express devices unique

Thomas Schaefer:
      pciehp: handle sticky power-fault status


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PATCH] PCI patches for 2.6.15 - retry
  2006-01-09 20:37 [GIT PATCH] PCI patches for 2.6.15 - retry Greg KH
@ 2006-01-10  0:00 ` Linus Torvalds
  2006-01-10  0:44   ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2006-01-10  0:00 UTC (permalink / raw)
  To: Greg KH; +Cc: Andrew Morton, linux-kernel, linux-pci



On Mon, 9 Jan 2006, Greg KH wrote:
>
> Here are some PCI patches against your latest git tree.  They have all
> been in the -mm tree for a while with no problems.  I've pulled out all
> of the offending patches that people objected to, or ones that crashed
> older machines from the last series I sent you.

Before I pull this, I'd like to get some confirmation that some of the 
other problems that seem to be PCI-related in the -mm tree are also 
understood, or at least known to be part of the stuff that you're _not_ 
sending me..

[ There's at least a pci_call_probe() NULL ptr dereference report by 
  Martin Bligh, I think Andrew has a few others he's tracked.. ]

		Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PATCH] PCI patches for 2.6.15 - retry
  2006-01-10  0:00 ` Linus Torvalds
@ 2006-01-10  0:44   ` Andrew Morton
  2006-01-10  1:49     ` Alan Cox
  2006-01-10  2:28     ` Greg KH
  0 siblings, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2006-01-10  0:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: gregkh, linux-kernel, linux-pci

Linus Torvalds <torvalds@osdl.org> wrote:
>
> 
> 
> On Mon, 9 Jan 2006, Greg KH wrote:
> >
> > Here are some PCI patches against your latest git tree.  They have all
> > been in the -mm tree for a while with no problems.  I've pulled out all
> > of the offending patches that people objected to, or ones that crashed
> > older machines from the last series I sent you.
> 
> Before I pull this, I'd like to get some confirmation that some of the 
> other problems that seem to be PCI-related in the -mm tree are also 
> understood, or at least known to be part of the stuff that you're _not_ 
> sending me..

It's really hard to keep track of all this, so it's likely that some things
will still sneak through.

- Reuben Farrelly's oops in make_class_name().  Could be libata, or scsi
  or driver core.

- A few problems with ehci.  For example Grant Coady went oops loading
  the module.  Probably USB, maybe solved now, but there are
  interactions...

- gregkh-pci-x86-pci-domain-support-the-meat.patch is a problem, but
  wasn't in this tree.

> [ There's at least a pci_call_probe() NULL ptr dereference report by 
>   Martin Bligh, I think Andrew has a few others he's tracked.. ]

Yes, Martin is reporting failures on a few machines.  Hopefully he's
working out whether gregkh-pci-x86-pci-domain-support-the-meat.patch was
the culprit here.  If so, I'd say we're good to go.  If that's _not_ the
source then we just don't know where the failure is coming from.

All very vague, sorry.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PATCH] PCI patches for 2.6.15 - retry
  2006-01-10  0:44   ` Andrew Morton
@ 2006-01-10  1:49     ` Alan Cox
  2006-01-10  1:49       ` Andrew Morton
  2006-01-12 20:55       ` Jeff Garzik
  2006-01-10  2:28     ` Greg KH
  1 sibling, 2 replies; 15+ messages in thread
From: Alan Cox @ 2006-01-10  1:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, gregkh, linux-kernel, linux-pci

On Llu, 2006-01-09 at 16:44 -0800, Andrew Morton wrote:
> - Reuben Farrelly's oops in make_class_name().  Could be libata, or scsi
>   or driver core.

libata I think. I reproduced it on 2.6.14-mm2 by accident with a buggy
pata driver.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PATCH] PCI patches for 2.6.15 - retry
  2006-01-10  1:49     ` Alan Cox
@ 2006-01-10  1:49       ` Andrew Morton
  2006-01-10 10:03         ` Reuben Farrelly
  2006-01-12  3:55         ` Reuben Farrelly
  2006-01-12 20:55       ` Jeff Garzik
  1 sibling, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2006-01-10  1:49 UTC (permalink / raw)
  To: Alan Cox; +Cc: torvalds, gregkh, linux-kernel, linux-pci, Reuben Farrelly

Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>
> On Llu, 2006-01-09 at 16:44 -0800, Andrew Morton wrote:
> > - Reuben Farrelly's oops in make_class_name().  Could be libata, or scsi
> >   or driver core.
> 
> libata I think. I reproduced it on 2.6.14-mm2 by accident with a buggy
> pata driver.

Well that's all merged up now.  Reuben, could you please test 2.6.15git6
tomorrow?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PATCH] PCI patches for 2.6.15 - retry
  2006-01-10  0:44   ` Andrew Morton
  2006-01-10  1:49     ` Alan Cox
@ 2006-01-10  2:28     ` Greg KH
  1 sibling, 0 replies; 15+ messages in thread
From: Greg KH @ 2006-01-10  2:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel, linux-pci

On Mon, Jan 09, 2006 at 04:44:10PM -0800, Andrew Morton wrote:
> Linus Torvalds <torvalds@osdl.org> wrote:
> >
> > On Mon, 9 Jan 2006, Greg KH wrote:
> > >
> > > Here are some PCI patches against your latest git tree.  They have all
> > > been in the -mm tree for a while with no problems.  I've pulled out all
> > > of the offending patches that people objected to, or ones that crashed
> > > older machines from the last series I sent you.
> > 
> > Before I pull this, I'd like to get some confirmation that some of the 
> > other problems that seem to be PCI-related in the -mm tree are also 
> > understood, or at least known to be part of the stuff that you're _not_ 
> > sending me..
> 
> It's really hard to keep track of all this, so it's likely that some things
> will still sneak through.
> 
> - Reuben Farrelly's oops in make_class_name().  Could be libata, or scsi
>   or driver core.

Haven't heard of this one before, but it shouldn't be a pci issue.

> - A few problems with ehci.  For example Grant Coady went oops loading
>   the module.  Probably USB, maybe solved now, but there are
>   interactions...

People are still working on this one, but it shouldn't be a pci issue.

> - gregkh-pci-x86-pci-domain-support-the-meat.patch is a problem, but
>   wasn't in this tree.
> 
> > [ There's at least a pci_call_probe() NULL ptr dereference report by 
> >   Martin Bligh, I think Andrew has a few others he's tracked.. ]

Yes, that's the x86-* patches in my tree, from Jeff, and I'm not going
to be sending them to you until all of the breakage is fixed up (he
created them for machines that aren't public yet, so I don't think
there's a rush for them to get in anytime soon...)

> Yes, Martin is reporting failures on a few machines.  Hopefully he's
> working out whether gregkh-pci-x86-pci-domain-support-the-meat.patch was
> the culprit here.  If so, I'd say we're good to go.  If that's _not_ the
> source then we just don't know where the failure is coming from.

It sure looks like it's the reason why, as we are suddenly failing with
a NULL pointer problem after we change an integer field into a pointer :)

Linus, it should all be safe for you to pull this tree, as I took
everything that people objected to out of my last attempt :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PATCH] PCI patches for 2.6.15 - retry
  2006-01-10  1:49       ` Andrew Morton
@ 2006-01-10 10:03         ` Reuben Farrelly
  2006-01-12  3:55         ` Reuben Farrelly
  1 sibling, 0 replies; 15+ messages in thread
From: Reuben Farrelly @ 2006-01-10 10:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alan Cox, torvalds, gregkh, linux-kernel, linux-pci



On 10/01/2006 2:49 p.m., Andrew Morton wrote:
> Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>> On Llu, 2006-01-09 at 16:44 -0800, Andrew Morton wrote:
>>> - Reuben Farrelly's oops in make_class_name().  Could be libata, or scsi
>>>   or driver core.
>> libata I think. I reproduced it on 2.6.14-mm2 by accident with a buggy
>> pata driver.
> 
> Well that's all merged up now.  Reuben, could you please test 2.6.15git6
> tomorrow?

A couple of reboots later with git6 and at this stage it seems all OK, no oopses.

I'm still having 100% repeatable "soft" hangs when booting up though, both with 
-mm2 (-mm1 seems OK in this regard) and git6.  It's enough to make git6 and mm2 
unusable because the machine never finishes booting userspace.  I'll put more 
details of that in another email following up to the original -mm2 thread, as 
it's unrelated to the oops above (but probably equally as nasty).

But it means I can't test the git6 fixes much more because every time I boot it 
I have to alt-sysrq S+U+B or uncleanly kill the box by hitting the reset button.

reuben

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PATCH] PCI patches for 2.6.15 - retry
  2006-01-10  1:49       ` Andrew Morton
  2006-01-10 10:03         ` Reuben Farrelly
@ 2006-01-12  3:55         ` Reuben Farrelly
  2006-01-12  4:29           ` Andrew Morton
  2006-01-12 11:42           ` Alan Cox
  1 sibling, 2 replies; 15+ messages in thread
From: Reuben Farrelly @ 2006-01-12  3:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alan Cox, torvalds, gregkh, linux-kernel, linux-pci



On 10/01/2006 2:49 p.m., Andrew Morton wrote:
> Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>> On Llu, 2006-01-09 at 16:44 -0800, Andrew Morton wrote:
>>> - Reuben Farrelly's oops in make_class_name().  Could be libata, or scsi
>>>   or driver core.
>> libata I think. I reproduced it on 2.6.14-mm2 by accident with a buggy
>> pata driver.
> 
> Well that's all merged up now.  Reuben, could you please test 2.6.15git6
> tomorrow?

Seemingly not fixed afterall.  I've been doing many reboots lately getting to 
the bottom of the barrier/md bug and just before I hit this with -mm3 
(linus.patch -git7) which I believe is the same bug (the call trace looks very 
similar).


Real Time Clock Driver v1.12ac
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
ÿserial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ACPI: PCI Interrupt 0000:06:02.0[A] -> GSI 18 (level, low) -> IRQ 185
0000:06:02.0: ttyS1 at I/O 0xbc00 (irq = 185) is a 16550A
0000:06:02.0: ttyS2 at I/O 0xbc08 (irq = 185) is a 16550A
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ahci: probe of 0000:00:1f.2 failed with error -12
ata1: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x0 irq 0
ata2: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x8 irq 0
Unable to handle kernel NULL pointer dereference at virtual address 00000000
  printing eip:
c023c873
*pde = 00000000
Oops: 0000 [#1]
SMP
last sysfs file:
Modules linked in:
CPU:    0
EIP:    0060:[<c023c873>]    Not tainted VLI
EFLAGS: 00010206   (2.6.15-mm3)
EIP is at make_class_name+0x28/0x8d
eax: 00000000   ebx: ffffffff   ecx: ffffffff   edx: c1a12224
esi: 00000009   edi: 00000000   ebp: c1921d2c   esp: c1921d1c
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 1, threadinfo=c1921000 task=c1920a90)
Stack: <0>c1a12224 c03913f8 c1a12224 c03913f8 c1921d54 c023cabd c1921d58 c0391380
        00000000 c1af39c0 c0391400 c1a12224 c1a12000 c1a12030 c1921d60 c023cb7b
        c1a120e4 c1921d74 c0255dbf c1a122c0 c1a43a40 00000000 c1921d80 c025e393
Call Trace:
  [<c0103c5d>] show_stack+0x9b/0xc0
  [<c0103de4>] show_registers+0x162/0x1e7
  [<c0103f8f>] die+0x126/0x231
  [<c01140db>] do_page_fault+0x271/0x5b9
  [<c01037df>] error_code+0x4f/0x54
  [<c023cabd>] class_device_del+0xa3/0x156
  [<c023cb7b>] class_device_unregister+0xb/0x15
  [<c0255dbf>] scsi_remove_host+0xb4/0xef
  [<c025e393>] ata_host_remove+0x11/0x1c
  [<c0260ec6>] ata_device_add+0x2e4/0xb7b
  [<c0261cd6>] ata_pci_init_one+0x322/0x387
  [<c0265b34>] piix_init_one+0x18c/0x338
  [<c01f4f4f>] pci_device_probe+0x44/0x5f
  [<c023bf62>] driver_probe_device+0x3e/0xb0
  [<c023c0df>] __driver_attach+0x8e/0x90
  [<c023b9f3>] bus_for_each_dev+0x44/0x62
  [<c023bece>] driver_attach+0x19/0x1b
  [<c023b687>] bus_add_driver+0x6d/0x126
  [<c023c350>] driver_register+0x6b/0x9b
  [<c01f50fb>] __pci_register_driver+0x6a/0x95
  [<c03e8ea8>] piix_init+0xf/0x22
  [<c01003cc>] init+0xff/0x325
  [<c0100d25>] kernel_thread_helper+0x5/0xb
Code: c8 5d c3 55 89 e5 57 56 53 83 ec 04 89 45 f0 89 c2 8b 40 48 8b 38 bb ff ff 
ff ff 89 d9 31 c0 f2 ae f7 d1 49 89 ce 8b 7a 08 89 d9 <f2> ae f7 d1 49 89 ca 8d 
4e 02 8d 04 0a ba d0 00 00 00 e8 3b 72
  <0>Kernel panic - not syncing: Attempted to kill init!

reuben





^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PATCH] PCI patches for 2.6.15 - retry
  2006-01-12  3:55         ` Reuben Farrelly
@ 2006-01-12  4:29           ` Andrew Morton
  2006-01-12  4:55             ` Reuben Farrelly
  2006-01-12 11:42           ` Alan Cox
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2006-01-12  4:29 UTC (permalink / raw)
  To: Reuben Farrelly
  Cc: alan, torvalds, gregkh, linux-kernel, linux-pci, Jeff Garzik

Reuben Farrelly <reuben-lkml@reub.net> wrote:
>
> 
> 
> On 10/01/2006 2:49 p.m., Andrew Morton wrote:
> > Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> >> On Llu, 2006-01-09 at 16:44 -0800, Andrew Morton wrote:
> >>> - Reuben Farrelly's oops in make_class_name().  Could be libata, or scsi
> >>>   or driver core.
> >> libata I think. I reproduced it on 2.6.14-mm2 by accident with a buggy
> >> pata driver.
> > 
> > Well that's all merged up now.  Reuben, could you please test 2.6.15git6
> > tomorrow?
> 
> Seemingly not fixed afterall.  I've been doing many reboots lately getting to 
> the bottom of the barrier/md bug and just before I hit this with -mm3 
> (linus.patch -git7) which I believe is the same bug (the call trace looks very 
> similar).
> 
> ...

I'm getting my bugs confused now - there are so many.  Were you the person
who reported this before?

> serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> ACPI: PCI Interrupt 0000:06:02.0[A] -> GSI 18 (level, low) -> IRQ 185
> 0000:06:02.0: ttyS1 at I/O 0xbc00 (irq = 185) is a 16550A
> 0000:06:02.0: ttyS2 at I/O 0xbc08 (irq = 185) is a 16550A
> Floppy drive(s): fd0 is 1.44M
> FDC 0 is a post-1991 82077
> Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
> ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
> ahci: probe of 0000:00:1f.2 failed with error -12
> ata1: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x0 irq 0
> ata2: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x8 irq 0
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
>   printing eip:
> c023c873
> *pde = 00000000
> Oops: 0000 [#1]
> SMP
> last sysfs file:
> Modules linked in:
> CPU:    0
> EIP:    0060:[<c023c873>]    Not tainted VLI
> EFLAGS: 00010206   (2.6.15-mm3)
> EIP is at make_class_name+0x28/0x8d
> eax: 00000000   ebx: ffffffff   ecx: ffffffff   edx: c1a12224
> esi: 00000009   edi: 00000000   ebp: c1921d2c   esp: c1921d1c
> ds: 007b   es: 007b   ss: 0068
> Process swapper (pid: 1, threadinfo=c1921000 task=c1920a90)
> Stack: <0>c1a12224 c03913f8 c1a12224 c03913f8 c1921d54 c023cabd c1921d58 c0391380
>         00000000 c1af39c0 c0391400 c1a12224 c1a12000 c1a12030 c1921d60 c023cb7b
>         c1a120e4 c1921d74 c0255dbf c1a122c0 c1a43a40 00000000 c1921d80 c025e393
> Call Trace:
>   [<c0103c5d>] show_stack+0x9b/0xc0
>   [<c0103de4>] show_registers+0x162/0x1e7
>   [<c0103f8f>] die+0x126/0x231
>   [<c01140db>] do_page_fault+0x271/0x5b9
>   [<c01037df>] error_code+0x4f/0x54
>   [<c023cabd>] class_device_del+0xa3/0x156
>   [<c023cb7b>] class_device_unregister+0xb/0x15
>   [<c0255dbf>] scsi_remove_host+0xb4/0xef
>   [<c025e393>] ata_host_remove+0x11/0x1c
>   [<c0260ec6>] ata_device_add+0x2e4/0xb7b
>   [<c0261cd6>] ata_pci_init_one+0x322/0x387
>   [<c0265b34>] piix_init_one+0x18c/0x338
>   [<c01f4f4f>] pci_device_probe+0x44/0x5f
>   [<c023bf62>] driver_probe_device+0x3e/0xb0
>   [<c023c0df>] __driver_attach+0x8e/0x90
>   [<c023b9f3>] bus_for_each_dev+0x44/0x62
>   [<c023bece>] driver_attach+0x19/0x1b
>   [<c023b687>] bus_add_driver+0x6d/0x126
>   [<c023c350>] driver_register+0x6b/0x9b
>   [<c01f50fb>] __pci_register_driver+0x6a/0x95
>   [<c03e8ea8>] piix_init+0xf/0x22
>   [<c01003cc>] init+0xff/0x325
>   [<c0100d25>] kernel_thread_helper+0x5/0xb
> Code: c8 5d c3 55 89 e5 57 56 53 83 ec 04 89 45 f0 89 c2 8b 40 48 8b 38 bb ff ff 
> ff ff 89 d9 31 c0 f2 ae f7 d1 49 89 ce 8b 7a 08 89 d9 <f2> ae f7 d1 49 89 ca 8d 
> 4e 02 8d 04 0a ba d0 00 00 00 e8 3b 72

Jeff, I beleive this is a sata bug.  ata_device_add() called
ata_host_remove() and something under there isnot yet sufficiently
initialised.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PATCH] PCI patches for 2.6.15 - retry
  2006-01-12  4:29           ` Andrew Morton
@ 2006-01-12  4:55             ` Reuben Farrelly
  0 siblings, 0 replies; 15+ messages in thread
From: Reuben Farrelly @ 2006-01-12  4:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: alan, torvalds, gregkh, linux-kernel, linux-pci, Jeff Garzik



On 12/01/2006 5:29 p.m., Andrew Morton wrote:
> Reuben Farrelly <reuben-lkml@reub.net> wrote:
>>
>>
>> On 10/01/2006 2:49 p.m., Andrew Morton wrote:
>>> Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>>>> On Llu, 2006-01-09 at 16:44 -0800, Andrew Morton wrote:
>>>>> - Reuben Farrelly's oops in make_class_name().  Could be libata, or scsi
>>>>>   or driver core.
>>>> libata I think. I reproduced it on 2.6.14-mm2 by accident with a buggy
>>>> pata driver.
>>> Well that's all merged up now.  Reuben, could you please test 2.6.15git6
>>> tomorrow?
>> Seemingly not fixed afterall.  I've been doing many reboots lately getting to 
>> the bottom of the barrier/md bug and just before I hit this with -mm3 
>> (linus.patch -git7) which I believe is the same bug (the call trace looks very 
>> similar).
>>
>> ...
> 
> I'm getting my bugs confused now - there are so many.  Were you the person
> who reported this before?

Yes.  It was suggested I try -git6.  I reported that it seemed to be OK, but 
clearly it isn't.  Then again, I've done a hell of a lot of reboots in the last 
couple of days.

I've updated my list at

http://www.reub.net/files/kernel/outstanding-kernel-bugs.txt
and
http://www.reub.net/files/kernel/

This is a very basic text file which outlines the details of the various bugs 
that I have on the go at any given point in time and where they're at as.  There 
are various postings on LKML reporting almost all of them.

Thread starts http://www.ussg.iu.edu/hypermail/linux/kernel/0601.1/0619.html 
Greg KH (Mon Jan 09 2006 - 15:36:39 EST)


>> serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
>> ACPI: PCI Interrupt 0000:06:02.0[A] -> GSI 18 (level, low) -> IRQ 185
>> 0000:06:02.0: ttyS1 at I/O 0xbc00 (irq = 185) is a 16550A
>> 0000:06:02.0: ttyS2 at I/O 0xbc08 (irq = 185) is a 16550A
>> Floppy drive(s): fd0 is 1.44M
>> FDC 0 is a post-1991 82077
>> Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
>> ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
>> ahci: probe of 0000:00:1f.2 failed with error -12
>> ata1: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x0 irq 0
>> ata2: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x8 irq 0
>> Unable to handle kernel NULL pointer dereference at virtual address 00000000
>>   printing eip:
>> c023c873
>> *pde = 00000000
>> Oops: 0000 [#1]
>> SMP
>> last sysfs file:
>> Modules linked in:
>> CPU:    0
>> EIP:    0060:[<c023c873>]    Not tainted VLI
>> EFLAGS: 00010206   (2.6.15-mm3)
>> EIP is at make_class_name+0x28/0x8d
>> eax: 00000000   ebx: ffffffff   ecx: ffffffff   edx: c1a12224
>> esi: 00000009   edi: 00000000   ebp: c1921d2c   esp: c1921d1c
>> ds: 007b   es: 007b   ss: 0068
>> Process swapper (pid: 1, threadinfo=c1921000 task=c1920a90)
>> Stack: <0>c1a12224 c03913f8 c1a12224 c03913f8 c1921d54 c023cabd c1921d58 c0391380
>>         00000000 c1af39c0 c0391400 c1a12224 c1a12000 c1a12030 c1921d60 c023cb7b
>>         c1a120e4 c1921d74 c0255dbf c1a122c0 c1a43a40 00000000 c1921d80 c025e393
>> Call Trace:
>>   [<c0103c5d>] show_stack+0x9b/0xc0
>>   [<c0103de4>] show_registers+0x162/0x1e7
>>   [<c0103f8f>] die+0x126/0x231
>>   [<c01140db>] do_page_fault+0x271/0x5b9
>>   [<c01037df>] error_code+0x4f/0x54
>>   [<c023cabd>] class_device_del+0xa3/0x156
>>   [<c023cb7b>] class_device_unregister+0xb/0x15
>>   [<c0255dbf>] scsi_remove_host+0xb4/0xef
>>   [<c025e393>] ata_host_remove+0x11/0x1c
>>   [<c0260ec6>] ata_device_add+0x2e4/0xb7b
>>   [<c0261cd6>] ata_pci_init_one+0x322/0x387
>>   [<c0265b34>] piix_init_one+0x18c/0x338
>>   [<c01f4f4f>] pci_device_probe+0x44/0x5f
>>   [<c023bf62>] driver_probe_device+0x3e/0xb0
>>   [<c023c0df>] __driver_attach+0x8e/0x90
>>   [<c023b9f3>] bus_for_each_dev+0x44/0x62
>>   [<c023bece>] driver_attach+0x19/0x1b
>>   [<c023b687>] bus_add_driver+0x6d/0x126
>>   [<c023c350>] driver_register+0x6b/0x9b
>>   [<c01f50fb>] __pci_register_driver+0x6a/0x95
>>   [<c03e8ea8>] piix_init+0xf/0x22
>>   [<c01003cc>] init+0xff/0x325
>>   [<c0100d25>] kernel_thread_helper+0x5/0xb
>> Code: c8 5d c3 55 89 e5 57 56 53 83 ec 04 89 45 f0 89 c2 8b 40 48 8b 38 bb ff ff 
>> ff ff 89 d9 31 c0 f2 ae f7 d1 49 89 ce 8b 7a 08 89 d9 <f2> ae f7 d1 49 89 ca 8d 
>> 4e 02 8d 04 0a ba d0 00 00 00 e8 3b 72
> 
> Jeff, I beleive this is a sata bug.  ata_device_add() called
> ata_host_remove() and something under there isnot yet sufficiently
> initialised.

reuben


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PATCH] PCI patches for 2.6.15 - retry
  2006-01-12  3:55         ` Reuben Farrelly
  2006-01-12  4:29           ` Andrew Morton
@ 2006-01-12 11:42           ` Alan Cox
  2006-01-12 20:59             ` Jeff Garzik
  1 sibling, 1 reply; 15+ messages in thread
From: Alan Cox @ 2006-01-12 11:42 UTC (permalink / raw)
  To: Reuben Farrelly; +Cc: Andrew Morton, torvalds, gregkh, linux-kernel, linux-pci

On Iau, 2006-01-12 at 16:55 +1300, Reuben Farrelly wrote:
> ata1: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x0 irq 0
> ata2: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x8 irq 0
> Unable to handle kernel NULL pointer dereference at virtual address 00000000

That is the critical bit. The SATA ports have no PCI resources assigned
for bus mastering (BAR 4). libata should have driven the device PIO in
this case but the resource should have been assigned.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PATCH] PCI patches for 2.6.15 - retry
  2006-01-10  1:49     ` Alan Cox
  2006-01-10  1:49       ` Andrew Morton
@ 2006-01-12 20:55       ` Jeff Garzik
  2006-01-13  0:16         ` Alan Cox
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff Garzik @ 2006-01-12 20:55 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andrew Morton, Linus Torvalds, gregkh, linux-kernel, linux-pci

Alan Cox wrote:
> On Llu, 2006-01-09 at 16:44 -0800, Andrew Morton wrote:
> 
>>- Reuben Farrelly's oops in make_class_name().  Could be libata, or scsi
>>  or driver core.
> 
> 
> libata I think. I reproduced it on 2.6.14-mm2 by accident with a buggy
> pata driver.


Any additional info?  How can I reproduce?

	Jeff


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PATCH] PCI patches for 2.6.15 - retry
  2006-01-12 11:42           ` Alan Cox
@ 2006-01-12 20:59             ` Jeff Garzik
  2006-01-16 13:11               ` Reuben Farrelly
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff Garzik @ 2006-01-12 20:59 UTC (permalink / raw)
  To: Alan Cox
  Cc: Reuben Farrelly, Andrew Morton, torvalds, gregkh, linux-kernel,
	linux-pci

Alan Cox wrote:
> On Iau, 2006-01-12 at 16:55 +1300, Reuben Farrelly wrote:
> 
>>ata1: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x0 irq 0
>>ata2: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x8 irq 0
>>Unable to handle kernel NULL pointer dereference at virtual address 00000000
> 
> 
> That is the critical bit. The SATA ports have no PCI resources assigned
> for bus mastering (BAR 4). libata should have driven the device PIO in
> this case but the resource should have been assigned.

Agreed.  This appears to be BIOS assigning bad values to SATA hardware.

However, libata should recognize this and not attempt to iomap or drive 
the hardware, in that case, rather than oops.

	Jeff



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PATCH] PCI patches for 2.6.15 - retry
  2006-01-12 20:55       ` Jeff Garzik
@ 2006-01-13  0:16         ` Alan Cox
  0 siblings, 0 replies; 15+ messages in thread
From: Alan Cox @ 2006-01-13  0:16 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, Linus Torvalds, gregkh, linux-kernel, linux-pci

On Iau, 2006-01-12 at 15:55 -0500, Jeff Garzik wrote:
> > libata I think. I reproduced it on 2.6.14-mm2 by accident with a buggy
> > pata driver.
> 
> 
> Any additional info?  How can I reproduce?

In my case I'm fairly sure (waves arms frantically) that it was
registering a controller which then failed to add any drives so it got
cleaned back up early


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PATCH] PCI patches for 2.6.15 - retry
  2006-01-12 20:59             ` Jeff Garzik
@ 2006-01-16 13:11               ` Reuben Farrelly
  0 siblings, 0 replies; 15+ messages in thread
From: Reuben Farrelly @ 2006-01-16 13:11 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Alan Cox, Andrew Morton, torvalds, gregkh, linux-kernel, linux-acpi



On 13/01/2006 9:59 a.m., Jeff Garzik wrote:
> Alan Cox wrote:
>> On Iau, 2006-01-12 at 16:55 +1300, Reuben Farrelly wrote:
>>
>>> ata1: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x0 irq 0
>>> ata2: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x8 irq 0
>>> Unable to handle kernel NULL pointer dereference at virtual address 
>>> 00000000
>>
>>
>> That is the critical bit. The SATA ports have no PCI resources assigned
>> for bus mastering (BAR 4). libata should have driven the device PIO in
>> this case but the resource should have been assigned.
> 
> Agreed.  This appears to be BIOS assigning bad values to SATA hardware.
> 
> However, libata should recognize this and not attempt to iomap or drive 
> the hardware, in that case, rather than oops.
> 
>     Jeff

Some testing tonight has shown up a bit more about where this regression crept in.

Below the table reads release on left hand side, and the result of a given 
reboot on  the right hand side after the dash.  I had to do so many reboots to 
be sure that the bug was there or not - as you can see from the -mm1 test it 
doesn't always show up.

2.6.15 - OK OK OK OK OK
2.6.15-git1 - OK OK OK OK OK OK OK OK
2.6.15-git2 - OK
2.6.15-git6 - OK OK OK OK OK OK OK OK
2.6.15-git12 - OK OK OK OK OK OK OK

2.6.15-rc5-mm3 - OK OK OK(but oopsed in usb) OK OK(but oopsed in usb)
    Those oopses in USB were only seen in this release so looks likely whatever
    was causing them was fixed soon after.
2.6.15-mm1 - OK OK OOPSED OOPSED OOPSED all in ATA
2.6.15-mm2 + mm3 - [known to OOPS on this bug frequently but not tested in this 
round]
2.6.15-mm4 - OOPSED OK OOPSED TIMEOUT TIMEOUT OOPS OK
2.6.15-mm1 with no git-acpi.patch - TIMEOUT TIMEOUT OOPSED TIMEOUT OK

OK means the system booted up to single user mode without issue, TIMEOUT means 
that the controllers were assigned IRQ 50 and then failed to find any ATA disks 
and OOPSED means that he SATA ports were not assigned IRQs at all and hence the 
system oopsed out like this:

ahci: probe of 0000:00:1f.2 failed with error -12
ata1: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x0 irq 0
ata2: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x8 irq 0
Unable to handle kernel NULL pointer dereference at virtual address 00000000
  printing eip:
c023c873
*pde = 00000000
Oops: 0000 [#1]
<plus a trace and a whole lot more>

Full output on http://www.reub.net/files/kernel/outstanding-kernel-bugs.txt (as
usual)

In summary the good news is that 2.6.15-git12 (which is the latest linus tree)
is GOOD and does not seem to exhibit this problem.  Not a single -git release 
crapped out.  It seems some regression was introduced into 2.6.15-mm1 which has 
been carried forward through to -mm4 so far though but never pushed to Linus.
I guess it also suggests that it's not a hardware or bios issue given the sheer 
number of successful reboots in a row, and I think reverting the git-acpi.patch 
suggests that ACPI is not the cause of it, at least in this instance.  But 
that's about as far as I have gotten.

45 reboots later I'm finishing for tonight, but before I go back and hit it with
git bisect to narrow it down any further, Andrew/Jeff does this make it any 
easier to pinpoint, and/or do you have any preliminary patches to test or ideas 
as to what other subsystems could be involved?

Thanks,
Reuben




^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-01-16 13:11 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-09 20:37 [GIT PATCH] PCI patches for 2.6.15 - retry Greg KH
2006-01-10  0:00 ` Linus Torvalds
2006-01-10  0:44   ` Andrew Morton
2006-01-10  1:49     ` Alan Cox
2006-01-10  1:49       ` Andrew Morton
2006-01-10 10:03         ` Reuben Farrelly
2006-01-12  3:55         ` Reuben Farrelly
2006-01-12  4:29           ` Andrew Morton
2006-01-12  4:55             ` Reuben Farrelly
2006-01-12 11:42           ` Alan Cox
2006-01-12 20:59             ` Jeff Garzik
2006-01-16 13:11               ` Reuben Farrelly
2006-01-12 20:55       ` Jeff Garzik
2006-01-13  0:16         ` Alan Cox
2006-01-10  2:28     ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).