All of lore.kernel.org
 help / color / mirror / Atom feed
* nFroce SATA lockup - problem location tracked down
@ 2004-12-01 20:37 Milan Holzäpfel
  2004-12-02  9:30 ` Keir Fraser
  0 siblings, 1 reply; 3+ messages in thread
From: Milan Holzäpfel @ 2004-12-01 20:37 UTC (permalink / raw)
  To: xen-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

I finally did some more tests with system with nForce3-250Gb SATA
controller, whose driver locks the system at boot time when inside xen.

The following was used:
 - mobo with nForce3-250 Gb chipset, which has got a S-ATA controller
 - sata_nv from 2.6.10-rc2-bk11 pushed into vanilla 2.6.9, patched with 
   Xen and reiser4
 - some Xen stable bk snapshot from a few days ago

Files: <URL:http://mjh.name/misc-files/xen-caps-20041130/>

 - cap.*: captures from serial console
 - cap.linux.*: native linux bootup
 - cap.xenolinux.*: linux inside xen bootup
 - cap.*.dbg.*: kernel command line option "debug" passed
 - cap.*.nd-dbg: libata debug enabled (ATA_DEBUG, ATA_VERBOSE_DEBUG, ATA_IRQ_TRAP)
 - also the are two kernel .config files, and lspci output 
   (from native linux)
 - cap.xenolinux.extra-dbg:  log output with some extra dbg options
   added by me
 - driver/scsi/libata-core.c, include/linux/libata.h, kernel/sched.c:
   files with my extra stuff added

The lockup takes place in drivers/scsi/libata-core.c, in function
ata_dev_set_xfermode().  The call which does not return is
wait_for_completion(&wait) (line 1837).

Inside wait_for_completion(), which is defined in kernel/sched.c, the
last call of this function is schedule(), line 2862.  I added some extra
debug output into wait_for_completion() and schedule(), which shows that
schedule() runs on and on, checking some stuff each few moments.  I'd
assume that schedule() checks whether the thread locked by
wait_for_completion should get unlocked, but this condition never seems
to be fulfilled, maybe because of some address glibberish or whatever.

At the site mentioned above you can also find my modified version of
libata-core.c, libata.h and sched.c, and the boot output when using
these versions (cap.xenolinux.extra-dbg).

Now I'd hope that this information will help you to get a closer view of
the problem, and maybe even get an idea of a solution, since the deeper
I dig into all this code, the more other code I have to read to get an
idea of what's actually going on.  (and hey, I am by no means sth. like
a experienced C programmer ;) ) 

I'd be happy to provide whatever other information might be useful,
however.

TIA
Milan

- -- 

                   Milan Holzäpfel alias jagdfalke alias jag

Antworten direkt an mich                             Answers directly to me
gehen bitte an eine Addresse,                        go to an address one
die man hier finden kann:                            can find here, please:

Kontaktinfos sowie                                   Contact infos as well as
Öff GnuPG-Schlüssel    <URL:http://con.mjh.name/>    GnuPG Public Key
GnuPG Fingerabdruck     4C8A 5FAF 5D32 6125 89D1     GnuPG Fingerprint
                        0CE5 DB0C AF4F 6583 7966



                    http://www.deppenleerzeichen.de/                        


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFBritx2wyvT2WDeWYRAuT6AKDIuhEDQBiy/Bm0dUkitZeN2JNw1wCg1HPH
d+k0NBqFFZcxvK1RnyUsPo8=
=uSd/
-----END PGP SIGNATURE-----


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: nFroce SATA lockup - problem location tracked down
  2004-12-01 20:37 nFroce SATA lockup - problem location tracked down Milan Holzäpfel
@ 2004-12-02  9:30 ` Keir Fraser
  2004-12-02 15:13   ` Milan Holzäpfel
  0 siblings, 1 reply; 3+ messages in thread
From: Keir Fraser @ 2004-12-02  9:30 UTC (permalink / raw)
  To: Milan Holzäpfel; +Cc: xen-devel


> I finally did some more tests with system with nForce3-250Gb SATA
> controller, whose driver locks the system at boot time when inside xen.

Looks like an interrupt problem. We plan to start using more of the
Linux DOM0 platform code in our next release which should avoid these
problems. It also may be that you have some large-numbered IRQs and
we can simply extend Xen to support those. Can you post the output of
'cat /proc/interrupts' from your working Linux installation?

 -- Keir


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: nFroce SATA lockup - problem location tracked down
  2004-12-02  9:30 ` Keir Fraser
@ 2004-12-02 15:13   ` Milan Holzäpfel
  0 siblings, 0 replies; 3+ messages in thread
From: Milan Holzäpfel @ 2004-12-02 15:13 UTC (permalink / raw)
  To: xen-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 02 Dec 2004 09:30:52 +0000
Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:

> 
> > I finally did some more tests with system with nForce3-250Gb SATA
> > controller, whose driver locks the system at boot time when inside xen.
> 
> Looks like an interrupt problem. We plan to start using more of the
> Linux DOM0 platform code in our next release which should avoid these
> problems. It also may be that you have some large-numbered IRQs and
> we can simply extend Xen to support those. Can you post the output of
> 'cat /proc/interrupts' from your working Linux installation?

/proc/interrupts on 2.6.9:
|            CPU0       
|   0:    2002603          XT-PIC  timer
|   1:       5087    IO-APIC-edge  i8042
|   8:          2    IO-APIC-edge  rtc
|   9:          0   IO-APIC-level  acpi
|  12:         67    IO-APIC-edge  i8042
|  14:        467    IO-APIC-edge  ide0
|  15:       2331    IO-APIC-edge  ide1
|  17:         80   IO-APIC-level  EMU10K1
|  19:     261861   IO-APIC-level  fcdsl
|  20:          2   IO-APIC-level  ehci_hcd
|  21:      88475   IO-APIC-level  libata, ohci_hcd
|  22:          0   IO-APIC-level  ohci_hcd, NVidia CK8S
|  23:     198342   IO-APIC-level  eth0
| NMI:          0 
| LOC:    2002471 
| ERR:          1
| MIS:          0

private mail is following, but I guess that it may be useful for other
ppl too...

On Thu, 02 Dec 2004 09:33:44 +0000
Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:

> 
> Further to my previous mail, I actually suspect that your setup is
> doomed until we start using the ACPI code in DOM0 Linux. It looks like
> you need a pretty complete ACPI configuration in order to set up IRQ
> routing correctly. That is getting done under Xen/XenLinux and so your
> sata interrupts are going nowhere. :-(
> 
> Does your system work with 2.4 kernels? Does your system work if you
> compile a non-ACPI kernel?

When booting with 2.6.9 without ACPI, it hangs at the same position as
Linux does inside Xen.  (at least according to the messages displayed
usually, but I think they should do the job...)

Since I haven't run any 2.4 Kernel on my installation I use normally, I
built & tried to boot a 2.4.28 on a smaller "rescue" installation, which
hasn't got up-do-date GCCs.  I first tried with 3.4.1, then with 3.3.3
(most current GCC from portage is 3.3.4/3.4.3), but the result was the
same:  The last line I get is from grub saying "file ok, booting the
kernel" or sth. like that, an then the system does a reset.  (giving
panic=10 didn't change anything)

I could try compiling a 2.4 kernel with GCC 3.3.4 and with updated
bin86, but I guess this would change anyhting (?)

Also I'm using "unofficial" gentoo profiles, which use
gcc-kernel-headers from 2.6 instead of from 2.4, but well, the problem
occurs fat before glibc is even touched :))

I'd happy to try & report when the changes you mentioned take place.
(unstable bk or whatever I wouldn't mind...)

As soon as this is working, can you say whether it will be possible to
give a 2.4 non-priviledged kernel functional access a PCI device?  (the
reason for me asking this is that I have a freakin' mostly-binary-only
driver for my ADSL hardware, but I some version of that driver for 2.4
is said to be stable, so my idea was to run a driver domain using 2.4
for this crappy piece of hardware...)

>  -- Keir

Regards,
Milan

- -- 

                   Milan Holzäpfel alias jagdfalke alias jag

Antworten direkt an mich                             Answers directly to me
gehen bitte an eine Addresse,                        go to an address one
die man hier finden kann:                            can find here, please:

Kontaktinfos sowie                                   Contact infos as well as
Öff GnuPG-Schlüssel    <URL:http://con.mjh.name/>    GnuPG Public Key
GnuPG Fingerabdruck     4C8A 5FAF 5D32 6125 89D1     GnuPG Fingerprint
                        0CE5 DB0C AF4F 6583 7966



                    http://www.deppenleerzeichen.de/                        


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFBrzEv2wyvT2WDeWYRApzEAKDqXYB1qy7V63ib2sJlMBqu56T2WwCg1tyF
Kd5K2NM38QLc58YhXYnQcYo=
=2aZV
-----END PGP SIGNATURE-----


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-12-02 15:13 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-01 20:37 nFroce SATA lockup - problem location tracked down Milan Holzäpfel
2004-12-02  9:30 ` Keir Fraser
2004-12-02 15:13   ` Milan Holzäpfel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.