linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
       [not found] ` <201109062222.p86MMlK9023363@demeter2.kernel.org>
@ 2012-01-07  1:31   ` Rogério Brito
  2012-01-07  1:50     ` Linus Torvalds
  2012-01-08  0:55     ` Márcia Brito
  0 siblings, 2 replies; 53+ messages in thread
From: Rogério Brito @ 2012-01-07  1:31 UTC (permalink / raw)
  To: bugzilla-daemon
  Cc: Edward Donovan, Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Linus Torvalds

Hi, Bjorn and others,

Now that bugzilla is up, I think that I can resume testing things on
that problematic notebook that I mentioned in earlier e-mails, so that
mom (in CC) can use her notebook with Linux.

For the record, *some* description of the problem is at:

* https://bugzilla.kernel.org/show_bug.cgi?id=41132
* https://bugzilla.kernel.org/show_bug.cgi?id=41722

The first one is a regression that I bisected, while the second one
seems to be a bit harder to understand for a layman like me.

Of couse, I can summarize what I found and what I didn't.

Sorry for the top-posting, but it is here to keep the context of the
previous message.

Thanks,
Rogério Brito.

On Tue, Sep 6, 2011 at 19:22,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
>
>
>
>
>
> --- Comment #15 from Rogério Brito <rbrito@ime.usp.br>  2011-09-06 22:22:44 ---
> Dear Bjorn,
>
> Sorry for not responding promptly but I had some family members with health
> problems. Things are settling right now and I think that I can be faster now.
>
> (In reply to comment #14)
>> So it looks like 2.4.27 doesn't even try to enable ACPI mode, and 2.6.8 hangs
>> the same way as 3.1.0-rc3.
>
> Yes, exactly.
>
>> Can you try 2.6.28?  If that doesn't boot, try the Ubuntu 9.04 kernel, which
>> was reported to boot on your machine/BIOS.
>
> Unfortunately, I tried to boot with that version of the Ubuntu live CD and it
> crashed the same way. I may try it once again, paying more attention if it
> behaves differently in any kind of way, though.
>
> Oh, talking about PCI and ACPI failing, could you take a look at issue #41132
> to see if you can help with another machine that stopped working after a commit
> that I bisected?
>
>
> Thanks,
>
> Rogério Brito.
>
> --
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug.



-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-07  1:31   ` [Bug 41722] Clevo M5X0JE hangs in ACPI init Rogério Brito
@ 2012-01-07  1:50     ` Linus Torvalds
  2012-01-07  4:19       ` Edward Donovan
  2012-01-08 22:13       ` Rogério Brito
  2012-01-08  0:55     ` Márcia Brito
  1 sibling, 2 replies; 53+ messages in thread
From: Linus Torvalds @ 2012-01-07  1:50 UTC (permalink / raw)
  To: Rogério Brito
  Cc: bugzilla-daemon, Edward Donovan, Thomas Gleixner, Bjorn Helgaas,
	linux-kernel, Márcia Coutinho de Brito

2012/1/6 Rogério Brito <rbrito@ime.usp.br>:
>
> For the record, *some* description of the problem is at:
>
> * https://bugzilla.kernel.org/show_bug.cgi?id=41132

This one looks very much like the thing that Edward Donovan fixed
fairly recently in commit 52553ddffad7 ("genirq: fix regression in
irqfixup, irqpoll")

So that should be fixed in 3.2 (and it's marked for stable, so I think
it's in the latest stable kernels too)

> * https://bugzilla.kernel.org/show_bug.cgi?id=41722

That, however, looks totally insane. I don't even have a clue where to
begin. I take it that it works with "acpi=off", and don't have a clue
about what might go wrong with ACPI enabled.

                       Linus

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-07  1:50     ` Linus Torvalds
@ 2012-01-07  4:19       ` Edward Donovan
  2012-01-08 22:27         ` Rogério Brito
  2012-01-08 22:13       ` Rogério Brito
  1 sibling, 1 reply; 53+ messages in thread
From: Edward Donovan @ 2012-01-07  4:19 UTC (permalink / raw)
  To: Rogério Brito
  Cc: Linus Torvalds, bugzilla-daemon, Thomas Gleixner, Bjorn Helgaas,
	linux-kernel, Márcia Coutinho de Brito

On Fri, Jan 6, 2012 at 8:50 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> 2012/1/6 Rogério Brito <rbrito@ime.usp.br>:
>>
>> For the record, *some* description of the problem is at:
>>
>> * https://bugzilla.kernel.org/show_bug.cgi?id=41132
>
> This one looks very much like the thing that Edward Donovan fixed
> fairly recently in commit 52553ddffad7 ("genirq: fix regression in
> irqfixup, irqpoll")
>
> So that should be fixed in 3.2 (and it's marked for stable, so I think
> it's in the latest stable kernels too)

Yessir, as of 3.0.13 and 3.1.5.  Rogério - for all the users I've
heard from, the 2.6.39 IRQ regressions are fixed now.  I'll be
interested to know if any IRQ handled by 2.6.38 isn't handled now, but
I expect it will be.  I'm sorry if that isn't enough for this laptop,
though -

Ed

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-07  1:31   ` [Bug 41722] Clevo M5X0JE hangs in ACPI init Rogério Brito
  2012-01-07  1:50     ` Linus Torvalds
@ 2012-01-08  0:55     ` Márcia Brito
  1 sibling, 0 replies; 53+ messages in thread
From: Márcia Brito @ 2012-01-08  0:55 UTC (permalink / raw)
  To: Rogério Brito
  Cc: bugzilla-daemon, Edward Donovan, Thomas Gleixner, Bjorn Helgaas,
	linux-kernel, Linus Torvalds

Dear sirs
I greatly appreciate the interest in helping me get my notebook using linux
I was surprised by the commitment of you to help
forever grateful
Marcia

Em 6 de janeiro de 2012 23:31, Rogério Brito <rbrito@ime.usp.br> escreveu:
> Hi, Bjorn and others,
>
> Now that bugzilla is up, I think that I can resume testing things on
> that problematic notebook that I mentioned in earlier e-mails, so that
> mom (in CC) can use her notebook with Linux.
>
> For the record, *some* description of the problem is at:
>
> * https://bugzilla.kernel.org/show_bug.cgi?id=41132
> * https://bugzilla.kernel.org/show_bug.cgi?id=41722
>
> The first one is a regression that I bisected, while the second one
> seems to be a bit harder to understand for a layman like me.
>
> Of couse, I can summarize what I found and what I didn't.
>
> Sorry for the top-posting, but it is here to keep the context of the
> previous message.
>
> Thanks,
> Rogério Brito.
>
> On Tue, Sep 6, 2011 at 19:22,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
>>
>>
>>
>>
>>
>> --- Comment #15 from Rogério Brito <rbrito@ime.usp.br>  2011-09-06 22:22:44 ---
>> Dear Bjorn,
>>
>> Sorry for not responding promptly but I had some family members with health
>> problems. Things are settling right now and I think that I can be faster now.
>>
>> (In reply to comment #14)
>>> So it looks like 2.4.27 doesn't even try to enable ACPI mode, and 2.6.8 hangs
>>> the same way as 3.1.0-rc3.
>>
>> Yes, exactly.
>>
>>> Can you try 2.6.28?  If that doesn't boot, try the Ubuntu 9.04 kernel, which
>>> was reported to boot on your machine/BIOS.
>>
>> Unfortunately, I tried to boot with that version of the Ubuntu live CD and it
>> crashed the same way. I may try it once again, paying more attention if it
>> behaves differently in any kind of way, though.
>>
>> Oh, talking about PCI and ACPI failing, could you take a look at issue #41132
>> to see if you can help with another machine that stopped working after a commit
>> that I bisected?
>>
>>
>> Thanks,
>>
>> Rogério Brito.
>>
>> --
>> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
>> ------- You are receiving this mail because: -------
>> You are on the CC list for the bug.
>
>
>
> --
> Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
> http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
> DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-07  1:50     ` Linus Torvalds
  2012-01-07  4:19       ` Edward Donovan
@ 2012-01-08 22:13       ` Rogério Brito
  2012-01-08 22:23         ` Linus Torvalds
  2012-01-10  9:25         ` Edward Donovan
  1 sibling, 2 replies; 53+ messages in thread
From: Rogério Brito @ 2012-01-08 22:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: bugzilla-daemon, Edward Donovan, Thomas Gleixner, Bjorn Helgaas,
	linux-kernel, Márcia Coutinho de Brito

Hi there, Linus.

On Jan 06 2012, Linus Torvalds wrote:
> 2012/1/6 Rogério Brito <rbrito@ime.usp.br>:
> >
> > For the record, *some* description of the problem is at:
> >
> > * https://bugzilla.kernel.org/show_bug.cgi?id=41132
> 
> This one looks very much like the thing that Edward Donovan fixed
> fairly recently in commit 52553ddffad7 ("genirq: fix regression in
> irqfixup, irqpoll")

It does look similar, but not quite the same. And, yes, my main desktop (the
one that I'm using right now) was affected by Edward's commit, which, BTW,
has my Reported-and-tested-by.

This one is different. Here is a brief summary of the situation:

# Computer description

* Rebranded Clevo M5X0JE
* nForce2 chipset
* AMD Sempron CPU
* NIC with driver forcedeth
* wifi rtl8187

# Booting options

The current working options passed to the kernel are: `acpi=off pnpbios=off noapic`

It doesn't boot with a vanilla kernel. It only boots when I pass the options
above *and* compile the kernel with the following patch applied:

,----[ do_not_size_subtractive_decoding_transparent_pci_to_pci_bridges.patch ]
| diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
| index 86b69f85..84543f5 100644
| --- a/drivers/pci/setup-bus.c
| +++ b/drivers/pci/setup-bus.c
| @@ -849,6 +849,10 @@ void __ref __pci_bus_size_bridges(struct pci_bus *bus,
|  		break;
|  
|  	case PCI_CLASS_BRIDGE_PCI:
| +		/* don't size subtractive decoding (transparent)
| +		 * PCI-to-PCI bridges */
| +		if (bus->self->transparent)
| +			break;
|  		pci_bridge_check_ranges(bus);
|  		if (bus->self->is_hotplug_bridge) {
|  			additional_io_size  = pci_hotplug_io_size;
`----

Otherwise, it hangs right when trying to configure PCI, which, according to
the dmesg log when it boots occurs approximately 0.12 seconds after the
kernel shows its first message.

> So that should be fixed in 3.2 (and it's marked for stable, so I think
> it's in the latest stable kernels too)
> 
> > * https://bugzilla.kernel.org/show_bug.cgi?id=41722
> 
> That, however, looks totally insane. I don't even have a clue where to
> begin. I take it that it works with "acpi=off", and don't have a clue
> about what might go wrong with ACPI enabled.

Yes, with acpi=off it "works" (well, it seems that without ACPI many things
don't get enabled).  At least mom was able to visit a site like Gmail to
check her email.

She would like say that she is positively impressed with the committment of
the community with a case of a layman person and she told me to send you all
big thank you.


Thanks from me also,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-08 22:13       ` Rogério Brito
@ 2012-01-08 22:23         ` Linus Torvalds
  2012-01-09 19:22           ` Jesse Barnes
  2012-01-10  1:57           ` Rogério Brito
  2012-01-10  9:25         ` Edward Donovan
  1 sibling, 2 replies; 53+ messages in thread
From: Linus Torvalds @ 2012-01-08 22:23 UTC (permalink / raw)
  To: Rogério Brito
  Cc: bugzilla-daemon, Edward Donovan, Thomas Gleixner, Bjorn Helgaas,
	linux-kernel, Márcia Coutinho de Brito, Jesse Barnes,
	Yinghai Lu, Ram Pai

On Sun, Jan 8, 2012 at 2:13 PM, Rogério Brito <rbrito@ime.usp.br> wrote:
>
> It doesn't boot with a vanilla kernel. It only boots when I pass the options
> above *and* compile the kernel with the following patch applied:
>
> ,----[ do_not_size_subtractive_decoding_transparent_pci_to_pci_bridges.patch ]
> | diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> | index 86b69f85..84543f5 100644
> | --- a/drivers/pci/setup-bus.c
> | +++ b/drivers/pci/setup-bus.c
> | @@ -849,6 +849,10 @@ void __ref __pci_bus_size_bridges(struct pci_bus *bus,
> |               break;
> |
> |       case PCI_CLASS_BRIDGE_PCI:
> | +             /* don't size subtractive decoding (transparent)
> | +              * PCI-to-PCI bridges */
> | +             if (bus->self->transparent)
> | +                     break;
> |               pci_bridge_check_ranges(bus);
> |               if (bus->self->is_hotplug_bridge) {
> |                       additional_io_size  = pci_hotplug_io_size;
> `----

Ahh. I'd forgotten about that particular PCI patch.

That is definitely the right thing to do, and commit 8fa5913d54f3
("PCI: remove transparent bridge sizing") did exactly that, but then
we reverted it in commit 12c22d6ef299 because it caused some odd
problems for some people.

I think we should try to re-do that "avoid sizing transparent bridges"
commit, because it really should make it much easier to do PCI
allocations under some very common situations (there's a *lot* of
common intel transparent PCI bridges).

Jesse, Yinghai, Ram - should we try to just re-do that commit in this
merge window, and see how that goes? We can always revert it again if
it causes problems..

                         Linus

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-07  4:19       ` Edward Donovan
@ 2012-01-08 22:27         ` Rogério Brito
  2012-01-09  4:20           ` Bjorn Helgaas
  0 siblings, 1 reply; 53+ messages in thread
From: Rogério Brito @ 2012-01-08 22:27 UTC (permalink / raw)
  To: Edward Donovan
  Cc: Linus Torvalds, bugzilla-daemon, Thomas Gleixner, Bjorn Helgaas,
	linux-kernel, Márcia Coutinho de Brito

[-- Attachment #1: Type: text/plain, Size: 1094 bytes --]

Hi, Edward.

On Jan 06 2012, Edward Donovan wrote:
> I'll be interested to know if any IRQ handled by 2.6.38 isn't handled now,
> but I expect it will be.  I'm sorry if that isn't enough for this laptop,
> though -

I bisected the PCI stuff and ended up with a bad commit 12c22d6ef299ccf0955,
which was made by Linus:

,----[ git show 12c22d6ef299ccf0955e5756eb57d90d7577ac68 ]
| commit 12c22d6ef299ccf0955e5756eb57d90d7577ac68
| Author: Linus Torvalds <torvalds@linux-foundation.org>
| Date:   Wed Mar 26 11:22:40 2008 -0700
| 
|     Revert "PCI: remove transparent bridge sizing"
| (...)    
`----

The patch that I am using to make this laptop boot at least is essentially a
revert of that commit.

Just for the sake of documentation, I am attaching here the patch that makes
things work (at least partially).

Any directions to debug this properly are welcome.


Thanks,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

[-- Attachment #2: do_not_size_subtractive_decoding_transparent_pci_to_pci_bridges.patch --]
[-- Type: text/x-diff, Size: 512 bytes --]

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 86b69f85..84543f5 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -849,6 +849,10 @@ void __ref __pci_bus_size_bridges(struct pci_bus *bus,
 		break;
 
 	case PCI_CLASS_BRIDGE_PCI:
+		/* don't size subtractive decoding (transparent)
+		 * PCI-to-PCI bridges */
+		if (bus->self->transparent)
+			break;
 		pci_bridge_check_ranges(bus);
 		if (bus->self->is_hotplug_bridge) {
 			additional_io_size  = pci_hotplug_io_size;

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-08 22:27         ` Rogério Brito
@ 2012-01-09  4:20           ` Bjorn Helgaas
  2012-01-10  2:12             ` Rogério Brito
  0 siblings, 1 reply; 53+ messages in thread
From: Bjorn Helgaas @ 2012-01-09  4:20 UTC (permalink / raw)
  To: Rogério Brito
  Cc: Edward Donovan, Linus Torvalds, bugzilla-daemon, Thomas Gleixner,
	linux-kernel, Márcia Coutinho de Brito

On Sun, Jan 8, 2012 at 3:27 PM, Rogério Brito <rbrito@ime.usp.br> wrote:
> I bisected the PCI stuff and ended up with a bad commit 12c22d6ef299ccf0955,
> which was made by Linus:
>
> ,----[ git show 12c22d6ef299ccf0955e5756eb57d90d7577ac68 ]
> | commit 12c22d6ef299ccf0955e5756eb57d90d7577ac68
> | Author: Linus Torvalds <torvalds@linux-foundation.org>
> | Date:   Wed Mar 26 11:22:40 2008 -0700
> |
> |     Revert "PCI: remove transparent bridge sizing"
> | (...)
> `----
>
> The patch that I am using to make this laptop boot at least is essentially a
> revert of that commit.
>
> Just for the sake of documentation, I am attaching here the patch that makes
> things work (at least partially).

Your patch avoids sizing transparent bridges.  I don't have an opinion
on whether that's a good idea in general.

However, I am curious about what breaks on your system when we do size
the transparent bridge.  It seems like sizing it should *work*, even
if it's not strictly necessary, so I wonder if this hang is telling us
about some other PCI allocation issue we should fix.  Would you mind
opening a bugzilla for this, since the current tree is broken for you,
and attaching a log or digital photo of the hang?

Bjorn

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-08 22:23         ` Linus Torvalds
@ 2012-01-09 19:22           ` Jesse Barnes
  2012-01-09 19:41             ` Linus Torvalds
  2012-01-10  1:57           ` Rogério Brito
  1 sibling, 1 reply; 53+ messages in thread
From: Jesse Barnes @ 2012-01-09 19:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rogério Brito, bugzilla-daemon, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Yinghai Lu, Ram Pai

[-- Attachment #1: Type: text/plain, Size: 2078 bytes --]

On Sun, 8 Jan 2012 14:23:29 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Sun, Jan 8, 2012 at 2:13 PM, Rogério Brito <rbrito@ime.usp.br> wrote:
> >
> > It doesn't boot with a vanilla kernel. It only boots when I pass the options
> > above *and* compile the kernel with the following patch applied:
> >
> > ,----[ do_not_size_subtractive_decoding_transparent_pci_to_pci_bridges.patch ]
> > | diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > | index 86b69f85..84543f5 100644
> > | --- a/drivers/pci/setup-bus.c
> > | +++ b/drivers/pci/setup-bus.c
> > | @@ -849,6 +849,10 @@ void __ref __pci_bus_size_bridges(struct pci_bus *bus,
> > |               break;
> > |
> > |       case PCI_CLASS_BRIDGE_PCI:
> > | +             /* don't size subtractive decoding (transparent)
> > | +              * PCI-to-PCI bridges */
> > | +             if (bus->self->transparent)
> > | +                     break;
> > |               pci_bridge_check_ranges(bus);
> > |               if (bus->self->is_hotplug_bridge) {
> > |                       additional_io_size  = pci_hotplug_io_size;
> > `----
> 
> Ahh. I'd forgotten about that particular PCI patch.
> 
> That is definitely the right thing to do, and commit 8fa5913d54f3
> ("PCI: remove transparent bridge sizing") did exactly that, but then
> we reverted it in commit 12c22d6ef299 because it caused some odd
> problems for some people.
> 
> I think we should try to re-do that "avoid sizing transparent bridges"
> commit, because it really should make it much easier to do PCI
> allocations under some very common situations (there's a *lot* of
> common intel transparent PCI bridges).
> 
> Jesse, Yinghai, Ram - should we try to just re-do that commit in this
> merge window, and see how that goes? We can always revert it again if
> it causes problems..

I don't remember what problems we hit, but if Ram and Yinghai are
willing to take a look at them we should go ahead and try again.

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-09 19:22           ` Jesse Barnes
@ 2012-01-09 19:41             ` Linus Torvalds
  2012-01-10  1:32               ` Yinghai Lu
  0 siblings, 1 reply; 53+ messages in thread
From: Linus Torvalds @ 2012-01-09 19:41 UTC (permalink / raw)
  To: Jesse Barnes, Ivan Kokshaysky
  Cc: Rogério Brito, bugzilla-daemon, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Yinghai Lu, Ram Pai

On Mon, Jan 9, 2012 at 11:22 AM, Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
>
> I don't remember what problems we hit, but if Ram and Yinghai are
> willing to take a look at them we should go ahead and try again.

So the two bugzilla entries quoted in the revert are

  http://bugzilla.kernel.org/show_bug.cgi?id=10080
  http://bugzilla.kernel.org/show_bug.cgi?id=9961

 with at least one real smoking gun in this one:

  https://lkml.org/lkml/2008/3/26/94

however at least *part* of the problem for some people was not the
transparency itself, as much as the fact that we did bad things with
64-bit resources etc.

So what I think mostly happened was that not sizing up transparent
bridges ended up showing up other bugs. We've fixed at least some of
those other bugs in the meantime (things like using "unsigned long"
for physical addresses in ioremap etc, which broke when 64-bit
resources were used on 32-bit architectures).

But it's entirely possible that it will still trigger similar issues.
For example, even if a bridge is transparent, maybe it is still
limited to 32-bit addresses? If we look at the actual IO windows, we'd
get that 32-bit limit right automatically. If we just say "it's
transparent", we might allocate child devices with 64-bit resources
above the 4GB area, and be screwed. That seems to have been one
problem above - even if we now get it right on a software level, there
may actually be hardware issues in the same area.

(It's not clear whether the 64-bit resource issue back in 2008 was due
to hardware or due to our resource bugs).

Adding Ivan to the cc due to historical issues - although I haven't
seen him in email for a year, so he may be gone.

                           Linus

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-09 19:41             ` Linus Torvalds
@ 2012-01-10  1:32               ` Yinghai Lu
  2012-01-10  2:41                 ` Rogério Brito
  0 siblings, 1 reply; 53+ messages in thread
From: Yinghai Lu @ 2012-01-10  1:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jesse Barnes, Ivan Kokshaysky, Rogério Brito,
	bugzilla-daemon, Edward Donovan, Thomas Gleixner, Bjorn Helgaas,
	linux-kernel, Márcia Coutinho de Brito, Ram Pai

On Mon, Jan 9, 2012 at 11:41 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Jan 9, 2012 at 11:22 AM, Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
>>
>> I don't remember what problems we hit, but if Ram and Yinghai are
>> willing to take a look at them we should go ahead and try again.
>
> So the two bugzilla entries quoted in the revert are
>
>  http://bugzilla.kernel.org/show_bug.cgi?id=10080
>  http://bugzilla.kernel.org/show_bug.cgi?id=9961
>
>  with at least one real smoking gun in this one:
>
>  https://lkml.org/lkml/2008/3/26/94
>
> however at least *part* of the problem for some people was not the
> transparency itself, as much as the fact that we did bad things with
> 64-bit resources etc.

2 cases:
a. transparent bridge has resource allocated from BIOS.
b. transparent bridge does not have resource allocated from BIOS.

skipping transparent bridge size checking only affects case b.

when skip code is there, those bridge register will not be probed (
with 0xfff0fff0),
and those bridge bar will not get allocated. and child devices that
does not get allocated from
BIOS, kernel will allocate from parent bus resources... up to peer root bus.

Rogério's laptop does not like bus resize, could be because the bridge
does not like kernel to use
0xfff0fff0 to probe it...

We could add one quirks etc to skip this kind of probe.

>
> So what I think mostly happened was that not sizing up transparent
> bridges ended up showing up other bugs. We've fixed at least some of
> those other bugs in the meantime (things like using "unsigned long"
> for physical addresses in ioremap etc, which broke when 64-bit
> resources were used on 32-bit architectures).
>
> But it's entirely possible that it will still trigger similar issues.
> For example, even if a bridge is transparent, maybe it is still
> limited to 32-bit addresses? If we look at the actual IO windows, we'd
> get that 32-bit limit right automatically. If we just say "it's
> transparent", we might allocate child devices with 64-bit resources
> above the 4GB area, and be screwed. That seems to have been one
> problem above - even if we now get it right on a software level, there
> may actually be hardware issues in the same area.

in pci_bus_alloc_resource(),  we already have

        /* don't allocate too high if the pref mem doesn't support 64bit*/
        if (!(res->flags & IORESOURCE_MEM_64))
                max = PCIBIOS_MAX_MEM_32;

So it should allocate to range below 4G to unassigned children devices.

but need to make sure (peer) root bus does have valid resource range at first.

Rogério,

Do you have bootlog with "debug ignore_loglevel pci=earlydump" ?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-08 22:23         ` Linus Torvalds
  2012-01-09 19:22           ` Jesse Barnes
@ 2012-01-10  1:57           ` Rogério Brito
  1 sibling, 0 replies; 53+ messages in thread
From: Rogério Brito @ 2012-01-10  1:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: bugzilla-daemon, Edward Donovan, Thomas Gleixner, Bjorn Helgaas,
	linux-kernel, Márcia Coutinho de Brito, Jesse Barnes,
	Yinghai Lu, Ram Pai

Hi, Linus.

On Sun, Jan 8, 2012 at 20:23, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sun, Jan 8, 2012 at 2:13 PM, Rogério Brito <rbrito@ime.usp.br> wrote:
>>
>> It doesn't boot with a vanilla kernel. It only boots when I pass the options
>> above *and* compile the kernel with the following patch applied:
>>
>> ,----[ do_not_size_subtractive_decoding_transparent_pci_to_pci_bridges.patch ]
>> | diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
>> | index 86b69f85..84543f5 100644
>> | --- a/drivers/pci/setup-bus.c
>> | +++ b/drivers/pci/setup-bus.c
>> | @@ -849,6 +849,10 @@ void __ref __pci_bus_size_bridges(struct pci_bus *bus,
>> |               break;
>> |
>> |       case PCI_CLASS_BRIDGE_PCI:
>> | +             /* don't size subtractive decoding (transparent)
>> | +              * PCI-to-PCI bridges */
>> | +             if (bus->self->transparent)
>> | +                     break;
>> |               pci_bridge_check_ranges(bus);
>> |               if (bus->self->is_hotplug_bridge) {
>> |                       additional_io_size  = pci_hotplug_io_size;
>> `----
>
> Ahh. I'd forgotten about that particular PCI patch.

Thanks for acknowledging that issue.

> I think we should try to re-do that "avoid sizing transparent bridges"
> commit, because it really should make it much easier to do PCI
> allocations under some very common situations (there's a *lot* of
> common intel transparent PCI bridges).

All this PCI stuff is alien to me, but I am learning some by osmosis
here, which is not bad at all. :)

I only fear that if the patch above is applied and proves to have ill
effects if reverting it will leave this computer (and, potentially,
some others) without booting. As I always say, I can test anything
that you guys want me to do, given some guidance.


Thanks,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-09  4:20           ` Bjorn Helgaas
@ 2012-01-10  2:12             ` Rogério Brito
  0 siblings, 0 replies; 53+ messages in thread
From: Rogério Brito @ 2012-01-10  2:12 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Edward Donovan, Linus Torvalds, bugzilla-daemon, Thomas Gleixner,
	linux-kernel, Márcia Coutinho de Brito, Jesse Barnes,
	Ivan Kokshaysky, Ram Pai, Yinghai Lu

Hi, Bjorn and other people.

2012/1/9 Bjorn Helgaas <bhelgaas@google.com>:
> However, I am curious about what breaks on your system when we do size
> the transparent bridge.  It seems like sizing it should *work*, even
> if it's not strictly necessary, so I wonder if this hang is telling us
> about some other PCI allocation issue we should fix.

Makes sense.

> Would you mind opening a bugzilla for this, since the current tree is broken for you,
> and attaching a log or digital photo of the hang?

I have already opened a bugzilla entry for this at:

    https://bugzilla.kernel.org/show_bug.cgi?id=41622

It contains some information already (see the attachments), but I can
always post more, of course. Just ask and I will try my best. (I will
try to post some pictures as soon as I can get back my bugzilla
password to change that bug).


Thanks,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-10  1:32               ` Yinghai Lu
@ 2012-01-10  2:41                 ` Rogério Brito
  2012-01-10  5:07                   ` Yinghai Lu
  2012-01-10  5:24                   ` Bjorn Helgaas
  0 siblings, 2 replies; 53+ messages in thread
From: Rogério Brito @ 2012-01-10  2:41 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, bugzilla-daemon,
	Edward Donovan, Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai

Dear Yinghai,

On Mon, Jan 9, 2012 at 23:32, Yinghai Lu <yinghai@kernel.org> wrote:
> in pci_bus_alloc_resource(),  we already have
>
>        /* don't allocate too high if the pref mem doesn't support 64bit*/
>        if (!(res->flags & IORESOURCE_MEM_64))
>                max = PCIBIOS_MAX_MEM_32;
>
> So it should allocate to range below 4G to unassigned children devices.
>
> but need to make sure (peer) root bus does have valid resource range at first.
>
> Rogério,
>
> Do you have bootlog with "debug ignore_loglevel pci=earlydump" ?

I have posted some information to bugzilla (see [0]), but not with
pci=earlydump. It seems that bugzilla doesn't want to mail me a reset
e-mail, but while I am waiting, I posted this on my homepage (see
[1]).

Please, note that this boot log was taken with the patch that I
e-mailed earlier (i.e., the one that makes things work). If you want
me to send a log with a vanilla kernel, just let me know and I will
grab more information.


Thanks,
Rogério.

[0]: https://bugzilla.kernel.org/show_bug.cgi?id=41622
[1]: http://www.ime.usp.br/~rbrito/linux/clevo/clevo-dmesg-with-earlydump-2012-01-10.txt

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-10  2:41                 ` Rogério Brito
@ 2012-01-10  5:07                   ` Yinghai Lu
  2012-01-11  7:04                     ` Rogério Brito
  2012-01-10  5:24                   ` Bjorn Helgaas
  1 sibling, 1 reply; 53+ messages in thread
From: Yinghai Lu @ 2012-01-10  5:07 UTC (permalink / raw)
  To: Rogério Brito
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, bugzilla-daemon,
	Edward Donovan, Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai

On Mon, Jan 9, 2012 at 6:41 PM, Rogério Brito <rbrito@ime.usp.br> wrote:
>
> I have posted some information to bugzilla (see [0]), but not with
> pci=earlydump. It seems that bugzilla doesn't want to mail me a reset
> e-mail, but while I am waiting, I posted this on my homepage (see
> [1]).

cool.
your system have bridge 00:10.0, and under it there is 05:07.0 and it
is one card bus bridge

for pci bridge, BIOS only allocate small range to mmio, and does not
allocate mmio_pref to it.

looks like bridge 00:10.0 does not like kernel probe its mmio pref register.

with the transparent bridge skip patch, kernel will not touch mmio
pref register,

and cardbus bridge will get allocation from root bus resource range...

[    0.162791] pci 0000:05:07.0: BAR 10: assigned [mem 0x84000000-0x87ffffff]
[    0.162863] pci 0000:05:07.0: BAR 9: assigned [mem
0x88000000-0x8bffffff pref]
[    0.162946] pci 0000:05:07.0: BAR 8: assigned [io  0x1000-0x10ff]
[    0.163016] pci 0000:05:07.0: BAR 7: assigned [io  0x1400-0x14ff]
[    0.163083] pci 0000:05:07.0: CardBus bridge to [bus 06-09]
[    0.163150] pci 0000:05:07.0:   bridge window [io  0x1400-0x14ff]
[    0.163220] pci 0000:05:07.0:   bridge window [io  0x1000-0x10ff]
[    0.163290] pci 0000:05:07.0:   bridge window [mem
0x88000000-0x8bffffff pref]
[    0.163331] pci 0000:05:07.0:   bridge window [mem 0x84000000-0x87ffffff]
[    0.163402] pci 0000:00:10.0: PCI bridge to [bus 05-06]
[    0.163471] pci 0000:00:10.0:   bridge window [mem 0xb3200000-0xb32fffff]

I have some pending patches that may fix bridge resource resizing.

Can you try them at

      git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
for-pci2

also please boot with "debug ignore_loglevel"

BTW, did you try boot with "pci=use_crs" and not with "acpi=off" ?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-10  2:41                 ` Rogério Brito
  2012-01-10  5:07                   ` Yinghai Lu
@ 2012-01-10  5:24                   ` Bjorn Helgaas
  2012-01-11  7:05                     ` Rogério Brito
  1 sibling, 1 reply; 53+ messages in thread
From: Bjorn Helgaas @ 2012-01-10  5:24 UTC (permalink / raw)
  To: Rogério Brito
  Cc: Yinghai Lu, Linus Torvalds, Jesse Barnes, Ivan Kokshaysky,
	bugzilla-daemon, Edward Donovan, Thomas Gleixner, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai

On Mon, Jan 9, 2012 at 7:41 PM, Rogério Brito <rbrito@ime.usp.br> wrote:

> Please, note that this boot log was taken with the patch that I
> e-mailed earlier (i.e., the one that makes things work). If you want
> me to send a log with a vanilla kernel, just let me know and I will
> grab more information.

I'd definitely be interested in a log with a vanilla kernel.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-08 22:13       ` Rogério Brito
  2012-01-08 22:23         ` Linus Torvalds
@ 2012-01-10  9:25         ` Edward Donovan
  2012-01-11  7:15           ` Rogério Brito
  1 sibling, 1 reply; 53+ messages in thread
From: Edward Donovan @ 2012-01-10  9:25 UTC (permalink / raw)
  To: Rogério Brito
  Cc: Linus Torvalds, Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Yinghai Lu, Jesse Barnes, Ram Pai

Hi Rogério -

I'm glad to see you're getting help from a lot of skilled people, here.

On Sun, Jan 8, 2012 at 5:13 PM, Rogério Brito <rbrito@ime.usp.br> wrote:
> On Jan 06 2012, Linus Torvalds wrote:
>> 2012/1/6 Rogério Brito <rbrito@ime.usp.br>:
>> >
>> > For the record, *some* description of the problem is at:
>> >
>> > * https://bugzilla.kernel.org/show_bug.cgi?id=41132
>>
>> This one looks very much like the thing that Edward Donovan fixed
>> fairly recently in commit 52553ddffad7 ("genirq: fix regression in
>> irqfixup, irqpoll")
>
> It does look similar, but not quite the same. And, yes, my main desktop (the
> one that I'm using right now) was affected by Edward's commit, which, BTW,
> has my Reported-and-tested-by.


The one area I have experience with is the IRQ regression you cite in
41132.  I was bitten by the regressions in spurious IRQ handling, like
you, and this has been my first trip inside the kernel.  The patches I
sent are the sum of my kernel wisdom to date, and I don't want to
impersonate an experienced developer, even by accident. :)

At any rate: I can't tell, yet, what IRQ trouble is happening on this
notebook.  I couldn't pick any out from the description below.  And so
I can't tell, either :) , how it is similar to the bug above, but not
quite the same.

Is there an IRQ disabled, "nobody cared", problem, even with the
newest kernels?

If you have info on IRQ problems, I can look at it, hopefully soon.
Right now, it might make sense to let this PCI effort roll forward,
before opening a second front, but I can't say that for sure.  I
certainly hope we can make your mom's notebook a happy Linux box.

Thanks,

Ed


> This one is different. Here is a brief summary of the situation:
>
> # Computer description
>
> * Rebranded Clevo M5X0JE
> * nForce2 chipset
> * AMD Sempron CPU
> * NIC with driver forcedeth
> * wifi rtl8187
>
> # Booting options
>
> The current working options passed to the kernel are: `acpi=off pnpbios=off noapic`
>
> It doesn't boot with a vanilla kernel. It only boots when I pass the options
> above *and* compile the kernel with the following patch applied:
>
> ,----[ do_not_size_subtractive_decoding_transparent_pci_to_pci_bridges.patch ]
> | diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> | index 86b69f85..84543f5 100644
> | --- a/drivers/pci/setup-bus.c
> | +++ b/drivers/pci/setup-bus.c
> | @@ -849,6 +849,10 @@ void __ref __pci_bus_size_bridges(struct pci_bus *bus,
> |               break;
> |
> |       case PCI_CLASS_BRIDGE_PCI:
> | +             /* don't size subtractive decoding (transparent)
> | +              * PCI-to-PCI bridges */
> | +             if (bus->self->transparent)
> | +                     break;
> |               pci_bridge_check_ranges(bus);
> |               if (bus->self->is_hotplug_bridge) {
> |                       additional_io_size  = pci_hotplug_io_size;
> `----
>
> Otherwise, it hangs right when trying to configure PCI, which, according to
> the dmesg log when it boots occurs approximately 0.12 seconds after the
> kernel shows its first message.
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-10  5:07                   ` Yinghai Lu
@ 2012-01-11  7:04                     ` Rogério Brito
  2012-01-12  5:06                       ` Yinghai Lu
  0 siblings, 1 reply; 53+ messages in thread
From: Rogério Brito @ 2012-01-11  7:04 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, bugzilla-daemon,
	Edward Donovan, Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai

Hi, Yinghai.

On Tue, Jan 10, 2012 at 03:07, Yinghai Lu <yinghai@kernel.org> wrote:
> On Mon, Jan 9, 2012 at 6:41 PM, Rogério Brito <rbrito@ime.usp.br> wrote:
> I have some pending patches that may fix bridge resource resizing.
>
> Can you try them at
>
>      git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
> for-pci2
>
> also please boot with "debug ignore_loglevel"

Grabbing it right now. Will report it after I get some sleep (5AM here).

> BTW, did you try boot with "pci=use_crs" and not with "acpi=off" ?

I tried, but I got a hang (both with Linux---many versions---and with
FreeBSD). With Linux I get:

   http://www.ime.usp.br/~rbrito/linux/clevo/boot-without-acpi-off.jpg

Suggestions?


Thanks,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-10  5:24                   ` Bjorn Helgaas
@ 2012-01-11  7:05                     ` Rogério Brito
  2012-01-11 10:45                       ` Ram Pai
  0 siblings, 1 reply; 53+ messages in thread
From: Rogério Brito @ 2012-01-11  7:05 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Yinghai Lu, Linus Torvalds, Jesse Barnes, Ivan Kokshaysky,
	bugzilla-daemon, Edward Donovan, Thomas Gleixner, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai

Hi, Bjorn.

2012/1/10 Bjorn Helgaas <bhelgaas@google.com>:
> On Mon, Jan 9, 2012 at 7:41 PM, Rogério Brito <rbrito@ime.usp.br> wrote:
>
>> Please, note that this boot log was taken with the patch that I
>> e-mailed earlier (i.e., the one that makes things work). If you want
>> me to send a log with a vanilla kernel, just let me know and I will
>> grab more information.
>
> I'd definitely be interested in a log with a vanilla kernel.

Sure, here you go:

    http://www.ime.usp.br/~rbrito/linux/clevo/boot-with-vanilla-3.2.m4v


Thanks for any insight,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-10  9:25         ` Edward Donovan
@ 2012-01-11  7:15           ` Rogério Brito
  0 siblings, 0 replies; 53+ messages in thread
From: Rogério Brito @ 2012-01-11  7:15 UTC (permalink / raw)
  To: Edward Donovan
  Cc: Linus Torvalds, Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Yinghai Lu, Jesse Barnes, Ram Pai

Hi, Edward.

2012/1/10 Edward Donovan <ed@numble.net>:
> I'm glad to see you're getting help from a lot of skilled people, here.

This is great. I'm keeping mom in CC and she is pleasantly surprised
that her case got attention from developers. She wasn't expecting one
single case to be treated seriously by important people (in her
words---and mine too).

>> It does look similar, but not quite the same. And, yes, my main desktop (the
>> one that I'm using right now) was affected by Edward's commit, which, BTW,
>> has my Reported-and-tested-by.
>
> The one area I have experience with is the IRQ regression you cite in
> 41132.  I was bitten by the regressions in spurious IRQ handling, like
> you, and this has been my first trip inside the kernel.  The patches I
> sent are the sum of my kernel wisdom to date, and I don't want to
> impersonate an experienced developer, even by accident. :)

Ooops. Sorry I listed two issues from different computers (yes, I am
seeing many issues with the computers that I have at my disposal). The
ones that I should have had listed relevant to this notebook are:

* https://bugzilla.kernel.org/show_bug.cgi?id=41622
* https://bugzilla.kernel.org/show_bug.cgi?id=41722

Issue

* https://bugzilla.kernel.org/show_bug.cgi?id=41132

should not have been mentioned here. Too many issues and I got confused. Sorry.

> Is there an IRQ disabled, "nobody cared", problem, even with the
> newest kernels?
>
> If you have info on IRQ problems, I can look at it, hopefully soon.
> Right now, it might make sense to let this PCI effort roll forward,
> before opening a second front, but I can't say that for sure.  I
> certainly hope we can make your mom's notebook a happy Linux box.

Well, as you mention, yes, I have one case of "nobody cared" in this
very desktop (this one I didn't file on bugzilla) that I am using
right now (and I have to boot with irqpoll), but I guess that this is
another issue entirely. If you could help me with this "nobody cared"
thing, then I will be glad to provide copious amounts of data. :)


Thanks,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-11  7:05                     ` Rogério Brito
@ 2012-01-11 10:45                       ` Ram Pai
  0 siblings, 0 replies; 53+ messages in thread
From: Ram Pai @ 2012-01-11 10:45 UTC (permalink / raw)
  To: Rogério Brito
  Cc: Bjorn Helgaas, Yinghai Lu, Linus Torvalds, Jesse Barnes,
	Ivan Kokshaysky, bugzilla-daemon, Edward Donovan,
	Thomas Gleixner, linux-kernel, Márcia Coutinho de Brito

On Wed, Jan 11, 2012 at 05:05:44AM -0200, Rogério Brito wrote:
> Hi, Bjorn.
> 
> 2012/1/10 Bjorn Helgaas <bhelgaas@google.com>:
> > On Mon, Jan 9, 2012 at 7:41 PM, Rogério Brito <rbrito@ime.usp.br> wrote:
> >
> >> Please, note that this boot log was taken with the patch that I
> >> e-mailed earlier (i.e., the one that makes things work). If you want
> >> me to send a log with a vanilla kernel, just let me know and I will
> >> grab more information.
> >
> > I'd definitely be interested in a log with a vanilla kernel.
> 
> Sure, here you go:
> 
>     http://www.ime.usp.br/~rbrito/linux/clevo/boot-with-vanilla-3.2.m4v
> 

I cannot tell what exactly is the problem, but looks like the transparent
bridge gets assigned a positive-decoded address, which seems to create some
conflict later somehow..

RP


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-11  7:04                     ` Rogério Brito
@ 2012-01-12  5:06                       ` Yinghai Lu
  2012-01-13 11:59                         ` Rogério Brito
  0 siblings, 1 reply; 53+ messages in thread
From: Yinghai Lu @ 2012-01-12  5:06 UTC (permalink / raw)
  To: Rogério Brito
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai

[-- Attachment #1: Type: text/plain, Size: 945 bytes --]

On Tue, Jan 10, 2012 at 11:04 PM, Rogério Brito <rbrito@ime.usp.br> wrote:
> Hi, Yinghai.
>
> On Tue, Jan 10, 2012 at 03:07, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Mon, Jan 9, 2012 at 6:41 PM, Rogério Brito <rbrito@ime.usp.br> wrote:
>> I have some pending patches that may fix bridge resource resizing.
>>
>> Can you try them at
>>
>>      git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
>> for-pci2
>>
>> also please boot with "debug ignore_loglevel"
>
> Grabbing it right now. Will report it after I get some sleep (5AM here).

please try attached patch, it will fix one bios wrong setting.

>
>> BTW, did you try boot with "pci=use_crs" and not with "acpi=off" ?
>
> I tried, but I got a hang (both with Linux---many versions---and with
> FreeBSD). With Linux I get:
>
>   http://www.ime.usp.br/~rbrito/linux/clevo/boot-without-acpi-off.jpg
>

need to work on that later.

Yinghai

[-- Attachment #2: disable_cardbus_mem1_pref.patch --]
[-- Type: text/x-patch, Size: 1029 bytes --]

Subject: [PATCH] PCI: Disable cardbus bridge MEM1 pref CTL

Some BIOS enable both pref for MEM0 and MEM1.

but we assume MEM1 is non-pref...

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/setup-bus.c |    8 ++++++++
 1 file changed, 8 insertions(+)

Index: linux-2.6/drivers/pci/setup-bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/setup-bus.c
+++ linux-2.6/drivers/pci/setup-bus.c
@@ -774,6 +774,14 @@ static void pci_bus_size_cardbus(struct
 	if (realloc_head)
 		add_to_list(realloc_head, bridge, b_res+1, pci_cardbus_io_size, 0 /* dont care */);
 
+	/* MEM1 must not be pref mmio */
+	pci_read_config_word(bridge, PCI_CB_BRIDGE_CONTROL, &ctrl);
+	if (ctrl & PCI_CB_BRIDGE_CTL_PREFETCH_MEM1) {
+		ctrl &= ~PCI_CB_BRIDGE_CTL_PREFETCH_MEM1;
+		pci_write_config_word(bridge, PCI_CB_BRIDGE_CONTROL, ctrl);
+		pci_read_config_word(bridge, PCI_CB_BRIDGE_CONTROL, &ctrl);
+	}
+
 	/*
 	 * Check whether prefetchable memory is supported
 	 * by this bridge.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-12  5:06                       ` Yinghai Lu
@ 2012-01-13 11:59                         ` Rogério Brito
  2012-01-13 17:29                           ` Yinghai Lu
  0 siblings, 1 reply; 53+ messages in thread
From: Rogério Brito @ 2012-01-13 11:59 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai

Dear Yinghai,

On Jan 11 2012, Yinghai Lu wrote:
> On Tue, Jan 10, 2012 at 11:04 PM, Rogério Brito <rbrito@ime.usp.br> wrote:
> > On Tue, Jan 10, 2012 at 03:07, Yinghai Lu <yinghai@kernel.org> wrote:
> >>   git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
> >> for-pci2
> >>
> >> also please boot with "debug ignore_loglevel"
> >
> > Grabbing it right now. Will report it after I get some sleep (5AM here).
> 
> please try attached patch, it will fix one bios wrong setting.

Should I use this patch agains which tree?

Right now, I have Linus's tree, yours (cited above) and I am, up to now,
I had been compiling everything from Linus's v3.2 tag: both a vanilla
kernel, as asked by Bjorn and with the patch that I mentioned to Linus ("the
one that makes things boot").

If it applies (and newer kernels don't have things which your patch needs),
I will keep working with Linus's v3.2 tag.

When I compiled from your branch for-pci2 from linux-yinghai.git (without
any extra patches), it *did* boot, but noticed one thing weird: using htop
and in single user mode, it reported 100% of CPU in use. With regular top, a
more fine grained output revealed that no processes were using the CPU, but
that almost all of the time was spent in kernel mode.

Anyway, as you asked, I put the log of what I see at

    http://www.ime.usp.br/~rbrito/linux/clevo/2012-01-13-0951-yanghai-for-pci2-branch/

> >> BTW, did you try boot with "pci=use_crs" and not with "acpi=off" ?
> >
> > I tried, but I got a hang (both with Linux---many versions---and with
> > FreeBSD). With Linux I get:
> >
> >  http://www.ime.usp.br/~rbrito/linux/clevo/boot-without-acpi-off.jpg
> >
> 
> need to work on that later.

Thanks for your (everybody else's) kindness/helpfulness with these multiple
issues.


Thanks,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-13 11:59                         ` Rogério Brito
@ 2012-01-13 17:29                           ` Yinghai Lu
  2012-01-13 22:24                             ` Yinghai Lu
  2012-01-14  2:01                             ` Rogério Brito
  0 siblings, 2 replies; 53+ messages in thread
From: Yinghai Lu @ 2012-01-13 17:29 UTC (permalink / raw)
  To: Rogério Brito
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai

On Fri, Jan 13, 2012 at 3:59 AM, Rogério Brito <rbrito@ime.usp.br> wrote:
> Dear Yinghai,
>
> On Jan 11 2012, Yinghai Lu wrote:
>> On Tue, Jan 10, 2012 at 11:04 PM, Rogério Brito <rbrito@ime.usp.br> wrote:
>> > On Tue, Jan 10, 2012 at 03:07, Yinghai Lu <yinghai@kernel.org> wrote:
>> >>   git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
>> >> for-pci2
>> >>
>> >> also please boot with "debug ignore_loglevel"
>> >
>> > Grabbing it right now. Will report it after I get some sleep (5AM here).
>>
>> please try attached patch, it will fix one bios wrong setting.
>
> Should I use this patch agains which tree?
>
> Right now, I have Linus's tree, yours (cited above) and I am, up to now,
> I had been compiling everything from Linus's v3.2 tag: both a vanilla
> kernel, as asked by Bjorn and with the patch that I mentioned to Linus ("the
> one that makes things boot").
>
> If it applies (and newer kernels don't have things which your patch needs),
> I will keep working with Linus's v3.2 tag.

No problem, I will put that patch into for-pci2 for your conveniences.

>
> When I compiled from your branch for-pci2 from linux-yinghai.git (without
> any extra patches), it *did* boot, but noticed one thing weird: using htop
> and in single user mode, it reported 100% of CPU in use. With regular top, a
> more fine grained output revealed that no processes were using the CPU, but
> that almost all of the time was spent in kernel mode.
>
> Anyway, as you asked, I put the log of what I see at
>
>    http://www.ime.usp.br/~rbrito/linux/clevo/2012-01-13-0951-yanghai-for-pci2-branch/

[    0.165615] pci 0000:05:07.0: CardBus bridge to [bus 06-09]
[    0.165682] pci 0000:05:07.0:   bridge window [io  0x0001-0x0000]
[    0.165751] pci 0000:05:07.0:   bridge window [io  0x0001-0x0000]
[    0.165821] pci 0000:05:07.0:   bridge window [mem
0x00000001-0x00000000 pref]
[    0.165904] pci 0000:05:07.0:   bridge window [mem 0x00000001-0x00000000]
[    0.165975] pci 0000:00:10.0: PCI bridge to [bus 05-06]
[    0.166043] pci 0000:00:10.0:   bridge window [mem 0xb3200000-0xb32fffff]
[    0.166259] pci 0000:00:10.0: setting latency timer to 64
[    0.166329] pci 0000:05:07.0: device not available (can't reserve
[io  0x0001-0x0000])
[    0.166412] pci 0000:05:07.0: Error enabling bridge (-22), continuing

We have some problems with handling cardbus bridge as optional resources.

will produce some patches and update for-pci2 branch for your test.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-13 17:29                           ` Yinghai Lu
@ 2012-01-13 22:24                             ` Yinghai Lu
  2012-01-14  2:01                             ` Rogério Brito
  1 sibling, 0 replies; 53+ messages in thread
From: Yinghai Lu @ 2012-01-13 22:24 UTC (permalink / raw)
  To: Rogério Brito
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai

[-- Attachment #1: Type: text/plain, Size: 758 bytes --]

On Fri, Jan 13, 2012 at 9:29 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Jan 13, 2012 at 3:59 AM, Rogério Brito <rbrito@ime.usp.br> wrote:
>> Right now, I have Linus's tree, yours (cited above) and I am, up to now,
>> I had been compiling everything from Linus's v3.2 tag: both a vanilla
>> kernel, as asked by Bjorn and with the patch that I mentioned to Linus ("the
>> one that makes things boot").
>>
>> If it applies (and newer kernels don't have things which your patch needs),
>> I will keep working with Linus's v3.2 tag.
>
> No problem, I will put that patch into for-pci2 for your conveniences.

please check for-pci2 again, i put two cardbus related fixes there.

or you can apply attached two patches.

Thanks

Yinghai

[-- Attachment #2: disable_cardbus_mem1_pref.patch --]
[-- Type: text/x-patch, Size: 1029 bytes --]

Subject: [PATCH] PCI: Disable cardbus bridge MEM1 pref CTL

Some BIOS enable both pref for MEM0 and MEM1.

but we assume MEM1 is non-pref...

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/setup-bus.c |    8 ++++++++
 1 file changed, 8 insertions(+)

Index: linux-2.6/drivers/pci/setup-bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/setup-bus.c
+++ linux-2.6/drivers/pci/setup-bus.c
@@ -774,6 +774,14 @@ static void pci_bus_size_cardbus(struct
 	if (realloc_head)
 		add_to_list(realloc_head, bridge, b_res+1, pci_cardbus_io_size, 0 /* dont care */);
 
+	/* MEM1 must not be pref mmio */
+	pci_read_config_word(bridge, PCI_CB_BRIDGE_CONTROL, &ctrl);
+	if (ctrl & PCI_CB_BRIDGE_CTL_PREFETCH_MEM1) {
+		ctrl &= ~PCI_CB_BRIDGE_CTL_PREFETCH_MEM1;
+		pci_write_config_word(bridge, PCI_CB_BRIDGE_CONTROL, ctrl);
+		pci_read_config_word(bridge, PCI_CB_BRIDGE_CONTROL, &ctrl);
+	}
+
 	/*
 	 * Check whether prefetchable memory is supported
 	 * by this bridge.

[-- Attachment #3: fix_cardbus_optional_res.patch --]
[-- Type: text/x-patch, Size: 4213 bytes --]

Subject: [PATCH] PCI: Fix cardbus bridge resources as optional size handling

We should not set the requested size to -2.

that will confuse the resource list sorting with align when SIZEALIGN is used.

Change to STARTALIGN and pass align from start.

We are safe to do that just as we do that regular pci bridge.

In the long run, we should just treat cardbus like regular pci bridge.

Also fix when realloc is not passed, should keep the requested size.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/setup-bus.c |   71 +++++++++++++++++++++++++++---------------------
 1 file changed, 40 insertions(+), 31 deletions(-)

Index: linux-2.6/drivers/pci/setup-bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/setup-bus.c
+++ linux-2.6/drivers/pci/setup-bus.c
@@ -891,21 +891,30 @@ static void pci_bus_size_cardbus(struct
 {
 	struct pci_dev *bridge = bus->self;
 	struct resource *b_res = &bridge->resource[PCI_BRIDGE_RESOURCES];
+	resource_size_t b_res_3_size = pci_cardbus_mem_size * 2;
 	u16 ctrl;
 
 	/*
 	 * Reserve some resources for CardBus.  We reserve
 	 * a fixed amount of bus space for CardBus bridges.
 	 */
-	b_res[0].start = 0;
-	b_res[0].flags |= IORESOURCE_IO | IORESOURCE_SIZEALIGN;
-	if (realloc_head)
-		add_to_list(realloc_head, bridge, b_res, pci_cardbus_io_size, 0 /* dont care */);
-
-	b_res[1].start = 0;
-	b_res[1].flags |= IORESOURCE_IO | IORESOURCE_SIZEALIGN;
-	if (realloc_head)
-		add_to_list(realloc_head, bridge, b_res+1, pci_cardbus_io_size, 0 /* dont care */);
+	b_res[0].start = pci_cardbus_io_size;
+	b_res[0].end = b_res[0].start + pci_cardbus_io_size - 1;
+	b_res[0].flags |= IORESOURCE_IO | IORESOURCE_STARTALIGN;
+	if (realloc_head) {
+		b_res[0].end -= pci_cardbus_io_size;
+		add_to_list(realloc_head, bridge, b_res, pci_cardbus_io_size,
+				pci_cardbus_io_size);
+	}
+
+	b_res[1].start = pci_cardbus_io_size;
+	b_res[1].end = b_res[1].start + pci_cardbus_io_size - 1;
+	b_res[1].flags |= IORESOURCE_IO | IORESOURCE_STARTALIGN;
+	if (realloc_head) {
+		b_res[1].end -= pci_cardbus_io_size;
+		add_to_list(realloc_head, bridge, b_res+1, pci_cardbus_io_size,
+				 pci_cardbus_io_size);
+	}
 
 	/* MEM1 must not be pref mmio */
 	pci_read_config_word(bridge, PCI_CB_BRIDGE_CONTROL, &ctrl);
@@ -932,28 +941,28 @@ static void pci_bus_size_cardbus(struct
 	 * twice the size.
 	 */
 	if (ctrl & PCI_CB_BRIDGE_CTL_PREFETCH_MEM0) {
-		b_res[2].start = 0;
-		b_res[2].flags |= IORESOURCE_MEM | IORESOURCE_PREFETCH | IORESOURCE_SIZEALIGN;
-		if (realloc_head)
-			add_to_list(realloc_head, bridge, b_res+2, pci_cardbus_mem_size, 0 /* dont care */);
-
-		b_res[3].start = 0;
-		b_res[3].flags |= IORESOURCE_MEM | IORESOURCE_SIZEALIGN;
-		if (realloc_head)
-			add_to_list(realloc_head, bridge, b_res+3, pci_cardbus_mem_size, 0 /* dont care */);
-	} else {
-		b_res[3].start = 0;
-		b_res[3].flags |= IORESOURCE_MEM | IORESOURCE_SIZEALIGN;
-		if (realloc_head)
-			add_to_list(realloc_head, bridge, b_res+3, pci_cardbus_mem_size * 2, 0 /* dont care */);
-	}
-
-	/* set the size of the resource to zero, so that the resource does not
-	 * get assigned during required-resource allocation cycle but gets assigned
-	 * during the optional-resource allocation cycle.
- 	 */
-	b_res[0].start = b_res[1].start = b_res[2].start = b_res[3].start = 1;
-	b_res[0].end = b_res[1].end = b_res[2].end = b_res[3].end = 0;
+		b_res[2].start = pci_cardbus_mem_size;
+		b_res[2].end = b_res[2].start + pci_cardbus_mem_size - 1;
+		b_res[2].flags |= IORESOURCE_MEM | IORESOURCE_PREFETCH |
+				  IORESOURCE_STARTALIGN;
+		if (realloc_head) {
+			b_res[2].end -= pci_cardbus_mem_size;
+			add_to_list(realloc_head, bridge, b_res+2,
+				 pci_cardbus_mem_size, pci_cardbus_mem_size);
+		}
+
+		/* reduce that to half */
+		b_res_3_size = pci_cardbus_mem_size;
+	}
+
+	b_res[3].start = pci_cardbus_mem_size;
+	b_res[3].end = b_res[3].start + b_res_3_size - 1;
+	b_res[3].flags |= IORESOURCE_MEM | IORESOURCE_STARTALIGN;
+	if (realloc_head) {
+		b_res[3].end -= b_res_3_size;
+		add_to_list(realloc_head, bridge, b_res+3, b_res_3_size,
+				 pci_cardbus_mem_size);
+	}
 }
 
 void __ref __pci_bus_size_bridges(struct pci_bus *bus,

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-13 17:29                           ` Yinghai Lu
  2012-01-13 22:24                             ` Yinghai Lu
@ 2012-01-14  2:01                             ` Rogério Brito
  2012-01-14  7:09                               ` Rogério Brito
  1 sibling, 1 reply; 53+ messages in thread
From: Rogério Brito @ 2012-01-14  2:01 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai

Hi, Yinghai.

On Fri, Jan 13, 2012 at 15:29, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Jan 13, 2012 at 3:59 AM, Rogério Brito <rbrito@ime.usp.br> wrote:
>> If it applies (and newer kernels don't have things which your patch needs),
>> I will keep working with Linus's v3.2 tag.
>
> No problem, I will put that patch into for-pci2 for your conveniences.

Thanks. That's definitely easier for me. I'm grabbing it to compile
and will report in a moment.

>> When I compiled from your branch for-pci2 from linux-yinghai.git (without
>> any extra patches), it *did* boot, but noticed one thing weird: using htop
>> and in single user mode, it reported 100% of CPU in use. With regular top, a
>> more fine grained output revealed that no processes were using the CPU, but
>> that almost all of the time was spent in kernel mode.
>>
>> Anyway, as you asked, I put the log of what I see at
>>
>>    http://www.ime.usp.br/~rbrito/linux/clevo/2012-01-13-0951-yanghai-for-pci2-branch/
>
> [    0.165615] pci 0000:05:07.0: CardBus bridge to [bus 06-09]
> [    0.165682] pci 0000:05:07.0:   bridge window [io  0x0001-0x0000]
> [    0.165751] pci 0000:05:07.0:   bridge window [io  0x0001-0x0000]
> [    0.165821] pci 0000:05:07.0:   bridge window [mem
> 0x00000001-0x00000000 pref]
> [    0.165904] pci 0000:05:07.0:   bridge window [mem 0x00000001-0x00000000]
> [    0.165975] pci 0000:00:10.0: PCI bridge to [bus 05-06]
> [    0.166043] pci 0000:00:10.0:   bridge window [mem 0xb3200000-0xb32fffff]
> [    0.166259] pci 0000:00:10.0: setting latency timer to 64
> [    0.166329] pci 0000:05:07.0: device not available (can't reserve
> [io  0x0001-0x0000])
> [    0.166412] pci 0000:05:07.0: Error enabling bridge (-22), continuing
>
> We have some problems with handling cardbus bridge as optional resources.
>
> will produce some patches and update for-pci2 branch for your test.

OK. Will try that as soon as my download completes.


Thanks again,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-14  2:01                             ` Rogério Brito
@ 2012-01-14  7:09                               ` Rogério Brito
  2012-01-14 21:05                                 ` Yinghai Lu
  0 siblings, 1 reply; 53+ messages in thread
From: Rogério Brito @ 2012-01-14  7:09 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai

Dear Yinghai and other people,

On Sat, Jan 14, 2012 at 00:01, Rogério Brito <rbrito@ime.usp.br> wrote:
> On Fri, Jan 13, 2012 at 15:29, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Fri, Jan 13, 2012 at 3:59 AM, Rogério Brito <rbrito@ime.usp.br> wrote:
>>> If it applies (and newer kernels don't have things which your patch needs),
>>> I will keep working with Linus's v3.2 tag.
>>
>> No problem, I will put that patch into for-pci2 for your conveniences.
>
> Thanks. That's definitely easier for me. I'm grabbing it to compile
> and will report in a moment.

Just to clarify, my "definitely easier for me" means "I know what to
do (or, at least, I'm closer this way not doing wrong things)".

>>> When I compiled from your branch for-pci2 from linux-yinghai.git (without
>>> any extra patches), it *did* boot, but noticed one thing weird: using htop
>>> and in single user mode, it reported 100% of CPU in use. With regular top, a
>>> more fine grained output revealed that no processes were using the CPU, but
>>> that almost all of the time was spent in kernel mode.
>>>
>>> Anyway, as you asked, I put the log of what I see at
>>>
>>>    http://www.ime.usp.br/~rbrito/linux/clevo/2012-01-13-0951-yanghai-for-pci2-branch/
>>
>> [    0.165615] pci 0000:05:07.0: CardBus bridge to [bus 06-09]
>> [    0.165682] pci 0000:05:07.0:   bridge window [io  0x0001-0x0000]
>> [    0.165751] pci 0000:05:07.0:   bridge window [io  0x0001-0x0000]
>> [    0.165821] pci 0000:05:07.0:   bridge window [mem
>> 0x00000001-0x00000000 pref]
>> [    0.165904] pci 0000:05:07.0:   bridge window [mem 0x00000001-0x00000000]
>> [    0.165975] pci 0000:00:10.0: PCI bridge to [bus 05-06]
>> [    0.166043] pci 0000:00:10.0:   bridge window [mem 0xb3200000-0xb32fffff]
>> [    0.166259] pci 0000:00:10.0: setting latency timer to 64
>> [    0.166329] pci 0000:05:07.0: device not available (can't reserve
>> [io  0x0001-0x0000])
>> [    0.166412] pci 0000:05:07.0: Error enabling bridge (-22), continuing
>>
>> We have some problems with handling cardbus bridge as optional resources.
>>
>> will produce some patches and update for-pci2 branch for your test.
>
> OK. Will try that as soon as my download completes.

Just for the record, I have grabbed your for-pci2 branch with HEAD
being 5db4211 and this is what I get:

    http://www.ime.usp.br/~rbrito/linux/clevo/boot-yinghai-for-pci2-g5db4211.m4v

A warning, though is that there were some options regarding PCI buses
which I did not know exactly how to answer. I can post my config here
if you are interested in that. Well, actually, I can give you ssh
access to this notebook if you want to hack on it. Just let me know
what times you (or anyone else) are online and I will try to arrange
everything.

Any guidance is welcome.


Thanks,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-14  7:09                               ` Rogério Brito
@ 2012-01-14 21:05                                 ` Yinghai Lu
  2012-01-16 23:08                                   ` Bjorn Helgaas
  2012-01-19  3:50                                   ` Rogério Brito
  0 siblings, 2 replies; 53+ messages in thread
From: Yinghai Lu @ 2012-01-14 21:05 UTC (permalink / raw)
  To: Rogério Brito
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai

On Fri, Jan 13, 2012 at 11:09 PM, Rogério Brito <rbrito@ime.usp.br> wrote:
> Dear Yinghai and other people,
>
> On Sat, Jan 14, 2012 at 00:01, Rogério Brito <rbrito@ime.usp.br> wrote:
>> On Fri, Jan 13, 2012 at 15:29, Yinghai Lu <yinghai@kernel.org> wrote:
>>> On Fri, Jan 13, 2012 at 3:59 AM, Rogério Brito <rbrito@ime.usp.br> wrote:
>>>> If it applies (and newer kernels don't have things which your patch needs),
>>>> I will keep working with Linus's v3.2 tag.
>>>
>>> No problem, I will put that patch into for-pci2 for your conveniences.
>>
>> Thanks. That's definitely easier for me. I'm grabbing it to compile
>> and will report in a moment.
>
> Just to clarify, my "definitely easier for me" means "I know what to
> do (or, at least, I'm closer this way not doing wrong things)".
>
>>>> When I compiled from your branch for-pci2 from linux-yinghai.git (without
>>>> any extra patches), it *did* boot, but noticed one thing weird: using htop
>>>> and in single user mode, it reported 100% of CPU in use. With regular top, a
>>>> more fine grained output revealed that no processes were using the CPU, but
>>>> that almost all of the time was spent in kernel mode.
>>>>
>>>> Anyway, as you asked, I put the log of what I see at
>>>>
>>>>    http://www.ime.usp.br/~rbrito/linux/clevo/2012-01-13-0951-yanghai-for-pci2-branch/
>>>
>>> [    0.165615] pci 0000:05:07.0: CardBus bridge to [bus 06-09]
>>> [    0.165682] pci 0000:05:07.0:   bridge window [io  0x0001-0x0000]
>>> [    0.165751] pci 0000:05:07.0:   bridge window [io  0x0001-0x0000]
>>> [    0.165821] pci 0000:05:07.0:   bridge window [mem
>>> 0x00000001-0x00000000 pref]
>>> [    0.165904] pci 0000:05:07.0:   bridge window [mem 0x00000001-0x00000000]
>>> [    0.165975] pci 0000:00:10.0: PCI bridge to [bus 05-06]
>>> [    0.166043] pci 0000:00:10.0:   bridge window [mem 0xb3200000-0xb32fffff]
>>> [    0.166259] pci 0000:00:10.0: setting latency timer to 64
>>> [    0.166329] pci 0000:05:07.0: device not available (can't reserve
>>> [io  0x0001-0x0000])
>>> [    0.166412] pci 0000:05:07.0: Error enabling bridge (-22), continuing
>>>
>>> We have some problems with handling cardbus bridge as optional resources.
>>>
>>> will produce some patches and update for-pci2 branch for your test.
>>
>> OK. Will try that as soon as my download completes.
>
> Just for the record, I have grabbed your for-pci2 branch with HEAD
> being 5db4211 and this is what I get:
>
>    http://www.ime.usp.br/~rbrito/linux/clevo/boot-yinghai-for-pci2-g5db4211.m4v
>
> A warning, though is that there were some options regarding PCI buses
> which I did not know exactly how to answer. I can post my config here
> if you are interested in that. Well, actually, I can give you ssh
> access to this notebook if you want to hack on it. Just let me know
> what times you (or anyone else) are online and I will try to arrange
> everything.
>

Please enable CONFIG_BOOT_PRINTK_DELAY in your .config

and boot with "boot_delay=1000" or even "boot_delay=5000"

then video capture could help.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-14 21:05                                 ` Yinghai Lu
@ 2012-01-16 23:08                                   ` Bjorn Helgaas
  2012-01-19  3:50                                   ` Rogério Brito
  1 sibling, 0 replies; 53+ messages in thread
From: Bjorn Helgaas @ 2012-01-16 23:08 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Rogério Brito, Linus Torvalds, Jesse Barnes,
	Ivan Kokshaysky, Edward Donovan, Thomas Gleixner, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai

Hi Rogério,

1) Your M5X0JE is horribly broken, but I couldn't find any similar
reports on the web, so I wonder if there's a BIOS setting or something
that's unique to your system.  Can you (a) capture the current BIOS
settings (photos of SETUP screens or something) and (b) reset to
factory defaults and see if anything changes?  If things work better
with the defaults, knowing what the difference is might help us fix
Linux.

2) Can you use a Windows tool like AIDA64 (free trial version at
http://www.aida64.com/downloads) to collect information about how
Windows configures the box?  Last time I used it, there was a way to
save a full configuration report.  Maybe there's a clue in the
differences between Windows and Linux.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-14 21:05                                 ` Yinghai Lu
  2012-01-16 23:08                                   ` Bjorn Helgaas
@ 2012-01-19  3:50                                   ` Rogério Brito
  2012-01-19  5:06                                     ` Yinghai Lu
  1 sibling, 1 reply; 53+ messages in thread
From: Rogério Brito @ 2012-01-19  3:50 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai, rui.zhang

Hi, people.

Sorry about the delay, but I have gathered a bunch of information
about the machine.

For the record, I tried to put some order on the subdirectory at:

    http://www.ime.usp.br/~rbrito/linux/clevo/

On Sat, Jan 14, 2012 at 19:05, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Jan 13, 2012 at 11:09 PM, Rogério Brito <rbrito@ime.usp.br> wrote:
>> A warning, though is that there were some options regarding PCI buses
>> which I did not know exactly how to answer. I can post my config here
>> if you are interested in that. Well, actually, I can give you ssh
>> access to this notebook if you want to hack on it. Just let me know
>> what times you (or anyone else) are online and I will try to arrange
>> everything.

Regarding these options, Yinghai, the options which I enabled once I
typed "make oldconfig" were:

CONFIG_PCI_PRI
CONFIG_PCI_PASID

> Please enable CONFIG_BOOT_PRINTK_DELAY in your .config
>
> and boot with "boot_delay=1000" or even "boot_delay=5000"
>
> then video capture could help.

I did just that and uploaded a video from Linus's git tree with the
options above to:

    http://youtu.be/_dYhkWHfep0

It took a long time to boot (about 7 minutes), but all details are
there, I hope.  I can repeat the same steps for your for-pci2 branch,
if you want me to.

Bjorn wrote:
> 1) Your M5X0JE is horribly broken, but I couldn't find any similar
> reports on the web, so I wonder if there's a BIOS setting or something
> that's unique to your system.

After searching, and searching, and searching, I could find this
thread (in Portuguese) of some people with this computer, all of them
having problems with multiple distributions (which is not a surprise,
as the kernel has problems with these machines):

    http://www.forumdebian.com.br/topico-notebook-amazon-pc-amz-a-101-resolvido

Just for the record, this Clevo was rebranded by a company called
Amazon PC (the model is A101). Amazon PC has disappeared from Earth.

> Can you (a) capture the current BIOS
> settings (photos of SETUP screens or something) and (b) reset to
> factory defaults and see if anything changes?  If things work better
> with the defaults, knowing what the difference is might help us fix
> Linux.

I took pictures of the few setup screens, and I reset everything to
factory defaults, but still no luck. You can see the photos at:

    http://www.ime.usp.br/~rbrito/linux/clevo/photos/

> 2) Can you use a Windows tool like AIDA64 (free trial version at
> http://www.aida64.com/downloads) to collect information about how
> Windows configures the box?

Sure, I downloaded it an generated some reports. Two of them with this
tool that you mentioned and one with lshw. They are at:

    http://www.ime.usp.br/~rbrito/linux/clevo/hardware-description/

--- Comment #16 from Zhang Rui <rui.zhang@intel.com>  2012-01-18 05:39:52 ---
hmm, what's the status of this bug?
does the problem still exist in the latest upstream kernel?

Yes, Zhang. Both bugs:

    https://bugzilla.kernel.org/show_bug.cgi?id=41622
    https://bugzilla.kernel.org/show_bug.cgi?id=41722

are still reproducible with the latest kernels. Please, don't close them.

OK. I'm going to bed right now, but I will be happy to provide any
further information or test any patches/trees.


Thanks for all the help,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-19  3:50                                   ` Rogério Brito
@ 2012-01-19  5:06                                     ` Yinghai Lu
  2012-01-19 13:48                                       ` Rogério Brito
  0 siblings, 1 reply; 53+ messages in thread
From: Yinghai Lu @ 2012-01-19  5:06 UTC (permalink / raw)
  To: Rogério Brito
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai, rui.zhang

2012/1/18 Rogério Brito <rbrito@ime.usp.br>:
>
> Regarding these options, Yinghai, the options which I enabled once I
> typed "make oldconfig" were:
>
> CONFIG_PCI_PRI
> CONFIG_PCI_PASID

should be not related.

>
>> Please enable CONFIG_BOOT_PRINTK_DELAY in your .config
>>
>> and boot with "boot_delay=1000" or even "boot_delay=5000"
>>
>> then video capture could help.
>
> I did just that and uploaded a video from Linus's git tree with the
> options above to:
>
>    http://youtu.be/_dYhkWHfep0
>
> It took a long time to boot (about 7 minutes), but all details are
> there, I hope.  I can repeat the same steps for your for-pci2 branch,
> if you want me to.

yes, please esp need patch.

http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=commitdiff;h=e3a7a7d4ea1b6a2be82ef454096aff7856c14de5

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-19  5:06                                     ` Yinghai Lu
@ 2012-01-19 13:48                                       ` Rogério Brito
  2012-01-19 16:12                                         ` Yinghai Lu
  0 siblings, 1 reply; 53+ messages in thread
From: Rogério Brito @ 2012-01-19 13:48 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai, rui.zhang

Hi, Yinghai.

On Thu, Jan 19, 2012 at 03:06, Yinghai Lu <yinghai@kernel.org> wrote:
> 2012/1/18 Rogério Brito <rbrito@ime.usp.br>:
>> Regarding these options, Yinghai, the options which I enabled once I
>> typed "make oldconfig" were:
>>
>> CONFIG_PCI_PRI
>> CONFIG_PCI_PASID
>
> should be not related.

Thanks. It would be good if, once all these problems were fixed, if
the help of those options were fleshed out a little bit for lusers
(e.g., me). Heck, I can even try to contribute some patches to
document some options, once I understand them.

>> I did just that and uploaded a video from Linus's git tree with the
>> options above to:
>>
>>    http://youtu.be/_dYhkWHfep0
>>
>> It took a long time to boot (about 7 minutes), but all details are
>> there, I hope.  I can repeat the same steps for your for-pci2 branch,
>> if you want me to.
>
> yes, please esp need patch.
>
> http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=commitdiff;h=e3a7a7d4ea1b6a2be82ef454096aff7856c14de5

With CONFIG_BOOT_PRINTK_DELAY enabled and with "boot_delay=500" on the
command line, I checked out the for-pci2 branch of your tree and this
is what I get:

    http://youtu.be/bWEme7iFD7g

If you want me to, say, recompile Linus's vanilla tree with only a few
selected patches, please let me know. I will now try to test some
other combinations of the options passed to the kernel.


Regards,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-19 13:48                                       ` Rogério Brito
@ 2012-01-19 16:12                                         ` Yinghai Lu
  2012-01-19 16:15                                           ` Rogério Brito
  0 siblings, 1 reply; 53+ messages in thread
From: Yinghai Lu @ 2012-01-19 16:12 UTC (permalink / raw)
  To: Rogério Brito
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai, rui.zhang

2012/1/19 Rogério Brito <rbrito@ime.usp.br>:
>> yes, please esp need patch.
>>
>> http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=commitdiff;h=e3a7a7d4ea1b6a2be82ef454096aff7856c14de5
>
> With CONFIG_BOOT_PRINTK_DELAY enabled and with "boot_delay=500" on the
> command line, I checked out the for-pci2 branch of your tree and this
> is what I get:
>
>    http://youtu.be/bWEme7iFD7g
>
> If you want me to, say, recompile Linus's vanilla tree with only a few
> selected patches, please let me know. I will now try to test some
> other combinations of the options passed to the kernel.

resource allocating for transparent bridge resizing works well and as expected.

looks like quirk for usb hand off cause problem.

can you try another branch in my tree?

git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
usb_smi_disable_early

it will do usb hand off very early.

you can do:

mkdir linux-2.6 || exit -1
cd linux-2.6

git init-db
# Add Linus's tree as a remote
git remote add linus
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

git remote add yinghai
git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git

git remote update

later will only do:

git remote update

git checkout -b rb_2012_01_19a linus/master
git merge yinghai/usb_smi_disable_early


Thanks

Yinghai

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-19 16:12                                         ` Yinghai Lu
@ 2012-01-19 16:15                                           ` Rogério Brito
  2012-01-19 17:20                                             ` Rogério Brito
  0 siblings, 1 reply; 53+ messages in thread
From: Rogério Brito @ 2012-01-19 16:15 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai, rui.zhang

Hi, all.

On Thu, Jan 19, 2012 at 14:12, Yinghai Lu <yinghai@kernel.org> wrote:
> 2012/1/19 Rogério Brito <rbrito@ime.usp.br>:
>> If you want me to, say, recompile Linus's vanilla tree with only a few
>> selected patches, please let me know. I will now try to test some
>> other combinations of the options passed to the kernel.
>
> resource allocating for transparent bridge resizing works well and as expected.

OK.

> looks like quirk for usb hand off cause problem.
>
> can you try another branch in my tree?
>
> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
> usb_smi_disable_early

Sure, will do that right now.

> it will do usb hand off very early.
>
> you can do:
>
> mkdir linux-2.6 || exit -1
> cd linux-2.6
>
> git init-db
> # Add Linus's tree as a remote
> git remote add linus
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
>
> git remote add yinghai
> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
>
> git remote update
>
> later will only do:
>
> git remote update
>
> git checkout -b rb_2012_01_19a linus/master
> git merge yinghai/usb_smi_disable_early

Will report ASAP. Thanks.


Regards,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-19 16:15                                           ` Rogério Brito
@ 2012-01-19 17:20                                             ` Rogério Brito
  2012-01-19 19:48                                               ` Yinghai Lu
  0 siblings, 1 reply; 53+ messages in thread
From: Rogério Brito @ 2012-01-19 17:20 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai, rui.zhang

Hi, all.

On Thu, Jan 19, 2012 at 14:15, Rogério Brito <rbrito@ime.usp.br> wrote:
> On Thu, Jan 19, 2012 at 14:12, Yinghai Lu <yinghai@kernel.org> wrote:
>> looks like quirk for usb hand off cause problem.
>>
>> can you try another branch in my tree?
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
>> usb_smi_disable_early

OK, now it seems that we are getting farther. The system boots, even
mounts the root filesystem, udev starts and, then, when it seems that
some USB messages are going to appear on the screen, the notebook
crashes with the same distortion that happened before (with some of
the last messages appearing being related to OHCI).

If needed, I can try to record what I see and upload to youtube. Do
you want me to record what I see?

So, even though this is a partial progress, I am happier.


Thanks,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-19 17:20                                             ` Rogério Brito
@ 2012-01-19 19:48                                               ` Yinghai Lu
  2012-01-21  9:26                                                 ` Ram Pai
  2012-01-24 22:18                                                 ` Rogério Brito
  0 siblings, 2 replies; 53+ messages in thread
From: Yinghai Lu @ 2012-01-19 19:48 UTC (permalink / raw)
  To: Rogério Brito
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai, rui.zhang

2012/1/19 Rogério Brito <rbrito@ime.usp.br>:
>
> OK, now it seems that we are getting farther. The system boots, even
> mounts the root filesystem, udev starts and, then, when it seems that
> some USB messages are going to appear on the screen, the notebook
> crashes with the same distortion that happened before (with some of
> the last messages appearing being related to OHCI).

in that case, Can you try to disable OHCI in .config

CONFIG_USB_OHCI_HCD

OHCI controller 00:0b.0 is using
Memory at b0004000 (32-bit, non-prefetchable) [size=4K]

VGA controller 00:05.0
	Memory at b2000000 (32-bit, non-prefetchable) [size=16M]
	Memory at c0000000 (64-bit, prefetchable) [size=256M]
	Memory at b1000000 (64-bit, non-prefetchable) [size=16M]
	[virtual] Expansion ROM at 80000000 [disabled] [size=128K]

it seems not related. or could be ioremap etc have problem?

So can you try to boot with nopat?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-19 19:48                                               ` Yinghai Lu
@ 2012-01-21  9:26                                                 ` Ram Pai
  2012-01-21 10:35                                                   ` Yinghai Lu
  2012-01-24 22:18                                                 ` Rogério Brito
  1 sibling, 1 reply; 53+ messages in thread
From: Ram Pai @ 2012-01-21  9:26 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Rogério Brito, Linus Torvalds, Jesse Barnes,
	Ivan Kokshaysky, Edward Donovan, Thomas Gleixner, Bjorn Helgaas,
	linux-kernel, Márcia Coutinho de Brito, Ram Pai, rui.zhang

On Thu, Jan 19, 2012 at 11:48:29AM -0800, Yinghai Lu wrote:
> 2012/1/19 Rogério Brito <rbrito@ime.usp.br>:
> >
> > OK, now it seems that we are getting farther. The system boots, even
> > mounts the root filesystem, udev starts and, then, when it seems that
> > some USB messages are going to appear on the screen, the notebook
> > crashes with the same distortion that happened before (with some of
> > the last messages appearing being related to OHCI).
> 
> in that case, Can you try to disable OHCI in .config
> 
> CONFIG_USB_OHCI_HCD
> 
> OHCI controller 00:0b.0 is using
> Memory at b0004000 (32-bit, non-prefetchable) [size=4K]
> 
> VGA controller 00:05.0
> 	Memory at b2000000 (32-bit, non-prefetchable) [size=16M]
> 	Memory at c0000000 (64-bit, prefetchable) [size=256M]
> 	Memory at b1000000 (64-bit, non-prefetchable) [size=16M]
> 	[virtual] Expansion ROM at 80000000 [disabled] [size=128K]
> 
> it seems not related. or could be ioremap etc have problem?
> 
> So can you try to boot with nopat?

Yinghai/Bjorn,

	After carefully examination of the allocations done by the vanilla upstream
kernel; the one without the 'skip transparent bridge while sizing' code, 
I find that the allocation to one of the Bar; BAR[16], of the CardBus Bridge at 05:07.0
does not overlap that of the parent transparent bridge at 00:10.0

	Here is what I see in the video at  
	http://www.youtube.com/watch?v=_dYhkWHfep0&feature=youtu.be

[268.002554] pci 0000:05:07.0: Cardbus bridge to  [bus 06-09]
[269.002582] pci 0000:05:07.0:  bridge window [io  0x1400-0x14ff]
[270.009090] pci 0000:05:07.0:  bridge window [io  0x1000-0x10ff]
[271.012358] pci 0000:05:07.0:  bridge window [mem  0x84000000-0x87ffffff pref]
[272.015626] pci 0000:05:07.0:  bridge window [mem  0x88000000-0x8bffffff]
		                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ offending allocation

[273.018894] pci 0000:00:10.0: PCI bridge to [bus  05-06]
[274.022162] pci 0000:00:10.0:  bridge window [io  0x1000-0x1fff]
[275.025430] pci 0000:00:10.0:  bridge window [mem  0xb3200000-0xb32fffff]
[276.028698] pci 0000:00:10.0:  bridge window [mem  0xb8400000-0x87ffffff pref]


To me, the issue seems to be that due to some reason the OS either fails to Or does not try to
reallocate the addresses assigned to the the non-pref mem BAR of the  the transparent bridge
at 00:10.0.

In any case since you have deeper knowledge in this area, I will let you verify my theory.
RP

> 
> Thanks
> 
> Yinghai


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-21  9:26                                                 ` Ram Pai
@ 2012-01-21 10:35                                                   ` Yinghai Lu
  0 siblings, 0 replies; 53+ messages in thread
From: Yinghai Lu @ 2012-01-21 10:35 UTC (permalink / raw)
  To: Ram Pai
  Cc: Rogério Brito, Linus Torvalds, Jesse Barnes,
	Ivan Kokshaysky, Edward Donovan, Thomas Gleixner, Bjorn Helgaas,
	linux-kernel, Márcia Coutinho de Brito, rui.zhang

2012/1/21 Ram Pai <linuxram@us.ibm.com>:
> On Thu, Jan 19, 2012 at 11:48:29AM -0800, Yinghai Lu wrote:
>> 2012/1/19 Rogério Brito <rbrito@ime.usp.br>:
>> >
>> > OK, now it seems that we are getting farther. The system boots, even
>> > mounts the root filesystem, udev starts and, then, when it seems that
>> > some USB messages are going to appear on the screen, the notebook
>> > crashes with the same distortion that happened before (with some of
>> > the last messages appearing being related to OHCI).
>>
>> in that case, Can you try to disable OHCI in .config
>>
>> CONFIG_USB_OHCI_HCD
>>
>> OHCI controller 00:0b.0 is using
>> Memory at b0004000 (32-bit, non-prefetchable) [size=4K]
>>
>> VGA controller 00:05.0
>>       Memory at b2000000 (32-bit, non-prefetchable) [size=16M]
>>       Memory at c0000000 (64-bit, prefetchable) [size=256M]
>>       Memory at b1000000 (64-bit, non-prefetchable) [size=16M]
>>       [virtual] Expansion ROM at 80000000 [disabled] [size=128K]
>>
>> it seems not related. or could be ioremap etc have problem?
>>
>> So can you try to boot with nopat?
>
> Yinghai/Bjorn,
>
>        After carefully examination of the allocations done by the vanilla upstream
> kernel; the one without the 'skip transparent bridge while sizing' code,
> I find that the allocation to one of the Bar; BAR[16], of the CardBus Bridge at 05:07.0
> does not overlap that of the parent transparent bridge at 00:10.0
>
>        Here is what I see in the video at
>        http://www.youtube.com/watch?v=_dYhkWHfep0&feature=youtu.be
>
> [268.002554] pci 0000:05:07.0: Cardbus bridge to  [bus 06-09]
> [269.002582] pci 0000:05:07.0:  bridge window [io  0x1400-0x14ff]
> [270.009090] pci 0000:05:07.0:  bridge window [io  0x1000-0x10ff]
> [271.012358] pci 0000:05:07.0:  bridge window [mem  0x84000000-0x87ffffff pref]
> [272.015626] pci 0000:05:07.0:  bridge window [mem  0x88000000-0x8bffffff]
>                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ offending allocation
>
> [273.018894] pci 0000:00:10.0: PCI bridge to [bus  05-06]
> [274.022162] pci 0000:00:10.0:  bridge window [io  0x1000-0x1fff]
> [275.025430] pci 0000:00:10.0:  bridge window [mem  0xb3200000-0xb32fffff]
> [276.028698] pci 0000:00:10.0:  bridge window [mem  0xb8400000-0x87ffffff pref]
>
>
> To me, the issue seems to be that due to some reason the OS either fails to Or does not try to
> reallocate the addresses assigned to the the non-pref mem BAR of the  the transparent bridge
> at 00:10.0.

the allocation looks right. and  even with transparent bus.

assume that skip resizing transparent bridge, just make some
allocation different, then happen to not trigger the overwriting to
VGA controller.

anyway, let's wait for test about not using OHCI driver.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-19 19:48                                               ` Yinghai Lu
  2012-01-21  9:26                                                 ` Ram Pai
@ 2012-01-24 22:18                                                 ` Rogério Brito
  2012-01-24 22:53                                                   ` Bjorn Helgaas
  2012-01-25  0:19                                                   ` Yinghai Lu
  1 sibling, 2 replies; 53+ messages in thread
From: Rogério Brito @ 2012-01-24 22:18 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai, rui.zhang

Dear Yinghai, Bjorn, Ram, Linus and other people,

On Thu, Jan 19, 2012 at 17:48, Yinghai Lu <yinghai@kernel.org> wrote:
> 2012/1/19 Rogério Brito <rbrito@ime.usp.br>:
>>
>> OK, now it seems that we are getting farther. The system boots, even
>> mounts the root filesystem, udev starts and, then, when it seems that
>> some USB messages are going to appear on the screen, the notebook
>> crashes with the same distortion that happened before (with some of
>> the last messages appearing being related to OHCI).
>
> in that case, Can you try to disable OHCI in .config
>
> CONFIG_USB_OHCI_HCD
>
> OHCI controller 00:0b.0 is using
> Memory at b0004000 (32-bit, non-prefetchable) [size=4K]
>
> VGA controller 00:05.0
>        Memory at b2000000 (32-bit, non-prefetchable) [size=16M]
>        Memory at c0000000 (64-bit, prefetchable) [size=256M]
>        Memory at b1000000 (64-bit, non-prefetchable) [size=16M]
>        [virtual] Expansion ROM at 80000000 [disabled] [size=128K]
>
> it seems not related. or could be ioremap etc have problem?
>
> So can you try to boot with nopat?

I have experimented with Linux 3.3.0-rc1-00080-g16111ea, which is a merge of:

* yinghai's usb_smi_disable_early branch with HEAD = a3a6c096
* linus's master branch with HEAD = 4a7cbb56

I have booted the kernel as above, configured, as Yinghai asked, with
OHCI disabled, but no other changes from the previous times.  I
registered in video two boots with only a slight change in options
passed to the kernel, with:

1 - http://youtu.be/fKYubKaNhuc being with options
    `acpi=off pnpbios=off noapic debug ignore_loglevel pci=earlydump
boot_delay=10`

2 - http://youtu.be/Gll9RLfXS_c being with options
    `acpi=off pnpbios=off noapic debug ignore_loglevel pci=earlydump
nopat boot_delay=10`

The only change is the `nopat` option, as suggested by Yinghai.

As you can see in the two videos, some things seem to have progressed:
the kernel actually gets to boot, gets to mount/load the initramfs,
passes things to init and, when it is loading things, I get a hang
with a messed screen, as before.

Among other things, one thing took my attention, which were some messages like:

    ACPI Exception: AE_BAD_PARAMETER, Thread 9340 could not acquire
Mutex [0x1] (20120111/utmutex-276)

which seems strange, particularly as I passed the option `acpi=off` to
my kernel.

I don't know if that's a red herring or not. I can perform other
tests, of course.


Thanks as usual,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-24 22:18                                                 ` Rogério Brito
@ 2012-01-24 22:53                                                   ` Bjorn Helgaas
  2012-01-25  0:19                                                   ` Yinghai Lu
  1 sibling, 0 replies; 53+ messages in thread
From: Bjorn Helgaas @ 2012-01-24 22:53 UTC (permalink / raw)
  To: Rogério Brito
  Cc: Yinghai Lu, Linus Torvalds, Jesse Barnes, Ivan Kokshaysky,
	Edward Donovan, Thomas Gleixner, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai, rui.zhang

On Tue, Jan 24, 2012 at 3:18 PM, Rogério Brito <rbrito@ime.usp.br> wrote:
> The only change is the `nopat` option, as suggested by Yinghai.

I like this idea.  Last August, when we first looked at the hang when
enabling ACPI, Len Brown suggested that the outb() that does the
transition triggers SMM, and the hang is likely in SMM (which is
basically invisible and undebuggable from Linux).  And he said "we've
had problems in the past with SMM relying on something that Windows
just happened to do -- such as different MTRRs."

I couldn't see any MTRR info in your video (probably goes by too fast
as it's very early).  You might be able to get that info by adding
another call to print_mtrr_state() later in boot.  Then it would be
interesting to try to get the corresponding info from Windows.

>    ACPI Exception: AE_BAD_PARAMETER, Thread 9340 could not acquire
> Mutex [0x1] (20120111/utmutex-276)
>
> which seems strange, particularly as I passed the option `acpi=off` to
> my kernel.

Definitely odd.  I don't think that should happen.  I see a couple
places where we might be able to get there even with ACPI disabled,
e.g., eeepc_wmi_check_atkd() -> acpi_get_devices() ->
acpi_ut_acquire_mutex().  You could find out by adding a dump_stack()
where that error is printed.  It'd be nice to fix it, but I don't
think it's related to the real problems on your box.

Bjorn

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-24 22:18                                                 ` Rogério Brito
  2012-01-24 22:53                                                   ` Bjorn Helgaas
@ 2012-01-25  0:19                                                   ` Yinghai Lu
  2012-01-25  0:34                                                     ` Linus Torvalds
  2012-01-25  0:57                                                     ` Yinghai Lu
  1 sibling, 2 replies; 53+ messages in thread
From: Yinghai Lu @ 2012-01-25  0:19 UTC (permalink / raw)
  To: Rogério Brito
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai, rui.zhang

[-- Attachment #1: Type: text/plain, Size: 1590 bytes --]

2012/1/24 Rogério Brito <rbrito@ime.usp.br>:
>> it seems not related. or could be ioremap etc have problem?
>>
>> So can you try to boot with nopat?
>
> I have experimented with Linux 3.3.0-rc1-00080-g16111ea, which is a merge of:
>
> * yinghai's usb_smi_disable_early branch with HEAD = a3a6c096
> * linus's master branch with HEAD = 4a7cbb56
>
> I have booted the kernel as above, configured, as Yinghai asked, with
> OHCI disabled, but no other changes from the previous times.  I
> registered in video two boots with only a slight change in options
> passed to the kernel, with:
>
> 1 - http://youtu.be/fKYubKaNhuc being with options
>    `acpi=off pnpbios=off noapic debug ignore_loglevel pci=earlydump
> boot_delay=10`
>
> 2 - http://youtu.be/Gll9RLfXS_c being with options
>    `acpi=off pnpbios=off noapic debug ignore_loglevel pci=earlydump
> nopat boot_delay=10`
>
> The only change is the `nopat` option, as suggested by Yinghai.
>
> As you can see in the two videos, some things seem to have progressed:
> the kernel actually gets to boot, gets to mount/load the initramfs,
> passes things to init and, when it is loading things, I get a hang
> with a messed screen, as before.

Your bios does not have RAM page aligned...

[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009dc00 (usable)
[    0.000000]  BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved)

assume that partial page is used SMI. let's leave the whole page alone for SMI

Can you please apply attached patch about e820 ram page alignement?

Thanks

Yinghai

[-- Attachment #2: e820_ram_aligned.patch --]
[-- Type: text/x-patch, Size: 2030 bytes --]

Subject: [PATCH] x86: Align e820 ram range to page

Some bios provided e820 ram entries are not page aligned.

To make later processing simple. We could round up those range to
to page aligned.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/kernel/e820.c |   44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -880,6 +880,47 @@ static int __init parse_memmap_opt(char
 }
 early_param("memmap", parse_memmap_opt);
 
+static void __init e820_align_ram_page(void)
+{
+	int i;
+	bool changed = false;;
+
+	for (i = 0; i < e820.nr_map; i++) {
+		struct e820entry *entry = &e820.map[i];
+		u64 start, end;
+		u64 start_aligned, end_aligned;
+
+		if (entry->type != E820_RAM)
+			continue;
+
+		start = entry->addr;
+		end = start + entry->size;
+
+		start_aligned = round_up(start, PAGE_SIZE);
+		end_aligned = round_down(end, PAGE_SIZE);
+
+		if (end_aligned <= start_aligned) {
+			e820_update_range(start, end - start, E820_RAM, E820_RESERVED);
+			changed = true;
+			continue;
+		}
+		if (start < start_aligned) {
+			e820_update_range(start, start_aligned - start, E820_RAM, E820_RESERVED);
+			changed = true;
+		}
+		if (end_aligned < end) {
+			e820_update_range(end_aligned, end - end_aligned, E820_RAM, E820_RESERVED);
+			changed = true;
+		}
+	}
+
+	if (changed) {
+		sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
+		printk(KERN_INFO "aligned physical RAM map:\n");
+		e820_print_map("aligned");
+	}
+}
+
 void __init finish_e820_parsing(void)
 {
 	if (userdef) {
@@ -892,6 +933,9 @@ void __init finish_e820_parsing(void)
 		printk(KERN_INFO "user-defined physical RAM map:\n");
 		e820_print_map("user");
 	}
+
+	/* In case, We have RAM entres that are not PAGE aligned */
+	e820_align_ram_page();
 }
 
 static inline const char *e820_type_to_string(int e820_type)

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-25  0:19                                                   ` Yinghai Lu
@ 2012-01-25  0:34                                                     ` Linus Torvalds
  2012-01-25  0:57                                                     ` Yinghai Lu
  1 sibling, 0 replies; 53+ messages in thread
From: Linus Torvalds @ 2012-01-25  0:34 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Rogério Brito, Jesse Barnes, Ivan Kokshaysky,
	Edward Donovan, Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai, rui.zhang

2012/1/24 Yinghai Lu <yinghai@kernel.org>:
>
> Your bios does not have RAM page aligned...

That's pretty common. BIOS tables etc are seldom page-aligned, and
afaik we handle it all correctly. Look for PFN_UP(start) and
PFN_DOWN(start+length) patterns in both e820.c and memblock.c.

Sure, it is _possible_ that we have some place we screw up in, but I
think it's unlikely. As mentioned, non-page-aligned memory is quite
common, especially in the low BIOS region just under the 640kB mark.

                     Linus

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-25  0:19                                                   ` Yinghai Lu
  2012-01-25  0:34                                                     ` Linus Torvalds
@ 2012-01-25  0:57                                                     ` Yinghai Lu
  2012-01-25  1:55                                                       ` Rogério Brito
  1 sibling, 1 reply; 53+ messages in thread
From: Yinghai Lu @ 2012-01-25  0:57 UTC (permalink / raw)
  To: Rogério Brito
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai, rui.zhang

2012/1/24 Yinghai Lu <yinghai@kernel.org>:
> 2012/1/24 Rogério Brito <rbrito@ime.usp.br>:
>>> it seems not related. or could be ioremap etc have problem?
>>>
>>> So can you try to boot with nopat?
>>
>> I have experimented with Linux 3.3.0-rc1-00080-g16111ea, which is a merge of:
>>
>> * yinghai's usb_smi_disable_early branch with HEAD = a3a6c096
>> * linus's master branch with HEAD = 4a7cbb56
>>
>> I have booted the kernel as above, configured, as Yinghai asked, with
>> OHCI disabled, but no other changes from the previous times.  I
>> registered in video two boots with only a slight change in options
>> passed to the kernel, with:

you really need disable_cardbus_mem1_pref.patch that i sent out before.

also please try to boot with
pci=nobios

it will disable pci bios32 probing.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-25  0:57                                                     ` Yinghai Lu
@ 2012-01-25  1:55                                                       ` Rogério Brito
  2012-01-25  2:33                                                         ` Rogério Brito
  0 siblings, 1 reply; 53+ messages in thread
From: Rogério Brito @ 2012-01-25  1:55 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai, rui.zhang

Hi, people.

On Tue, Jan 24, 2012 at 22:57, Yinghai Lu <yinghai@kernel.org> wrote:
> 2012/1/24 Yinghai Lu <yinghai@kernel.org>:
>> 2012/1/24 Rogério Brito <rbrito@ime.usp.br>:
>>>> it seems not related. or could be ioremap etc have problem?
>>>>
>>>> So can you try to boot with nopat?
>>>
>>> I have experimented with Linux 3.3.0-rc1-00080-g16111ea, which is a merge of:
>>>
>>> * yinghai's usb_smi_disable_early branch with HEAD = a3a6c096
>>> * linus's master branch with HEAD = 4a7cbb56
>>>
>>> I have booted the kernel as above, configured, as Yinghai asked, with
>>> OHCI disabled, but no other changes from the previous times.  I
>>> registered in video two boots with only a slight change in options
>>> passed to the kernel, with:
>
> you really need disable_cardbus_mem1_pref.patch that i sent out before.

I thought that it was already in your git branch, but I applied it this time.

> also please try to boot with
> pci=nobios
>
> it will disable pci bios32 probing.

Did that (with both the disable_cardbus... and with e820_ram_aligned
patches applied), but I still got the same results. This was booting
with nopat and pci=nobios.

The next thing that I'm doing is recording a video of how the computer
boots when that transparent sizing bridge patch is reverted, so that
you can have that as a reference. Then, I will rip apart the ACPI
stuff from the configuration of the kernel to see if it is ACPI that
is getting in the way of something here.

I'm open to other suggestions.


Thanks,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-25  1:55                                                       ` Rogério Brito
@ 2012-01-25  2:33                                                         ` Rogério Brito
  2012-01-25  2:39                                                           ` Rogério Brito
  0 siblings, 1 reply; 53+ messages in thread
From: Rogério Brito @ 2012-01-25  2:33 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai, rui.zhang

Hi there, Yinghai, Linus, Bjorn and others.

On Tue, Jan 24, 2012 at 23:55, Rogério Brito <rbrito@ime.usp.br> wrote:
> The next thing that I'm doing is recording a video of how the computer
> boots when that transparent sizing bridge patch is reverted, so that
> you can have that as a reference. Then, I will rip apart the ACPI
> stuff from the configuration of the kernel to see if it is ACPI that
> is getting in the way of something here.

I started ripping things apart, first with the whole ACPI subsystem,
then with PM, with sound and I noticed that things always hung with
the nvidia watchdog/TCO module. So, I disabled it and the boot doesn't
hang there anymore. In fact, it goes further, and, for the first time
since I got my mother's notebook, the nouveau module was loaded.

Unfortunately, it died with a kernel panic and didn't reach the point
of X firing up.

So, there is something fishy indeed with these things.

Still open to sugestions. I will now compile a vanilla kernel from
Linus's tree gradually putting things in, but from what I already
talked with Bjorn, I think that ACPI will be a barrier (and it is sad
that so many things *don't* work with ACPI disabled).


Regards,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-25  2:33                                                         ` Rogério Brito
@ 2012-01-25  2:39                                                           ` Rogério Brito
  2012-01-25 23:58                                                             ` Rogério Brito
  0 siblings, 1 reply; 53+ messages in thread
From: Rogério Brito @ 2012-01-25  2:39 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai, rui.zhang

On Wed, Jan 25, 2012 at 00:33, Rogério Brito <rbrito@ime.usp.br> wrote:
> I started ripping things apart, first with the whole ACPI subsystem,
> then with PM, with sound and I noticed that things always hung with
> the nvidia watchdog/TCO module. So, I disabled it and the boot doesn't
> hang there anymore. In fact, it goes further, and, for the first time
> since I got my mother's notebook, the nouveau module was loaded.
>
> Unfortunately, it died with a kernel panic and didn't reach the point
> of X firing up.

Update: X *does* start and the panic occurs when GNOME is loading. I
will try next to run a kernel with the nvidia tco thing blacklisted
*but* with the transparent bridging resizing thing on to see if the
notebook is at least usable (as I suspect that nouveau may be having a
hard time with the current status of the PCI rework).


Regards,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-25  2:39                                                           ` Rogério Brito
@ 2012-01-25 23:58                                                             ` Rogério Brito
  2012-01-26  2:03                                                               ` Yinghai Lu
  0 siblings, 1 reply; 53+ messages in thread
From: Rogério Brito @ 2012-01-25 23:58 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai, rui.zhang

Another update:

On Wed, Jan 25, 2012 at 00:39, Rogério Brito <rbrito@ime.usp.br> wrote:
> On Wed, Jan 25, 2012 at 00:33, Rogério Brito <rbrito@ime.usp.br> wrote:
>> Unfortunately, it died with a kernel panic and didn't reach the point
>> of X firing up.
>
> Update: X *does* start and the panic occurs when GNOME is loading. I
> will try next to run a kernel with the nvidia tco thing blacklisted
> *but* with the transparent bridging resizing thing on to see if the
> notebook is at least usable (as I suspect that nouveau may be having a
> hard time with the current status of the PCI rework).

The kernel panic is indeed there when I compile nouveau in, *after*
some time that X is loaded.

Wild thoughts: I just saw that nouveau calls some ACPI stuff to get
EDID information and, as ACPI is broken on this notebook, could that
be a potential reason for the panic that I'm seeing? (Yes, I can take
a photo of the screen).

I'm simply curious about what knowledgeable people would think.


Regards,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-25 23:58                                                             ` Rogério Brito
@ 2012-01-26  2:03                                                               ` Yinghai Lu
  2012-01-26 11:16                                                                 ` Rogério Brito
  0 siblings, 1 reply; 53+ messages in thread
From: Yinghai Lu @ 2012-01-26  2:03 UTC (permalink / raw)
  To: Rogério Brito
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai, rui.zhang

2012/1/25 Rogério Brito <rbrito@ime.usp.br>:
>
> Wild thoughts: I just saw that nouveau calls some ACPI stuff to get
> EDID information and, as ACPI is broken on this notebook, could that
> be a potential reason for the panic that I'm seeing? (Yes, I can take
> a photo of the screen).
>

Which 2.6.x kernel works on your laptop without disabling acpi?

looks like we should make linux without disabling acpi work on your
system at first.

BTW, can you post you .config?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-26  2:03                                                               ` Yinghai Lu
@ 2012-01-26 11:16                                                                 ` Rogério Brito
  2012-01-27  3:41                                                                   ` Bjorn Helgaas
  2012-01-27  6:53                                                                   ` Yinghai Lu
  0 siblings, 2 replies; 53+ messages in thread
From: Rogério Brito @ 2012-01-26 11:16 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai, rui.zhang

On Thu, Jan 26, 2012 at 00:03, Yinghai Lu <yinghai@kernel.org> wrote:
> 2012/1/25 Rogério Brito <rbrito@ime.usp.br>:
>>
>> Wild thoughts: I just saw that nouveau calls some ACPI stuff to get
>> EDID information and, as ACPI is broken on this notebook, could that
>> be a potential reason for the panic that I'm seeing? (Yes, I can take
>> a photo of the screen).
>
> Which 2.6.x kernel works on your laptop without disabling acpi?

Unfortunately, no 2.6 kernel has worked on this laptop without
acpi=off. I also once booted FreeBSD here just to see how it would
behave and it also hang with ACPI enabled.

>From what I see on this notebook, if ACPI is not turned off, the
computer simply hangs once the ACPI is started (it gives one layman
like me the impression of entering an infinite loop). But, again, this
is just a layman observation of what the system appears to be doing.

> looks like we should make linux without disabling acpi work on your
> system at first.

I'm OK with whatever you people want me to do to get things right.

> BTW, can you post you .config?

Sure, it is here:

    http://www.ime.usp.br/~rbrito/linux/clevo/linux-configs/config-3.2.0-rc5-12270-g6fe13a6-yinghai

If you want me to change anything, please let me know and I'll try to
do my best.


Regards,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-26 11:16                                                                 ` Rogério Brito
@ 2012-01-27  3:41                                                                   ` Bjorn Helgaas
  2012-02-06 23:11                                                                     ` Bjorn Helgaas
  2012-01-27  6:53                                                                   ` Yinghai Lu
  1 sibling, 1 reply; 53+ messages in thread
From: Bjorn Helgaas @ 2012-01-27  3:41 UTC (permalink / raw)
  To: Rogério Brito
  Cc: Yinghai Lu, Linus Torvalds, Jesse Barnes, Ivan Kokshaysky,
	Edward Donovan, Thomas Gleixner, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai, rui.zhang

On Thu, Jan 26, 2012 at 4:16 AM, Rogério Brito <rbrito@ime.usp.br> wrote:
> On Thu, Jan 26, 2012 at 00:03, Yinghai Lu <yinghai@kernel.org> wrote:
>> 2012/1/25 Rogério Brito <rbrito@ime.usp.br>:
>>>
>>> Wild thoughts: I just saw that nouveau calls some ACPI stuff to get
>>> EDID information and, as ACPI is broken on this notebook, could that
>>> be a potential reason for the panic that I'm seeing? (Yes, I can take
>>> a photo of the screen).
>>
>> Which 2.6.x kernel works on your laptop without disabling acpi?
>
> Unfortunately, no 2.6 kernel has worked on this laptop without
> acpi=off. I also once booted FreeBSD here just to see how it would
> behave and it also hang with ACPI enabled.
>
> From what I see on this notebook, if ACPI is not turned off, the
> computer simply hangs once the ACPI is started (it gives one layman
> like me the impression of entering an infinite loop). But, again, this
> is just a layman observation of what the system appears to be doing.

My guess is that most of these problems are related, so if we fixed
whatever causes the ACPI hang, the other problems would probably go
away, too.

Did you figure out anything about the MTRRs?

Bjorn

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-26 11:16                                                                 ` Rogério Brito
  2012-01-27  3:41                                                                   ` Bjorn Helgaas
@ 2012-01-27  6:53                                                                   ` Yinghai Lu
  1 sibling, 0 replies; 53+ messages in thread
From: Yinghai Lu @ 2012-01-27  6:53 UTC (permalink / raw)
  To: Rogério Brito
  Cc: Linus Torvalds, Jesse Barnes, Ivan Kokshaysky, Edward Donovan,
	Thomas Gleixner, Bjorn Helgaas, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai, rui.zhang

2012/1/26 Rogério Brito <rbrito@ime.usp.br>:
>
>> BTW, can you post you .config?
>
> Sure, it is here:
>
>    http://www.ime.usp.br/~rbrito/linux/clevo/linux-configs/config-3.2.0-rc5-12270-g6fe13a6-yinghai

are you sure ?

your dmesg is 32bit

http://www.ime.usp.br/~rbrito/linux/clevo/2012-01-13-0951-yanghai-for-pci2-branch/dmesg-3.2.0-rc5+.txt

and this config is 64bit.

Yinghai

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-01-27  3:41                                                                   ` Bjorn Helgaas
@ 2012-02-06 23:11                                                                     ` Bjorn Helgaas
  2012-02-07  0:21                                                                       ` Rogério Brito
  0 siblings, 1 reply; 53+ messages in thread
From: Bjorn Helgaas @ 2012-02-06 23:11 UTC (permalink / raw)
  To: Rogério Brito
  Cc: Yinghai Lu, Linus Torvalds, Jesse Barnes, Ivan Kokshaysky,
	Edward Donovan, Thomas Gleixner, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai, rui.zhang

On Thu, Jan 26, 2012 at 7:41 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Thu, Jan 26, 2012 at 4:16 AM, Rogério Brito <rbrito@ime.usp.br> wrote:
>> On Thu, Jan 26, 2012 at 00:03, Yinghai Lu <yinghai@kernel.org> wrote:
>>> 2012/1/25 Rogério Brito <rbrito@ime.usp.br>:
>>>>
>>>> Wild thoughts: I just saw that nouveau calls some ACPI stuff to get
>>>> EDID information and, as ACPI is broken on this notebook, could that
>>>> be a potential reason for the panic that I'm seeing? (Yes, I can take
>>>> a photo of the screen).
>>>
>>> Which 2.6.x kernel works on your laptop without disabling acpi?
>>
>> Unfortunately, no 2.6 kernel has worked on this laptop without
>> acpi=off. I also once booted FreeBSD here just to see how it would
>> behave and it also hang with ACPI enabled.
>>
>> From what I see on this notebook, if ACPI is not turned off, the
>> computer simply hangs once the ACPI is started (it gives one layman
>> like me the impression of entering an infinite loop). But, again, this
>> is just a layman observation of what the system appears to be doing.
>
> My guess is that most of these problems are related, so if we fixed
> whatever causes the ACPI hang, the other problems would probably go
> away, too.
>
> Did you figure out anything about the MTRRs?

Ping :)  Let me know if you'd like a patch to print out the MTRR info
later in boot where it will be slow enough to capture in a video.

 Bjorn

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Bug 41722] Clevo M5X0JE hangs in ACPI init
  2012-02-06 23:11                                                                     ` Bjorn Helgaas
@ 2012-02-07  0:21                                                                       ` Rogério Brito
  0 siblings, 0 replies; 53+ messages in thread
From: Rogério Brito @ 2012-02-07  0:21 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Yinghai Lu, Linus Torvalds, Jesse Barnes, Ivan Kokshaysky,
	Edward Donovan, Thomas Gleixner, linux-kernel,
	Márcia Coutinho de Brito, Ram Pai, rui.zhang

Hi there.

On Feb 06 2012, Bjorn Helgaas wrote:
> On Thu, Jan 26, 2012 at 7:41 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > On Thu, Jan 26, 2012 at 4:16 AM, Rogério Brito <rbrito@ime.usp.br> wrote:
> >> On Thu, Jan 26, 2012 at 00:03, Yinghai Lu <yinghai@kernel.org> wrote:
> >>> 2012/1/25 Rogério Brito <rbrito@ime.usp.br>:
> >>>>
> >>>> Wild thoughts: I just saw that nouveau calls some ACPI stuff to get
> >>>> EDID information and, as ACPI is broken on this notebook, could that
> >>>> be a potential reason for the panic that I'm seeing? (Yes, I can take
> >>>> a photo of the screen).
> >>>
> >>> Which 2.6.x kernel works on your laptop without disabling acpi?
> >>
> >> Unfortunately, no 2.6 kernel has worked on this laptop without
> >> acpi=off. I also once booted FreeBSD here just to see how it would
> >> behave and it also hang with ACPI enabled.
> >>
> >> From what I see on this notebook, if ACPI is not turned off, the
> >> computer simply hangs once the ACPI is started (it gives one layman
> >> like me the impression of entering an infinite loop). But, again, this
> >> is just a layman observation of what the system appears to be doing.
> >
> > My guess is that most of these problems are related, so if we fixed
> > whatever causes the ACPI hang, the other problems would probably go
> > away, too.
> >
> > Did you figure out anything about the MTRRs?
> 
> Ping :)  Let me know if you'd like a patch to print out the MTRR info
> later in boot where it will be slow enough to capture in a video.

It's quite late here and I'm going to bed, but tomorrow I will send the
things that I collected so far (not much more, but still).


Thanks for the reminder,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://rb.doesntexist.org : Packages for LaTeX : algorithms.berlios.de
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2012-02-07  0:22 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-41722-5003@https.bugzilla.kernel.org/>
     [not found] ` <201109062222.p86MMlK9023363@demeter2.kernel.org>
2012-01-07  1:31   ` [Bug 41722] Clevo M5X0JE hangs in ACPI init Rogério Brito
2012-01-07  1:50     ` Linus Torvalds
2012-01-07  4:19       ` Edward Donovan
2012-01-08 22:27         ` Rogério Brito
2012-01-09  4:20           ` Bjorn Helgaas
2012-01-10  2:12             ` Rogério Brito
2012-01-08 22:13       ` Rogério Brito
2012-01-08 22:23         ` Linus Torvalds
2012-01-09 19:22           ` Jesse Barnes
2012-01-09 19:41             ` Linus Torvalds
2012-01-10  1:32               ` Yinghai Lu
2012-01-10  2:41                 ` Rogério Brito
2012-01-10  5:07                   ` Yinghai Lu
2012-01-11  7:04                     ` Rogério Brito
2012-01-12  5:06                       ` Yinghai Lu
2012-01-13 11:59                         ` Rogério Brito
2012-01-13 17:29                           ` Yinghai Lu
2012-01-13 22:24                             ` Yinghai Lu
2012-01-14  2:01                             ` Rogério Brito
2012-01-14  7:09                               ` Rogério Brito
2012-01-14 21:05                                 ` Yinghai Lu
2012-01-16 23:08                                   ` Bjorn Helgaas
2012-01-19  3:50                                   ` Rogério Brito
2012-01-19  5:06                                     ` Yinghai Lu
2012-01-19 13:48                                       ` Rogério Brito
2012-01-19 16:12                                         ` Yinghai Lu
2012-01-19 16:15                                           ` Rogério Brito
2012-01-19 17:20                                             ` Rogério Brito
2012-01-19 19:48                                               ` Yinghai Lu
2012-01-21  9:26                                                 ` Ram Pai
2012-01-21 10:35                                                   ` Yinghai Lu
2012-01-24 22:18                                                 ` Rogério Brito
2012-01-24 22:53                                                   ` Bjorn Helgaas
2012-01-25  0:19                                                   ` Yinghai Lu
2012-01-25  0:34                                                     ` Linus Torvalds
2012-01-25  0:57                                                     ` Yinghai Lu
2012-01-25  1:55                                                       ` Rogério Brito
2012-01-25  2:33                                                         ` Rogério Brito
2012-01-25  2:39                                                           ` Rogério Brito
2012-01-25 23:58                                                             ` Rogério Brito
2012-01-26  2:03                                                               ` Yinghai Lu
2012-01-26 11:16                                                                 ` Rogério Brito
2012-01-27  3:41                                                                   ` Bjorn Helgaas
2012-02-06 23:11                                                                     ` Bjorn Helgaas
2012-02-07  0:21                                                                       ` Rogério Brito
2012-01-27  6:53                                                                   ` Yinghai Lu
2012-01-10  5:24                   ` Bjorn Helgaas
2012-01-11  7:05                     ` Rogério Brito
2012-01-11 10:45                       ` Ram Pai
2012-01-10  1:57           ` Rogério Brito
2012-01-10  9:25         ` Edward Donovan
2012-01-11  7:15           ` Rogério Brito
2012-01-08  0:55     ` Márcia Brito

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).