linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
       [not found]       ` <fa.5o6E6S0UWnARbQPxLe30TvLQIiY@ifi.uio.no>
@ 2007-12-08 18:24         ` Robert Hancock
  2007-12-09  5:59           ` Tejun Heo
  2007-12-09 21:36           ` Andreas Mohr
  0 siblings, 2 replies; 74+ messages in thread
From: Robert Hancock @ 2007-12-08 18:24 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Andrew Morton, Andreas Mohr, Rafael J. Wysocki, LKML,
	Linus Torvalds, Ingo Molnar, linux-ide, Tejun Heo, Len Brown,
	linux-acpi

Matthew Garrett wrote:
> On Sat, Dec 08, 2007 at 02:20:01AM -0800, Andrew Morton wrote:
>> On Sat, 8 Dec 2007 11:12:57 +0100 Andreas Mohr <andi@lisas.de> wrote:
>>> ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
>>> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
>>> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT
>>> ata1.01: _GTF evaluation failed (AE 0x300d)
> 
> 037f6bb79f753c014bc84bca0de9bf98bb5ab169 ought to have fixed this?
> 

I should think it should have.

I think we're too aggressive about disabling the libata ACPI support, 
even. One of my laptop's _GTF commands on resume is a DEVICE 
CONFIGURATION FREEZE LOCK command, which gets rejected by the drive 
(maybe it worked on the original Hitachi disk, but I've upgraded it to a 
  newer Samsung). I'd say if the drive returns command aborted on one of 
these, we should just ignore that command and continue to the next one 
without trying to retry or disabling the ACPI support entirely.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08 18:24         ` 2.6.24-rc4-git5: Reported regressions from 2.6.23 Robert Hancock
@ 2007-12-09  5:59           ` Tejun Heo
  2007-12-09 21:36           ` Andreas Mohr
  1 sibling, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2007-12-09  5:59 UTC (permalink / raw)
  To: Robert Hancock
  Cc: Matthew Garrett, Andrew Morton, Andreas Mohr, Rafael J. Wysocki,
	LKML, Linus Torvalds, Ingo Molnar, linux-ide, Len Brown,
	linux-acpi

Robert Hancock wrote:
> Matthew Garrett wrote:
>> On Sat, Dec 08, 2007 at 02:20:01AM -0800, Andrew Morton wrote:
>>> On Sat, 8 Dec 2007 11:12:57 +0100 Andreas Mohr <andi@lisas.de> wrote:
>>>> ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index
>>>> (0FFFFFFFF) is beyond end of object [20070126]
>>>> ACPI Error (psparse-0537): Method parse/execution failed
>>>> [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
>>>> ACPI Error (psparse-0537): Method parse/execution failed
>>>> [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT
>>>> ata1.01: _GTF evaluation failed (AE 0x300d)
>>
>> 037f6bb79f753c014bc84bca0de9bf98bb5ab169 ought to have fixed this?
>>
> 
> I should think it should have.
> 
> I think we're too aggressive about disabling the libata ACPI support,
> even. One of my laptop's _GTF commands on resume is a DEVICE
> CONFIGURATION FREEZE LOCK command, which gets rejected by the drive
> (maybe it worked on the original Hitachi disk, but I've upgraded it to a
>  newer Samsung). I'd say if the drive returns command aborted on one of
> these, we should just ignore that command and continue to the next one
> without trying to retry or disabling the ACPI support entirely.

Yeap, my pending patchset does exactly that.  It's currently being
tested by but reporters.  I'll soon post the patchset.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08 18:24         ` 2.6.24-rc4-git5: Reported regressions from 2.6.23 Robert Hancock
  2007-12-09  5:59           ` Tejun Heo
@ 2007-12-09 21:36           ` Andreas Mohr
  2007-12-10  0:04             ` Andreas Mohr
  1 sibling, 1 reply; 74+ messages in thread
From: Andreas Mohr @ 2007-12-09 21:36 UTC (permalink / raw)
  To: Robert Hancock
  Cc: Matthew Garrett, Andrew Morton, Andreas Mohr, Rafael J. Wysocki,
	LKML, Linus Torvalds, Ingo Molnar, linux-ide, Tejun Heo,
	Len Brown, linux-acpi

Hi,

[ACPI _GTM suspend issue sorta fixed, read below]

On Sat, Dec 08, 2007 at 12:24:16PM -0600, Robert Hancock wrote:
> Matthew Garrett wrote:
>> On Sat, Dec 08, 2007 at 02:20:01AM -0800, Andrew Morton wrote:
>>> On Sat, 8 Dec 2007 11:12:57 +0100 Andreas Mohr <andi@lisas.de> wrote:
>>>> ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
>>>> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
>>>> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT
>>>> ata1.01: _GTF evaluation failed (AE 0x300d)
>>
>> 037f6bb79f753c014bc84bca0de9bf98bb5ab169 ought to have fixed this?
>>
>
> I should think it should have.

Yup, the _GTF problem is certainly fixed, but this is a dead-end
since I showed the -rc1 vs. -rc2 behaviour, whereas I still have
failing suspend in -rc4 with this patch confirmed to be applied
(source does contain the changes) and confirmed to apparently be working
(no errors in dmesg any more).

IOW, what I'm concerned about is not a _GTF error on boot any more,
but a seemingly fatally handled _GTM error on suspend.


...OK, dug some more into this, and now I managed to get it to work,
and it was indeed _GTM which broke my whole suspend:

Since _GTM is failing on me (and the point is, it's failing
catastrophically, not "normally"!), ata_acpi_on_suspend()
calling ata_acpi_gtm() fails with -EINVAL instead of -ENOENT,
however ata_acpi_on_suspend() has the following correction only:

        if (rc == -ENOENT)
                rc = 0;

to make sure a suspend doesn't get aborted (fatal error) when
_GTM is simply empty.

Changing this into

        if ((rc == -ENOENT) || (rc == -EINVAL))
                rc = 0;

to additionally account for _invalid_ _GTM execution makes my suspend
(and resume!) work again on -rc4.


Now the question is whether this error code correction is ok, or whether a
catastrophically failing _GTM should have been truly registered on boot
already (where it does gtm to fetch cable timings) to subsequently avoid
doing any ATA ACPI things on suspend at all.


And the second, possibly much more lucrative, question would be
whether we're actually doing something wrong with our ACPI _GTM execution
which triggers the AE_AML_PACKAGE_LIMIT problem.

This might help here, perhaps (relevant snippets of AML dump):

                Device (CHN0)
                {
                    Name (_ADR, 0x00)
                    Method (_GTM, 0, NotSerialized)
                    {
                        Return (GTM (PMPT, PMUE, PMUT, PSPT, PSUE, PSUT))
                    }

                    Method (_STM, 3, NotSerialized)
                    {
                        Store (Arg0, TMD0)
                        Store (PMPT, GMPT)
                        Store (PMUE, GMUE)
                        Store (PMUT, GMUT)
                        Store (PSPT, GSPT)
                        Store (PSUE, GSUE)
                        Store (PSUT, GSUT)
                        STM ()
                        Store (GMPT, PMPT)
                        Store (GMUE, PMUE)
                        Store (GMUT, PMUT)
                        Store (GSPT, PSPT)
                        Store (GSUE, PSUE)
                        Store (GSUT, PSUT)
                    }

                Device (CHN1)
                {
                    Name (_ADR, 0x01)
                    Method (_GTM, 0, NotSerialized)
                    {
                        Return (GTM (SMPT, SMUE, SMUT, SSPT, SSUE, SSUT))
                    }

                    Method (_STM, 3, NotSerialized)
                    {
                        Store (Arg0, TMD0)
                        Store (SMPT, GMPT)
                        Store (SMUE, GMUE)
                        Store (SMUT, GMUT)
                        Store (SSPT, GSPT)
                        Store (SSUE, GSUE)
                        Store (SSUT, GSUT)
                        STM ()
                        Store (GMPT, SMPT)
                        Store (GMUE, SMUE)
                        Store (GMUT, SMUT)
                        Store (GSPT, SSPT)
                        Store (GSUE, SSUE)
                        Store (GSUT, SSUT)
                    }


                Method (GTM, 6, Serialized)
                {
                    Store (Ones, PIO0)
                    Store (Ones, PIO1)
                    Store (Ones, DMA0)
                    Store (Ones, DMA1)
                    Store (0x10, CHNF)
                    If (REGF) {}
                    Else
                    {
                        Return (TMD0)
                    }

                    Store (Match (DerefOf (Index (TIM0, 0x01)), MEQ, Arg0, MTR,
                        0x00, 0x00), Local6)
                    Store (DerefOf (Index (DerefOf (Index (TIM0, 0x00)), Local6)
),
                        Local7)
                    Store (Local7, DMA0)
                    Store (Local7, PIO0)
                    Store (Match (DerefOf (Index (TIM0, 0x01)), MEQ, Arg3, MTR,
                        0x00, 0x00), Local6)
                    Store (DerefOf (Index (DerefOf (Index (TIM0, 0x00)), Local6)
),
                        Local7)
                    Store (Local7, DMA1)
                    Store (Local7, PIO1)
                    If (Arg1)
                    {
                        If (A133 ())
                        {
                            Store (DerefOf (Index (DerefOf (Index (TIM0, 0x0D)),
 Arg2)),
                                Local5)
                        }
                        Else
                        {
                            Store (DerefOf (Index (DerefOf (Index (TIM0, 0x0A)),
 Arg2)),
                                Local5)
                        }

                        If (A133 ())
                        {
                            Store (DerefOf (Index (DerefOf (Index (TIM0, 0x0C)),
 Local5)),
                                DMA0)
                        }
                        Else
                        {
                            Store (DerefOf (Index (DerefOf (Index (TIM0, 0x04)),
 Local5)),
                                DMA0)
                        }

                        Or (CHNF, 0x01, CHNF)
                    }

                    If (Arg4)
                    {
                        If (A133 ())
                        {
                            Store (DerefOf (Index (DerefOf (Index (TIM0, 0x0D)), Arg5)),
                                Local5)
                        }
                        Else
                        {
                            Store (DerefOf (Index (DerefOf (Index (TIM0, 0x0A)), Arg5)),
                                Local5)
                        }

                        If (A133 ())
                        {
                            Store (DerefOf (Index (DerefOf (Index (TIM0, 0x0C)), Local5)),
                                DMA1)
                        }
                        Else
                        {
                            Store (DerefOf (Index (DerefOf (Index (TIM0, 0x04)), Local5)),
                                DMA1)
                        }

                        Or (CHNF, 0x04, CHNF)
                    }

                    Return (TMD0)
                }


Reminder: issue tracked at #9530.


Thanks,

Andreas Mohr

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 21:36           ` Andreas Mohr
@ 2007-12-10  0:04             ` Andreas Mohr
  2007-12-10  0:49               ` Andreas Mohr
  0 siblings, 1 reply; 74+ messages in thread
From: Andreas Mohr @ 2007-12-10  0:04 UTC (permalink / raw)
  To: Andreas Mohr
  Cc: Robert Hancock, Matthew Garrett, Andrew Morton,
	Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide,
	Tejun Heo, Len Brown, linux-acpi

Hi,

On Sun, Dec 09, 2007 at 10:36:42PM +0100, Andreas Mohr wrote:
> And the second, possibly much more lucrative, question would be
> whether we're actually doing something wrong with our ACPI _GTM execution
> which triggers the AE_AML_PACKAGE_LIMIT problem.
> 
> This might help here, perhaps (relevant snippets of AML dump):

Indeed, after looking over this horrid ASL stuff for ages I'm now starting
to believe that our IDE controller state is wrong,
since the Match()ing etc. in this particular _GTM implementation
is heavily dependant on actual PCI values
(it references some PCI_Config OperationRegion:s),
and some indexing seems to go wrong due to this.

IOW, it seems very likely that _GTM on these BIOSes (VIA chipsets) isn't
actually wrongly implemented but simply expects IDE controller values
to have been set up ""differently"".


Or... one could possibly even infer from this that - maybe -
the _GTM invocation spot is wrong, it should be done somewhere
different during bootup. Or whatever.



This seems to tell me again that we're often quick to blacklist
or whitelist things left and right when instead fundamental problems
are hidden somewhere.

Still investigating,

Andreas Mohr

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10  0:04             ` Andreas Mohr
@ 2007-12-10  0:49               ` Andreas Mohr
  2007-12-10  1:28                 ` Robert Hancock
  2007-12-10  2:20                 ` Tejun Heo
  0 siblings, 2 replies; 74+ messages in thread
From: Andreas Mohr @ 2007-12-10  0:49 UTC (permalink / raw)
  To: Andreas Mohr
  Cc: Robert Hancock, Matthew Garrett, Andrew Morton,
	Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide,
	Tejun Heo, Len Brown, linux-acpi

On Mon, Dec 10, 2007 at 01:04:31AM +0100, Andreas Mohr wrote:
> IOW, it seems very likely that _GTM on these BIOSes (VIA chipsets) isn't
> actually wrongly implemented but simply expects IDE controller values
> to have been set up ""differently"".
> 
> 
> Or... one could possibly even infer from this that - maybe -
> the _GTM invocation spot is wrong, it should be done somewhere
> different during bootup. Or whatever.

"Whatever" indeed:

There's an ASL Match() for a "PMPT" (Primary Master PorT) PCI register,
and the possible register values are:

                    Package (0x04)
                    {
                        0x20,
                        0x31,
                        0x65,
                        0xA8
                    },

and from

                OperationRegion (CFG2, PCI_Config, 0x40, 0x20)
                Field (CFG2, DWordAcc, NoLock, Preserve)
                {
                            Offset (0x08),·
                    SSPT,   8,·
                    SMPT,   8,·
                    PSPT,   8,·
                    PMPT,   8,·
                            Offset (0x10),·
...
we can infer that at PCI_Config offset 0x48 those values should be located.
However after bootup or resume there are:

# lspci -s 00:11.1 -xxx
00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00: 06 11 71 05 07 00 90 02 06 8a 01 01 00 20 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 e4 00 00 00 00 00 00 00 00 00 00 06 11 71 05
30: 00 00 00 00 c0 00 00 00 00 00 00 00 ff 01 00 00
40: 0b 32 09 0a 18 1c c0 00 99 99 20 20 ff 00 a8 20
50: 07 07 f6 f1 14 03 00 00 a8 a8 a8 a8 00 00 00 00
60: 00 02 00 00 00 00 00 00 00 02 00 00 00 00 00 00
70: 02 01 00 00 00 00 00 00 82 01 00 00 00 00 00 00
80: 00 e0 a1 1f 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 06 00 71 05 06 11 71 05 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 07 00 00 00 00 00 00 00 00 00


As one can see, the relevant values for SSPT, SMPT, PSPT and PMPT are
99 99 20 20, which are not quite entirely valid judging from the array above,
and this is because the secondary port is unused, as can also be seen
from my bootup log:

scsi0 : pata_via
scsi1 : pata_via
ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xe400 irq 14
ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xe408 irq 15
ata1.00: ATA-5: WDC WD1200JB-00CRA1, 17.07W17, max UDMA/100
ata1.00: 234441648 sectors, multi 16: LBA
ata1.01: ATAPI: TOSHIBA DVD-ROM SD-M1612, 1004, max UDMA/33
Switched to high resolution mode on CPU 0
ata1.00: configured for UDMA/100
ata1.01: configured for UDMA/33
ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTM_] (Node df80b9a8), AE_AML_PACKAGE_LIM
IT
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN1._GTM] (Node df80b8d0), AE_AML_PACKAG
E_LIMIT
ata2: ACPI get timing mode failed (AE 0x300d)


Manually tweaking the values to 20 20 20 20 truly does skip the _GTM failure message on suspend -
only to reappear right on resume due to 99 99 20 20 combo happening again.
If I don't tweak, I get _GTM failure at both suspend and resume.


As such one can conclude that this BIOS is rather very confused when being called for _GTM on an entirely
unused controller port. And this is either because the BIOS is dumb or because ACPI doesn't really
expect anyone to call _GTM on an unused physical port. I'd bet on the latter...
(however I haven't found ACPI 3.0b explicitly mentioning this somewhere yet)

Andreas Mohr

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10  0:49               ` Andreas Mohr
@ 2007-12-10  1:28                 ` Robert Hancock
  2007-12-10  2:25                   ` Tejun Heo
  2007-12-10  2:20                 ` Tejun Heo
  1 sibling, 1 reply; 74+ messages in thread
From: Robert Hancock @ 2007-12-10  1:28 UTC (permalink / raw)
  To: Andreas Mohr
  Cc: Matthew Garrett, Andrew Morton, Rafael J. Wysocki, LKML,
	Linus Torvalds, Ingo Molnar, linux-ide, Tejun Heo, Len Brown,
	linux-acpi

Andreas Mohr wrote:
> On Mon, Dec 10, 2007 at 01:04:31AM +0100, Andreas Mohr wrote:
>> IOW, it seems very likely that _GTM on these BIOSes (VIA chipsets) isn't
>> actually wrongly implemented but simply expects IDE controller values
>> to have been set up ""differently"".
>>
>>
>> Or... one could possibly even infer from this that - maybe -
>> the _GTM invocation spot is wrong, it should be done somewhere
>> different during bootup. Or whatever.
> 
> "Whatever" indeed:
> 
> There's an ASL Match() for a "PMPT" (Primary Master PorT) PCI register,
> and the possible register values are:
> 
>                     Package (0x04)
>                     {
>                         0x20,
>                         0x31,
>                         0x65,
>                         0xA8
>                     },
> 
> and from
> 
>                 OperationRegion (CFG2, PCI_Config, 0x40, 0x20)
>                 Field (CFG2, DWordAcc, NoLock, Preserve)
>                 {
>                             Offset (0x08),·
>                     SSPT,   8,·
>                     SMPT,   8,·
>                     PSPT,   8,·
>                     PMPT,   8,·
>                             Offset (0x10),·
> ...
> we can infer that at PCI_Config offset 0x48 those values should be located.
> However after bootup or resume there are:
> 
> # lspci -s 00:11.1 -xxx
> 00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
> 00: 06 11 71 05 07 00 90 02 06 8a 01 01 00 20 00 00
> 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 01 e4 00 00 00 00 00 00 00 00 00 00 06 11 71 05
> 30: 00 00 00 00 c0 00 00 00 00 00 00 00 ff 01 00 00
> 40: 0b 32 09 0a 18 1c c0 00 99 99 20 20 ff 00 a8 20
> 50: 07 07 f6 f1 14 03 00 00 a8 a8 a8 a8 00 00 00 00
> 60: 00 02 00 00 00 00 00 00 00 02 00 00 00 00 00 00
> 70: 02 01 00 00 00 00 00 00 82 01 00 00 00 00 00 00
> 80: 00 e0 a1 1f 00 00 00 00 00 00 00 00 00 00 00 00
> 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
> d0: 06 00 71 05 06 11 71 05 00 00 00 00 00 00 00 00
> e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> f0: 00 00 00 00 00 00 07 00 00 00 00 00 00 00 00 00
> 
> 
> As one can see, the relevant values for SSPT, SMPT, PSPT and PMPT are
> 99 99 20 20, which are not quite entirely valid judging from the array above,
> and this is because the secondary port is unused, as can also be seen
> from my bootup log:
> 
> scsi0 : pata_via
> scsi1 : pata_via
> ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xe400 irq 14
> ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xe408 irq 15
> ata1.00: ATA-5: WDC WD1200JB-00CRA1, 17.07W17, max UDMA/100
> ata1.00: 234441648 sectors, multi 16: LBA
> ata1.01: ATAPI: TOSHIBA DVD-ROM SD-M1612, 1004, max UDMA/33
> Switched to high resolution mode on CPU 0
> ata1.00: configured for UDMA/100
> ata1.01: configured for UDMA/33
> ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTM_] (Node df80b9a8), AE_AML_PACKAGE_LIM
> IT
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN1._GTM] (Node df80b8d0), AE_AML_PACKAG
> E_LIMIT
> ata2: ACPI get timing mode failed (AE 0x300d)
> 
> 
> Manually tweaking the values to 20 20 20 20 truly does skip the _GTM failure message on suspend -
> only to reappear right on resume due to 99 99 20 20 combo happening again.
> If I don't tweak, I get _GTM failure at both suspend and resume.
> 
> 
> As such one can conclude that this BIOS is rather very confused when being called for _GTM on an entirely
> unused controller port. And this is either because the BIOS is dumb or because ACPI doesn't really
> expect anyone to call _GTM on an unused physical port. I'd bet on the latter...
> (however I haven't found ACPI 3.0b explicitly mentioning this somewhere yet)
> 
> Andreas Mohr
> 

Probably Windows doesn't call _GTM on a port with no devices connected, 
and so the BIOS people never tested that case. Likely we can just avoid 
doing this - if no devices are connected the timing settings for that 
channel are irrelevant..

And you're quite right in your comment that we are often too quick to 
blacklist hardware instead of looking into why it really is failing. 
ACPI is one of those areas where we often just need to figure out how to 
be bug-to-bug compatibile with what Windows is doing..

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10  0:49               ` Andreas Mohr
  2007-12-10  1:28                 ` Robert Hancock
@ 2007-12-10  2:20                 ` Tejun Heo
  1 sibling, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2007-12-10  2:20 UTC (permalink / raw)
  To: Andreas Mohr
  Cc: Robert Hancock, Matthew Garrett, Andrew Morton,
	Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide,
	Len Brown, linux-acpi

Andreas Mohr wrote:
> As such one can conclude that this BIOS is rather very confused when being called for _GTM on an entirely
> unused controller port. And this is either because the BIOS is dumb or because ACPI doesn't really
> expect anyone to call _GTM on an unused physical port. I'd bet on the latter...
> (however I haven't found ACPI 3.0b explicitly mentioning this somewhere yet)

Thanks a lot for finding this out.  One of the two reports in bug 9320
seems to be the same problem although the other doesn't seem to be.  So,
it seems we'll have to check that both primary and secondary slots are
empty and skip _GTM if so.  :-(

Also, right, there's no need to fail suspend on _GTM failure whatever
the error is.  That was me being anal again.  Will incorporate both into
the ACPI fixes patchset.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10  1:28                 ` Robert Hancock
@ 2007-12-10  2:25                   ` Tejun Heo
  2007-12-10  3:20                     ` Robert Hancock
  0 siblings, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2007-12-10  2:25 UTC (permalink / raw)
  To: Robert Hancock
  Cc: Andreas Mohr, Matthew Garrett, Andrew Morton, Rafael J. Wysocki,
	LKML, Linus Torvalds, Ingo Molnar, linux-ide, Len Brown,
	linux-acpi

Robert Hancock wrote:
> And you're quite right in your comment that we are often too quick to
> blacklist hardware instead of looking into why it really is failing.
> ACPI is one of those areas where we often just need to figure out how to
> be bug-to-bug compatibile with what Windows is doing..

In the spirit of not blacklisting without looking deep into ACPI code,
can somebody familiar with ASL take a look at comment 11 of bug 9320?

  http://bugzilla.kernel.org/show_bug.cgi?id=9320#c11

This is libata calling _GTM to find out how the BIOS configured the
device to determine cable type.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10  2:25                   ` Tejun Heo
@ 2007-12-10  3:20                     ` Robert Hancock
  0 siblings, 0 replies; 74+ messages in thread
From: Robert Hancock @ 2007-12-10  3:20 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andreas Mohr, Matthew Garrett, Andrew Morton, Rafael J. Wysocki,
	LKML, Linus Torvalds, Ingo Molnar, linux-ide, Len Brown,
	linux-acpi

Tejun Heo wrote:
> Robert Hancock wrote:
>> And you're quite right in your comment that we are often too quick to
>> blacklist hardware instead of looking into why it really is failing.
>> ACPI is one of those areas where we often just need to figure out how to
>> be bug-to-bug compatibile with what Windows is doing..
> 
> In the spirit of not blacklisting without looking deep into ACPI code,
> can somebody familiar with ASL take a look at comment 11 of bug 9320?
> 
>   http://bugzilla.kernel.org/show_bug.cgi?id=9320#c11
> 
> This is libata calling _GTM to find out how the BIOS configured the
> device to determine cable type.
> 
> Thanks.

I suspect it's somewhat similar (though perhaps a different cause), the 
code is trying to lookup a value (presumably register contents) in a 
table using Match, gets a value that's not in the table (which makes 
Match return the ONES value FFFFFFFF meaning not found) and so the 
lookup of the corresponding output value with that index fails. We'd 
need the full ASL dump to know exactly what's going on there.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-13 10:49         ` Takashi Iwai
@ 2007-12-20 15:42           ` Takashi Iwai
  0 siblings, 0 replies; 74+ messages in thread
From: Takashi Iwai @ 2007-12-20 15:42 UTC (permalink / raw)
  To: perex
  Cc: Theodore Tso, Rafael J. Wysocki, Andrew Morton, LKML,
	Linus Torvalds, Ingo Molnar, Roland Dreier

At Thu, 13 Dec 2007 11:49:51 +0100,
I wrote:
> 
> [Sorry for the late response as I've been on vacation]
> 
> At Sat, 8 Dec 2007 21:15:44 -0500,
> Theodore Tso wrote:
> > 
> > On Sat, Dec 08, 2007 at 11:30:53PM +0100, Rafael J. Wysocki wrote:
> > > On Saturday, 8 of December 2007, Theodore Tso wrote:
> > > > However, as far as I am concerned, Ingo's patch, first posted to LKML
> > > > here: http://lkml.org/lkml/2007/11/16/66 should be listed as fixing
> > > > the above regression.  Rafael, could you please make a note of this in
> > > > your regression list,
> > > 
> > > Done, thanks.
> > 
> > Great, thanks.  I should add that technically this wasn't a regression
> > since I had been seeing this since before 2.6.23.  Also, it isn't a
> > big deal, since aside from noise in the syslog, falling back to
> > polling more doesn't make any functional or user-visible difference
> > (although I guess it's less efficient).  
> > 
> > Regardless of whether it is a regression, it would be nice to get the
> > patch applied and and this issue fixed for 2.6.25!
> 
> You mean 2.6.24 ? ;-)
> 
> Yes, if it solves the problem, not only improves the latency, it's
> definitely nice to have now.  I was just too conservative to mark it
> for 2.6.24 merge although it looks safe.
> 
> Jaroslav, could you prepare this for the push?  It corresponds to
> alsa-kernel HG changeset 5557.

Jaroslav, what about this now?


Takashi

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-11  9:01           ` Ingo Molnar
  2007-12-11 21:10             ` Stefano Brivio
@ 2007-12-19  0:58             ` Stefano Brivio
  1 sibling, 0 replies; 74+ messages in thread
From: Stefano Brivio @ 2007-12-19  0:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, rjw, linux-kernel, torvalds, Guillaume Chazarain

On Tue, 11 Dec 2007 10:01:20 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> ok, just to make sure we are all synced up. I made 8 patches related to 
> this problem category (and all the trickle effects). 3 are upstream 
> already, 5 are pending for v2.6.25. One out of those 5 is an immaterial 
> cleanup patch - which leaves us 4 patches to sort out.
> 
> So i'd suggest for you to try latest -git - that will tell us whether 
> udelay() is acceptable on your box right now.
> 
> i've attached those 4 patches:
> 
>  x86-sched_clock-re-scheduler-fix-x86-regression-in-native-sched-clock.patch
>  x86-cpu-clock-idle-event.patch
>  sched-printk-recursion-fix.patch
>  sched-printk-clock-fix.patch
> 
> none of them is _supposed_ to have any effect on udelay(), but the 
> interactions in this area are weird.

Exactly, none of them have any effect on udelay().

> [ note: CONFIG_PRINTK_TIME will be broken and only fixed in v2.6.25, so 
>   use some other time metric for determining mdelay quality. ]
> 
> plus then there's this patch:
> 
>   http://lkml.org/lkml/2007/12/7/100
> 
> is it perhaps this one that fixed udelay for you? [ which would be much 
> more expected, as this patch changes udelay ;-) ]

Yes, this one did. mdelay(2000) still gives delays between 2 and 2.9s, which is
acceptable. I have marked the regression as CODE_FIX.


--
Ciao
Stefano

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09  2:15       ` Theodore Tso
@ 2007-12-13 10:49         ` Takashi Iwai
  2007-12-20 15:42           ` Takashi Iwai
  0 siblings, 1 reply; 74+ messages in thread
From: Takashi Iwai @ 2007-12-13 10:49 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Rafael J. Wysocki, Andrew Morton, LKML, Linus Torvalds,
	Ingo Molnar, Roland Dreier, perex

[Sorry for the late response as I've been on vacation]

At Sat, 8 Dec 2007 21:15:44 -0500,
Theodore Tso wrote:
> 
> On Sat, Dec 08, 2007 at 11:30:53PM +0100, Rafael J. Wysocki wrote:
> > On Saturday, 8 of December 2007, Theodore Tso wrote:
> > > However, as far as I am concerned, Ingo's patch, first posted to LKML
> > > here: http://lkml.org/lkml/2007/11/16/66 should be listed as fixing
> > > the above regression.  Rafael, could you please make a note of this in
> > > your regression list,
> > 
> > Done, thanks.
> 
> Great, thanks.  I should add that technically this wasn't a regression
> since I had been seeing this since before 2.6.23.  Also, it isn't a
> big deal, since aside from noise in the syslog, falling back to
> polling more doesn't make any functional or user-visible difference
> (although I guess it's less efficient).  
> 
> Regardless of whether it is a regression, it would be nice to get the
> patch applied and and this issue fixed for 2.6.25!

You mean 2.6.24 ? ;-)

Yes, if it solves the problem, not only improves the latency, it's
definitely nice to have now.  I was just too conservative to mark it
for 2.6.24 merge although it looks safe.

Jaroslav, could you prepare this for the push?  It corresponds to
alsa-kernel HG changeset 5557.


thanks,

Takashi

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-11  9:01           ` Ingo Molnar
@ 2007-12-11 21:10             ` Stefano Brivio
  2007-12-19  0:58             ` Stefano Brivio
  1 sibling, 0 replies; 74+ messages in thread
From: Stefano Brivio @ 2007-12-11 21:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, rjw, linux-kernel, torvalds, Guillaume Chazarain

On Tue, 11 Dec 2007 10:01:20 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> ok, just to make sure we are all synced up. I made 8 patches related to 
> this problem category (and all the trickle effects). 3 are upstream 
> already, 5 are pending for v2.6.25. One out of those 5 is an immaterial 
> cleanup patch - which leaves us 4 patches to sort out.
> 
> So i'd suggest for you to try latest -git - that will tell us whether 
> udelay() is acceptable on your box right now.

Yes, it is (msleep(2000), as said, gives delays between 2 and 2.9s on my
box, and drivers are happy).

The commit which fixed this (it seems) is
fa2dd441df28b9fdfc68f84ae66f1b507cfff0e4. I'll bisect and tell you more in the
next days.

> i've attached those 4 patches:
> 
>  x86-sched_clock-re-scheduler-fix-x86-regression-in-native-sched-clock.patch
>  x86-cpu-clock-idle-event.patch
>  sched-printk-recursion-fix.patch
>  sched-printk-clock-fix.patch
> 
> none of them is _supposed_ to have any effect on udelay(), but the 
> interactions in this area are weird.

No effects here IIRC.

> [ note: CONFIG_PRINTK_TIME will be broken and only fixed in v2.6.25, so 
>   use some other time metric for determining mdelay quality. ]
> 
> plus then there's this patch:
> 
>   http://lkml.org/lkml/2007/12/7/100
> 
> is it perhaps this one that fixed udelay for you? [ which would be much 
> more expected, as this patch changes udelay ;-) ]

Will try it ASAP, again, in the next few days anyway.


--
Ciao
Stefano

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10 23:34         ` Stefano Brivio
  2007-12-10 23:53           ` Guillaume Chazarain
  2007-12-10 23:56           ` Arjan van de Ven
@ 2007-12-11  9:01           ` Ingo Molnar
  2007-12-11 21:10             ` Stefano Brivio
  2007-12-19  0:58             ` Stefano Brivio
  2 siblings, 2 replies; 74+ messages in thread
From: Ingo Molnar @ 2007-12-11  9:01 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: Andrew Morton, rjw, linux-kernel, torvalds, Guillaume Chazarain

[-- Attachment #1: Type: text/plain, Size: 1791 bytes --]


* Stefano Brivio <stefano.brivio@polimi.it> wrote:

> > Stefano, could you please try to sum up your experiences with that 
> > issue? Is it reproducable, and the 5 patches i did fix it? (if yes, 
> > could you try to re-do the mdelay verifications perhaps, to make 
> > sure it's not some other effect interacting here. In theory 
> > sched-clock scaling has no effect on udelay behavior.)
> 
> Sorry for disappearing. Anyway, yes, those patches fixed it. Precision 
> in delays isn't that good when using my crappy unstable TSC 
> (mdelay(2000) causes delays between 2 and 2.9 seconds) but it's not 
> depending on frequency changes anymore. So I'd say it's fixed, but 
> please tell me if you want me to do any other test so as to be sure it 
> is.

ok, just to make sure we are all synced up. I made 8 patches related to 
this problem category (and all the trickle effects). 3 are upstream 
already, 5 are pending for v2.6.25. One out of those 5 is an immaterial 
cleanup patch - which leaves us 4 patches to sort out.

So i'd suggest for you to try latest -git - that will tell us whether 
udelay() is acceptable on your box right now.

i've attached those 4 patches:

 x86-sched_clock-re-scheduler-fix-x86-regression-in-native-sched-clock.patch
 x86-cpu-clock-idle-event.patch
 sched-printk-recursion-fix.patch
 sched-printk-clock-fix.patch

none of them is _supposed_ to have any effect on udelay(), but the 
interactions in this area are weird.

[ note: CONFIG_PRINTK_TIME will be broken and only fixed in v2.6.25, so 
  use some other time metric for determining mdelay quality. ]

plus then there's this patch:

  http://lkml.org/lkml/2007/12/7/100

is it perhaps this one that fixed udelay for you? [ which would be much 
more expected, as this patch changes udelay ;-) ]

	Ingo

[-- Attachment #2: x86-sched_clock-re-scheduler-fix-x86-regression-in-native-sched-clock.patch --]
[-- Type: text/plain, Size: 7410 bytes --]

Subject: x86: scale cyc_2_nsec according to CPU frequency
From: "Guillaume Chazarain" <guichaz@yahoo.fr>

scale the sched_clock() cyc_2_nsec scaling factor according to
CPU frequency changes.

[ mingo@elte.hu: simplified it and fixed it for SMP. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/tsc_32.c |   43 ++++++++++++++++++++++++++++++-----
 arch/x86/kernel/tsc_64.c |   57 ++++++++++++++++++++++++++++++++++++++---------
 include/asm-x86/timer.h  |   23 ++++++++++++++----
 3 files changed, 102 insertions(+), 21 deletions(-)

Index: linux/arch/x86/kernel/tsc_32.c
===================================================================
--- linux.orig/arch/x86/kernel/tsc_32.c
+++ linux/arch/x86/kernel/tsc_32.c
@@ -5,6 +5,7 @@
 #include <linux/jiffies.h>
 #include <linux/init.h>
 #include <linux/dmi.h>
+#include <linux/percpu.h>
 
 #include <asm/delay.h>
 #include <asm/tsc.h>
@@ -80,13 +81,31 @@ EXPORT_SYMBOL_GPL(check_tsc_unstable);
  *
  *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
  */
-unsigned long cyc2ns_scale __read_mostly;
 
-#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
+DEFINE_PER_CPU(unsigned long, cyc2ns);
 
-static inline void set_cyc2ns_scale(unsigned long cpu_khz)
+static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
 {
-	cyc2ns_scale = (1000000 << CYC2NS_SCALE_FACTOR)/cpu_khz;
+	unsigned long flags, prev_scale, *scale;
+	unsigned long long tsc_now, ns_now;
+
+	local_irq_save(flags);
+	sched_clock_idle_sleep_event();
+
+	scale = &per_cpu(cyc2ns, cpu);
+
+	rdtscll(tsc_now);
+	ns_now = __cycles_2_ns(tsc_now);
+
+	prev_scale = *scale;
+	if (cpu_khz)
+		*scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz;
+
+	/*
+	 * Start smoothly with the new frequency:
+	 */
+	sched_clock_idle_wakeup_event(0);
+	local_irq_restore(flags);
 }
 
 /*
@@ -239,7 +258,9 @@ time_cpufreq_notifier(struct notifier_bl
 						ref_freq, freq->new);
 			if (!(freq->flags & CPUFREQ_CONST_LOOPS)) {
 				tsc_khz = cpu_khz;
-				set_cyc2ns_scale(cpu_khz);
+				preempt_disable();
+				set_cyc2ns_scale(cpu_khz, smp_processor_id());
+				preempt_enable();
 				/*
 				 * TSC based sched_clock turns
 				 * to junk w/ cpufreq
@@ -367,6 +388,8 @@ static inline void check_geode_tsc_relia
 
 void __init tsc_init(void)
 {
+	int cpu;
+
 	if (!cpu_has_tsc || tsc_disable)
 		goto out_no_tsc;
 
@@ -380,7 +403,15 @@ void __init tsc_init(void)
 				(unsigned long)cpu_khz / 1000,
 				(unsigned long)cpu_khz % 1000);
 
-	set_cyc2ns_scale(cpu_khz);
+	/*
+	 * Secondary CPUs do not run through tsc_init(), so set up
+	 * all the scale factors for all CPUs, assuming the same
+	 * speed as the bootup CPU. (cpufreq notifiers will fix this
+	 * up if their speed diverges)
+	 */
+	for_each_possible_cpu(cpu)
+		set_cyc2ns_scale(cpu_khz, cpu);
+
 	use_tsc_delay();
 
 	/* Check and install the TSC clocksource */
Index: linux/arch/x86/kernel/tsc_64.c
===================================================================
--- linux.orig/arch/x86/kernel/tsc_64.c
+++ linux/arch/x86/kernel/tsc_64.c
@@ -10,6 +10,7 @@
 
 #include <asm/hpet.h>
 #include <asm/timex.h>
+#include <asm/timer.h>
 
 static int notsc __initdata = 0;
 
@@ -18,16 +19,48 @@ EXPORT_SYMBOL(cpu_khz);
 unsigned int tsc_khz;
 EXPORT_SYMBOL(tsc_khz);
 
-static unsigned int cyc2ns_scale __read_mostly;
+/* Accelerators for sched_clock()
+ * convert from cycles(64bits) => nanoseconds (64bits)
+ *  basic equation:
+ *		ns = cycles / (freq / ns_per_sec)
+ *		ns = cycles * (ns_per_sec / freq)
+ *		ns = cycles * (10^9 / (cpu_khz * 10^3))
+ *		ns = cycles * (10^6 / cpu_khz)
+ *
+ *	Then we use scaling math (suggested by george@mvista.com) to get:
+ *		ns = cycles * (10^6 * SC / cpu_khz) / SC
+ *		ns = cycles * cyc2ns_scale / SC
+ *
+ *	And since SC is a constant power of two, we can convert the div
+ *  into a shift.
+ *
+ *  We can use khz divisor instead of mhz to keep a better precision, since
+ *  cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits.
+ *  (mathieu.desnoyers@polymtl.ca)
+ *
+ *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
+ */
+DEFINE_PER_CPU(unsigned long, cyc2ns);
 
-static inline void set_cyc2ns_scale(unsigned long khz)
+static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
 {
-	cyc2ns_scale = (NSEC_PER_MSEC << NS_SCALE) / khz;
-}
+	unsigned long flags, prev_scale, *scale;
+	unsigned long long tsc_now, ns_now;
 
-static unsigned long long cycles_2_ns(unsigned long long cyc)
-{
-	return (cyc * cyc2ns_scale) >> NS_SCALE;
+	local_irq_save(flags);
+	sched_clock_idle_sleep_event();
+
+	scale = &per_cpu(cyc2ns, cpu);
+
+	rdtscll(tsc_now);
+	ns_now = __cycles_2_ns(tsc_now);
+
+	prev_scale = *scale;
+	if (cpu_khz)
+		*scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz;
+
+	sched_clock_idle_wakeup_event(0);
+	local_irq_restore(flags);
 }
 
 unsigned long long sched_clock(void)
@@ -100,7 +133,9 @@ static int time_cpufreq_notifier(struct 
 			mark_tsc_unstable("cpufreq changes");
 	}
 
-	set_cyc2ns_scale(tsc_khz_ref);
+	preempt_disable();
+	set_cyc2ns_scale(tsc_khz_ref, smp_processor_id());
+	preempt_enable();
 
 	return 0;
 }
@@ -151,7 +186,7 @@ static unsigned long __init tsc_read_ref
 void __init tsc_calibrate(void)
 {
 	unsigned long flags, tsc1, tsc2, tr1, tr2, pm1, pm2, hpet1, hpet2;
-	int hpet = is_hpet_enabled();
+	int hpet = is_hpet_enabled(), cpu;
 
 	local_irq_save(flags);
 
@@ -206,7 +241,9 @@ void __init tsc_calibrate(void)
 	}
 
 	tsc_khz = tsc2 / tsc1;
-	set_cyc2ns_scale(tsc_khz);
+
+	for_each_possible_cpu(cpu)
+		set_cyc2ns_scale(tsc_khz, cpu);
 }
 
 /*
Index: linux/include/asm-x86/timer.h
===================================================================
--- linux.orig/include/asm-x86/timer.h
+++ linux/include/asm-x86/timer.h
@@ -2,6 +2,7 @@
 #define _ASMi386_TIMER_H
 #include <linux/init.h>
 #include <linux/pm.h>
+#include <linux/percpu.h>
 
 #define TICK_SIZE (tick_nsec / 1000)
 
@@ -16,7 +17,7 @@ extern int recalibrate_cpu_khz(void);
 #define calculate_cpu_khz() native_calculate_cpu_khz()
 #endif
 
-/* Accellerators for sched_clock()
+/* Accelerators for sched_clock()
  * convert from cycles(64bits) => nanoseconds (64bits)
  *  basic equation:
  *		ns = cycles / (freq / ns_per_sec)
@@ -31,20 +32,32 @@ extern int recalibrate_cpu_khz(void);
  *	And since SC is a constant power of two, we can convert the div
  *  into a shift.
  *
- *  We can use khz divisor instead of mhz to keep a better percision, since
+ *  We can use khz divisor instead of mhz to keep a better precision, since
  *  cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits.
  *  (mathieu.desnoyers@polymtl.ca)
  *
  *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
  */
-extern unsigned long cyc2ns_scale __read_mostly;
+
+DECLARE_PER_CPU(unsigned long, cyc2ns);
 
 #define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
 
-static inline unsigned long long cycles_2_ns(unsigned long long cyc)
+static inline unsigned long long __cycles_2_ns(unsigned long long cyc)
 {
-	return (cyc * cyc2ns_scale) >> CYC2NS_SCALE_FACTOR;
+	return cyc * per_cpu(cyc2ns, smp_processor_id()) >> CYC2NS_SCALE_FACTOR;
 }
 
+static inline unsigned long long cycles_2_ns(unsigned long long cyc)
+{
+	unsigned long long ns;
+	unsigned long flags;
+
+	local_irq_save(flags);
+	ns = __cycles_2_ns(cyc);
+	local_irq_restore(flags);
+
+	return ns;
+}
 
 #endif

[-- Attachment #3: x86-cpu-clock-idle-event.patch --]
[-- Type: text/plain, Size: 2055 bytes --]

Subject: x86: idle wakeup event in the HLT loop
From: Ingo Molnar <mingo@elte.hu>

do a proper idle-wakeup event on HLT as well - some CPUs stop the TSC
in HLT too, not just when going through the ACPI methods.

(the ACPI idle code already does this.)

[ update the 64-bit side too, as noticed by Jiri Slaby. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/process_32.c |   15 ++++++++++++---
 arch/x86/kernel/process_64.c |   13 ++++++++++---
 2 files changed, 22 insertions(+), 6 deletions(-)

Index: linux-x86.q/arch/x86/kernel/process_32.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/process_32.c
+++ linux-x86.q/arch/x86/kernel/process_32.c
@@ -113,10 +113,19 @@ void default_idle(void)
 		smp_mb();
 
 		local_irq_disable();
-		if (!need_resched())
+		if (!need_resched()) {
+			ktime_t t0, t1;
+			u64 t0n, t1n;
+
+			t0 = ktime_get();
+			t0n = ktime_to_ns(t0);
 			safe_halt();	/* enables interrupts racelessly */
-		else
-			local_irq_enable();
+			local_irq_disable();
+			t1 = ktime_get();
+			t1n = ktime_to_ns(t1);
+			sched_clock_idle_wakeup_event(t1n - t0n);
+		}
+		local_irq_enable();
 		current_thread_info()->status |= TS_POLLING;
 	} else {
 		/* loop is done by the caller */
Index: linux-x86.q/arch/x86/kernel/process_64.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/process_64.c
+++ linux-x86.q/arch/x86/kernel/process_64.c
@@ -116,9 +116,16 @@ static void default_idle(void)
 	smp_mb();
 	local_irq_disable();
 	if (!need_resched()) {
-		/* Enables interrupts one instruction before HLT.
-		   x86 special cases this so there is no race. */
-		safe_halt();
+		ktime_t t0, t1;
+		u64 t0n, t1n;
+
+		t0 = ktime_get();
+		t0n = ktime_to_ns(t0);
+		safe_halt();	/* enables interrupts racelessly */
+		local_irq_disable();
+		t1 = ktime_get();
+		t1n = ktime_to_ns(t1);
+		sched_clock_idle_wakeup_event(t1n - t0n);
 	} else
 		local_irq_enable();
 	current_thread_info()->status |= TS_POLLING;

[-- Attachment #4: sched-printk-recursion-fix.patch --]
[-- Type: text/plain, Size: 3181 bytes --]

Subject: printk: make printk more robust by not allowing recursion
From: Ingo Molnar <mingo@elte.hu>

make printk more robust by allowing recursion only if there's a crash
going on. Also add recursion detection.

I've tested it with an artificially injected printk recursion - instead
of a lockup or spontaneous reboot or other crash, the output was a well
controlled:

[   41.057335] SysRq : <2>BUG: recent printk recursion!
[   41.057335] loglevel0-8 reBoot Crashdump show-all-locks(D) tErm Full kIll saK showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks

also do all this printk-debug logic with irqs disabled.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/printk.c |   48 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 38 insertions(+), 10 deletions(-)

Index: linux/kernel/printk.c
===================================================================
--- linux.orig/kernel/printk.c
+++ linux/kernel/printk.c
@@ -628,30 +628,57 @@ asmlinkage int printk(const char *fmt, .
 /* cpu currently holding logbuf_lock */
 static volatile unsigned int printk_cpu = UINT_MAX;
 
+const char printk_recursion_bug_msg [] =
+			KERN_CRIT "BUG: recent printk recursion!\n";
+static int printk_recursion_bug;
+
 asmlinkage int vprintk(const char *fmt, va_list args)
 {
+	static int log_level_unknown = 1;
+	static char printk_buf[1024];
+
 	unsigned long flags;
-	int printed_len;
+	int printed_len = 0;
+	int this_cpu;
 	char *p;
-	static char printk_buf[1024];
-	static int log_level_unknown = 1;
 
 	boot_delay_msec();
 
 	preempt_disable();
-	if (unlikely(oops_in_progress) && printk_cpu == smp_processor_id())
-		/* If a crash is occurring during printk() on this CPU,
-		 * make sure we can't deadlock */
-		zap_locks();
-
 	/* This stops the holder of console_sem just where we want him */
 	raw_local_irq_save(flags);
+	this_cpu = smp_processor_id();
+
+	/*
+	 * Ouch, printk recursed into itself!
+	 */
+	if (unlikely(printk_cpu == this_cpu)) {
+		/*
+		 * If a crash is occurring during printk() on this CPU,
+		 * then try to get the crash message out but make sure
+		 * we can't deadlock. Otherwise just return to avoid the
+		 * recursion and return - but flag the recursion so that
+		 * it can be printed at the next appropriate moment:
+		 */
+		if (!oops_in_progress) {
+			printk_recursion_bug = 1;
+			goto out_restore_irqs;
+		}
+		zap_locks();
+	}
+
 	lockdep_off();
 	spin_lock(&logbuf_lock);
-	printk_cpu = smp_processor_id();
+	printk_cpu = this_cpu;
 
+	if (printk_recursion_bug) {
+		printk_recursion_bug = 0;
+		strcpy(printk_buf, printk_recursion_bug_msg);
+		printed_len = sizeof(printk_recursion_bug_msg);
+	}
 	/* Emit the output into the temporary buffer */
-	printed_len = vscnprintf(printk_buf, sizeof(printk_buf), fmt, args);
+	printed_len += vscnprintf(printk_buf + printed_len,
+				  sizeof(printk_buf), fmt, args);
 
 	/*
 	 * Copy the output into log_buf.  If the caller didn't provide
@@ -744,6 +771,7 @@ asmlinkage int vprintk(const char *fmt, 
 		printk_cpu = UINT_MAX;
 		spin_unlock(&logbuf_lock);
 		lockdep_on();
+out_restore_irqs:
 		raw_local_irq_restore(flags);
 	}
 

[-- Attachment #5: sched-printk-clock-fix.patch --]
[-- Type: text/plain, Size: 943 bytes --]

Subject: sched: fix CONFIG_PRINT_TIME's reliance on sched_clock()
From: Ingo Molnar <mingo@elte.hu>

Stefano Brivio reported weird printk timestamp behavior during
CPU frequency changes:

  http://bugzilla.kernel.org/show_bug.cgi?id=9475

fix CONFIG_PRINT_TIME's reliance on sched_clock() and use cpu_clock()
instead.

Reported-and-bisected-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/printk.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/kernel/printk.c
===================================================================
--- linux.orig/kernel/printk.c
+++ linux/kernel/printk.c
@@ -707,7 +707,7 @@ asmlinkage int vprintk(const char *fmt, 
 					loglev_char = default_message_loglevel
 						+ '0';
 				}
-				t = printk_clock();
+				t = cpu_clock(printk_cpu);
 				nanosec_rem = do_div(t, 1000000000);
 				tlen = sprintf(tbuf,
 						"<%c>[%5lu.%06lu] ",

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10 23:53           ` Guillaume Chazarain
@ 2007-12-11  8:48             ` Ingo Molnar
  0 siblings, 0 replies; 74+ messages in thread
From: Ingo Molnar @ 2007-12-11  8:48 UTC (permalink / raw)
  To: Guillaume Chazarain
  Cc: Stefano Brivio, Andrew Morton, rjw, linux-kernel, torvalds


* Guillaume Chazarain <guichaz@yahoo.fr> wrote:

> Stefano Brivio <stefano.brivio@polimi.it> wrote:
> 
> > Sorry for disappearing. Anyway, yes, those patches fixed it. Precision in
> > delays isn't that good when using my crappy unstable TSC (mdelay(2000)
> > causes delays between 2 and 2.9 seconds) but it's not depending on frequency
> > changes anymore. So I'd say it's fixed, but please tell me if you want me
> > to do any other test so as to be sure it is.
> 
> Ingo,
> 
> it seems you dropped http://lkml.org/lkml/2007/12/7/100 (cpu_clock() 
> based udelay), so how udelay can be affected by your proposed changes?

was this needed for you to get stable udelay()? (that cpu_clock() based 
udelay patch was buggy, i got the units wrong. udelay does wacky 
conversions between various units. So i dropped it for the time being.)

the last rollup you tested didnt show udelay problems, and it didnt 
include the sched_clock() based udelay patch.

so it would be nice if you could re-examine exactly what is needed. 
Please try latest -git and the concatenation of the 4 patches below. 
What would be the best info is to see which (if any!) patches are needed 
against latest -git to get a stable udelay() on your box.

	Ingo

--------------------------------------->
* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> Subject		: jiffies counter leaps in 2.6.24-rc3
> Submitter	: Stefano Brivio <stefano.brivio@polimi.it>
> References	: http://lkml.org/lkml/2007/11/24/53
> 		  http://bugzilla.kernel.org/show_bug.cgi?id=9475
> Handled-By	: Ingo Molnar <mingo@elte.hu>
> Patch		: http://lkml.org/lkml/2007/12/7/132

Linus, Andrew, i need some help deciding what to do about this 
regression. The fixes for this have been tested and resolve the 
regression, but they change printk and other code that runs by default 
and is thus rather invasive so late in the v2.6.24 cycle. This bug 
should only affect CONFIG_PRINTK_TIME=y kernels (a non-default debug 
option) - although some claimed effect was on udelay()/mdelay() too.

i've attached below the queue of 5 patches that fix this problem. They 
have been build and boot tested with more than 1000 random kernels in 
the past few days, so i certainly trust the core and x86 bits of this.

what do you think? Right now i've got them queued up for 2.6.25 in both 
the scheduler-devel and the x86-devel git trees - but can submit them 
for 2.6.24 if it's better if we did them there. I've got no strong 
opinion either way.

	Ingo

-------------------->
Subject: x86: scale cyc_2_nsec according to CPU frequency
From: "Guillaume Chazarain" <guichaz@yahoo.fr>

scale the sched_clock() cyc_2_nsec scaling factor according to
CPU frequency changes.

[ mingo@elte.hu: simplified it and fixed it for SMP. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/tsc_32.c |   43 ++++++++++++++++++++++++++++++-----
 arch/x86/kernel/tsc_64.c |   57 ++++++++++++++++++++++++++++++++++++++---------
 include/asm-x86/timer.h  |   23 ++++++++++++++----
 3 files changed, 102 insertions(+), 21 deletions(-)

Index: linux/arch/x86/kernel/tsc_32.c
===================================================================
--- linux.orig/arch/x86/kernel/tsc_32.c
+++ linux/arch/x86/kernel/tsc_32.c
@@ -5,6 +5,7 @@
 #include <linux/jiffies.h>
 #include <linux/init.h>
 #include <linux/dmi.h>
+#include <linux/percpu.h>
 
 #include <asm/delay.h>
 #include <asm/tsc.h>
@@ -80,13 +81,31 @@ EXPORT_SYMBOL_GPL(check_tsc_unstable);
  *
  *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
  */
-unsigned long cyc2ns_scale __read_mostly;
 
-#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
+DEFINE_PER_CPU(unsigned long, cyc2ns);
 
-static inline void set_cyc2ns_scale(unsigned long cpu_khz)
+static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
 {
-	cyc2ns_scale = (1000000 << CYC2NS_SCALE_FACTOR)/cpu_khz;
+	unsigned long flags, prev_scale, *scale;
+	unsigned long long tsc_now, ns_now;
+
+	local_irq_save(flags);
+	sched_clock_idle_sleep_event();
+
+	scale = &per_cpu(cyc2ns, cpu);
+
+	rdtscll(tsc_now);
+	ns_now = __cycles_2_ns(tsc_now);
+
+	prev_scale = *scale;
+	if (cpu_khz)
+		*scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz;
+
+	/*
+	 * Start smoothly with the new frequency:
+	 */
+	sched_clock_idle_wakeup_event(0);
+	local_irq_restore(flags);
 }
 
 /*
@@ -239,7 +258,9 @@ time_cpufreq_notifier(struct notifier_bl
 						ref_freq, freq->new);
 			if (!(freq->flags & CPUFREQ_CONST_LOOPS)) {
 				tsc_khz = cpu_khz;
-				set_cyc2ns_scale(cpu_khz);
+				preempt_disable();
+				set_cyc2ns_scale(cpu_khz, smp_processor_id());
+				preempt_enable();
 				/*
 				 * TSC based sched_clock turns
 				 * to junk w/ cpufreq
@@ -367,6 +388,8 @@ static inline void check_geode_tsc_relia
 
 void __init tsc_init(void)
 {
+	int cpu;
+
 	if (!cpu_has_tsc || tsc_disable)
 		goto out_no_tsc;
 
@@ -380,7 +403,15 @@ void __init tsc_init(void)
 				(unsigned long)cpu_khz / 1000,
 				(unsigned long)cpu_khz % 1000);
 
-	set_cyc2ns_scale(cpu_khz);
+	/*
+	 * Secondary CPUs do not run through tsc_init(), so set up
+	 * all the scale factors for all CPUs, assuming the same
+	 * speed as the bootup CPU. (cpufreq notifiers will fix this
+	 * up if their speed diverges)
+	 */
+	for_each_possible_cpu(cpu)
+		set_cyc2ns_scale(cpu_khz, cpu);
+
 	use_tsc_delay();
 
 	/* Check and install the TSC clocksource */
Index: linux/arch/x86/kernel/tsc_64.c
===================================================================
--- linux.orig/arch/x86/kernel/tsc_64.c
+++ linux/arch/x86/kernel/tsc_64.c
@@ -10,6 +10,7 @@
 
 #include <asm/hpet.h>
 #include <asm/timex.h>
+#include <asm/timer.h>
 
 static int notsc __initdata = 0;
 
@@ -18,16 +19,48 @@ EXPORT_SYMBOL(cpu_khz);
 unsigned int tsc_khz;
 EXPORT_SYMBOL(tsc_khz);
 
-static unsigned int cyc2ns_scale __read_mostly;
+/* Accelerators for sched_clock()
+ * convert from cycles(64bits) => nanoseconds (64bits)
+ *  basic equation:
+ *		ns = cycles / (freq / ns_per_sec)
+ *		ns = cycles * (ns_per_sec / freq)
+ *		ns = cycles * (10^9 / (cpu_khz * 10^3))
+ *		ns = cycles * (10^6 / cpu_khz)
+ *
+ *	Then we use scaling math (suggested by george@mvista.com) to get:
+ *		ns = cycles * (10^6 * SC / cpu_khz) / SC
+ *		ns = cycles * cyc2ns_scale / SC
+ *
+ *	And since SC is a constant power of two, we can convert the div
+ *  into a shift.
+ *
+ *  We can use khz divisor instead of mhz to keep a better precision, since
+ *  cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits.
+ *  (mathieu.desnoyers@polymtl.ca)
+ *
+ *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
+ */
+DEFINE_PER_CPU(unsigned long, cyc2ns);
 
-static inline void set_cyc2ns_scale(unsigned long khz)
+static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
 {
-	cyc2ns_scale = (NSEC_PER_MSEC << NS_SCALE) / khz;
-}
+	unsigned long flags, prev_scale, *scale;
+	unsigned long long tsc_now, ns_now;
 
-static unsigned long long cycles_2_ns(unsigned long long cyc)
-{
-	return (cyc * cyc2ns_scale) >> NS_SCALE;
+	local_irq_save(flags);
+	sched_clock_idle_sleep_event();
+
+	scale = &per_cpu(cyc2ns, cpu);
+
+	rdtscll(tsc_now);
+	ns_now = __cycles_2_ns(tsc_now);
+
+	prev_scale = *scale;
+	if (cpu_khz)
+		*scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz;
+
+	sched_clock_idle_wakeup_event(0);
+	local_irq_restore(flags);
 }
 
 unsigned long long sched_clock(void)
@@ -100,7 +133,9 @@ static int time_cpufreq_notifier(struct 
 			mark_tsc_unstable("cpufreq changes");
 	}
 
-	set_cyc2ns_scale(tsc_khz_ref);
+	preempt_disable();
+	set_cyc2ns_scale(tsc_khz_ref, smp_processor_id());
+	preempt_enable();
 
 	return 0;
 }
@@ -151,7 +186,7 @@ static unsigned long __init tsc_read_ref
 void __init tsc_calibrate(void)
 {
 	unsigned long flags, tsc1, tsc2, tr1, tr2, pm1, pm2, hpet1, hpet2;
-	int hpet = is_hpet_enabled();
+	int hpet = is_hpet_enabled(), cpu;
 
 	local_irq_save(flags);
 
@@ -206,7 +241,9 @@ void __init tsc_calibrate(void)
 	}
 
 	tsc_khz = tsc2 / tsc1;
-	set_cyc2ns_scale(tsc_khz);
+
+	for_each_possible_cpu(cpu)
+		set_cyc2ns_scale(tsc_khz, cpu);
 }
 
 /*
Index: linux/include/asm-x86/timer.h
===================================================================
--- linux.orig/include/asm-x86/timer.h
+++ linux/include/asm-x86/timer.h
@@ -2,6 +2,7 @@
 #define _ASMi386_TIMER_H
 #include <linux/init.h>
 #include <linux/pm.h>
+#include <linux/percpu.h>
 
 #define TICK_SIZE (tick_nsec / 1000)
 
@@ -16,7 +17,7 @@ extern int recalibrate_cpu_khz(void);
 #define calculate_cpu_khz() native_calculate_cpu_khz()
 #endif
 
-/* Accellerators for sched_clock()
+/* Accelerators for sched_clock()
  * convert from cycles(64bits) => nanoseconds (64bits)
  *  basic equation:
  *		ns = cycles / (freq / ns_per_sec)
@@ -31,20 +32,32 @@ extern int recalibrate_cpu_khz(void);
  *	And since SC is a constant power of two, we can convert the div
  *  into a shift.
  *
- *  We can use khz divisor instead of mhz to keep a better percision, since
+ *  We can use khz divisor instead of mhz to keep a better precision, since
  *  cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits.
  *  (mathieu.desnoyers@polymtl.ca)
  *
  *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
  */
-extern unsigned long cyc2ns_scale __read_mostly;
+
+DECLARE_PER_CPU(unsigned long, cyc2ns);
 
 #define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
 
-static inline unsigned long long cycles_2_ns(unsigned long long cyc)
+static inline unsigned long long __cycles_2_ns(unsigned long long cyc)
 {
-	return (cyc * cyc2ns_scale) >> CYC2NS_SCALE_FACTOR;
+	return cyc * per_cpu(cyc2ns, smp_processor_id()) >> CYC2NS_SCALE_FACTOR;
 }
 
+static inline unsigned long long cycles_2_ns(unsigned long long cyc)
+{
+	unsigned long long ns;
+	unsigned long flags;
+
+	local_irq_save(flags);
+	ns = __cycles_2_ns(cyc);
+	local_irq_restore(flags);
+
+	return ns;
+}
 
 #endif
--------------->
Subject: x86: idle wakeup event in the HLT loop
From: Ingo Molnar <mingo@elte.hu>

do a proper idle-wakeup event on HLT as well - some CPUs stop the TSC
in HLT too, not just when going through the ACPI methods.

(the ACPI idle code already does this.)

[ update the 64-bit side too, as noticed by Jiri Slaby. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/process_32.c |   15 ++++++++++++---
 arch/x86/kernel/process_64.c |   13 ++++++++++---
 2 files changed, 22 insertions(+), 6 deletions(-)

Index: linux-x86.q/arch/x86/kernel/process_32.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/process_32.c
+++ linux-x86.q/arch/x86/kernel/process_32.c
@@ -113,10 +113,19 @@ void default_idle(void)
 		smp_mb();
 
 		local_irq_disable();
-		if (!need_resched())
+		if (!need_resched()) {
+			ktime_t t0, t1;
+			u64 t0n, t1n;
+
+			t0 = ktime_get();
+			t0n = ktime_to_ns(t0);
 			safe_halt();	/* enables interrupts racelessly */
-		else
-			local_irq_enable();
+			local_irq_disable();
+			t1 = ktime_get();
+			t1n = ktime_to_ns(t1);
+			sched_clock_idle_wakeup_event(t1n - t0n);
+		}
+		local_irq_enable();
 		current_thread_info()->status |= TS_POLLING;
 	} else {
 		/* loop is done by the caller */
Index: linux-x86.q/arch/x86/kernel/process_64.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/process_64.c
+++ linux-x86.q/arch/x86/kernel/process_64.c
@@ -116,9 +116,16 @@ static void default_idle(void)
 	smp_mb();
 	local_irq_disable();
 	if (!need_resched()) {
-		/* Enables interrupts one instruction before HLT.
-		   x86 special cases this so there is no race. */
-		safe_halt();
+		ktime_t t0, t1;
+		u64 t0n, t1n;
+
+		t0 = ktime_get();
+		t0n = ktime_to_ns(t0);
+		safe_halt();	/* enables interrupts racelessly */
+		local_irq_disable();
+		t1 = ktime_get();
+		t1n = ktime_to_ns(t1);
+		sched_clock_idle_wakeup_event(t1n - t0n);
 	} else
 		local_irq_enable();
 	current_thread_info()->status |= TS_POLLING;
--------------->
Subject: printk: make printk more robust by not allowing recursion
From: Ingo Molnar <mingo@elte.hu>

make printk more robust by allowing recursion only if there's a crash
going on. Also add recursion detection.

I've tested it with an artificially injected printk recursion - instead
of a lockup or spontaneous reboot or other crash, the output was a well
controlled:

[   41.057335] SysRq : <2>BUG: recent printk recursion!
[   41.057335] loglevel0-8 reBoot Crashdump show-all-locks(D) tErm Full kIll saK showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks

also do all this printk-debug logic with irqs disabled.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/printk.c |   48 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 38 insertions(+), 10 deletions(-)

Index: linux/kernel/printk.c
===================================================================
--- linux.orig/kernel/printk.c
+++ linux/kernel/printk.c
@@ -628,30 +628,57 @@ asmlinkage int printk(const char *fmt, .
 /* cpu currently holding logbuf_lock */
 static volatile unsigned int printk_cpu = UINT_MAX;
 
+const char printk_recursion_bug_msg [] =
+			KERN_CRIT "BUG: recent printk recursion!\n";
+static int printk_recursion_bug;
+
 asmlinkage int vprintk(const char *fmt, va_list args)
 {
+	static int log_level_unknown = 1;
+	static char printk_buf[1024];
+
 	unsigned long flags;
-	int printed_len;
+	int printed_len = 0;
+	int this_cpu;
 	char *p;
-	static char printk_buf[1024];
-	static int log_level_unknown = 1;
 
 	boot_delay_msec();
 
 	preempt_disable();
-	if (unlikely(oops_in_progress) && printk_cpu == smp_processor_id())
-		/* If a crash is occurring during printk() on this CPU,
-		 * make sure we can't deadlock */
-		zap_locks();
-
 	/* This stops the holder of console_sem just where we want him */
 	raw_local_irq_save(flags);
+	this_cpu = smp_processor_id();
+
+	/*
+	 * Ouch, printk recursed into itself!
+	 */
+	if (unlikely(printk_cpu == this_cpu)) {
+		/*
+		 * If a crash is occurring during printk() on this CPU,
+		 * then try to get the crash message out but make sure
+		 * we can't deadlock. Otherwise just return to avoid the
+		 * recursion and return - but flag the recursion so that
+		 * it can be printed at the next appropriate moment:
+		 */
+		if (!oops_in_progress) {
+			printk_recursion_bug = 1;
+			goto out_restore_irqs;
+		}
+		zap_locks();
+	}
+
 	lockdep_off();
 	spin_lock(&logbuf_lock);
-	printk_cpu = smp_processor_id();
+	printk_cpu = this_cpu;
 
+	if (printk_recursion_bug) {
+		printk_recursion_bug = 0;
+		strcpy(printk_buf, printk_recursion_bug_msg);
+		printed_len = sizeof(printk_recursion_bug_msg);
+	}
 	/* Emit the output into the temporary buffer */
-	printed_len = vscnprintf(printk_buf, sizeof(printk_buf), fmt, args);
+	printed_len += vscnprintf(printk_buf + printed_len,
+				  sizeof(printk_buf), fmt, args);
 
 	/*
 	 * Copy the output into log_buf.  If the caller didn't provide
@@ -744,6 +771,7 @@ asmlinkage int vprintk(const char *fmt, 
 		printk_cpu = UINT_MAX;
 		spin_unlock(&logbuf_lock);
 		lockdep_on();
+out_restore_irqs:
 		raw_local_irq_restore(flags);
 	}
 
--------------->
Subject: sched: fix CONFIG_PRINT_TIME's reliance on sched_clock()
From: Ingo Molnar <mingo@elte.hu>

Stefano Brivio reported weird printk timestamp behavior during
CPU frequency changes:

  http://bugzilla.kernel.org/show_bug.cgi?id=9475

fix CONFIG_PRINT_TIME's reliance on sched_clock() and use cpu_clock()
instead.

Reported-and-bisected-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/printk.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/kernel/printk.c
===================================================================
--- linux.orig/kernel/printk.c
+++ linux/kernel/printk.c
@@ -707,7 +707,7 @@ asmlinkage int vprintk(const char *fmt, 
 					loglev_char = default_message_loglevel
 						+ '0';
 				}
-				t = printk_clock();
+				t = cpu_clock(printk_cpu);
 				nanosec_rem = do_div(t, 1000000000);
 				tlen = sprintf(tbuf,
 						"<%c>[%5lu.%06lu] ",

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-11  1:06               ` Arjan van de Ven
@ 2007-12-11  8:43                 ` Ingo Molnar
  0 siblings, 0 replies; 74+ messages in thread
From: Ingo Molnar @ 2007-12-11  8:43 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Guillaume Chazarain, Stefano Brivio, Andrew Morton, rjw,
	linux-kernel, torvalds


* Arjan van de Ven <arjan@infradead.org> wrote:

> > That sounds like a big problem.
> 
> it'll get way worse going forward. (but even on todays systems, the 
> tsc no longer represents frequency, but is some fixed clock totally 
> unrelated to cpu frequency)

X86_FEATURE_CONSTANT_TSC CPUs (all modern Intel CPUs) should be fine - 
we dont do any TSC frequency fixups for them. The loops_per_jiffy fixup 
looks like this:

                if (!(freq->flags & CPUFREQ_CONST_LOOPS))
                        cpu_data(freq->cpu).loops_per_jiffy =
                                cpufreq_scale(loops_per_jiffy_ref,
                                                ref_freq, freq->new);

i.e. X86_FEATURE_CONSTANT_TSC excluded. The sched_clock() scaling factor 
is modified like this:

                        if (!(freq->flags & CPUFREQ_CONST_LOOPS)) {
                                tsc_khz = cpu_khz;
                                preempt_disable();
                                set_cyc2ns_scale(cpu_khz, smp_processor_id());

so here X86_FEATURE_CONSTANT_TSC is excluded again. So the whole 
frequency scaling issue will become a pure legacy issue only with time.

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-11  0:01             ` Guillaume Chazarain
@ 2007-12-11  1:06               ` Arjan van de Ven
  2007-12-11  8:43                 ` Ingo Molnar
  0 siblings, 1 reply; 74+ messages in thread
From: Arjan van de Ven @ 2007-12-11  1:06 UTC (permalink / raw)
  To: Guillaume Chazarain
  Cc: Stefano Brivio, Ingo Molnar, Andrew Morton, rjw, linux-kernel, torvalds

On Tue, 11 Dec 2007 01:01:25 +0100
Guillaume Chazarain <guichaz@yahoo.fr> wrote:

> Arjan van de Ven <arjan@infradead.org> wrote:
> 
> > the frequency of both cores is the maximum of what linux sets each
> > core to;
> 
> Do you mean that the cpufreq code can be confused about the actual
> frequency of the cores? 

it means that cpufreq doesn't know the actual frequency (although bios sometimes tells us about the relationship, often the bios just lies through it's teeth); it only knows what it asks for, not what it gets. We know it'll get at least what it asks for, but it can get more than it asks for basically.

>That sounds like a big problem.

it'll get way worse going forward.
(but even on todays systems, the tsc no longer represents frequency, but is some fixed clock totally unrelated to cpu frequency)

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10 23:56           ` Arjan van de Ven
@ 2007-12-11  0:01             ` Guillaume Chazarain
  2007-12-11  1:06               ` Arjan van de Ven
  0 siblings, 1 reply; 74+ messages in thread
From: Guillaume Chazarain @ 2007-12-11  0:01 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Stefano Brivio, Ingo Molnar, Andrew Morton, rjw, linux-kernel, torvalds

Arjan van de Ven <arjan@infradead.org> wrote:

> the frequency of both cores is the maximum of what linux sets each core to;

Do you mean that the cpufreq code can be confused about the actual
frequency of the cores? That sounds like a big problem.

Thanks for any insight.

-- 
Guillaume

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10 23:34         ` Stefano Brivio
  2007-12-10 23:53           ` Guillaume Chazarain
@ 2007-12-10 23:56           ` Arjan van de Ven
  2007-12-11  0:01             ` Guillaume Chazarain
  2007-12-11  9:01           ` Ingo Molnar
  2 siblings, 1 reply; 74+ messages in thread
From: Arjan van de Ven @ 2007-12-10 23:56 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: Ingo Molnar, Andrew Morton, rjw, linux-kernel, torvalds,
	Guillaume Chazarain

On Tue, 11 Dec 2007 00:34:33 +0100
Stefano Brivio <stefano.brivio@polimi.it> wrote:

> On Tue, 11 Dec 2007 00:04:25 +0100
> Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > * Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > > 
> > > > > what do you think? Right now i've got them queued up for
> > > > > 2.6.25 in both the scheduler-devel and the x86-devel git
> > > > > trees - but can submit them for 2.6.24 if it's better if we
> > > > > did them there. I've got no strong opinion either way.
> > > > 
> > > > printk_clock() doesn't seem terribly important but what's this
> > > > stuff about effects on udelay/mdelay?  That can be serious if
> > > > they're getting shortened.
> > > 
> > > since udelay depends on loops_per_jiffy, which is fixed up 
> > > time_cpufreq_notifier(), i dont see how it could be affected by 
> > > frequency changes. (but that's the theory - practice might be 
> > > different)
> > 
> > Stefano Brivio reported udelay()/mdelay() effects in the b43
> > driver. (and it caused driver failures for him.)
> > 
> > Stefano, could you please try to sum up your experiences with that 
> > issue? Is it reproducable, and the 5 patches i did fix it? (if yes, 
> > could you try to re-do the mdelay verifications perhaps, to make
> > sure it's not some other effect interacting here. In theory
> > sched-clock scaling has no effect on udelay behavior.)
> 
> Sorry for disappearing. Anyway, yes, those patches fixed it.
> Precision in delays isn't that good when using my crappy unstable TSC
> (mdelay(2000) causes delays between 2 and 2.9 seconds) but it's not
> depending on frequency changes anymore. So I'd say it's fixed, but
> please tell me if you want me to do any other test so as to be sure
> it is.
> 
> 
I'm still quite concerned about this in dual/quad core scenarios;
the frequency of both cores is the maximum of what linux sets each core to;
this means that if you're THIS sensitive to that there still is quite a nasty issue there.

I wonder if the various delay functions (maybe only in .25) should use the maximum observed loops_per_jiffie instead always (across cpus) to be super safe here.

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10 23:34         ` Stefano Brivio
@ 2007-12-10 23:53           ` Guillaume Chazarain
  2007-12-11  8:48             ` Ingo Molnar
  2007-12-10 23:56           ` Arjan van de Ven
  2007-12-11  9:01           ` Ingo Molnar
  2 siblings, 1 reply; 74+ messages in thread
From: Guillaume Chazarain @ 2007-12-10 23:53 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Ingo Molnar, Andrew Morton, rjw, linux-kernel, torvalds

Stefano Brivio <stefano.brivio@polimi.it> wrote:

> Sorry for disappearing. Anyway, yes, those patches fixed it. Precision in
> delays isn't that good when using my crappy unstable TSC (mdelay(2000)
> causes delays between 2 and 2.9 seconds) but it's not depending on frequency
> changes anymore. So I'd say it's fixed, but please tell me if you want me
> to do any other test so as to be sure it is.

Ingo,

it seems you dropped http://lkml.org/lkml/2007/12/7/100 (cpu_clock()
based udelay), so how udelay can be affected by your proposed changes?

Thanks.

-- 
Guillaume

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10 23:04       ` Ingo Molnar
@ 2007-12-10 23:34         ` Stefano Brivio
  2007-12-10 23:53           ` Guillaume Chazarain
                             ` (2 more replies)
  0 siblings, 3 replies; 74+ messages in thread
From: Stefano Brivio @ 2007-12-10 23:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, rjw, linux-kernel, torvalds, Guillaume Chazarain

On Tue, 11 Dec 2007 00:04:25 +0100
Ingo Molnar <mingo@elte.hu> wrote:
> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > > what do you think? Right now i've got them queued up for 2.6.25 in 
> > > > both the scheduler-devel and the x86-devel git trees - but can 
> > > > submit them for 2.6.24 if it's better if we did them there. I've got 
> > > > no strong opinion either way.
> > > 
> > > printk_clock() doesn't seem terribly important but what's this stuff 
> > > about effects on udelay/mdelay?  That can be serious if they're 
> > > getting shortened.
> > 
> > since udelay depends on loops_per_jiffy, which is fixed up 
> > time_cpufreq_notifier(), i dont see how it could be affected by 
> > frequency changes. (but that's the theory - practice might be 
> > different)
> 
> Stefano Brivio reported udelay()/mdelay() effects in the b43 driver. 
> (and it caused driver failures for him.)
> 
> Stefano, could you please try to sum up your experiences with that 
> issue? Is it reproducable, and the 5 patches i did fix it? (if yes, 
> could you try to re-do the mdelay verifications perhaps, to make sure 
> it's not some other effect interacting here. In theory sched-clock 
> scaling has no effect on udelay behavior.)

Sorry for disappearing. Anyway, yes, those patches fixed it. Precision in
delays isn't that good when using my crappy unstable TSC (mdelay(2000)
causes delays between 2 and 2.9 seconds) but it's not depending on frequency
changes anymore. So I'd say it's fixed, but please tell me if you want me
to do any other test so as to be sure it is.


--
Ciao
Stefano

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10 22:45     ` Ingo Molnar
@ 2007-12-10 23:04       ` Ingo Molnar
  2007-12-10 23:34         ` Stefano Brivio
  0 siblings, 1 reply; 74+ messages in thread
From: Ingo Molnar @ 2007-12-10 23:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: rjw, linux-kernel, torvalds, Stefano Brivio, Guillaume Chazarain


* Ingo Molnar <mingo@elte.hu> wrote:

> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > > what do you think? Right now i've got them queued up for 2.6.25 in 
> > > both the scheduler-devel and the x86-devel git trees - but can 
> > > submit them for 2.6.24 if it's better if we did them there. I've got 
> > > no strong opinion either way.
> > 
> > printk_clock() doesn't seem terribly important but what's this stuff 
> > about effects on udelay/mdelay?  That can be serious if they're 
> > getting shortened.
> 
> since udelay depends on loops_per_jiffy, which is fixed up 
> time_cpufreq_notifier(), i dont see how it could be affected by 
> frequency changes. (but that's the theory - practice might be 
> different)

Stefano Brivio reported udelay()/mdelay() effects in the b43 driver. 
(and it caused driver failures for him.)

Stefano, could you please try to sum up your experiences with that 
issue? Is it reproducable, and the 5 patches i did fix it? (if yes, 
could you try to re-do the mdelay verifications perhaps, to make sure 
it's not some other effect interacting here. In theory sched-clock 
scaling has no effect on udelay behavior.)

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10 20:59   ` Andrew Morton
@ 2007-12-10 22:45     ` Ingo Molnar
  2007-12-10 23:04       ` Ingo Molnar
  0 siblings, 1 reply; 74+ messages in thread
From: Ingo Molnar @ 2007-12-10 22:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: rjw, linux-kernel, torvalds


* Andrew Morton <akpm@linux-foundation.org> wrote:

> > what do you think? Right now i've got them queued up for 2.6.25 in 
> > both the scheduler-devel and the x86-devel git trees - but can 
> > submit them for 2.6.24 if it's better if we did them there. I've got 
> > no strong opinion either way.
> 
> printk_clock() doesn't seem terribly important but what's this stuff 
> about effects on udelay/mdelay?  That can be serious if they're 
> getting shortened.

since udelay depends on loops_per_jiffy, which is fixed up 
time_cpufreq_notifier(), i dont see how it could be affected by 
frequency changes. (but that's the theory - practice might be different)

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10 20:42 ` Ingo Molnar
  2007-12-10 20:57   ` Guillaume Chazarain
@ 2007-12-10 20:59   ` Andrew Morton
  2007-12-10 22:45     ` Ingo Molnar
  1 sibling, 1 reply; 74+ messages in thread
From: Andrew Morton @ 2007-12-10 20:59 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: rjw, linux-kernel, torvalds

On Mon, 10 Dec 2007 21:42:12 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > Subject		: jiffies counter leaps in 2.6.24-rc3
> > Submitter	: Stefano Brivio <stefano.brivio@polimi.it>
> > References	: http://lkml.org/lkml/2007/11/24/53
> > 		  http://bugzilla.kernel.org/show_bug.cgi?id=9475
> > Handled-By	: Ingo Molnar <mingo@elte.hu>
> > Patch		: http://lkml.org/lkml/2007/12/7/132
> 
> Linus, Andrew, i need some help deciding what to do about this 
> regression. The fixes for this have been tested and resolve the 
> regression, but they change printk and other code that runs by default 
> and is thus rather invasive so late in the v2.6.24 cycle. This bug 
> should only affect CONFIG_PRINTK_TIME=y kernels (a non-default debug 
> option) - although some claimed effect was on udelay()/mdelay() too.
> 
> i've attached below the queue of 5 patches that fix this problem. They 
> have been build and boot tested with more than 1000 random kernels in 
> the past few days, so i certainly trust the core and x86 bits of this.
> 
> what do you think? Right now i've got them queued up for 2.6.25 in both 
> the scheduler-devel and the x86-devel git trees - but can submit them 
> for 2.6.24 if it's better if we did them there. I've got no strong 
> opinion either way.

printk_clock() doesn't seem terribly important but what's this stuff about
effects on udelay/mdelay?  That can be serious if they're getting
shortened.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10 20:42 ` Ingo Molnar
@ 2007-12-10 20:57   ` Guillaume Chazarain
  2007-12-10 20:59   ` Andrew Morton
  1 sibling, 0 replies; 74+ messages in thread
From: Guillaume Chazarain @ 2007-12-10 20:57 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rafael J. Wysocki, LKML, Andrew Morton, Linus Torvalds

On Dec 10, 2007 9:42 PM, Ingo Molnar <mingo@elte.hu> wrote:
> although some claimed effect was on udelay()/mdelay() too.

Any specific report?
The jumping sched_clock on frequency change caused some
scheduling oddities for me, but CFS attenuated the effect.

Thanks.

-- 
Guillaume

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  2:40 Rafael J. Wysocki
                   ` (7 preceding siblings ...)
  2007-12-09 11:54 ` Andrew Morton
@ 2007-12-10 20:42 ` Ingo Molnar
  2007-12-10 20:57   ` Guillaume Chazarain
  2007-12-10 20:59   ` Andrew Morton
  8 siblings, 2 replies; 74+ messages in thread
From: Ingo Molnar @ 2007-12-10 20:42 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, Andrew Morton, Linus Torvalds


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> Subject		: jiffies counter leaps in 2.6.24-rc3
> Submitter	: Stefano Brivio <stefano.brivio@polimi.it>
> References	: http://lkml.org/lkml/2007/11/24/53
> 		  http://bugzilla.kernel.org/show_bug.cgi?id=9475
> Handled-By	: Ingo Molnar <mingo@elte.hu>
> Patch		: http://lkml.org/lkml/2007/12/7/132

Linus, Andrew, i need some help deciding what to do about this 
regression. The fixes for this have been tested and resolve the 
regression, but they change printk and other code that runs by default 
and is thus rather invasive so late in the v2.6.24 cycle. This bug 
should only affect CONFIG_PRINTK_TIME=y kernels (a non-default debug 
option) - although some claimed effect was on udelay()/mdelay() too.

i've attached below the queue of 5 patches that fix this problem. They 
have been build and boot tested with more than 1000 random kernels in 
the past few days, so i certainly trust the core and x86 bits of this.

what do you think? Right now i've got them queued up for 2.6.25 in both 
the scheduler-devel and the x86-devel git trees - but can submit them 
for 2.6.24 if it's better if we did them there. I've got no strong 
opinion either way.

	Ingo

-------------------->
Subject: x86: scale cyc_2_nsec according to CPU frequency
From: "Guillaume Chazarain" <guichaz@yahoo.fr>

scale the sched_clock() cyc_2_nsec scaling factor according to
CPU frequency changes.

[ mingo@elte.hu: simplified it and fixed it for SMP. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/tsc_32.c |   43 ++++++++++++++++++++++++++++++-----
 arch/x86/kernel/tsc_64.c |   57 ++++++++++++++++++++++++++++++++++++++---------
 include/asm-x86/timer.h  |   23 ++++++++++++++----
 3 files changed, 102 insertions(+), 21 deletions(-)

Index: linux/arch/x86/kernel/tsc_32.c
===================================================================
--- linux.orig/arch/x86/kernel/tsc_32.c
+++ linux/arch/x86/kernel/tsc_32.c
@@ -5,6 +5,7 @@
 #include <linux/jiffies.h>
 #include <linux/init.h>
 #include <linux/dmi.h>
+#include <linux/percpu.h>
 
 #include <asm/delay.h>
 #include <asm/tsc.h>
@@ -80,13 +81,31 @@ EXPORT_SYMBOL_GPL(check_tsc_unstable);
  *
  *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
  */
-unsigned long cyc2ns_scale __read_mostly;
 
-#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
+DEFINE_PER_CPU(unsigned long, cyc2ns);
 
-static inline void set_cyc2ns_scale(unsigned long cpu_khz)
+static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
 {
-	cyc2ns_scale = (1000000 << CYC2NS_SCALE_FACTOR)/cpu_khz;
+	unsigned long flags, prev_scale, *scale;
+	unsigned long long tsc_now, ns_now;
+
+	local_irq_save(flags);
+	sched_clock_idle_sleep_event();
+
+	scale = &per_cpu(cyc2ns, cpu);
+
+	rdtscll(tsc_now);
+	ns_now = __cycles_2_ns(tsc_now);
+
+	prev_scale = *scale;
+	if (cpu_khz)
+		*scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz;
+
+	/*
+	 * Start smoothly with the new frequency:
+	 */
+	sched_clock_idle_wakeup_event(0);
+	local_irq_restore(flags);
 }
 
 /*
@@ -239,7 +258,9 @@ time_cpufreq_notifier(struct notifier_bl
 						ref_freq, freq->new);
 			if (!(freq->flags & CPUFREQ_CONST_LOOPS)) {
 				tsc_khz = cpu_khz;
-				set_cyc2ns_scale(cpu_khz);
+				preempt_disable();
+				set_cyc2ns_scale(cpu_khz, smp_processor_id());
+				preempt_enable();
 				/*
 				 * TSC based sched_clock turns
 				 * to junk w/ cpufreq
@@ -367,6 +388,8 @@ static inline void check_geode_tsc_relia
 
 void __init tsc_init(void)
 {
+	int cpu;
+
 	if (!cpu_has_tsc || tsc_disable)
 		goto out_no_tsc;
 
@@ -380,7 +403,15 @@ void __init tsc_init(void)
 				(unsigned long)cpu_khz / 1000,
 				(unsigned long)cpu_khz % 1000);
 
-	set_cyc2ns_scale(cpu_khz);
+	/*
+	 * Secondary CPUs do not run through tsc_init(), so set up
+	 * all the scale factors for all CPUs, assuming the same
+	 * speed as the bootup CPU. (cpufreq notifiers will fix this
+	 * up if their speed diverges)
+	 */
+	for_each_possible_cpu(cpu)
+		set_cyc2ns_scale(cpu_khz, cpu);
+
 	use_tsc_delay();
 
 	/* Check and install the TSC clocksource */
Index: linux/arch/x86/kernel/tsc_64.c
===================================================================
--- linux.orig/arch/x86/kernel/tsc_64.c
+++ linux/arch/x86/kernel/tsc_64.c
@@ -10,6 +10,7 @@
 
 #include <asm/hpet.h>
 #include <asm/timex.h>
+#include <asm/timer.h>
 
 static int notsc __initdata = 0;
 
@@ -18,16 +19,48 @@ EXPORT_SYMBOL(cpu_khz);
 unsigned int tsc_khz;
 EXPORT_SYMBOL(tsc_khz);
 
-static unsigned int cyc2ns_scale __read_mostly;
+/* Accelerators for sched_clock()
+ * convert from cycles(64bits) => nanoseconds (64bits)
+ *  basic equation:
+ *		ns = cycles / (freq / ns_per_sec)
+ *		ns = cycles * (ns_per_sec / freq)
+ *		ns = cycles * (10^9 / (cpu_khz * 10^3))
+ *		ns = cycles * (10^6 / cpu_khz)
+ *
+ *	Then we use scaling math (suggested by george@mvista.com) to get:
+ *		ns = cycles * (10^6 * SC / cpu_khz) / SC
+ *		ns = cycles * cyc2ns_scale / SC
+ *
+ *	And since SC is a constant power of two, we can convert the div
+ *  into a shift.
+ *
+ *  We can use khz divisor instead of mhz to keep a better precision, since
+ *  cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits.
+ *  (mathieu.desnoyers@polymtl.ca)
+ *
+ *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
+ */
+DEFINE_PER_CPU(unsigned long, cyc2ns);
 
-static inline void set_cyc2ns_scale(unsigned long khz)
+static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
 {
-	cyc2ns_scale = (NSEC_PER_MSEC << NS_SCALE) / khz;
-}
+	unsigned long flags, prev_scale, *scale;
+	unsigned long long tsc_now, ns_now;
 
-static unsigned long long cycles_2_ns(unsigned long long cyc)
-{
-	return (cyc * cyc2ns_scale) >> NS_SCALE;
+	local_irq_save(flags);
+	sched_clock_idle_sleep_event();
+
+	scale = &per_cpu(cyc2ns, cpu);
+
+	rdtscll(tsc_now);
+	ns_now = __cycles_2_ns(tsc_now);
+
+	prev_scale = *scale;
+	if (cpu_khz)
+		*scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz;
+
+	sched_clock_idle_wakeup_event(0);
+	local_irq_restore(flags);
 }
 
 unsigned long long sched_clock(void)
@@ -100,7 +133,9 @@ static int time_cpufreq_notifier(struct 
 			mark_tsc_unstable("cpufreq changes");
 	}
 
-	set_cyc2ns_scale(tsc_khz_ref);
+	preempt_disable();
+	set_cyc2ns_scale(tsc_khz_ref, smp_processor_id());
+	preempt_enable();
 
 	return 0;
 }
@@ -151,7 +186,7 @@ static unsigned long __init tsc_read_ref
 void __init tsc_calibrate(void)
 {
 	unsigned long flags, tsc1, tsc2, tr1, tr2, pm1, pm2, hpet1, hpet2;
-	int hpet = is_hpet_enabled();
+	int hpet = is_hpet_enabled(), cpu;
 
 	local_irq_save(flags);
 
@@ -206,7 +241,9 @@ void __init tsc_calibrate(void)
 	}
 
 	tsc_khz = tsc2 / tsc1;
-	set_cyc2ns_scale(tsc_khz);
+
+	for_each_possible_cpu(cpu)
+		set_cyc2ns_scale(tsc_khz, cpu);
 }
 
 /*
Index: linux/include/asm-x86/timer.h
===================================================================
--- linux.orig/include/asm-x86/timer.h
+++ linux/include/asm-x86/timer.h
@@ -2,6 +2,7 @@
 #define _ASMi386_TIMER_H
 #include <linux/init.h>
 #include <linux/pm.h>
+#include <linux/percpu.h>
 
 #define TICK_SIZE (tick_nsec / 1000)
 
@@ -16,7 +17,7 @@ extern int recalibrate_cpu_khz(void);
 #define calculate_cpu_khz() native_calculate_cpu_khz()
 #endif
 
-/* Accellerators for sched_clock()
+/* Accelerators for sched_clock()
  * convert from cycles(64bits) => nanoseconds (64bits)
  *  basic equation:
  *		ns = cycles / (freq / ns_per_sec)
@@ -31,20 +32,32 @@ extern int recalibrate_cpu_khz(void);
  *	And since SC is a constant power of two, we can convert the div
  *  into a shift.
  *
- *  We can use khz divisor instead of mhz to keep a better percision, since
+ *  We can use khz divisor instead of mhz to keep a better precision, since
  *  cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits.
  *  (mathieu.desnoyers@polymtl.ca)
  *
  *			-johnstul@us.ibm.com "math is hard, lets go shopping!"
  */
-extern unsigned long cyc2ns_scale __read_mostly;
+
+DECLARE_PER_CPU(unsigned long, cyc2ns);
 
 #define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
 
-static inline unsigned long long cycles_2_ns(unsigned long long cyc)
+static inline unsigned long long __cycles_2_ns(unsigned long long cyc)
 {
-	return (cyc * cyc2ns_scale) >> CYC2NS_SCALE_FACTOR;
+	return cyc * per_cpu(cyc2ns, smp_processor_id()) >> CYC2NS_SCALE_FACTOR;
 }
 
+static inline unsigned long long cycles_2_ns(unsigned long long cyc)
+{
+	unsigned long long ns;
+	unsigned long flags;
+
+	local_irq_save(flags);
+	ns = __cycles_2_ns(cyc);
+	local_irq_restore(flags);
+
+	return ns;
+}
 
 #endif
--------------->
Subject: x86: idle wakeup event in the HLT loop
From: Ingo Molnar <mingo@elte.hu>

do a proper idle-wakeup event on HLT as well - some CPUs stop the TSC
in HLT too, not just when going through the ACPI methods.

(the ACPI idle code already does this.)

[ update the 64-bit side too, as noticed by Jiri Slaby. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/process_32.c |   15 ++++++++++++---
 arch/x86/kernel/process_64.c |   13 ++++++++++---
 2 files changed, 22 insertions(+), 6 deletions(-)

Index: linux-x86.q/arch/x86/kernel/process_32.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/process_32.c
+++ linux-x86.q/arch/x86/kernel/process_32.c
@@ -113,10 +113,19 @@ void default_idle(void)
 		smp_mb();
 
 		local_irq_disable();
-		if (!need_resched())
+		if (!need_resched()) {
+			ktime_t t0, t1;
+			u64 t0n, t1n;
+
+			t0 = ktime_get();
+			t0n = ktime_to_ns(t0);
 			safe_halt();	/* enables interrupts racelessly */
-		else
-			local_irq_enable();
+			local_irq_disable();
+			t1 = ktime_get();
+			t1n = ktime_to_ns(t1);
+			sched_clock_idle_wakeup_event(t1n - t0n);
+		}
+		local_irq_enable();
 		current_thread_info()->status |= TS_POLLING;
 	} else {
 		/* loop is done by the caller */
Index: linux-x86.q/arch/x86/kernel/process_64.c
===================================================================
--- linux-x86.q.orig/arch/x86/kernel/process_64.c
+++ linux-x86.q/arch/x86/kernel/process_64.c
@@ -116,9 +116,16 @@ static void default_idle(void)
 	smp_mb();
 	local_irq_disable();
 	if (!need_resched()) {
-		/* Enables interrupts one instruction before HLT.
-		   x86 special cases this so there is no race. */
-		safe_halt();
+		ktime_t t0, t1;
+		u64 t0n, t1n;
+
+		t0 = ktime_get();
+		t0n = ktime_to_ns(t0);
+		safe_halt();	/* enables interrupts racelessly */
+		local_irq_disable();
+		t1 = ktime_get();
+		t1n = ktime_to_ns(t1);
+		sched_clock_idle_wakeup_event(t1n - t0n);
 	} else
 		local_irq_enable();
 	current_thread_info()->status |= TS_POLLING;
--------------->
Subject: printk: make printk more robust by not allowing recursion
From: Ingo Molnar <mingo@elte.hu>

make printk more robust by allowing recursion only if there's a crash
going on. Also add recursion detection.

I've tested it with an artificially injected printk recursion - instead
of a lockup or spontaneous reboot or other crash, the output was a well
controlled:

[   41.057335] SysRq : <2>BUG: recent printk recursion!
[   41.057335] loglevel0-8 reBoot Crashdump show-all-locks(D) tErm Full kIll saK showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks

also do all this printk-debug logic with irqs disabled.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/printk.c |   48 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 38 insertions(+), 10 deletions(-)

Index: linux/kernel/printk.c
===================================================================
--- linux.orig/kernel/printk.c
+++ linux/kernel/printk.c
@@ -628,30 +628,57 @@ asmlinkage int printk(const char *fmt, .
 /* cpu currently holding logbuf_lock */
 static volatile unsigned int printk_cpu = UINT_MAX;
 
+const char printk_recursion_bug_msg [] =
+			KERN_CRIT "BUG: recent printk recursion!\n";
+static int printk_recursion_bug;
+
 asmlinkage int vprintk(const char *fmt, va_list args)
 {
+	static int log_level_unknown = 1;
+	static char printk_buf[1024];
+
 	unsigned long flags;
-	int printed_len;
+	int printed_len = 0;
+	int this_cpu;
 	char *p;
-	static char printk_buf[1024];
-	static int log_level_unknown = 1;
 
 	boot_delay_msec();
 
 	preempt_disable();
-	if (unlikely(oops_in_progress) && printk_cpu == smp_processor_id())
-		/* If a crash is occurring during printk() on this CPU,
-		 * make sure we can't deadlock */
-		zap_locks();
-
 	/* This stops the holder of console_sem just where we want him */
 	raw_local_irq_save(flags);
+	this_cpu = smp_processor_id();
+
+	/*
+	 * Ouch, printk recursed into itself!
+	 */
+	if (unlikely(printk_cpu == this_cpu)) {
+		/*
+		 * If a crash is occurring during printk() on this CPU,
+		 * then try to get the crash message out but make sure
+		 * we can't deadlock. Otherwise just return to avoid the
+		 * recursion and return - but flag the recursion so that
+		 * it can be printed at the next appropriate moment:
+		 */
+		if (!oops_in_progress) {
+			printk_recursion_bug = 1;
+			goto out_restore_irqs;
+		}
+		zap_locks();
+	}
+
 	lockdep_off();
 	spin_lock(&logbuf_lock);
-	printk_cpu = smp_processor_id();
+	printk_cpu = this_cpu;
 
+	if (printk_recursion_bug) {
+		printk_recursion_bug = 0;
+		strcpy(printk_buf, printk_recursion_bug_msg);
+		printed_len = sizeof(printk_recursion_bug_msg);
+	}
 	/* Emit the output into the temporary buffer */
-	printed_len = vscnprintf(printk_buf, sizeof(printk_buf), fmt, args);
+	printed_len += vscnprintf(printk_buf + printed_len,
+				  sizeof(printk_buf), fmt, args);
 
 	/*
 	 * Copy the output into log_buf.  If the caller didn't provide
@@ -744,6 +771,7 @@ asmlinkage int vprintk(const char *fmt, 
 		printk_cpu = UINT_MAX;
 		spin_unlock(&logbuf_lock);
 		lockdep_on();
+out_restore_irqs:
 		raw_local_irq_restore(flags);
 	}
 
--------------->
Subject: sched: fix CONFIG_PRINT_TIME's reliance on sched_clock()
From: Ingo Molnar <mingo@elte.hu>

Stefano Brivio reported weird printk timestamp behavior during
CPU frequency changes:

  http://bugzilla.kernel.org/show_bug.cgi?id=9475

fix CONFIG_PRINT_TIME's reliance on sched_clock() and use cpu_clock()
instead.

Reported-and-bisected-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/printk.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/kernel/printk.c
===================================================================
--- linux.orig/kernel/printk.c
+++ linux/kernel/printk.c
@@ -707,7 +707,7 @@ asmlinkage int vprintk(const char *fmt, 
 					loglev_char = default_message_loglevel
 						+ '0';
 				}
-				t = printk_clock();
+				t = cpu_clock(printk_cpu);
 				nanosec_rem = do_div(t, 1000000000);
 				tlen = sprintf(tbuf,
 						"<%c>[%5lu.%06lu] ",
--------------->
Subject: sched: remove printk_clock()
From: Ingo Molnar <mingo@elte.hu>

printk_clock() is obsolete - it has been replaced with cpu_clock().

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/arm/kernel/time.c  |   11 -----------
 arch/ia64/kernel/time.c |   27 ---------------------------
 kernel/printk.c         |    5 -----
 3 files changed, 43 deletions(-)

Index: linux/arch/arm/kernel/time.c
===================================================================
--- linux.orig/arch/arm/kernel/time.c
+++ linux/arch/arm/kernel/time.c
@@ -79,17 +79,6 @@ static unsigned long dummy_gettimeoffset
 }
 #endif
 
-/*
- * An implementation of printk_clock() independent from
- * sched_clock().  This avoids non-bootable kernels when
- * printk_clock is enabled.
- */
-unsigned long long printk_clock(void)
-{
-	return (unsigned long long)(jiffies - INITIAL_JIFFIES) *
-			(1000000000 / HZ);
-}
-
 static unsigned long next_rtc_update;
 
 /*
Index: linux/arch/ia64/kernel/time.c
===================================================================
--- linux.orig/arch/ia64/kernel/time.c
+++ linux/arch/ia64/kernel/time.c
@@ -344,33 +344,6 @@ udelay (unsigned long usecs)
 }
 EXPORT_SYMBOL(udelay);
 
-static unsigned long long ia64_itc_printk_clock(void)
-{
-	if (ia64_get_kr(IA64_KR_PER_CPU_DATA))
-		return sched_clock();
-	return 0;
-}
-
-static unsigned long long ia64_default_printk_clock(void)
-{
-	return (unsigned long long)(jiffies_64 - INITIAL_JIFFIES) *
-		(1000000000/HZ);
-}
-
-unsigned long long (*ia64_printk_clock)(void) = &ia64_default_printk_clock;
-
-unsigned long long printk_clock(void)
-{
-	return ia64_printk_clock();
-}
-
-void __init
-ia64_setup_printk_clock(void)
-{
-	if (!(sal_platform_features & IA64_SAL_PLATFORM_FEATURE_ITC_DRIFT))
-		ia64_printk_clock = ia64_itc_printk_clock;
-}
-
 /* IA64 doesn't cache the timezone */
 void update_vsyscall_tz(void)
 {
Index: linux/kernel/printk.c
===================================================================
--- linux.orig/kernel/printk.c
+++ linux/kernel/printk.c
@@ -573,11 +573,6 @@ static int __init printk_time_setup(char
 
 __setup("time", printk_time_setup);
 
-__attribute__((weak)) unsigned long long printk_clock(void)
-{
-	return sched_clock();
-}
-
 /* Check if we have any console registered that can be called early in boot. */
 static int have_callable_console(void)
 {

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10  3:38             ` Alan Cox
@ 2007-12-10 15:38               ` Linus Torvalds
  0 siblings, 0 replies; 74+ messages in thread
From: Linus Torvalds @ 2007-12-10 15:38 UTC (permalink / raw)
  To: Alan Cox; +Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar



On Mon, 10 Dec 2007, Alan Cox wrote:
> 
> And as I keep pointing out but you keep ignoring - not doing it breaks
> even more things, by a factor of quite a lot.

But we've never done it before in libata, right?

So the "not doing it breaks" argument is about stuff that isn't 
regressions.

Can you really not see the difference?

		Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10  8:27               ` Tejun Heo
@ 2007-12-10  8:41                 ` Ingo Molnar
  0 siblings, 0 replies; 74+ messages in thread
From: Ingo Molnar @ 2007-12-10  8:41 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Linus Torvalds, Alan Cox, Andrew Morton, Rafael J. Wysocki, LKML


* Tejun Heo <htejun@gmail.com> wrote:

> The following git tree contains patches pending review for 2.6.25.
> 
> http://git.kernel.org/?p=linux/kernel/git/tj/libata-dev.git;a=shortlog;h=improve-ATAPI-data-transfer-no-pio
> 
> And we're getting close to fixing the regression.  I don't think 
> there's too much worry about this one.  Just need a bit more time to 
> test few more things.

ah, i see, the joys of the kernel running BIOS written code (AML):

  http://bugzilla.kernel.org/attachment.cgi?id=13932&action=view

cute!

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10  8:21             ` Ingo Molnar
@ 2007-12-10  8:27               ` Tejun Heo
  2007-12-10  8:41                 ` Ingo Molnar
  0 siblings, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2007-12-10  8:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Alan Cox, Andrew Morton, Rafael J. Wysocki, LKML

Ingo Molnar wrote:
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
>> Tejun already reported that this apparently gets fixed _properly_ with 
>> the more extensive cleanups and fixes that are pending for 2.6.25.
> 
> btw., how extensive are those cleanups and fixes in reality, is there a 
> rollup somewhere one could take a look at? Those fixes and cleanups were 
> deferred to v2.6.25 in the knowledge of having the current code included 
> in v2.6.24 - but now that the current approach seems to regress, maybe 
> those cleanups are still safe enough. (compared to an outright revert)

The following git tree contains patches pending review for 2.6.25.

http://git.kernel.org/?p=linux/kernel/git/tj/libata-dev.git;a=shortlog;h=improve-ATAPI-data-transfer-no-pio

And we're getting close to fixing the regression.  I don't think there's
too much worry about this one.  Just need a bit more time to test few
more things.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10  1:57           ` Linus Torvalds
  2007-12-10  3:28             ` Alan Cox
  2007-12-10  3:38             ` Alan Cox
@ 2007-12-10  8:21             ` Ingo Molnar
  2007-12-10  8:27               ` Tejun Heo
  2 siblings, 1 reply; 74+ messages in thread
From: Ingo Molnar @ 2007-12-10  8:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Tejun already reported that this apparently gets fixed _properly_ with 
> the more extensive cleanups and fixes that are pending for 2.6.25.

btw., how extensive are those cleanups and fixes in reality, is there a 
rollup somewhere one could take a look at? Those fixes and cleanups were 
deferred to v2.6.25 in the knowledge of having the current code included 
in v2.6.24 - but now that the current approach seems to regress, maybe 
those cleanups are still safe enough. (compared to an outright revert)

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10  1:57           ` Linus Torvalds
  2007-12-10  3:28             ` Alan Cox
@ 2007-12-10  3:38             ` Alan Cox
  2007-12-10 15:38               ` Linus Torvalds
  2007-12-10  8:21             ` Ingo Molnar
  2 siblings, 1 reply; 74+ messages in thread
From: Alan Cox @ 2007-12-10  3:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar

> Have you even *read* the thread?

In detail, as it unfolds and while testing variants of Tejun's code on
the hardware I have access to - none of which has this bug making it
rather trickier to help.

> In other words, the stuff you call so critically important (yet we've been 
> able to live without it until now!) is apparently simply NOT YET READY. 
> It's breaking things.

And as I keep pointing out but you keep ignoring - not doing it breaks
even more things, by a factor of quite a lot.

> .. and what the hell does that matter? If the code doesn't work, it 
> doesn't work, and you might as well point to some random scribblings done 
> by a three-year-old on toilet paper rather than any "specs".

The code without the changes doesn't work either. So pick your toilet
paper.. by your argument both are toilet paper.

> causes regressions should be reverted, so that 2.6.24 is at least no worse 
> than 2.6.23 (and all earlier kernels) in this respect.

Which as the distro bug lists for ATAPI will tell you - aint good. Still
distro vendors can ship patches.

> We used to allow regressions. It was really painful. It's hard to debug 
> things when things sometimes break. It's much better to have a nice 
> constant monotonic improvement.

Linus, the kernel regresses all over the place every release. If it
didn't do that you'd never get any changes in. Your kernel would
fossilize like RHEL or SLES and you'd be spending weeks analysing each
changeset for possible side effects, or - as happens by neccessity -
adding code paths so a fix vital to one driver ceases to share core code
with another driver - to reduce regression risk. Been there, done that
and its not the way progress happens.

> It's better for users, but it's much better also for developers, even if 
> you may be frustrated right now because some new code effectively gets 
> shut down until it works for everybody.

Have fun. I trust you'll be fixing the other 11 I think it was listed
regressions before 2.6.24  - or backing out every changeset that could be
responsible ?

No I thought not - because that wouldn't be sensible either.

Alan

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-10  1:57           ` Linus Torvalds
@ 2007-12-10  3:28             ` Alan Cox
  2007-12-10  3:38             ` Alan Cox
  2007-12-10  8:21             ` Ingo Molnar
  2 siblings, 0 replies; 74+ messages in thread
From: Alan Cox @ 2007-12-10  3:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar

Its your kernel. Its your call, and your privilege to be wrong.

And anyone with ATAPI problems should probably test the -mm tree before
reporting anything.

Alan

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 22:01         ` Alan Cox
  2007-12-09 22:51           ` Ray Lee
@ 2007-12-10  1:57           ` Linus Torvalds
  2007-12-10  3:28             ` Alan Cox
                               ` (2 more replies)
  1 sibling, 3 replies; 74+ messages in thread
From: Linus Torvalds @ 2007-12-10  1:57 UTC (permalink / raw)
  To: Alan Cox; +Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar



On Sun, 9 Dec 2007, Alan Cox wrote:
> 
> The one off regression is probably not one off, but this is IDE so
> actually its quite probable its a single broken firmware. 
>
> The alternative is that you cripple just about every user of various
> other standards compliant devices and controllers whose hardware we
> finally fixed.

Alan, you're so full of shit that it's not even funny.

Have you even *read* the thread?

Tejun already reported that this apparently gets fixed _properly_ with the 
more extensive cleanups and fixes that are pending for 2.6.25.

In other words, the stuff you call so critically important (yet we've been 
able to live without it until now!) is apparently simply NOT YET READY. 
It's breaking things.

In this case, Tejun seems to be right on the money.  I also agree 100% 
with him when he says

   "Blacklist takes time to develop and temporary blacklist for just one
    release doesn't sound like a good idea."

because if we create some blacklist for that one reported device, not only 
is it likely going to be wrong (it's almost never just one firmware or one 
chip that has a particular issue), but we tend to create thee blacklists 
and later realize that we shouldn't have blacklisted things at all, we 
should just have done things differently.

For examples of that, see the NCQ blacklist that was just _us_ doing 
things wrong (over-reacting to things we shouldn't care about), and 
there's currently another totally unrelated discussion on a very similar 
thing wrt libata and the ACPI startup commands for an unused controller 
port.

> Finally you need to remember that the 'regression' is caused by the fact
> we now do the _right_ thing both in terms of 'old IDE' and specs.

.. and what the hell does that matter? If the code doesn't work, it 
doesn't work, and you might as well point to some random scribblings done 
by a three-year-old on toilet paper rather than any "specs".

Real life matters more. Regressions matter more.

We apparently do have a full fix, but it seems to be too invasive for 
2.6.24, which means that the thing that currently DOES NOT WORK and 
causes regressions should be reverted, so that 2.6.24 is at least no worse 
than 2.6.23 (and all earlier kernels) in this respect.

And then we should just hope that the more complete fix that Tejun has 
doesn't cause any issues on its own. I would suggest that if you care so 
deeply about this issue, you press Fedora into putting Tejun's tree into 
Fedora testing, and get that thing tested out extensively.

So the fact is, we have a way forward, but we should *not* take steps 
backwards just because you want to push something out that isn't quite 
ready. We should revert the change that causes the current trouble, safe 
in the knowledge (or at least "strong hope") that we have a way forward 
that makes *both* 2.6.24 and 2.6.25 be continual improvements.

We used to allow regressions. It was really painful. It's hard to debug 
things when things sometimes break. It's much better to have a nice 
constant monotonic improvement.

It's better for users, but it's much better also for developers, even if 
you may be frustrated right now because some new code effectively gets 
shut down until it works for everybody.

		Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 22:01         ` Alan Cox
@ 2007-12-09 22:51           ` Ray Lee
  2007-12-10  1:57           ` Linus Torvalds
  1 sibling, 0 replies; 74+ messages in thread
From: Ray Lee @ 2007-12-09 22:51 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, Tejun Heo, Andrew Morton, Rafael J. Wysocki,
	LKML, Ingo Molnar

On Dec 9, 2007 2:01 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > Btw, Alan, that "math" is total and utter BULLSH*T, and you should know
> > that.
>
> To blindly argue regressions are critical is sometimes (as in this case)
> to argue that "this freeway is no longer compatible with a horse and
> cart" means the freeway should be turned back into a dirt road.

Honest question: If you allow regressions, then how does one guarantee
forward progress? (If it were a finite set of systems, all within one
group's control, then the answer is simple: count how many work.
However, in this case we only have a statistical sampling available to
us.)

Ray

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 18:41       ` Linus Torvalds
@ 2007-12-09 22:01         ` Alan Cox
  2007-12-09 22:51           ` Ray Lee
  2007-12-10  1:57           ` Linus Torvalds
  0 siblings, 2 replies; 74+ messages in thread
From: Alan Cox @ 2007-12-09 22:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar

> Btw, Alan, that "math" is total and utter BULLSH*T, and you should know 
> that.

The one off regression is probably not one off, but this is IDE so
actually its quite probable its a single broken firmware. 

The alternative is that you cripple just about every user of various
other standards compliant devices and controllers whose hardware we
finally fixed.

Finally you need to remember that the 'regression' is caused by the fact
we now do the _right_ thing both in terms of 'old IDE' and specs.

Believe it or not I did actually think in quite some detail about this
case, and the relative probabilities, and go back and re-review the old
IDE code (whose behaviour we now follow) and the spec. I spend a
measurable amount of my time reviewing code and weighing risks,
regressions and progress for an enterprise Linux vendor, so its something
I do every day of the week.

To blindly argue regressions are critical is sometimes (as in this case)
to argue that "this freeway is no longer compatible with a horse and
cart" means the freeway should be turned back into a dirt road.

The horse and cart happened to work by chance because the road was quiet
that day. We clearly need to add a horse & cart lane in the long term,
but for 2.6.24 it may well be the right thing to do just to blacklist
that specific drive back to old behaviour until we can tidy it more
nicely.

Alan

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 18:36       ` Linus Torvalds
@ 2007-12-09 21:54         ` Alan Cox
  0 siblings, 0 replies; 74+ messages in thread
From: Alan Cox @ 2007-12-09 21:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar

> Regressions are worse. It doesn't matter AT ALL if you think that it 
> breaks ten times more devices, if it's a regression and those devices 
> didn't work in the past, they simply DO NOT COUNT.

Must be time for an -ac tree again.

Alan

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 15:46         ` Tejun Heo
@ 2007-12-09 19:59           ` Andreas Mohr
  0 siblings, 0 replies; 74+ messages in thread
From: Andreas Mohr @ 2007-12-09 19:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andreas Mohr, Andrew Morton, Rafael J. Wysocki, LKML,
	Linus Torvalds, Ingo Molnar, linux-ide, Len Brown, linux-acpi

Hi,

On Mon, Dec 10, 2007 at 12:46:57AM +0900, Tejun Heo wrote:
> Please post full kernel boot log and the result of 'lspci -nn'.

Done, on #9530.

Will try some of the promising patches/suggestions now, hopefully this will
show me what's up. Will add further results there.

Andreas Mohr

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 13:42     ` Alan Cox
  2007-12-09 15:09       ` Tejun Heo
  2007-12-09 18:36       ` Linus Torvalds
@ 2007-12-09 18:41       ` Linus Torvalds
  2007-12-09 22:01         ` Alan Cox
  2 siblings, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2007-12-09 18:41 UTC (permalink / raw)
  To: Alan Cox; +Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar



On Sun, 9 Dec 2007, Alan Cox wrote:
> 
> Great, make everyone else wait another three months for a working CD
> drive. The one off regression appears far less harmful than a revert.

Btw, Alan, that "math" is total and utter BULLSH*T, and you should know 
that.

"The one off regression" is likely the tip of an iceberg. If something 
regresses for one person, for that one person who tested and noticed and 
made a bug-report, there's probably a thousand people who haven't even 
tested the development kernel, or who had problems and just went back to 
the previous version.

In contrast, reverting something will be guaranteed to not have those 
kinds of issues, since the only people who could notice are people for who 
it never worked in the first place. There's no "silent mass of people" 
that can be affected.

This is why regressions are so important. They don't trump _everything_, 
but basically ignoring and letting them slide is *much* more painful than 
just reverting it.

The biggest reason to ignore a regression is if nobody can even figure 
out where it came from, or reverting simply isn't an option for some 
really deep and fundamental issue. That doesn't seem to be the case here. 

So we should revert unless there is some known acceptable real fix.

		Linus


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 13:42     ` Alan Cox
  2007-12-09 15:09       ` Tejun Heo
@ 2007-12-09 18:36       ` Linus Torvalds
  2007-12-09 21:54         ` Alan Cox
  2007-12-09 18:41       ` Linus Torvalds
  2 siblings, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2007-12-09 18:36 UTC (permalink / raw)
  To: Alan Cox; +Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar



On Sun, 9 Dec 2007, Alan Cox wrote:
>
> > If we fail to find out the solution in time, we always have the
> > alternative of backing out the ATAPI transfer chunk size update.  This
> 
> Which will break far more controllers and drives than it fixes, so
> backing it out is nonsensical and not in the general good.

No.

Regressions are worse. It doesn't matter AT ALL if you think that it 
breaks ten times more devices, if it's a regression and those devices 
didn't work in the past, they simply DO NOT COUNT.

				Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08 10:55       ` Andreas Mohr
@ 2007-12-09 15:46         ` Tejun Heo
  2007-12-09 19:59           ` Andreas Mohr
  0 siblings, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2007-12-09 15:46 UTC (permalink / raw)
  To: Andreas Mohr
  Cc: Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds,
	Ingo Molnar, linux-ide, Len Brown, linux-acpi

Andreas Mohr wrote:
> Hi,
> 
> On Sat, Dec 08, 2007 at 02:20:01AM -0800, Andrew Morton wrote:
>>> Does this report now win me the lucky draw, pretty please? ;)
>> nah, you have to cc the acpi guys to get a prize ;)
> 
> Thought so shortly, but missed it.
> 
>> Andreas, please do separately report that WOL problem too..
> 
> Local setup issue only, at least this one *isn't* a 2.6.24-rc regression. ;)
> 
>> Our list just reached 30.
> 
> Oh, so this is in fact a separate issue? Wasn't sure, couldn't do
> enough analysis of similar cases.
> 
> Will test any (already submitted!) suggestions ASAP.

Please post full kernel boot log and the result of 'lspci -nn'.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 15:25         ` Alan Cox
@ 2007-12-09 15:39           ` Tejun Heo
  0 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2007-12-09 15:39 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar

Alan Cox wrote:
>> Newly broken ones will be regressions.  How many do we fix by the
>> change?  On SATA, setting the correct transfer chunk size doesn't seem
>> to fix many.
> 
> Regressions are not some kind of grand evil. Better to regress the odd
> device than continue to break entire controllers.

We need to put more weight on regressions as it at least makes releases
predictable to users.  Anyways, I wasn't saying it was some absolute
maxim.  I was literally asking how many so that we can evaluate the
trade off.

>>> Tejun - instead of backing out important updates for 2.6.24 we should
>>> just blacklist that specific drive for now and sort it nicely in 2.6.25,
>>> not revert stuff and break everyone elses ATAPI devices.
>> We'll need to blacklist setting transfer chunk size, eek, and let's
>> leave that as the last resort and hope that we find the solution soon.
>> Blacklist takes time to develop and temporary blacklist for just one
>> release doesn't sound like a good idea.
> 
> It seems to be sensible to me *if* it is just this one device we are
> somehow confusing and that one device is holding up fixing everything
> else.

Yeah, if it's this one device, I fully agree.  Let's see how debugging
turns out.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 15:09       ` Tejun Heo
@ 2007-12-09 15:25         ` Alan Cox
  2007-12-09 15:39           ` Tejun Heo
  0 siblings, 1 reply; 74+ messages in thread
From: Alan Cox @ 2007-12-09 15:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar

> Newly broken ones will be regressions.  How many do we fix by the
> change?  On SATA, setting the correct transfer chunk size doesn't seem
> to fix many.

Regressions are not some kind of grand evil. Better to regress the odd
device than continue to break entire controllers.

> > Tejun - instead of backing out important updates for 2.6.24 we should
> > just blacklist that specific drive for now and sort it nicely in 2.6.25,
> > not revert stuff and break everyone elses ATAPI devices.
> 
> We'll need to blacklist setting transfer chunk size, eek, and let's
> leave that as the last resort and hope that we find the solution soon.
> Blacklist takes time to develop and temporary blacklist for just one
> release doesn't sound like a good idea.

It seems to be sensible to me *if* it is just this one device we are
somehow confusing and that one device is holding up fixing everything
else.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 14:20     ` Rafael J. Wysocki
@ 2007-12-09 15:11       ` Tejun Heo
  0 siblings, 0 replies; 74+ messages in thread
From: Tejun Heo @ 2007-12-09 15:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andrew Morton, LKML, Linus Torvalds, Ingo Molnar, linux-ide

Rafael J. Wysocki wrote:
>>> If any machines _are_ breaking then this could cause real problems
>>> and I'd prefer that we either go for a whitelist or arrange to
>>> detect the condition and fall back to non-acpi ata.
>> The pending patchset should make ATA ACPI quite resistant to failures.
> 
> Are you going to push it for 2.6.24?

Yeah, I'm hoping so.  Maybe command filtering should wait till 2.6.25
but the rest, yeap.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 13:42     ` Alan Cox
@ 2007-12-09 15:09       ` Tejun Heo
  2007-12-09 15:25         ` Alan Cox
  2007-12-09 18:36       ` Linus Torvalds
  2007-12-09 18:41       ` Linus Torvalds
  2 siblings, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2007-12-09 15:09 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar

Hello, Alan.

Alan Cox wrote:
>> will break some other cases which were fixed by the change but those
>> won't be regressions at least and we can add transfer chunk size
>> update with other changes to 2.6.25.
> 
> Great, make everyone else wait another three months for a working CD
> drive. The one off regression appears far less harmful than a revert.

Newly broken ones will be regressions.  How many do we fix by the
change?  On SATA, setting the correct transfer chunk size doesn't seem
to fix many.

> Tejun - instead of backing out important updates for 2.6.24 we should
> just blacklist that specific drive for now and sort it nicely in 2.6.25,
> not revert stuff and break everyone elses ATAPI devices.

We'll need to blacklist setting transfer chunk size, eek, and let's
leave that as the last resort and hope that we find the solution soon.
Blacklist takes time to develop and temporary blacklist for just one
release doesn't sound like a good idea.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 11:54 ` Andrew Morton
  2007-12-09 12:05   ` Ingo Molnar
@ 2007-12-09 14:24   ` Rafael J. Wysocki
  1 sibling, 0 replies; 74+ messages in thread
From: Rafael J. Wysocki @ 2007-12-09 14:24 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, Linus Torvalds, Ingo Molnar

On Sunday, 9 of December 2007, Andrew Morton wrote:
> On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > This message contains a list of some regressions from 2.6.23 which have been
> > reported since 2.6.24-rc1 was released and for which there are no fixes in the
> > mainline that I know of.
> 
> Here's one for you - I have a new Lenovo t61p with which to irritate
> everyone. 
> 
> suspend-to-ram is a wipeout, but suspend-to-disk works OK under
> 2.6.23.
> 
> However under 2.6.24-rc1 and -rc4 the machine reboots right at the end of
> resume-from-disk.

It's http://bugzilla.kernel.org/show_bug.cgi?id=9258 , I think.

Does it do that if you unload ehci-hcd before the hibernation?

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09  6:52   ` Tejun Heo
@ 2007-12-09 14:20     ` Rafael J. Wysocki
  2007-12-09 15:11       ` Tejun Heo
  0 siblings, 1 reply; 74+ messages in thread
From: Rafael J. Wysocki @ 2007-12-09 14:20 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Andrew Morton, LKML, Linus Torvalds, Ingo Molnar, linux-ide

On Sunday, 9 of December 2007, Tejun Heo wrote:
> Hello,
> 
> Andrew Morton wrote:
> >> Subject		: PATA scan: ACPI Exception AE_AML_PACKAGE_LIMIT... is beyond end of object
> >> Submitter	: Hans de Bruin <bruinjm@xs4all.nl>
> >> References	: http://bugzilla.kernel.org/show_bug.cgi?id=9320
> >> Handled-By	: Robert Moore <Robert.Moore@intel.com>
> >> 		  Tejun Heo <htejun@gmail.com>
> >> 		  Fu Michael <michael.fu@intel.com>
> >> Patch		: 
> >>
> > 
> > A number of other people are seeing the same thing and Tejun is
> > putting in a blacklist of machines which cannot use libata+acpi.
> > That patch is not yet in any git tree which I pull.
> > 
> > AFACIT the machines kepe working OK - there's just some nasty dmesg
> > spew.
> > 
> > If any machines _are_ breaking then this could cause real problems
> > and I'd prefer that we either go for a whitelist or arrange to
> > detect the condition and fall back to non-acpi ata.
> 
> The pending patchset should make ATA ACPI quite resistant to failures.

Are you going to push it for 2.6.24?

> Known bad boards can be blacklisted (currently only one is on the
> list), ATA ACPI is disabled quicker if ACPI evalution fails, execution
> errors are handled better and commands which are intended to help the
> vendor instead of the user are filtered.  So, I think we have enough
> safety nets.

Sounds good.  :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09  7:00   ` Tejun Heo
@ 2007-12-09 13:42     ` Alan Cox
  2007-12-09 15:09       ` Tejun Heo
                         ` (2 more replies)
  0 siblings, 3 replies; 74+ messages in thread
From: Alan Cox @ 2007-12-09 13:42 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar

> If we fail to find out the solution in time, we always have the
> alternative of backing out the ATAPI transfer chunk size update.  This

Which will break far more controllers and drives than it fixes, so
backing it out is nonsensical and not in the general good.

> will break some other cases which were fixed by the change but those
> won't be regressions at least and we can add transfer chunk size
> update with other changes to 2.6.25.

Great, make everyone else wait another three months for a working CD
drive. The one off regression appears far less harmful than a revert.

Tejun - instead of backing out important updates for 2.6.24 we should
just blacklist that specific drive for now and sort it nicely in 2.6.25,
not revert stuff and break everyone elses ATAPI devices.

Alan

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-09 11:54 ` Andrew Morton
@ 2007-12-09 12:05   ` Ingo Molnar
  2007-12-09 14:24   ` Rafael J. Wysocki
  1 sibling, 0 replies; 74+ messages in thread
From: Ingo Molnar @ 2007-12-09 12:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rafael J. Wysocki, LKML, Linus Torvalds


* Andrew Morton <akpm@linux-foundation.org> wrote:

> Am trying to do a git-disect on it but it seems that someone has been 
> screwing with ata Kconfig and I'm hitting a pile of 
> cant-find-root-disk bisection points and I can't immediately work out 
> why.  I'll try to find time to look at it again next week.

the way i solve such bisection problems is to have the patch like below 
applied by a "git-bisect run" scriptlet (and popped off after the test). 
This way all must-have drivers and kernel features are selected for that 
particular testbox, no matter what Kconfig complication there are. 
(except outright config option renaming but those are rare)

	Ingo

Index: linux/arch/x86/Kconfig.needed
===================================================================
--- /dev/null
+++ linux/arch/x86/Kconfig.needed
@@ -0,0 +1,88 @@
+config FORCE_MINIMAL_CONFIG
+	bool
+	default y
+
+select EXPERIMENTAL
+
+select EXT3_FS
+select EXT3_FS_XATTR
+select EXT3_FS_POSIX_ACL
+select EXT3_FS_SECURITY
+select BLOCK
+select HOTPLUG
+#select INOTIFY
+#select INOTIFY_USER
+
+# so that capset() works (sudo, etc.):
+select SECURITY
+select SECURITY_CAPABILITIES
+
+select BINFMT_ELF
+select MSDOS_PARTITION
+select PARTITION_ADVANCED
+select BSD_DISKLABEL
+
+select SYSFS
+select SYSFS_DEPRECATED
+select PROC_FS
+select FUTEX
+
+select ATA
+select SATA_AHCI
+select ATA_PIIX
+select PATA_AMD
+select PATA_OLDPIIX
+select BLK_DEV_SD
+
+select E100
+select E1000
+select NET_ETHERNET
+select NET_PCI
+select MII
+select CRC32
+
+select 8139TOO
+select FORCEDETH
+
+select PACKET
+
+select NETPOLL
+select NETCONSOLE
+select NET_POLL_CONTROLLER
+select INET
+select NET
+select UNIX
+select NETDEVICES
+
+select SERIAL_8250
+select SERIAL_8250_CONSOLE
+select MAGIC_SYSRQ
+
+select INPUT
+select INPUT_MOUSEDEV
+select INPUT_POLLDEV
+select INPUT_KEYBOARD
+select KEYBOARD_ATKBD
+select SERIO
+select SERIO_I8042
+
+select VT
+select VT_CONSOLE
+select HW_CONSOLE
+select VGA_CONSOLE
+select EARLY_PRINTK
+select PRINTK
+select UNIX98_PTYS
+
+select USB
+select USB_MOUSE
+select USB_EHCI_HCD
+select USB_OHCI_HCD
+select USB_UHCI_HCD
+select USB_SUPPORT
+
+select PCI
+
+select STANDALONE
+select PREVENT_FIRMWARE_BUILD
+
Index: linux/lib/Kconfig
===================================================================
--- linux.orig/lib/Kconfig
+++ linux/lib/Kconfig
@@ -142,3 +142,6 @@ config CHECK_SIGNATURE
 	bool
 
 endmenu
+
+source "arch/x86/Kconfig.needed"
+
Index: linux/lib/Kconfig.debug

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  2:40 Rafael J. Wysocki
                   ` (6 preceding siblings ...)
  2007-12-08 10:44 ` Richard Purdie
@ 2007-12-09 11:54 ` Andrew Morton
  2007-12-09 12:05   ` Ingo Molnar
  2007-12-09 14:24   ` Rafael J. Wysocki
  2007-12-10 20:42 ` Ingo Molnar
  8 siblings, 2 replies; 74+ messages in thread
From: Andrew Morton @ 2007-12-09 11:54 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, Linus Torvalds, Ingo Molnar

On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> This message contains a list of some regressions from 2.6.23 which have been
> reported since 2.6.24-rc1 was released and for which there are no fixes in the
> mainline that I know of.

Here's one for you - I have a new Lenovo t61p with which to irritate
everyone. 

suspend-to-ram is a wipeout, but suspend-to-disk works OK under
2.6.23.

However under 2.6.24-rc1 and -rc4 the machine reboots right at the end of
resume-from-disk.

Am trying to do a git-disect on it but it seems that someone has been
screwing with ata Kconfig and I'm hitting a pile of cant-find-root-disk
bisection points and I can't immediately work out why.  I'll try to find
time to look at it again next week.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  9:52 ` Andrew Morton
@ 2007-12-09  7:00   ` Tejun Heo
  2007-12-09 13:42     ` Alan Cox
  0 siblings, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2007-12-09  7:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, Alan Cox

Hello, (cc'ing Alan)

Andrew Morton wrote:
>> Subject		: cd/dvd inaccessible in 2.6.24-rc2
>> Submitter	: Will Trives <will@trivescon.com.au>
>> References	: http://lkml.org/lkml/2007/11/9/290
>> 		  http://bugzilla.kernel.org/show_bug.cgi?id=9346
>> Handled-By	: Len Brown <lenb@kernel.org>
>> 		  Tejun Heo <htejun@gmail.com>
>> Patch		: 
>>
> 
> Nasty one.  Tejun and several diligent reporters are doing sterling
> work there and things have improved.  I don't know whether any of
> Tejun's patches have been merged yet, but we'll probably be OK on
> this one.

I'm still trying to find out what's really going on.  That drive is
quite peculiar.

> What is unclear (to me) is what actually caused those people's machines to
> break?

It's introduced by setting ATAPI transfer chunk size to actual
transfer size which is the right thing to do generally.  However, with
the change, the ATAPI HSM should be ready to drain full extra transfer
chunks which libata HSM wasn't doing.  With that part changed, most
regressions should go away.

Unfortunately, simply adding that doesn't fix the case in bug 9346 and
I'm still trying to find out why.  The good news is that the drive
works fine with proposed more extensive improvements to libata ATAPI
which will probably be included into 2.6.25, so we at least have long
term solution.

If we fail to find out the solution in time, we always have the
alternative of backing out the ATAPI transfer chunk size update.  This
will break some other cases which were fixed by the change but those
won't be regressions at least and we can add transfer chunk size
update with other changes to 2.6.25.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  9:36 ` Andrew Morton
  2007-12-08 10:12   ` Andreas Mohr
@ 2007-12-09  6:52   ` Tejun Heo
  2007-12-09 14:20     ` Rafael J. Wysocki
  1 sibling, 1 reply; 74+ messages in thread
From: Tejun Heo @ 2007-12-09  6:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide

Hello,

Andrew Morton wrote:
>> Subject		: PATA scan: ACPI Exception AE_AML_PACKAGE_LIMIT... is beyond end of object
>> Submitter	: Hans de Bruin <bruinjm@xs4all.nl>
>> References	: http://bugzilla.kernel.org/show_bug.cgi?id=9320
>> Handled-By	: Robert Moore <Robert.Moore@intel.com>
>> 		  Tejun Heo <htejun@gmail.com>
>> 		  Fu Michael <michael.fu@intel.com>
>> Patch		: 
>>
> 
> A number of other people are seeing the same thing and Tejun is
> putting in a blacklist of machines which cannot use libata+acpi.
> That patch is not yet in any git tree which I pull.
> 
> AFACIT the machines kepe working OK - there's just some nasty dmesg
> spew.
> 
> If any machines _are_ breaking then this could cause real problems
> and I'd prefer that we either go for a whitelist or arrange to
> detect the condition and fall back to non-acpi ata.

The pending patchset should make ATA ACPI quite resistant to failures.
Known bad boards can be blacklisted (currently only one is on the
list), ATA ACPI is disabled quicker if ACPI evalution fails, execution
errors are handled better and commands which are intended to help the
vendor instead of the user are filtered.  So, I think we have enough
safety nets.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08 22:30     ` Rafael J. Wysocki
@ 2007-12-09  2:15       ` Theodore Tso
  2007-12-13 10:49         ` Takashi Iwai
  0 siblings, 1 reply; 74+ messages in thread
From: Theodore Tso @ 2007-12-09  2:15 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andrew Morton, LKML, Linus Torvalds, Ingo Molnar, Roland Dreier,
	Takashi Iwai

On Sat, Dec 08, 2007 at 11:30:53PM +0100, Rafael J. Wysocki wrote:
> On Saturday, 8 of December 2007, Theodore Tso wrote:
> > However, as far as I am concerned, Ingo's patch, first posted to LKML
> > here: http://lkml.org/lkml/2007/11/16/66 should be listed as fixing
> > the above regression.  Rafael, could you please make a note of this in
> > your regression list,
> 
> Done, thanks.

Great, thanks.  I should add that technically this wasn't a regression
since I had been seeing this since before 2.6.23.  Also, it isn't a
big deal, since aside from noise in the syslog, falling back to
polling more doesn't make any functional or user-visible difference
(although I guess it's less efficient).  

Regardless of whether it is a regression, it would be nice to get the
patch applied and and this issue fixed for 2.6.25!

					- Ted

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08 10:44 ` Richard Purdie
@ 2007-12-08 22:32   ` Rafael J. Wysocki
  0 siblings, 0 replies; 74+ messages in thread
From: Rafael J. Wysocki @ 2007-12-08 22:32 UTC (permalink / raw)
  To: Richard Purdie; +Cc: LKML, Andrew Morton, Linus Torvalds, Ingo Molnar

On Saturday, 8 of December 2007, Richard Purdie wrote:
> On Sat, 2007-12-08 at 03:40 +0100, Rafael J. Wysocki wrote:
> > Subject		: leds: ledtrig-timer calls sleeping function from invalid context
> > Submitter	: Márton Németh <nm127@freemail.hu>
> > References	: http://bugzilla.kernel.org/show_bug.cgi?id=9264
> > Handled-By	: Richard Purdie <rpurdie@rpsys.net>
> > Patch		: http://bugzilla.kernel.org/attachment.cgi?id=13493&action=view
> 
> The fix is now in mainline:
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=dc47206e552c0850ad11f7e9a1fca0a3c92f5d65

Yes, already dropped.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08 19:40   ` Theodore Tso
  2007-12-08 19:55     ` Ingo Molnar
@ 2007-12-08 22:30     ` Rafael J. Wysocki
  2007-12-09  2:15       ` Theodore Tso
  1 sibling, 1 reply; 74+ messages in thread
From: Rafael J. Wysocki @ 2007-12-08 22:30 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Andrew Morton, LKML, Linus Torvalds, Ingo Molnar, Roland Dreier,
	Takashi Iwai

On Saturday, 8 of December 2007, Theodore Tso wrote:
> On Sat, Dec 08, 2007 at 01:42:41AM -0800, Andrew Morton wrote:
> > > Subject		: snd_hda_intel 2.6.24-rc2 bug: interrupts don't always work on Lenovo X60s
> > > Submitter	: Roland Dreier <rdreier@cisco.com>
> > > References	: http://lkml.org/lkml/2007/11/8/255
> > > 		  http://bugzilla.kernel.org/show_bug.cgi?id=9332
> > > Handled-By	: 
> > > Patch		: 
> > 
> > Takashi had a patch and that has been merged.  AFAIK this regression
> > has been fixed and we're left with a new but harmless warning.
> > 
> > However Roland reported other problems and it appears that the trail went
> > cold (http://lkml.org/lkml/2007/11/14/251)
> > 
> > Ted was hitting some of the same problems but that trail appears to also
> > have gone cold (http://lkml.org/lkml/2007/11/23/17).
> 
> Actually, not gone cold, but I stopped posting about it because it's
> been solved and I thought agreement had been reached that it should be
> pushed to mainline before 2.6.25.
> 
> I am very happily running with Ingo's "snd hda suspend latency:
> shorten codec read" patch, which was originally intended to speed up
> resuming from hibernation, but which as I discovered, also has the
> nice side effect of eliminating the reported error.  
> 
> On 11/23, Takashi replied to my note (http://lkml.org/lkml/2007/11/23/17)
> and suggested that Jaroslav push this patch to Linus immediately
> instead of waiting for 2.6.25, since it appearly solves two problems
> with one stone.  However, I just checked, as of Linus's public, and
> Ingo's patch is *not* in mainline.
> 
> However, as far as I am concerned, Ingo's patch, first posted to LKML
> here: http://lkml.org/lkml/2007/11/16/66 should be listed as fixing
> the above regression.  Rafael, could you please make a note of this in
> your regression list,

Done, thanks.

> and could we please get this patch pushed into mainline?

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  9:29 ` Andrew Morton
@ 2007-12-08 22:17   ` Rafael J. Wysocki
  0 siblings, 0 replies; 74+ messages in thread
From: Rafael J. Wysocki @ 2007-12-08 22:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, Linus Torvalds, Ingo Molnar, Márton Németh

On Saturday, 8 of December 2007, Andrew Morton wrote:
> On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > This message contains a list of some regressions from 2.6.23 which have been
> > reported since 2.6.24-rc1 was released and for which there are no fixes in the
> > mainline that I know of.  If any of them have been fixed already, please let me
> > know.
> > 
> > If you know of any other unresolved regressions from 2.6.23, please let me know
> > either and I'll add them to the list.
> 
> Twenty nine, huh?
> 
> It would be useful if these records were sorted in date-of-reportage order
> and had a date stamp so we could see how long they've been hanging about.

They are sorted by the bugzilla number which reflects the date-of-reportage
order pretty well.  For a techincal reason, it's easier to me if they're sorted
like this.

Adding date stamps should be easy, tough, I'll try to add them to the next
report.

> Something to think about for the post-2.6.24 regression if you'll be handling
> those?

Yes, I'm going to handle the post-2.6.24 regressions too (in the hope there
will be less of them ;-)).

> > Subject		: leds: ledtrig-timer calls sleeping function from invalid context
> > Submitter	: Márton Németh <nm127@freemail.hu>
> > References	: http://bugzilla.kernel.org/show_bug.cgi?id=9264
> > Handled-By	: Richard Purdie <rpurdie@rpsys.net>
> > Patch		: http://bugzilla.kernel.org/attachment.cgi?id=13493&action=view
> 
> That patch has been merged (dc47206e552c0850ad11f7e9a1fca0a3c92f5d65) and
> assuming Márton has tested the latest git snapshot
> (ftp://ftp.kernel.org/pub/linux/kernel/v2.6/snapshots) successfully we can
> cross it off?

Yes, will drop.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  9:23     ` Andrew Morton
@ 2007-12-08 22:11       ` Rafael J. Wysocki
  0 siblings, 0 replies; 74+ messages in thread
From: Rafael J. Wysocki @ 2007-12-08 22:11 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Fabio Comolli, LKML, Linus Torvalds, Greg KH,
	Len Brown, Alexey Starikovskiy

On Saturday, 8 of December 2007, Andrew Morton wrote:
> On Sat, 8 Dec 2007 09:28:15 +0100 Ingo Molnar <mingo@elte.hu> wrote:
> 
> > 
> > * Fabio Comolli <fabio.comolli@gmail.com> wrote:
> > 
> > > <snip>
> > > 
> > > > Subject         : Battery shows up twice in kpowersave
> > > > Submitter       : Rolf Eike Beer <eike-kernel@sf-tec.de>
> > > > References      : http://bugzilla.kernel.org/show_bug.cgi?id=9494
> > > > Handled-By      : Alexey Starikovskiy <astarikovskiy@suse.de>
> > > > Patch           :
> > > >
> > > 
> > > I don't think that this is a regression: I reported on RedHat bugzilla 
> > > when I switched from F7 to F8 and I was using 2.6.23.8 at that time. 
> > > It looks to me an HAL regression, but of course I may be wrong :-) as 
> > > the reported bisected to a bad commit.
> > > 
> > > https://bugzilla.redhat.com/show_bug.cgi?id=373041
> > > 
> > > By the way, I now switched to Fedrora Rawhide with a 2.6.24-rc4-git5 
> > > custom kernel and Gnome desktop and the problem is still present, even 
> > > with gnome-power-manager.
> > 
> > to me this looks like an ABI regression - utilities should work without 
> > change. Something changed in /sys output that caused HAL to think that 
> > there are two batteries:
> 
> Yep.  Although HAL is of course a most special case of "userspace".
> 
> > | The output of lshal shows that there are two UDI's with 
> > | info.capabilities = { 'battery' }:
> > |
> > | udi = '/org/freedesktop/Hal/devices/acpi_BAT0'
> > | udi = '/org/freedesktop/Hal/devices/computer_power_supply_0'
> > 
> > whether it's a HAL bug or a kernel bug, the original state should be 
> > restored and it should be worked out without breaking users of older HAL 
> > versions.
> 
> "breaking users of older HAL versions" == "breaking machines".
> 
> The patch should be reverted.  Do we know which one it was?
> 
> > grumble: way too many times do various system utilities break when i 
> > upgrade the kernel on my laptop. Maybe a new debug mechanism: we should 
> > start fingerprinting the exact /sys and /proc output and enforce that 
> > it's immutable across kernel releases as long as the hardware is 
> > unmodified?
> 
> That would be neat.  It would need to be executed on a lot of different
> machines.

Hm, that wouldn't allow us to add new attributes ...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08 19:40   ` Theodore Tso
@ 2007-12-08 19:55     ` Ingo Molnar
  2007-12-08 22:30     ` Rafael J. Wysocki
  1 sibling, 0 replies; 74+ messages in thread
From: Ingo Molnar @ 2007-12-08 19:55 UTC (permalink / raw)
  To: Theodore Tso, Andrew Morton, Rafael J. Wysocki, LKML,
	Linus Torvalds, Roland Dreier, Takashi Iwai


* Theodore Tso <tytso@mit.edu> wrote:

> I am very happily running with Ingo's "snd hda suspend latency: 
> shorten codec read" patch, which was originally intended to speed up 
> resuming from hibernation, but which as I discovered, also has the 
> nice side effect of eliminating the reported error.
> 
> On 11/23, Takashi replied to my note 
> (http://lkml.org/lkml/2007/11/23/17) and suggested that Jaroslav push 
> this patch to Linus immediately instead of waiting for 2.6.25, since 
> it appearly solves two problems with one stone.  However, I just 
> checked, as of Linus's public, and Ingo's patch is *not* in mainline.
> 
> However, as far as I am concerned, Ingo's patch, first posted to LKML 
> here: http://lkml.org/lkml/2007/11/16/66 should be listed as fixing 
> the above regression.  Rafael, could you please make a note of this in 
> your regression list, and could we please get this patch pushed into 
> mainline?

ha! I'd never have expected _that_ to happen. Cool. Fixing a driver bug 
by accident :-)

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  9:42 ` Andrew Morton
  2007-12-08 18:57   ` Roland Dreier
@ 2007-12-08 19:40   ` Theodore Tso
  2007-12-08 19:55     ` Ingo Molnar
  2007-12-08 22:30     ` Rafael J. Wysocki
  1 sibling, 2 replies; 74+ messages in thread
From: Theodore Tso @ 2007-12-08 19:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar,
	Roland Dreier, Takashi Iwai

On Sat, Dec 08, 2007 at 01:42:41AM -0800, Andrew Morton wrote:
> > Subject		: snd_hda_intel 2.6.24-rc2 bug: interrupts don't always work on Lenovo X60s
> > Submitter	: Roland Dreier <rdreier@cisco.com>
> > References	: http://lkml.org/lkml/2007/11/8/255
> > 		  http://bugzilla.kernel.org/show_bug.cgi?id=9332
> > Handled-By	: 
> > Patch		: 
> 
> Takashi had a patch and that has been merged.  AFAIK this regression
> has been fixed and we're left with a new but harmless warning.
> 
> However Roland reported other problems and it appears that the trail went
> cold (http://lkml.org/lkml/2007/11/14/251)
> 
> Ted was hitting some of the same problems but that trail appears to also
> have gone cold (http://lkml.org/lkml/2007/11/23/17).

Actually, not gone cold, but I stopped posting about it because it's
been solved and I thought agreement had been reached that it should be
pushed to mainline before 2.6.25.

I am very happily running with Ingo's "snd hda suspend latency:
shorten codec read" patch, which was originally intended to speed up
resuming from hibernation, but which as I discovered, also has the
nice side effect of eliminating the reported error.  

On 11/23, Takashi replied to my note (http://lkml.org/lkml/2007/11/23/17)
and suggested that Jaroslav push this patch to Linus immediately
instead of waiting for 2.6.25, since it appearly solves two problems
with one stone.  However, I just checked, as of Linus's public, and
Ingo's patch is *not* in mainline.

However, as far as I am concerned, Ingo's patch, first posted to LKML
here: http://lkml.org/lkml/2007/11/16/66 should be listed as fixing
the above regression.  Rafael, could you please make a note of this in
your regression list, and could we please get this patch pushed into
mainline?

Thanks!!

						- Ted

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  9:42 ` Andrew Morton
@ 2007-12-08 18:57   ` Roland Dreier
  2007-12-08 19:40   ` Theodore Tso
  1 sibling, 0 replies; 74+ messages in thread
From: Roland Dreier @ 2007-12-08 18:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar,
	Roland Dreier, Takashi Iwai, Theodore Ts'o

 > > Subject		: snd_hda_intel 2.6.24-rc2 bug: interrupts don't always work on Lenovo X60s
 > > Submitter	: Roland Dreier <rdreier@cisco.com>
 > > References	: http://lkml.org/lkml/2007/11/8/255
 > > 		  http://bugzilla.kernel.org/show_bug.cgi?id=9332
 > > Handled-By	: 
 > > Patch		: 
 > 
 > Takashi had a patch and that has been merged.  AFAIK this regression
 > has been fixed and we're left with a new but harmless warning.
 > 
 > However Roland reported other problems and it appears that the trail went
 > cold (http://lkml.org/lkml/2007/11/14/251)

A fix for the most likely cause of this problem was merged (7eba5c9d
"[ALSA] hda-codec - Check PINCAP only for PIN widgets") but it seems
that setting CONFIG_SND_HDA_POWER_SAVE can cause the "azx_get_response
timeout, switching to polling mode" message sometimes too.  However
according to Takashi this is really just a cosmetic problem -- polling
mode is not so bad.

 - R.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  9:46 ` Andrew Morton
@ 2007-12-08 15:49   ` Alan Stern
  0 siblings, 0 replies; 74+ messages in thread
From: Alan Stern @ 2007-12-08 15:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar

On Sat, 8 Dec 2007, Andrew Morton wrote:

> On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > This message contains a list of some regressions from 2.6.23 which have been
> > reported since 2.6.24-rc1 was released and for which there are no fixes in the
> > mainline that I know of.  If any of them have been fixed already, please let me
> > know.
> > 
> > If you know of any other unresolved regressions from 2.6.23, please let me know
> > either and I'll add them to the list.
> > 
> > 
> > ..
> >
> > Subject		: system hangs after a few minutes
> > Submitter	: Marcus Better <marcus@better.se>
> > References	: http://bugzilla.kernel.org/show_bug.cgi?id=9335
> > Handled-By	: Andrew Morton <akpm@linux-foundation.org>
> > 		  Alan Stern <stern@rowland.harvard.edu>
> > Patch		: http://bugzilla.kernel.org/attachment.cgi?id=13871&action=view
> > 
> 
> This one we have a confirmed fix from Alan but it doesn't appear to be in
> anyone's tree.

An expanded version of that fix is in Greg's queue:

http://marc.info/?l=linux-usb-devel&m=119697043410947&w=2

Since he's away until Tuesday, nothing will happen for a few days.  
However you might want to replace the old fix that got added to -mm.

> There is a second bug in here, applicable to core x86: Marcus's machine
> won't boot with nmi_watchdog=1.

Alan Stern


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08 10:20     ` Andrew Morton
  2007-12-08 10:28       ` Matthew Garrett
@ 2007-12-08 10:55       ` Andreas Mohr
  2007-12-09 15:46         ` Tejun Heo
  1 sibling, 1 reply; 74+ messages in thread
From: Andreas Mohr @ 2007-12-08 10:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andreas Mohr, Rafael J. Wysocki, LKML, Linus Torvalds,
	Ingo Molnar, linux-ide, Tejun Heo, Len Brown, linux-acpi

Hi,

On Sat, Dec 08, 2007 at 02:20:01AM -0800, Andrew Morton wrote:
> > Does this report now win me the lucky draw, pretty please? ;)
> 
> nah, you have to cc the acpi guys to get a prize ;)

Thought so shortly, but missed it.

> Andreas, please do separately report that WOL problem too..

Local setup issue only, at least this one *isn't* a 2.6.24-rc regression. ;)

> Our list just reached 30.

Oh, so this is in fact a separate issue? Wasn't sure, couldn't do
enough analysis of similar cases.

Will test any (already submitted!) suggestions ASAP.

Andreas Mohr

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  2:40 Rafael J. Wysocki
                   ` (5 preceding siblings ...)
  2007-12-08  9:52 ` Andrew Morton
@ 2007-12-08 10:44 ` Richard Purdie
  2007-12-08 22:32   ` Rafael J. Wysocki
  2007-12-09 11:54 ` Andrew Morton
  2007-12-10 20:42 ` Ingo Molnar
  8 siblings, 1 reply; 74+ messages in thread
From: Richard Purdie @ 2007-12-08 10:44 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, Andrew Morton, Linus Torvalds, Ingo Molnar

On Sat, 2007-12-08 at 03:40 +0100, Rafael J. Wysocki wrote:
> Subject		: leds: ledtrig-timer calls sleeping function from invalid context
> Submitter	: Márton Németh <nm127@freemail.hu>
> References	: http://bugzilla.kernel.org/show_bug.cgi?id=9264
> Handled-By	: Richard Purdie <rpurdie@rpsys.net>
> Patch		: http://bugzilla.kernel.org/attachment.cgi?id=13493&action=view

The fix is now in mainline:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=dc47206e552c0850ad11f7e9a1fca0a3c92f5d65

Cheers,

Richard




^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08 10:20     ` Andrew Morton
@ 2007-12-08 10:28       ` Matthew Garrett
  2007-12-08 10:55       ` Andreas Mohr
  1 sibling, 0 replies; 74+ messages in thread
From: Matthew Garrett @ 2007-12-08 10:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andreas Mohr, Rafael J. Wysocki, LKML, Linus Torvalds,
	Ingo Molnar, linux-ide, Tejun Heo, Len Brown, linux-acpi

On Sat, Dec 08, 2007 at 02:20:01AM -0800, Andrew Morton wrote:
> On Sat, 8 Dec 2007 11:12:57 +0100 Andreas Mohr <andi@lisas.de> wrote:
> > ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
> > ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
> > ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT
> > ata1.01: _GTF evaluation failed (AE 0x300d)

037f6bb79f753c014bc84bca0de9bf98bb5ab169 ought to have fixed this?

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08 10:12   ` Andreas Mohr
@ 2007-12-08 10:20     ` Andrew Morton
  2007-12-08 10:28       ` Matthew Garrett
  2007-12-08 10:55       ` Andreas Mohr
  0 siblings, 2 replies; 74+ messages in thread
From: Andrew Morton @ 2007-12-08 10:20 UTC (permalink / raw)
  To: Andreas Mohr
  Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide,
	Tejun Heo, Len Brown, linux-acpi

On Sat, 8 Dec 2007 11:12:57 +0100 Andreas Mohr <andi@lisas.de> wrote:

> Hi,
> 
> On Sat, Dec 08, 2007 at 01:36:31AM -0800, Andrew Morton wrote:
> > > Subject		: PATA scan: ACPI Exception AE_AML_PACKAGE_LIMIT... is beyond end of object
> > > Submitter	: Hans de Bruin <bruinjm@xs4all.nl>
> > > References	: http://bugzilla.kernel.org/show_bug.cgi?id=9320
> > > Handled-By	: Robert Moore <Robert.Moore@intel.com>
> > > 		  Tejun Heo <htejun@gmail.com>
> > > 		  Fu Michael <michael.fu@intel.com>
> > > Patch		: 
> > > 
> > 
> > A number of other people are seeing the same thing and Tejun is putting in
> > a blacklist of machines which cannot use libata+acpi.  That patch is not
> > yet in any git tree which I pull.
> > 
> > AFACIT the machines kepe working OK - there's just some nasty dmesg spew.
> > 
> > If any machines _are_ breaking then this could cause real problems and I'd
> > prefer that we either go for a whitelist or arrange to detect the condition
> > and fall back to non-acpi ata.
> 
> Does this report now win me the lucky draw, pretty please? ;)

nah, you have to cc the acpi guys to get a prize ;)

Len&co, could you please take a look?

Andreas, please do separately report that WOL problem too..

Our list just reached 30.

> STD regression rc1 -> rc234, suspend fails completely, recovering is
> pretty much useless since HDD is DEAD from this point on anyway.
> Managed to capture -rc2 suspend logging via still-alive ssh session.
> 
> 2.6.24-rc1 suspend/resume log, successful (well, a couple seconds delay, most likely due to
> well-recovered AML failure):
> 
> swsusp: Marking nosave pages: 000000000009f000 - 0000000000100000
> swsusp: Basic memory bitmaps created
> Syncing filesystems ... done.
> Freezing user space processes ... (elapsed 0.00 seconds) done.
> Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
> Shrinking memory... done (0 pages freed)
> Freed 0 kbytes in 0.02 seconds (0.00 MB/s)
> Suspending console(s)
> hub 4-0:1.0: hub_suspend
> usb usb4: bus suspend
> ehci_hcd 0000:00:10.3: suspend root hub
> hub 3-0:1.0: hub_suspend
> usb usb3: bus suspend
> usb usb3: suspend_rh
> hub 2-0:1.0: hub_suspend
> usb usb2: bus suspend
> usb usb2: suspend_rh
> hub 1-0:1.0: hub_suspend
> usb usb1: bus suspend
> usb usb1: suspend_rh
> sd 0:0:0:0: [sda] Synchronizing SCSI cache
> parport_pc 00:09: disabled
> serial 00:08: disabled
> serial 00:07: disabled
> ACPI: PCI interrupt for device 0000:00:11.5 disabled
> ACPI handle has no context!
> ACPI: PCI interrupt for device 0000:00:11.1 disabled
> ACPI: PCI interrupt for device 0000:00:10.3 disabled
> ehci_hcd 0000:00:10.3: --> PCI D3/wakeup
> uhci_hcd 0000:00:10.2: uhci_suspend
> ACPI: PCI interrupt for device 0000:00:10.2 disabled
> uhci_hcd 0000:00:10.2: --> PCI D3
> uhci_hcd 0000:00:10.1: uhci_suspend
> ACPI: PCI interrupt for device 0000:00:10.1 disabled
> uhci_hcd 0000:00:10.1: --> PCI D3
> uhci_hcd 0000:00:10.0: uhci_suspend
> ACPI: PCI interrupt for device 0000:00:10.0 disabled
> uhci_hcd 0000:00:10.0: --> PCI D3
> ACPI: PCI interrupt for device 0000:00:0d.0 disabled
> ACPI handle has no context!
> ACPI: PCI interrupt for device 0000:00:0c.0 disabled
> ACPI handle has no context!
> pci_set_power_state(): 0000:00:00.0: state=3, current state=5
> swsusp: critical section:
> swsusp: Need to copy 51195 pages
> Intel machine check architecture supported.
> Intel machine check reporting enabled on CPU#0.
> evxfevnt-0079 [00] enable                : System is already in ACPI mode
> ACPI: PCI Interrupt Link [ALKA] BIOS reported IRQ 0, using IRQ 20
> ACPI: PCI Interrupt Link [ALKB] BIOS reported IRQ 0, using IRQ 21
> ACPI: PCI Interrupt Link [ALKC] BIOS reported IRQ 0, using IRQ 22
> ACPI: PCI Interrupt Link [ALKD] BIOS reported IRQ 0, using IRQ 23
> evxfevnt-0079 [00] enable                : System is already in ACPI mode
> ACPI: Unable to turn cooling device [c180ff60] 'off'
> PCI: Setting latency timer of device 0000:00:01.0 to 64
> ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[19]  MMIO=[db140000-db1407ff]  Max Packet=[2048]  IR/IT contexts=[4/8]
> ACPI: PCI Interrupt 0000:00:0a.0[A] -> GSI 18 (level, low) -> IRQ 18
> e100: eth-intel: e100_watchdog: link up, 100Mbps, full-duplex
> PM: Writing back config space on device 0000:00:0d.0 at offset 1 (was 2100007, writing 2100003)
> ACPI: PCI Interrupt 0000:00:0d.0[A] -> GSI 19 (level, low) -> IRQ 22
> uhci_hcd 0000:00:10.0: PCI D0, from previous PCI D3
> ACPI: PCI Interrupt 0000:00:10.0[A] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20
> uhci_hcd 0000:00:10.0: uhci_resume
> uhci_hcd 0000:00:10.0: uhci_check_and_reset_hc: cmd = 0x0000
> uhci_hcd 0000:00:10.0: Performing full reset
> usb usb1: root hub lost power or was reset
> usb usb1: suspend_rh
> uhci_hcd 0000:00:10.1: PCI D0, from previous PCI D3
> ACPI: PCI Interrupt 0000:00:10.1[B] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20
> uhci_hcd 0000:00:10.1: uhci_resume
> uhci_hcd 0000:00:10.1: uhci_check_and_reset_hc: cmd = 0x0000
> uhci_hcd 0000:00:10.1: Performing full reset
> usb usb2: root hub lost power or was reset
> usb usb2: suspend_rh
> uhci_hcd 0000:00:10.2: PCI D0, from previous PCI D3
> ACPI: PCI Interrupt 0000:00:10.2[C] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20
> uhci_hcd 0000:00:10.2: uhci_resume
> uhci_hcd 0000:00:10.2: uhci_check_and_reset_hc: cmd = 0x0000
> uhci_hcd 0000:00:10.2: Performing full reset
> usb usb3: root hub lost power or was reset
> usb usb3: suspend_rh
> ehci_hcd 0000:00:10.3: PCI D0, from previous PCI D3
> ACPI: PCI Interrupt 0000:00:10.3[D] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20
> PM: Writing back config space on device 0000:00:10.3 at offset 3 (was 2008, writing 2010)
> PM: Writing back config space on device 0000:00:10.3 at offset 1 (was 2100007, writing 2100017)
> PM: Writing back config space on device 0000:00:11.1 at offset 1 (was 2900003, writing 2900007)
> ACPI: PCI Interrupt 0000:00:11.1[A] -> Link [ALKA] -> GSI 20 (level, low) -> IRQ 17
> ACPI: PCI Interrupt 0000:00:11.5[C] -> Link [ALKC] -> GSI 22 (level, low) -> IRQ 23
> PCI: Setting latency timer of device 0000:00:11.5 to 64
> ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 16
> serial 00:07: activated
> serial 00:08: activated
> parport_pc 00:09: activated
> i8042 aux 00:0a: activation failed
> i8042 kbd 00:0b: activation failed
> sd 0:0:0:0: [sda] Starting disk
> ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT
> ata1.01: _GTF evaluation failed (AE 0x300d)
> ata1.01: revalidation failed (errno=-5)
> ata1: failed to recover some devices, retrying in 5 secs
> ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT
> ata1.01: _GTF evaluation failed (AE 0x300d)
> ata1.01: ACPI on devcfg failed the second time, disabling (errno=-5)
> ata1.01: revalidation failed (errno=1)
> ata1: failed to recover some devices, retrying in 5 secs
> ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV0._GTF] (Node c180b840), AE_AML_PACKAGE_LIMIT
> ata1.00: _GTF evaluation failed (AE 0x300d)
> ata1.00: revalidation failed (errno=-5)
> ata1: failed to recover some devices, retrying in 5 secs
> ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV0._GTF] (Node c180b840), AE_AML_PACKAGE_LIMIT
> ata1.00: _GTF evaluation failed (AE 0x300d)
> ata1.00: ACPI on devcfg failed the second time, disabling (errno=-5)
> ata1.00: revalidation failed (errno=1)
> ata1: failed to recover some devices, retrying in 5 secs
> ata1.00: configured for UDMA/100
> ata1.01: configured for UDMA/33
> sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors (120034 MB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors (120034 MB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> usb usb1: usb resume
> usb usb1: wakeup_rh
> hub 1-0:1.0: trying to enable port power on non-switchable hub
> usb usb2: usb resume
> usb usb2: wakeup_rh
> hub 2-0:1.0: trying to enable port power on non-switchable hub
> usb usb3: usb resume
> usb usb3: wakeup_rh
> hub 3-0:1.0: trying to enable port power on non-switchable hub
> usb usb4: usb resume
> ehci_hcd 0000:00:10.3: resume root hub
> hub 4-0:1.0: hub_resume
> Restarting tasks ... <7>hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0006
> uhci_hcd 0000:00:10.0: port 1 portsc 018a,00
> hub 1-0:1.0: port 1, status 0300, change 0003, 1.5 Mb/s
> done.
> swsusp: Basic memory bitmaps freed
> hub 1-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x300
> uhci_hcd 0000:00:10.0: port 2 portsc 008a,00
> hub 1-0:1.0: port 2, status 0100, change 0003, 12 Mb/s
> hub 1-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100
> hub 2-0:1.0: state 7 ports 2 chg 0000 evt 0006
> uhci_hcd 0000:00:10.1: port 1 portsc 018a,00
> hub 2-0:1.0: port 1, status 0300, change 0003, 1.5 Mb/s
> hub 2-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x300
> uhci_hcd 0000:00:10.1: port 2 portsc 008a,00
> hub 2-0:1.0: port 2, status 0100, change 0003, 12 Mb/s
> hub 2-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100
> hub 3-0:1.0: state 7 ports 2 chg 0000 evt 0006
> uhci_hcd 0000:00:10.2: port 1 portsc 008a,00
> hub 3-0:1.0: port 1, status 0100, change 0003, 12 Mb/s
> hub 3-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x100
> uhci_hcd 0000:00:10.2: port 2 portsc 008a,00
> hub 3-0:1.0: port 2, status 0100, change 0003, 12 Mb/s
> hub 3-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100
> hub 4-0:1.0: state 7 ports 6 chg 0000 evt 0000
> hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0000
> hub 2-0:1.0: state 7 ports 2 chg 0000 evt 0000
> hub 3-0:1.0: state 7 ports 2 chg 0000 evt 0000
> usb usb1: suspend_rh (auto-stop)
> usb usb2: suspend_rh (auto-stop)
> usb usb3: suspend_rh (auto-stop)
> agpgart: Found an AGP 2.0 compliant device at 0000:00:00.0.
> agpgart: Putting AGP V2 device at 0000:00:00.0 into 4x mode
> agpgart: Putting AGP V2 device at 0000:01:00.0 into 4x mode
> [drm] Loading R200 Microcode
> 
> 
> 
> 2.6.24-rc2 suspend log (one screenful), UNSUCCESSFUL:
> 
> serial 00:07: disabled
> ACPI: PCI interrupt for device 0000:00:11.5 disabled
> ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTM_]
> (Node c180b9a8), AE_AML_PACKAGE_LIMIT
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN1._GTM] (Node c180b8d0), AE_AML_PACKAGE_LIMIT
> ata2: ACPI get timing mode failed (AE 0x300d)
> pci_device_suspend(): ata_pci_device_suspend+0x0/0x40() returns -22
> suspend_device(): pci_device_suspend+0x0/0x70() returns -22
> Could not suspend device 0000:00:11.1: error -22
> ACPI: PCI Interrupt 0000:00:11.5[C] -> Link [ALKC] -> GSI 22 (level, low) -> IRQ 23
> ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 16
> serial 00:07: activated
> serial 00:08: activated
> parport_pc 00:09: activated
> i8042 aux 00:0a: activation failed
> i8042 kbd 00:0b: activation failed
> sd 0:0:0:0: [sda] Starting disk
> sd 0:0:0:0: timing out command, waited 180s
> sd 0:0:0:0: [sda] START_STOP FAILED
> sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK
> Restarting tasks ... <7>hub 1-0:1.0: state 7 ports 6 chg 0000 evt 0000
> done.
> swsusp: Basic memory bitmaps freed
> swsusp: Marking nosave pages: 000000000009f000 - 0000000000100000
> swsusp: Basic memory bitmaps created
> Syncing filesystems ...
> 
> 
> 
> # lspci
> 00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333]
> 00:01.0 PCI bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333 AGP]
> 00:09.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 46)
> 00:0a.0 Multimedia audio controller: Aureal Semiconductor Vortex 2 (rev fe)
> 00:0c.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 08)
> 00:0d.0 Multimedia audio controller: Aztech System Ltd 3328 Audio (rev 10)
> 00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80)
> 00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80)
> 00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80)
> 00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82)
> 00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge
> 00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
> 00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 50)
> 00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74)
> 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon RV250 If [Radeon 9000] (rev 01)
> 01:00.1 Display controller: ATI Technologies Inc Radeon RV250 [Radeon 9000] (Secondary) (rev 01)
> 
> 
> # dmidecode 2.9
> SMBIOS 2.2 present.
> 39 structures occupying 1035 bytes.
> Table at 0x000F0800.
> 
> Handle 0x0000, DMI type 0, 19 bytes
> BIOS Information
>         Vendor: Award Software International, Inc.
>         Version: 6.00 PG
>         Release Date: 09/16/2003
>         Address: 0xE0000
>         Runtime Size: 128 kB
>         ROM Size: 512 kB
>         Characteristics:
>                 ISA is supported
>                 PCI is supported
>                 PNP is supported
>                 APM is supported
>                 BIOS is upgradeable
>                 BIOS shadowing is allowed
>                 ESCD support is available
>                 Boot from CD is supported
>                 Selectable boot is supported
>                 BIOS ROM is socketed
>                 EDD is supported
>                 5.25"/360 KB floppy services are supported (int 13h)
>                 5.25"/1.2 MB floppy services are supported (int 13h)
>                 3.5"/720 KB floppy services are supported (int 13h)
>                 3.5"/2.88 MB floppy services are supported (int 13h)
>                 Print screen service is supported (int 5h)
>                 8042 keyboard services are supported (int 9h)
>                 Serial services are supported (int 14h)
>                 Printer services are supported (int 17h)
>                 CGA/mono video services are supported (int 10h)
>                 ACPI is supported
>                 USB legacy is supported
>                 AGP is supported
>                 LS-120 boot is supported
>                 ATAPI Zip drive boot is supported
> 
> Handle 0x0001, DMI type 1, 25 bytes
> System Information
>         Manufacturer: VIA Technologies, Inc.
>         Product Name: VT8367-8235
>         Version:
>         Serial Number:
>         UUID: Not Present
>         Wake-up Type: Power Switch
> 
> Handle 0x0002, DMI type 2, 8 bytes
> Base Board Information
>         Manufacturer:
>         Product Name: VT8367-8235
>         Version:
>         Serial Number:
> 
> Handle 0x0003, DMI type 3, 13 bytes
> Chassis Information
>         Manufacturer:
>         Type: Desktop
>         Lock: Not Present
>         Version:
>         Serial Number:
>         Asset Tag:
>         Boot-up State: Unknown
>         Power Supply State: Unknown
>         Thermal State: Unknown
>         Security Status: Unknown
> 
> Handle 0x0004, DMI type 4, 32 bytes
> Processor Information
>         Socket Designation: Socket A
>         Type: Central Processor
>         Family: Duron
>         Manufacturer: AMD
>         ID: 81 06 00 00 FF FB 83 03
>         Signature: Family 6, Model 8, Stepping 1
>         Flags:
>                 FPU (Floating-point unit on-chip)
>                 VME (Virtual mode extension)
>                 DE (Debugging extension)
>                 PSE (Page size extension)
>                 TSC (Time stamp counter)
>                 MSR (Model specific registers)
>                 PAE (Physical address extension)
>                 MCE (Machine check exception)
>                 CX8 (CMPXCHG8 instruction supported)
>                 APIC (On-chip APIC hardware supported)
>                 SEP (Fast system call)
>                 MTRR (Memory type range registers)
>                 PGE (Page global enable)
>                 MCA (Machine check architecture)
>                 CMOV (Conditional move instruction supported)
>                 PAT (Page attribute table)
>                 PSE-36 (36-bit page size extension)
>                 MMX (MMX technology supported)
>                 FXSR (Fast floating-point save and restore)
>                 SSE (Streaming SIMD extensions)
>         Version: AMD K7 processor
>         Voltage: 3.3 V
>         External Clock: 133 MHz
>         Max Speed: 1500 MHz
>         Current Speed: 1200 MHz
>         Status: Populated, Enabled
>         Upgrade: ZIF Socket
>         L1 Cache Handle: 0x000A
>         L2 Cache Handle: 0x000B
>         L3 Cache Handle: No L3 Cache
> 
> Handle 0x0005, DMI type 5, 24 bytes
> Memory Controller Information
>         Error Detecting Method: None
>         Error Correcting Capabilities:
>                 None
>         Supported Interleave: One-way Interleave
>         Current Interleave: Four-way Interleave
>         Maximum Memory Module Size: 32 MB
>         Maximum Total Memory Size: 128 MB
>         Supported Speeds:
>                 70 ns
>                 60 ns
>         Supported Memory Types:
>                 Standard
>                 EDO
>         Memory Module Voltage: 5.0 V
>         Associated Memory Slots: 4
>                 0x0006
>                 0x0007
>                 0x0008
>                 0x0009
>         Enabled Error Correcting Capabilities: None
> 
> .
> .
> .
> 
> 
> # hdparm -i /dev/sda
> 
> /dev/sda:
> 
>  Model=WDC WD1200JB-00CRA1                     , FwRev=17.07W17, SerialNo=WD-WCA8C4285629
>  Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
>  RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=40
>  BuffType=DualPortCache, BuffSize=8192kB, MaxMultSect=16, MultSect=?16?
>  CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=234441648
>  IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
>  PIO modes:  pio0 pio1 pio2 pio3 pio4
>  DMA modes:  mdma0 mdma1 mdma2
>  UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
>  AdvancedPM=no WriteCache=enabled
>  Drive conforms to: Unspecified:  ATA/ATAPI-1,2,3,4,5
> 
>  * signifies the current active mode
> 
> 
> 
> Athlon on EPOX 8K5A2+ board.
> 
> 
> 
> Again, 2.6.23 and 2.6.24-rc1 work, yet 2.6.24 -rc2, -rc3 and -rc4 FAIL.
> 
> Probably won't be able to do any reporting over the weekend (WOL is
> inoperable ATM for some weird reason), let me know what you need.
> Took too much time to gather this report already anyway ;)
> 
> Thanks,
> 
> Andreas Mohr

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  9:36 ` Andrew Morton
@ 2007-12-08 10:12   ` Andreas Mohr
  2007-12-08 10:20     ` Andrew Morton
  2007-12-09  6:52   ` Tejun Heo
  1 sibling, 1 reply; 74+ messages in thread
From: Andreas Mohr @ 2007-12-08 10:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide,
	Tejun Heo

Hi,

On Sat, Dec 08, 2007 at 01:36:31AM -0800, Andrew Morton wrote:
> > Subject		: PATA scan: ACPI Exception AE_AML_PACKAGE_LIMIT... is beyond end of object
> > Submitter	: Hans de Bruin <bruinjm@xs4all.nl>
> > References	: http://bugzilla.kernel.org/show_bug.cgi?id=9320
> > Handled-By	: Robert Moore <Robert.Moore@intel.com>
> > 		  Tejun Heo <htejun@gmail.com>
> > 		  Fu Michael <michael.fu@intel.com>
> > Patch		: 
> > 
> 
> A number of other people are seeing the same thing and Tejun is putting in
> a blacklist of machines which cannot use libata+acpi.  That patch is not
> yet in any git tree which I pull.
> 
> AFACIT the machines kepe working OK - there's just some nasty dmesg spew.
> 
> If any machines _are_ breaking then this could cause real problems and I'd
> prefer that we either go for a whitelist or arrange to detect the condition
> and fall back to non-acpi ata.

Does this report now win me the lucky draw, pretty please? ;)

STD regression rc1 -> rc234, suspend fails completely, recovering is
pretty much useless since HDD is DEAD from this point on anyway.
Managed to capture -rc2 suspend logging via still-alive ssh session.

2.6.24-rc1 suspend/resume log, successful (well, a couple seconds delay, most likely due to
well-recovered AML failure):

swsusp: Marking nosave pages: 000000000009f000 - 0000000000100000
swsusp: Basic memory bitmaps created
Syncing filesystems ... done.
Freezing user space processes ... (elapsed 0.00 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
Shrinking memory... done (0 pages freed)
Freed 0 kbytes in 0.02 seconds (0.00 MB/s)
Suspending console(s)
hub 4-0:1.0: hub_suspend
usb usb4: bus suspend
ehci_hcd 0000:00:10.3: suspend root hub
hub 3-0:1.0: hub_suspend
usb usb3: bus suspend
usb usb3: suspend_rh
hub 2-0:1.0: hub_suspend
usb usb2: bus suspend
usb usb2: suspend_rh
hub 1-0:1.0: hub_suspend
usb usb1: bus suspend
usb usb1: suspend_rh
sd 0:0:0:0: [sda] Synchronizing SCSI cache
parport_pc 00:09: disabled
serial 00:08: disabled
serial 00:07: disabled
ACPI: PCI interrupt for device 0000:00:11.5 disabled
ACPI handle has no context!
ACPI: PCI interrupt for device 0000:00:11.1 disabled
ACPI: PCI interrupt for device 0000:00:10.3 disabled
ehci_hcd 0000:00:10.3: --> PCI D3/wakeup
uhci_hcd 0000:00:10.2: uhci_suspend
ACPI: PCI interrupt for device 0000:00:10.2 disabled
uhci_hcd 0000:00:10.2: --> PCI D3
uhci_hcd 0000:00:10.1: uhci_suspend
ACPI: PCI interrupt for device 0000:00:10.1 disabled
uhci_hcd 0000:00:10.1: --> PCI D3
uhci_hcd 0000:00:10.0: uhci_suspend
ACPI: PCI interrupt for device 0000:00:10.0 disabled
uhci_hcd 0000:00:10.0: --> PCI D3
ACPI: PCI interrupt for device 0000:00:0d.0 disabled
ACPI handle has no context!
ACPI: PCI interrupt for device 0000:00:0c.0 disabled
ACPI handle has no context!
pci_set_power_state(): 0000:00:00.0: state=3, current state=5
swsusp: critical section:
swsusp: Need to copy 51195 pages
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
evxfevnt-0079 [00] enable                : System is already in ACPI mode
ACPI: PCI Interrupt Link [ALKA] BIOS reported IRQ 0, using IRQ 20
ACPI: PCI Interrupt Link [ALKB] BIOS reported IRQ 0, using IRQ 21
ACPI: PCI Interrupt Link [ALKC] BIOS reported IRQ 0, using IRQ 22
ACPI: PCI Interrupt Link [ALKD] BIOS reported IRQ 0, using IRQ 23
evxfevnt-0079 [00] enable                : System is already in ACPI mode
ACPI: Unable to turn cooling device [c180ff60] 'off'
PCI: Setting latency timer of device 0000:00:01.0 to 64
ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[19]  MMIO=[db140000-db1407ff]  Max Packet=[2048]  IR/IT contexts=[4/8]
ACPI: PCI Interrupt 0000:00:0a.0[A] -> GSI 18 (level, low) -> IRQ 18
e100: eth-intel: e100_watchdog: link up, 100Mbps, full-duplex
PM: Writing back config space on device 0000:00:0d.0 at offset 1 (was 2100007, writing 2100003)
ACPI: PCI Interrupt 0000:00:0d.0[A] -> GSI 19 (level, low) -> IRQ 22
uhci_hcd 0000:00:10.0: PCI D0, from previous PCI D3
ACPI: PCI Interrupt 0000:00:10.0[A] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20
uhci_hcd 0000:00:10.0: uhci_resume
uhci_hcd 0000:00:10.0: uhci_check_and_reset_hc: cmd = 0x0000
uhci_hcd 0000:00:10.0: Performing full reset
usb usb1: root hub lost power or was reset
usb usb1: suspend_rh
uhci_hcd 0000:00:10.1: PCI D0, from previous PCI D3
ACPI: PCI Interrupt 0000:00:10.1[B] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20
uhci_hcd 0000:00:10.1: uhci_resume
uhci_hcd 0000:00:10.1: uhci_check_and_reset_hc: cmd = 0x0000
uhci_hcd 0000:00:10.1: Performing full reset
usb usb2: root hub lost power or was reset
usb usb2: suspend_rh
uhci_hcd 0000:00:10.2: PCI D0, from previous PCI D3
ACPI: PCI Interrupt 0000:00:10.2[C] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20
uhci_hcd 0000:00:10.2: uhci_resume
uhci_hcd 0000:00:10.2: uhci_check_and_reset_hc: cmd = 0x0000
uhci_hcd 0000:00:10.2: Performing full reset
usb usb3: root hub lost power or was reset
usb usb3: suspend_rh
ehci_hcd 0000:00:10.3: PCI D0, from previous PCI D3
ACPI: PCI Interrupt 0000:00:10.3[D] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20
PM: Writing back config space on device 0000:00:10.3 at offset 3 (was 2008, writing 2010)
PM: Writing back config space on device 0000:00:10.3 at offset 1 (was 2100007, writing 2100017)
PM: Writing back config space on device 0000:00:11.1 at offset 1 (was 2900003, writing 2900007)
ACPI: PCI Interrupt 0000:00:11.1[A] -> Link [ALKA] -> GSI 20 (level, low) -> IRQ 17
ACPI: PCI Interrupt 0000:00:11.5[C] -> Link [ALKC] -> GSI 22 (level, low) -> IRQ 23
PCI: Setting latency timer of device 0000:00:11.5 to 64
ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 16
serial 00:07: activated
serial 00:08: activated
parport_pc 00:09: activated
i8042 aux 00:0a: activation failed
i8042 kbd 00:0b: activation failed
sd 0:0:0:0: [sda] Starting disk
ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT
ata1.01: _GTF evaluation failed (AE 0x300d)
ata1.01: revalidation failed (errno=-5)
ata1: failed to recover some devices, retrying in 5 secs
ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT
ata1.01: _GTF evaluation failed (AE 0x300d)
ata1.01: ACPI on devcfg failed the second time, disabling (errno=-5)
ata1.01: revalidation failed (errno=1)
ata1: failed to recover some devices, retrying in 5 secs
ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV0._GTF] (Node c180b840), AE_AML_PACKAGE_LIMIT
ata1.00: _GTF evaluation failed (AE 0x300d)
ata1.00: revalidation failed (errno=-5)
ata1: failed to recover some devices, retrying in 5 secs
ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV0._GTF] (Node c180b840), AE_AML_PACKAGE_LIMIT
ata1.00: _GTF evaluation failed (AE 0x300d)
ata1.00: ACPI on devcfg failed the second time, disabling (errno=-5)
ata1.00: revalidation failed (errno=1)
ata1: failed to recover some devices, retrying in 5 secs
ata1.00: configured for UDMA/100
ata1.01: configured for UDMA/33
sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors (120034 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors (120034 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
usb usb1: usb resume
usb usb1: wakeup_rh
hub 1-0:1.0: trying to enable port power on non-switchable hub
usb usb2: usb resume
usb usb2: wakeup_rh
hub 2-0:1.0: trying to enable port power on non-switchable hub
usb usb3: usb resume
usb usb3: wakeup_rh
hub 3-0:1.0: trying to enable port power on non-switchable hub
usb usb4: usb resume
ehci_hcd 0000:00:10.3: resume root hub
hub 4-0:1.0: hub_resume
Restarting tasks ... <7>hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0006
uhci_hcd 0000:00:10.0: port 1 portsc 018a,00
hub 1-0:1.0: port 1, status 0300, change 0003, 1.5 Mb/s
done.
swsusp: Basic memory bitmaps freed
hub 1-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x300
uhci_hcd 0000:00:10.0: port 2 portsc 008a,00
hub 1-0:1.0: port 2, status 0100, change 0003, 12 Mb/s
hub 1-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100
hub 2-0:1.0: state 7 ports 2 chg 0000 evt 0006
uhci_hcd 0000:00:10.1: port 1 portsc 018a,00
hub 2-0:1.0: port 1, status 0300, change 0003, 1.5 Mb/s
hub 2-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x300
uhci_hcd 0000:00:10.1: port 2 portsc 008a,00
hub 2-0:1.0: port 2, status 0100, change 0003, 12 Mb/s
hub 2-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100
hub 3-0:1.0: state 7 ports 2 chg 0000 evt 0006
uhci_hcd 0000:00:10.2: port 1 portsc 008a,00
hub 3-0:1.0: port 1, status 0100, change 0003, 12 Mb/s
hub 3-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x100
uhci_hcd 0000:00:10.2: port 2 portsc 008a,00
hub 3-0:1.0: port 2, status 0100, change 0003, 12 Mb/s
hub 3-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100
hub 4-0:1.0: state 7 ports 6 chg 0000 evt 0000
hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0000
hub 2-0:1.0: state 7 ports 2 chg 0000 evt 0000
hub 3-0:1.0: state 7 ports 2 chg 0000 evt 0000
usb usb1: suspend_rh (auto-stop)
usb usb2: suspend_rh (auto-stop)
usb usb3: suspend_rh (auto-stop)
agpgart: Found an AGP 2.0 compliant device at 0000:00:00.0.
agpgart: Putting AGP V2 device at 0000:00:00.0 into 4x mode
agpgart: Putting AGP V2 device at 0000:01:00.0 into 4x mode
[drm] Loading R200 Microcode



2.6.24-rc2 suspend log (one screenful), UNSUCCESSFUL:

serial 00:07: disabled
ACPI: PCI interrupt for device 0000:00:11.5 disabled
ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTM_]
(Node c180b9a8), AE_AML_PACKAGE_LIMIT
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN1._GTM] (Node c180b8d0), AE_AML_PACKAGE_LIMIT
ata2: ACPI get timing mode failed (AE 0x300d)
pci_device_suspend(): ata_pci_device_suspend+0x0/0x40() returns -22
suspend_device(): pci_device_suspend+0x0/0x70() returns -22
Could not suspend device 0000:00:11.1: error -22
ACPI: PCI Interrupt 0000:00:11.5[C] -> Link [ALKC] -> GSI 22 (level, low) -> IRQ 23
ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 16
serial 00:07: activated
serial 00:08: activated
parport_pc 00:09: activated
i8042 aux 00:0a: activation failed
i8042 kbd 00:0b: activation failed
sd 0:0:0:0: [sda] Starting disk
sd 0:0:0:0: timing out command, waited 180s
sd 0:0:0:0: [sda] START_STOP FAILED
sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK
Restarting tasks ... <7>hub 1-0:1.0: state 7 ports 6 chg 0000 evt 0000
done.
swsusp: Basic memory bitmaps freed
swsusp: Marking nosave pages: 000000000009f000 - 0000000000100000
swsusp: Basic memory bitmaps created
Syncing filesystems ...



# lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333]
00:01.0 PCI bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333 AGP]
00:09.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 46)
00:0a.0 Multimedia audio controller: Aureal Semiconductor Vortex 2 (rev fe)
00:0c.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 08)
00:0d.0 Multimedia audio controller: Aztech System Ltd 3328 Audio (rev 10)
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80)
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80)
00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge
00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 50)
00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74)
01:00.0 VGA compatible controller: ATI Technologies Inc Radeon RV250 If [Radeon 9000] (rev 01)
01:00.1 Display controller: ATI Technologies Inc Radeon RV250 [Radeon 9000] (Secondary) (rev 01)


# dmidecode 2.9
SMBIOS 2.2 present.
39 structures occupying 1035 bytes.
Table at 0x000F0800.

Handle 0x0000, DMI type 0, 19 bytes
BIOS Information
        Vendor: Award Software International, Inc.
        Version: 6.00 PG
        Release Date: 09/16/2003
        Address: 0xE0000
        Runtime Size: 128 kB
        ROM Size: 512 kB
        Characteristics:
                ISA is supported
                PCI is supported
                PNP is supported
                APM is supported
                BIOS is upgradeable
                BIOS shadowing is allowed
                ESCD support is available
                Boot from CD is supported
                Selectable boot is supported
                BIOS ROM is socketed
                EDD is supported
                5.25"/360 KB floppy services are supported (int 13h)
                5.25"/1.2 MB floppy services are supported (int 13h)
                3.5"/720 KB floppy services are supported (int 13h)
                3.5"/2.88 MB floppy services are supported (int 13h)
                Print screen service is supported (int 5h)
                8042 keyboard services are supported (int 9h)
                Serial services are supported (int 14h)
                Printer services are supported (int 17h)
                CGA/mono video services are supported (int 10h)
                ACPI is supported
                USB legacy is supported
                AGP is supported
                LS-120 boot is supported
                ATAPI Zip drive boot is supported

Handle 0x0001, DMI type 1, 25 bytes
System Information
        Manufacturer: VIA Technologies, Inc.
        Product Name: VT8367-8235
        Version:
        Serial Number:
        UUID: Not Present
        Wake-up Type: Power Switch

Handle 0x0002, DMI type 2, 8 bytes
Base Board Information
        Manufacturer:
        Product Name: VT8367-8235
        Version:
        Serial Number:

Handle 0x0003, DMI type 3, 13 bytes
Chassis Information
        Manufacturer:
        Type: Desktop
        Lock: Not Present
        Version:
        Serial Number:
        Asset Tag:
        Boot-up State: Unknown
        Power Supply State: Unknown
        Thermal State: Unknown
        Security Status: Unknown

Handle 0x0004, DMI type 4, 32 bytes
Processor Information
        Socket Designation: Socket A
        Type: Central Processor
        Family: Duron
        Manufacturer: AMD
        ID: 81 06 00 00 FF FB 83 03
        Signature: Family 6, Model 8, Stepping 1
        Flags:
                FPU (Floating-point unit on-chip)
                VME (Virtual mode extension)
                DE (Debugging extension)
                PSE (Page size extension)
                TSC (Time stamp counter)
                MSR (Model specific registers)
                PAE (Physical address extension)
                MCE (Machine check exception)
                CX8 (CMPXCHG8 instruction supported)
                APIC (On-chip APIC hardware supported)
                SEP (Fast system call)
                MTRR (Memory type range registers)
                PGE (Page global enable)
                MCA (Machine check architecture)
                CMOV (Conditional move instruction supported)
                PAT (Page attribute table)
                PSE-36 (36-bit page size extension)
                MMX (MMX technology supported)
                FXSR (Fast floating-point save and restore)
                SSE (Streaming SIMD extensions)
        Version: AMD K7 processor
        Voltage: 3.3 V
        External Clock: 133 MHz
        Max Speed: 1500 MHz
        Current Speed: 1200 MHz
        Status: Populated, Enabled
        Upgrade: ZIF Socket
        L1 Cache Handle: 0x000A
        L2 Cache Handle: 0x000B
        L3 Cache Handle: No L3 Cache

Handle 0x0005, DMI type 5, 24 bytes
Memory Controller Information
        Error Detecting Method: None
        Error Correcting Capabilities:
                None
        Supported Interleave: One-way Interleave
        Current Interleave: Four-way Interleave
        Maximum Memory Module Size: 32 MB
        Maximum Total Memory Size: 128 MB
        Supported Speeds:
                70 ns
                60 ns
        Supported Memory Types:
                Standard
                EDO
        Memory Module Voltage: 5.0 V
        Associated Memory Slots: 4
                0x0006
                0x0007
                0x0008
                0x0009
        Enabled Error Correcting Capabilities: None

.
.
.


# hdparm -i /dev/sda

/dev/sda:

 Model=WDC WD1200JB-00CRA1                     , FwRev=17.07W17, SerialNo=WD-WCA8C4285629
 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
 RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=40
 BuffType=DualPortCache, BuffSize=8192kB, MaxMultSect=16, MultSect=?16?
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=234441648
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: Unspecified:  ATA/ATAPI-1,2,3,4,5

 * signifies the current active mode



Athlon on EPOX 8K5A2+ board.



Again, 2.6.23 and 2.6.24-rc1 work, yet 2.6.24 -rc2, -rc3 and -rc4 FAIL.

Probably won't be able to do any reporting over the weekend (WOL is
inoperable ATM for some weird reason), let me know what you need.
Took too much time to gather this report already anyway ;)

Thanks,

Andreas Mohr

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  2:40 Rafael J. Wysocki
                   ` (4 preceding siblings ...)
  2007-12-08  9:46 ` Andrew Morton
@ 2007-12-08  9:52 ` Andrew Morton
  2007-12-09  7:00   ` Tejun Heo
  2007-12-08 10:44 ` Richard Purdie
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 74+ messages in thread
From: Andrew Morton @ 2007-12-08  9:52 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, Linus Torvalds, Ingo Molnar, Tejun Heo

On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> This message contains a list of some regressions from 2.6.23 which have been
> reported since 2.6.24-rc1 was released and for which there are no fixes in the
> mainline that I know of.  If any of them have been fixed already, please let me
> know.
> 
> If you know of any other unresolved regressions from 2.6.23, please let me know
> either and I'll add them to the list.
> 
> ...
> 
> Subject		: cd/dvd inaccessible in 2.6.24-rc2
> Submitter	: Will Trives <will@trivescon.com.au>
> References	: http://lkml.org/lkml/2007/11/9/290
> 		  http://bugzilla.kernel.org/show_bug.cgi?id=9346
> Handled-By	: Len Brown <lenb@kernel.org>
> 		  Tejun Heo <htejun@gmail.com>
> Patch		: 
> 

Nasty one.  Tejun and several diligent reporters are doing sterling work
there and things have improved.  I don't know whether any of Tejun's
patches have been merged yet, but we'll probably be OK on this one.

What is unclear (to me) is what actually caused those people's machines to
break?


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  2:40 Rafael J. Wysocki
                   ` (3 preceding siblings ...)
  2007-12-08  9:42 ` Andrew Morton
@ 2007-12-08  9:46 ` Andrew Morton
  2007-12-08 15:49   ` Alan Stern
  2007-12-08  9:52 ` Andrew Morton
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 74+ messages in thread
From: Andrew Morton @ 2007-12-08  9:46 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, Linus Torvalds, Ingo Molnar, Alan Stern

On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> This message contains a list of some regressions from 2.6.23 which have been
> reported since 2.6.24-rc1 was released and for which there are no fixes in the
> mainline that I know of.  If any of them have been fixed already, please let me
> know.
> 
> If you know of any other unresolved regressions from 2.6.23, please let me know
> either and I'll add them to the list.
> 
> 
> ..
>
> Subject		: system hangs after a few minutes
> Submitter	: Marcus Better <marcus@better.se>
> References	: http://bugzilla.kernel.org/show_bug.cgi?id=9335
> Handled-By	: Andrew Morton <akpm@linux-foundation.org>
> 		  Alan Stern <stern@rowland.harvard.edu>
> Patch		: http://bugzilla.kernel.org/attachment.cgi?id=13871&action=view
> 

This one we have a confirmed fix from Alan but it doesn't appear to be in
anyone's tree.

There is a second bug in here, applicable to core x86: Marcus's machine
won't boot with nmi_watchdog=1.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  2:40 Rafael J. Wysocki
                   ` (2 preceding siblings ...)
  2007-12-08  9:36 ` Andrew Morton
@ 2007-12-08  9:42 ` Andrew Morton
  2007-12-08 18:57   ` Roland Dreier
  2007-12-08 19:40   ` Theodore Tso
  2007-12-08  9:46 ` Andrew Morton
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 74+ messages in thread
From: Andrew Morton @ 2007-12-08  9:42 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Linus Torvalds, Ingo Molnar, Roland Dreier, Takashi Iwai,
	Theodore Ts'o

On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> This message contains a list of some regressions from 2.6.23 which have been
> reported since 2.6.24-rc1 was released and for which there are no fixes in the
> mainline that I know of.  If any of them have been fixed already, please let me
> know.
> 
> If you know of any other unresolved regressions from 2.6.23, please let me know
> either and I'll add them to the list.
> 
> ...
> 
> Subject		: snd_hda_intel 2.6.24-rc2 bug: interrupts don't always work on Lenovo X60s
> Submitter	: Roland Dreier <rdreier@cisco.com>
> References	: http://lkml.org/lkml/2007/11/8/255
> 		  http://bugzilla.kernel.org/show_bug.cgi?id=9332
> Handled-By	: 
> Patch		: 

Takashi had a patch and that has been merged.  AFAIK this regression
has been fixed and we're left with a new but harmless warning.

However Roland reported other problems and it appears that the trail went
cold (http://lkml.org/lkml/2007/11/14/251)

Ted was hitting some of the same problems but that trail appears to also
have gone cold (http://lkml.org/lkml/2007/11/23/17).

Guys, can we have a status update on all of this please?

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  2:40 Rafael J. Wysocki
  2007-12-08  6:53 ` Fabio Comolli
  2007-12-08  9:29 ` Andrew Morton
@ 2007-12-08  9:36 ` Andrew Morton
  2007-12-08 10:12   ` Andreas Mohr
  2007-12-09  6:52   ` Tejun Heo
  2007-12-08  9:42 ` Andrew Morton
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 74+ messages in thread
From: Andrew Morton @ 2007-12-08  9:36 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, Linus Torvalds, Ingo Molnar, linux-ide, Tejun Heo

On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> This message contains a list of some regressions from 2.6.23 which have been
> reported since 2.6.24-rc1 was released and for which there are no fixes in the
> mainline that I know of.  If any of them have been fixed already, please let me
> know.
> 
> If you know of any other unresolved regressions from 2.6.23, please let me know
> either and I'll add them to the list.
> 
> 
> Subject		: PATA scan: ACPI Exception AE_AML_PACKAGE_LIMIT... is beyond end of object
> Submitter	: Hans de Bruin <bruinjm@xs4all.nl>
> References	: http://bugzilla.kernel.org/show_bug.cgi?id=9320
> Handled-By	: Robert Moore <Robert.Moore@intel.com>
> 		  Tejun Heo <htejun@gmail.com>
> 		  Fu Michael <michael.fu@intel.com>
> Patch		: 
> 

A number of other people are seeing the same thing and Tejun is putting in
a blacklist of machines which cannot use libata+acpi.  That patch is not
yet in any git tree which I pull.

AFACIT the machines kepe working OK - there's just some nasty dmesg spew.

If any machines _are_ breaking then this could cause real problems and I'd
prefer that we either go for a whitelist or arrange to detect the condition
and fall back to non-acpi ata.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  2:40 Rafael J. Wysocki
  2007-12-08  6:53 ` Fabio Comolli
@ 2007-12-08  9:29 ` Andrew Morton
  2007-12-08 22:17   ` Rafael J. Wysocki
  2007-12-08  9:36 ` Andrew Morton
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 74+ messages in thread
From: Andrew Morton @ 2007-12-08  9:29 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Linus Torvalds, Ingo Molnar, Márton Németh

On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> This message contains a list of some regressions from 2.6.23 which have been
> reported since 2.6.24-rc1 was released and for which there are no fixes in the
> mainline that I know of.  If any of them have been fixed already, please let me
> know.
> 
> If you know of any other unresolved regressions from 2.6.23, please let me know
> either and I'll add them to the list.

Twenty nine, huh?

It would be useful if these records were sorted in date-of-reportage order
and had a date stamp so we could see how long they've been hanging about.
Something to think about for the post-2.6.24 regression if you'll be handling
those?

> Subject		: leds: ledtrig-timer calls sleeping function from invalid context
> Submitter	: Márton Németh <nm127@freemail.hu>
> References	: http://bugzilla.kernel.org/show_bug.cgi?id=9264
> Handled-By	: Richard Purdie <rpurdie@rpsys.net>
> Patch		: http://bugzilla.kernel.org/attachment.cgi?id=13493&action=view

That patch has been merged (dc47206e552c0850ad11f7e9a1fca0a3c92f5d65) and
assuming Márton has tested the latest git snapshot
(ftp://ftp.kernel.org/pub/linux/kernel/v2.6/snapshots) successfully we can
cross it off?  

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  8:28   ` Ingo Molnar
@ 2007-12-08  9:23     ` Andrew Morton
  2007-12-08 22:11       ` Rafael J. Wysocki
  0 siblings, 1 reply; 74+ messages in thread
From: Andrew Morton @ 2007-12-08  9:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Fabio Comolli, Rafael J. Wysocki, LKML, Linus Torvalds, Greg KH,
	Len Brown, Alexey Starikovskiy

On Sat, 8 Dec 2007 09:28:15 +0100 Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Fabio Comolli <fabio.comolli@gmail.com> wrote:
> 
> > <snip>
> > 
> > > Subject         : Battery shows up twice in kpowersave
> > > Submitter       : Rolf Eike Beer <eike-kernel@sf-tec.de>
> > > References      : http://bugzilla.kernel.org/show_bug.cgi?id=9494
> > > Handled-By      : Alexey Starikovskiy <astarikovskiy@suse.de>
> > > Patch           :
> > >
> > 
> > I don't think that this is a regression: I reported on RedHat bugzilla 
> > when I switched from F7 to F8 and I was using 2.6.23.8 at that time. 
> > It looks to me an HAL regression, but of course I may be wrong :-) as 
> > the reported bisected to a bad commit.
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=373041
> > 
> > By the way, I now switched to Fedrora Rawhide with a 2.6.24-rc4-git5 
> > custom kernel and Gnome desktop and the problem is still present, even 
> > with gnome-power-manager.
> 
> to me this looks like an ABI regression - utilities should work without 
> change. Something changed in /sys output that caused HAL to think that 
> there are two batteries:

Yep.  Although HAL is of course a most special case of "userspace".

> | The output of lshal shows that there are two UDI's with 
> | info.capabilities = { 'battery' }:
> |
> | udi = '/org/freedesktop/Hal/devices/acpi_BAT0'
> | udi = '/org/freedesktop/Hal/devices/computer_power_supply_0'
> 
> whether it's a HAL bug or a kernel bug, the original state should be 
> restored and it should be worked out without breaking users of older HAL 
> versions.

"breaking users of older HAL versions" == "breaking machines".

The patch should be reverted.  Do we know which one it was?

> grumble: way too many times do various system utilities break when i 
> upgrade the kernel on my laptop. Maybe a new debug mechanism: we should 
> start fingerprinting the exact /sys and /proc output and enforce that 
> it's immutable across kernel releases as long as the hardware is 
> unmodified?

That would be neat.  It would need to be executed on a lot of different
machines.

I wonder if there's something sneaky we can do here.  Install the script in
/lib/modules/$(uname -r) and then run it from the kernel when the fork
count reaches 1000 ;)

(hey, I've seen worse: /proc files which start with #!/bin/sh)

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  6:53 ` Fabio Comolli
@ 2007-12-08  8:28   ` Ingo Molnar
  2007-12-08  9:23     ` Andrew Morton
  0 siblings, 1 reply; 74+ messages in thread
From: Ingo Molnar @ 2007-12-08  8:28 UTC (permalink / raw)
  To: Fabio Comolli
  Cc: Rafael J. Wysocki, LKML, Andrew Morton, Linus Torvalds, Greg KH,
	Len Brown


* Fabio Comolli <fabio.comolli@gmail.com> wrote:

> <snip>
> 
> > Subject         : Battery shows up twice in kpowersave
> > Submitter       : Rolf Eike Beer <eike-kernel@sf-tec.de>
> > References      : http://bugzilla.kernel.org/show_bug.cgi?id=9494
> > Handled-By      : Alexey Starikovskiy <astarikovskiy@suse.de>
> > Patch           :
> >
> 
> I don't think that this is a regression: I reported on RedHat bugzilla 
> when I switched from F7 to F8 and I was using 2.6.23.8 at that time. 
> It looks to me an HAL regression, but of course I may be wrong :-) as 
> the reported bisected to a bad commit.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=373041
> 
> By the way, I now switched to Fedrora Rawhide with a 2.6.24-rc4-git5 
> custom kernel and Gnome desktop and the problem is still present, even 
> with gnome-power-manager.

to me this looks like an ABI regression - utilities should work without 
change. Something changed in /sys output that caused HAL to think that 
there are two batteries:

| The output of lshal shows that there are two UDI's with 
| info.capabilities = { 'battery' }:
|
| udi = '/org/freedesktop/Hal/devices/acpi_BAT0'
| udi = '/org/freedesktop/Hal/devices/computer_power_supply_0'

whether it's a HAL bug or a kernel bug, the original state should be 
restored and it should be worked out without breaking users of older HAL 
versions.

grumble: way too many times do various system utilities break when i 
upgrade the kernel on my laptop. Maybe a new debug mechanism: we should 
start fingerprinting the exact /sys and /proc output and enforce that 
it's immutable across kernel releases as long as the hardware is 
unmodified?

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
  2007-12-08  2:40 Rafael J. Wysocki
@ 2007-12-08  6:53 ` Fabio Comolli
  2007-12-08  8:28   ` Ingo Molnar
  2007-12-08  9:29 ` Andrew Morton
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 74+ messages in thread
From: Fabio Comolli @ 2007-12-08  6:53 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, Andrew Morton, Linus Torvalds, Ingo Molnar

Hi.

On Dec 8, 2007 3:40 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> This message contains a list of some regressions from 2.6.23 which have been
> reported since 2.6.24-rc1 was released and for which there are no fixes in the
> mainline that I know of. If any of them have been fixed already, please let me
> know.
>
> If you know of any other unresolved regressions from 2.6.23, please let me know
> either and I'll add them to the list.

<snip>

> Subject         : Battery shows up twice in kpowersave
> Submitter       : Rolf Eike Beer <eike-kernel@sf-tec.de>
> References      : http://bugzilla.kernel.org/show_bug.cgi?id=9494
> Handled-By      : Alexey Starikovskiy <astarikovskiy@suse.de>
> Patch           :
>

I don't think that this is a regression: I reported on RedHat bugzilla
when I switched from F7 to F8 and I was using 2.6.23.8 at that time.
It looks to me an HAL regression, but of course I may be wrong :-) as
the reported bisected to a bad commit.

https://bugzilla.redhat.com/show_bug.cgi?id=373041

By the way, I now switched to Fedrora Rawhide with a 2.6.24-rc4-git5
custom kernel and Gnome desktop and the problem is still present, even
with gnome-power-manager.

Hope this helps.
Regards,
Fabio

^ permalink raw reply	[flat|nested] 74+ messages in thread

* 2.6.24-rc4-git5: Reported regressions from 2.6.23
@ 2007-12-08  2:40 Rafael J. Wysocki
  2007-12-08  6:53 ` Fabio Comolli
                   ` (8 more replies)
  0 siblings, 9 replies; 74+ messages in thread
From: Rafael J. Wysocki @ 2007-12-08  2:40 UTC (permalink / raw)
  To: LKML; +Cc: Andrew Morton, Linus Torvalds, Ingo Molnar

This message contains a list of some regressions from 2.6.23 which have been
reported since 2.6.24-rc1 was released and for which there are no fixes in the
mainline that I know of.  If any of them have been fixed already, please let me
know.

If you know of any other unresolved regressions from 2.6.23, please let me know
either and I'll add them to the list.


Subject		: EHCI causes system to resume instantly from S4
Submitter	: Maxim Levitsky <maximlevitsky@gmail.com>
References	: http://lkml.org/lkml/2007/10/27/66
		  http://bugzilla.kernel.org/show_bug.cgi?id=9258
Handled-By	: "Rafael J. Wysocki" <rjw@sisk.pl>
		  David Brownell <david-b@pacbell.net>
		  Alan Stern <stern@rowland.harvard.edu>
Patch		: 
Note:		: the problem appears to heavily depend on hardware


Subject		: leds: ledtrig-timer calls sleeping function from invalid context
Submitter	: Márton Németh <nm127@freemail.hu>
References	: http://bugzilla.kernel.org/show_bug.cgi?id=9264
Handled-By	: Richard Purdie <rpurdie@rpsys.net>
Patch		: http://bugzilla.kernel.org/attachment.cgi?id=13493&action=view


Subject		: PATA scan: ACPI Exception AE_AML_PACKAGE_LIMIT... is beyond end of object
Submitter	: Hans de Bruin <bruinjm@xs4all.nl>
References	: http://bugzilla.kernel.org/show_bug.cgi?id=9320
Handled-By	: Robert Moore <Robert.Moore@intel.com>
		  Tejun Heo <htejun@gmail.com>
		  Fu Michael <michael.fu@intel.com>
Patch		: 


Subject		: snd_hda_intel 2.6.24-rc2 bug: interrupts don't always work on Lenovo X60s
Submitter	: Roland Dreier <rdreier@cisco.com>
References	: http://lkml.org/lkml/2007/11/8/255
		  http://bugzilla.kernel.org/show_bug.cgi?id=9332
Handled-By	: 
Patch		: 


Subject		: system hangs after a few minutes
Submitter	: Marcus Better <marcus@better.se>
References	: http://bugzilla.kernel.org/show_bug.cgi?id=9335
Handled-By	: Andrew Morton <akpm@linux-foundation.org>
		  Alan Stern <stern@rowland.harvard.edu>
Patch		: http://bugzilla.kernel.org/attachment.cgi?id=13871&action=view


Subject		: cd/dvd inaccessible in 2.6.24-rc2
Submitter	: Will Trives <will@trivescon.com.au>
References	: http://lkml.org/lkml/2007/11/9/290
		  http://bugzilla.kernel.org/show_bug.cgi?id=9346
Handled-By	: Len Brown <lenb@kernel.org>
		  Tejun Heo <htejun@gmail.com>
Patch		: 


Subject		: The keyboard doesn't work
Submitter	: Francois Valenduc <francois.valenduc@skynet.be>
References	: http://bugzilla.kernel.org/show_bug.cgi?id=9362
Handled-By	: Dmitry Torokhov <dtor@insightbb.com>
		  Ingo Molnar <mingo@elte.hu>
		  Alexey Starikovskiy <astarikovskiy@suse.de>
Patch		: http://bugzilla.kernel.org/attachment.cgi?id=13892&action=view
		  http://bugzilla.kernel.org/attachment.cgi?id=13893&action=view
		  http://bugzilla.kernel.org/attachment.cgi?id=13907&action=view
Note		: patches to apply in this order, top-down


Subject		: v2.6.24-rc2-409-g9418d5d: attempt to access beyond end of device
Submitter	: Thomas Meyer <thomas@m3y3r.de>
References	: http://lkml.org/lkml/2007/11/13/250
		  http://bugzilla.kernel.org/show_bug.cgi?id=9370
Handled-By	: Matthew Wilcox <matthew@wil.cx>
Patch		: 


Subject		: SError: { DevExch } occuring and causing disruption
Submitter	: Avuton Olrich <avuton@gmail.com>
References	: http://bugzilla.kernel.org/show_bug.cgi?id=9393
Handled-By	: Tejun Heo <htejun@gmail.com>
		  Mark Lord <mlord@pobox.com>
Patch		: 


Subject		: nfsd gets stuck when underlying filesystem is XFS
Submitter	: Christian Kujau <lists@nerdbynature.de>
		  Chris Wedgwood <cw@f00f.org>
References	: http://bugzilla.kernel.org/show_bug.cgi?id=9400
Handled-By	: "J. Bruce Fields" <bfields@fieldses.org>
		  Christoph Hellwig <hch@infradead.org>
Patch		: http://lkml.org/lkml/2007/11/25/39


Subject		: 2.6.24-rc3: find complains about /proc/net
Submitter	: Pavel Machek <pavel@ucw.cz>
References	: http://lkml.org/lkml/2007/11/19/253
		  http://bugzilla.kernel.org/show_bug.cgi?id=9411
Handled-By	: "Eric W. Biederman" <ebiederm@xmission.com>
Patch		:
Note		: the existing fix needs fixing


Subject		: Not work light of button-led with module b43 in chipset broadcom 4318
Submitter	: Cristian Aravena Romero <caravena@gmail.com>
References	: http://bugzilla.kernel.org/show_bug.cgi?id=9414
Handled-By	: 
Patch		: 


Subject		: unable to turn cooling device 'off' - LG LE50 Express laptop
Submitter	: Marcus Better <marcus@better.se>
References	: http://bugzilla.kernel.org/show_bug.cgi?id=9432
Handled-By	: Len Brown <lenb@kernel.org>
		  Alexey Starikovskiy <astarikovskiy@suse.de>
Patch		: 


Subject		: 2.6.24-rc3 can't see sd partitions on Alpha
Submitter	: Bob Tracy <rct@gherkin.frus.com>
References	: http://lkml.org/lkml/2007/11/18/3
		  http://bugzilla.kernel.org/show_bug.cgi?id=9457
Handled-By	: Andrew Morton <akpm@linux-foundation.org>
		  Kay Sievers <kay.sievers@vrfy.org>
		  Ingo Molnar <mingo@elte.hu>
Patch		: 


Subject		: 2.6.24-rc3-git2 softlockup detected
Submitter	: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
References	: http://lkml.org/lkml/2007/11/28/16
		  http://bugzilla.kernel.org/show_bug.cgi?id=9472
Handled-By	: Andrew Morton <akpm@linux-foundation.org>
		  Ingo Molnar <mingo@elte.hu>
Patch		: 


Subject		: jiffies counter leaps in 2.6.24-rc3
Submitter	: Stefano Brivio <stefano.brivio@polimi.it>
References	: http://lkml.org/lkml/2007/11/24/53
		  http://bugzilla.kernel.org/show_bug.cgi?id=9475
Handled-By	: Ingo Molnar <mingo@elte.hu>
Patch		: http://lkml.org/lkml/2007/12/7/132


Subject		: kernel GPF in 2.6.24 (g09f345da)
Submitter	: Jon Nelson <jnelson-kernel-bugzilla@jamponi.net>
References	: http://bugzilla.kernel.org/show_bug.cgi?id=9482
Handled-By	: Andrew Morton <akpm@linux-foundation.org>
Patch		: 


Subject		: 20000+ wake-ups/second in 2.6.24
Submitter	: Mark Lord <lkml@rtr.ca>
References	: http://lkml.org/lkml/2007/12/1/141
		  http://bugzilla.kernel.org/show_bug.cgi?id=9489
Handled-By	: Arjan van de Ven <arjan@infradead.org>
Patch		: 


Subject		: 2.6.24: false double-clicks from USB mouse
Submitter	: Mark Lord <lkml@rtr.ca>
References	: http://lkml.org/lkml/2007/12/2/86
		  http://bugzilla.kernel.org/show_bug.cgi?id=9492
Handled-By	: Jiri Kosina <jkosina@suse.cz>
Patch		: 


Subject		: Battery shows up twice in kpowersave
Submitter	: Rolf Eike Beer <eike-kernel@sf-tec.de>
References	: http://bugzilla.kernel.org/show_bug.cgi?id=9494
Handled-By	: Alexey Starikovskiy <astarikovskiy@suse.de>
Patch		: 


Subject		: kobject ->k_name memory leak
Submitter	: Alexey Dobriyan <adobriyan@sw.ru>
References	: http://lkml.org/lkml/2007/12/3/20
		  http://bugzilla.kernel.org/show_bug.cgi?id=9496
Handled-By	: Greg KH <gregkh@suse.de>
Patch		: 


Subject		: tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot()
Submitter	: Ingo Molnar <mingo@elte.hu>
References	: http://lkml.org/lkml/2007/11/29/157
		  http://bugzilla.kernel.org/show_bug.cgi?id=9497
Handled-By	: Matt Mackall <mpm@selenic.com>
Patch		: http://lkml.org/lkml/2007/11/29/387


Subject		: Regression - 2.6.24-rc3 - umem nvram card driver oops
Submitter	: David Chinner <dgc@sgi.com>
References	: http://lkml.org/lkml/2007/12/3/216
		  http://bugzilla.kernel.org/show_bug.cgi?id=9498
Handled-By	: Neil Brown <neilb@suse.de>
Patch		: http://lkml.org/lkml/2007/12/3/266


Subject		: PS3: trouble with SPARSEMEM_VMEMMAP and kexec
Submitter	: Geoff Levand <geoffrey.levand@am.sony.com>
References	: http://lkml.org/lkml/2007/12/3/137
		  http://bugzilla.kernel.org/show_bug.cgi?id=9499
Handled-By	: Milton Miller <miltonm@bga.com>
		  Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
		  Yasunori Goto <y-goto@jp.fujitsu.com>
Patch		: 


Subject		: binfmt_misc file system is empty
Submitter	: Marcus Better <marcus@better.se>
References	: http://bugzilla.kernel.org/show_bug.cgi?id=9504
Handled-By	: Denis V. Lunev <den@openvz.org>
Patch		:


Subject		: 2.6.24-rc4 hwmon it87 probe fails
Submitter	: Mike Houston <mikeserv@bmts.com>
References	: http://lkml.org/lkml/2007/12/4/466
		  http://bugzilla.kernel.org/show_bug.cgi?id=9514
Handled-By	: 
Patch		: 


Subject		: Major regression on hackbench with SLUB
Submitter	: Steven Rostedt <rostedt@goodmis.org>
References	: http://lkml.org/lkml/2007/12/7/181
		  http://bugzilla.kernel.org/show_bug.cgi?id=9521
Handled-By	: Linus Torvalds <torvalds@linux-foundation.org>
Patch		: 


Subject		: 2.6.24-rc3-git4 NFS crossmnt regression
Submitter	: Shane <gnome42@gmail.com>
References	: http://lkml.org/lkml/2007/12/6/410
		  http://bugzilla.kernel.org/show_bug.cgi?id=9522
Handled-By	: "Trond Myklebust" <trond.myklebust@fys.uio.no>
Patch		: http://bugzilla.kernel.org/attachment.cgi?id=13908&action=view


Subject		: soft lockup - CPU#1 stuck for 15s! [swapper:0]
Submitter	: "Parag Warudkar" <parag.warudkar@gmail.com>
References	: http://lkml.org/lkml/2007/12/7/299
		  http://bugzilla.kernel.org/show_bug.cgi?id=9525
Handled-By	: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>
Patch		: 


For details, please follow the links given in references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.23,
unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=9243

Please let me know if there are any Bugzilla entries that should be added to
the list in there.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2007-12-20 17:41 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <fa.qQhD8aJpiTOFaZqjRYwoaG7YT1c@ifi.uio.no>
     [not found] ` <fa.yaP86AixGhz5Q7eXSu04pIQp6ho@ifi.uio.no>
     [not found]   ` <fa.c3k8VKWAx4HIo9zXWbL5Ek0oSBw@ifi.uio.no>
     [not found]     ` <fa.C8ACnOhs8bXB++vmugf5F34JcJg@ifi.uio.no>
     [not found]       ` <fa.5o6E6S0UWnARbQPxLe30TvLQIiY@ifi.uio.no>
2007-12-08 18:24         ` 2.6.24-rc4-git5: Reported regressions from 2.6.23 Robert Hancock
2007-12-09  5:59           ` Tejun Heo
2007-12-09 21:36           ` Andreas Mohr
2007-12-10  0:04             ` Andreas Mohr
2007-12-10  0:49               ` Andreas Mohr
2007-12-10  1:28                 ` Robert Hancock
2007-12-10  2:25                   ` Tejun Heo
2007-12-10  3:20                     ` Robert Hancock
2007-12-10  2:20                 ` Tejun Heo
2007-12-08  2:40 Rafael J. Wysocki
2007-12-08  6:53 ` Fabio Comolli
2007-12-08  8:28   ` Ingo Molnar
2007-12-08  9:23     ` Andrew Morton
2007-12-08 22:11       ` Rafael J. Wysocki
2007-12-08  9:29 ` Andrew Morton
2007-12-08 22:17   ` Rafael J. Wysocki
2007-12-08  9:36 ` Andrew Morton
2007-12-08 10:12   ` Andreas Mohr
2007-12-08 10:20     ` Andrew Morton
2007-12-08 10:28       ` Matthew Garrett
2007-12-08 10:55       ` Andreas Mohr
2007-12-09 15:46         ` Tejun Heo
2007-12-09 19:59           ` Andreas Mohr
2007-12-09  6:52   ` Tejun Heo
2007-12-09 14:20     ` Rafael J. Wysocki
2007-12-09 15:11       ` Tejun Heo
2007-12-08  9:42 ` Andrew Morton
2007-12-08 18:57   ` Roland Dreier
2007-12-08 19:40   ` Theodore Tso
2007-12-08 19:55     ` Ingo Molnar
2007-12-08 22:30     ` Rafael J. Wysocki
2007-12-09  2:15       ` Theodore Tso
2007-12-13 10:49         ` Takashi Iwai
2007-12-20 15:42           ` Takashi Iwai
2007-12-08  9:46 ` Andrew Morton
2007-12-08 15:49   ` Alan Stern
2007-12-08  9:52 ` Andrew Morton
2007-12-09  7:00   ` Tejun Heo
2007-12-09 13:42     ` Alan Cox
2007-12-09 15:09       ` Tejun Heo
2007-12-09 15:25         ` Alan Cox
2007-12-09 15:39           ` Tejun Heo
2007-12-09 18:36       ` Linus Torvalds
2007-12-09 21:54         ` Alan Cox
2007-12-09 18:41       ` Linus Torvalds
2007-12-09 22:01         ` Alan Cox
2007-12-09 22:51           ` Ray Lee
2007-12-10  1:57           ` Linus Torvalds
2007-12-10  3:28             ` Alan Cox
2007-12-10  3:38             ` Alan Cox
2007-12-10 15:38               ` Linus Torvalds
2007-12-10  8:21             ` Ingo Molnar
2007-12-10  8:27               ` Tejun Heo
2007-12-10  8:41                 ` Ingo Molnar
2007-12-08 10:44 ` Richard Purdie
2007-12-08 22:32   ` Rafael J. Wysocki
2007-12-09 11:54 ` Andrew Morton
2007-12-09 12:05   ` Ingo Molnar
2007-12-09 14:24   ` Rafael J. Wysocki
2007-12-10 20:42 ` Ingo Molnar
2007-12-10 20:57   ` Guillaume Chazarain
2007-12-10 20:59   ` Andrew Morton
2007-12-10 22:45     ` Ingo Molnar
2007-12-10 23:04       ` Ingo Molnar
2007-12-10 23:34         ` Stefano Brivio
2007-12-10 23:53           ` Guillaume Chazarain
2007-12-11  8:48             ` Ingo Molnar
2007-12-10 23:56           ` Arjan van de Ven
2007-12-11  0:01             ` Guillaume Chazarain
2007-12-11  1:06               ` Arjan van de Ven
2007-12-11  8:43                 ` Ingo Molnar
2007-12-11  9:01           ` Ingo Molnar
2007-12-11 21:10             ` Stefano Brivio
2007-12-19  0:58             ` Stefano Brivio

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).