* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 [not found] ` <fa.5o6E6S0UWnARbQPxLe30TvLQIiY@ifi.uio.no> @ 2007-12-08 18:24 ` Robert Hancock 2007-12-09 5:59 ` Tejun Heo 2007-12-09 21:36 ` Andreas Mohr 0 siblings, 2 replies; 74+ messages in thread From: Robert Hancock @ 2007-12-08 18:24 UTC (permalink / raw) To: Matthew Garrett Cc: Andrew Morton, Andreas Mohr, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide, Tejun Heo, Len Brown, linux-acpi Matthew Garrett wrote: > On Sat, Dec 08, 2007 at 02:20:01AM -0800, Andrew Morton wrote: >> On Sat, 8 Dec 2007 11:12:57 +0100 Andreas Mohr <andi@lisas.de> wrote: >>> ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126] >>> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT >>> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT >>> ata1.01: _GTF evaluation failed (AE 0x300d) > > 037f6bb79f753c014bc84bca0de9bf98bb5ab169 ought to have fixed this? > I should think it should have. I think we're too aggressive about disabling the libata ACPI support, even. One of my laptop's _GTF commands on resume is a DEVICE CONFIGURATION FREEZE LOCK command, which gets rejected by the drive (maybe it worked on the original Hitachi disk, but I've upgraded it to a newer Samsung). I'd say if the drive returns command aborted on one of these, we should just ignore that command and continue to the next one without trying to retry or disabling the ACPI support entirely. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 18:24 ` 2.6.24-rc4-git5: Reported regressions from 2.6.23 Robert Hancock @ 2007-12-09 5:59 ` Tejun Heo 2007-12-09 21:36 ` Andreas Mohr 1 sibling, 0 replies; 74+ messages in thread From: Tejun Heo @ 2007-12-09 5:59 UTC (permalink / raw) To: Robert Hancock Cc: Matthew Garrett, Andrew Morton, Andreas Mohr, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide, Len Brown, linux-acpi Robert Hancock wrote: > Matthew Garrett wrote: >> On Sat, Dec 08, 2007 at 02:20:01AM -0800, Andrew Morton wrote: >>> On Sat, 8 Dec 2007 11:12:57 +0100 Andreas Mohr <andi@lisas.de> wrote: >>>> ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index >>>> (0FFFFFFFF) is beyond end of object [20070126] >>>> ACPI Error (psparse-0537): Method parse/execution failed >>>> [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT >>>> ACPI Error (psparse-0537): Method parse/execution failed >>>> [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT >>>> ata1.01: _GTF evaluation failed (AE 0x300d) >> >> 037f6bb79f753c014bc84bca0de9bf98bb5ab169 ought to have fixed this? >> > > I should think it should have. > > I think we're too aggressive about disabling the libata ACPI support, > even. One of my laptop's _GTF commands on resume is a DEVICE > CONFIGURATION FREEZE LOCK command, which gets rejected by the drive > (maybe it worked on the original Hitachi disk, but I've upgraded it to a > newer Samsung). I'd say if the drive returns command aborted on one of > these, we should just ignore that command and continue to the next one > without trying to retry or disabling the ACPI support entirely. Yeap, my pending patchset does exactly that. It's currently being tested by but reporters. I'll soon post the patchset. Thanks. -- tejun ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 18:24 ` 2.6.24-rc4-git5: Reported regressions from 2.6.23 Robert Hancock 2007-12-09 5:59 ` Tejun Heo @ 2007-12-09 21:36 ` Andreas Mohr 2007-12-10 0:04 ` Andreas Mohr 1 sibling, 1 reply; 74+ messages in thread From: Andreas Mohr @ 2007-12-09 21:36 UTC (permalink / raw) To: Robert Hancock Cc: Matthew Garrett, Andrew Morton, Andreas Mohr, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide, Tejun Heo, Len Brown, linux-acpi Hi, [ACPI _GTM suspend issue sorta fixed, read below] On Sat, Dec 08, 2007 at 12:24:16PM -0600, Robert Hancock wrote: > Matthew Garrett wrote: >> On Sat, Dec 08, 2007 at 02:20:01AM -0800, Andrew Morton wrote: >>> On Sat, 8 Dec 2007 11:12:57 +0100 Andreas Mohr <andi@lisas.de> wrote: >>>> ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126] >>>> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT >>>> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT >>>> ata1.01: _GTF evaluation failed (AE 0x300d) >> >> 037f6bb79f753c014bc84bca0de9bf98bb5ab169 ought to have fixed this? >> > > I should think it should have. Yup, the _GTF problem is certainly fixed, but this is a dead-end since I showed the -rc1 vs. -rc2 behaviour, whereas I still have failing suspend in -rc4 with this patch confirmed to be applied (source does contain the changes) and confirmed to apparently be working (no errors in dmesg any more). IOW, what I'm concerned about is not a _GTF error on boot any more, but a seemingly fatally handled _GTM error on suspend. ...OK, dug some more into this, and now I managed to get it to work, and it was indeed _GTM which broke my whole suspend: Since _GTM is failing on me (and the point is, it's failing catastrophically, not "normally"!), ata_acpi_on_suspend() calling ata_acpi_gtm() fails with -EINVAL instead of -ENOENT, however ata_acpi_on_suspend() has the following correction only: if (rc == -ENOENT) rc = 0; to make sure a suspend doesn't get aborted (fatal error) when _GTM is simply empty. Changing this into if ((rc == -ENOENT) || (rc == -EINVAL)) rc = 0; to additionally account for _invalid_ _GTM execution makes my suspend (and resume!) work again on -rc4. Now the question is whether this error code correction is ok, or whether a catastrophically failing _GTM should have been truly registered on boot already (where it does gtm to fetch cable timings) to subsequently avoid doing any ATA ACPI things on suspend at all. And the second, possibly much more lucrative, question would be whether we're actually doing something wrong with our ACPI _GTM execution which triggers the AE_AML_PACKAGE_LIMIT problem. This might help here, perhaps (relevant snippets of AML dump): Device (CHN0) { Name (_ADR, 0x00) Method (_GTM, 0, NotSerialized) { Return (GTM (PMPT, PMUE, PMUT, PSPT, PSUE, PSUT)) } Method (_STM, 3, NotSerialized) { Store (Arg0, TMD0) Store (PMPT, GMPT) Store (PMUE, GMUE) Store (PMUT, GMUT) Store (PSPT, GSPT) Store (PSUE, GSUE) Store (PSUT, GSUT) STM () Store (GMPT, PMPT) Store (GMUE, PMUE) Store (GMUT, PMUT) Store (GSPT, PSPT) Store (GSUE, PSUE) Store (GSUT, PSUT) } Device (CHN1) { Name (_ADR, 0x01) Method (_GTM, 0, NotSerialized) { Return (GTM (SMPT, SMUE, SMUT, SSPT, SSUE, SSUT)) } Method (_STM, 3, NotSerialized) { Store (Arg0, TMD0) Store (SMPT, GMPT) Store (SMUE, GMUE) Store (SMUT, GMUT) Store (SSPT, GSPT) Store (SSUE, GSUE) Store (SSUT, GSUT) STM () Store (GMPT, SMPT) Store (GMUE, SMUE) Store (GMUT, SMUT) Store (GSPT, SSPT) Store (GSUE, SSUE) Store (GSUT, SSUT) } Method (GTM, 6, Serialized) { Store (Ones, PIO0) Store (Ones, PIO1) Store (Ones, DMA0) Store (Ones, DMA1) Store (0x10, CHNF) If (REGF) {} Else { Return (TMD0) } Store (Match (DerefOf (Index (TIM0, 0x01)), MEQ, Arg0, MTR, 0x00, 0x00), Local6) Store (DerefOf (Index (DerefOf (Index (TIM0, 0x00)), Local6) ), Local7) Store (Local7, DMA0) Store (Local7, PIO0) Store (Match (DerefOf (Index (TIM0, 0x01)), MEQ, Arg3, MTR, 0x00, 0x00), Local6) Store (DerefOf (Index (DerefOf (Index (TIM0, 0x00)), Local6) ), Local7) Store (Local7, DMA1) Store (Local7, PIO1) If (Arg1) { If (A133 ()) { Store (DerefOf (Index (DerefOf (Index (TIM0, 0x0D)), Arg2)), Local5) } Else { Store (DerefOf (Index (DerefOf (Index (TIM0, 0x0A)), Arg2)), Local5) } If (A133 ()) { Store (DerefOf (Index (DerefOf (Index (TIM0, 0x0C)), Local5)), DMA0) } Else { Store (DerefOf (Index (DerefOf (Index (TIM0, 0x04)), Local5)), DMA0) } Or (CHNF, 0x01, CHNF) } If (Arg4) { If (A133 ()) { Store (DerefOf (Index (DerefOf (Index (TIM0, 0x0D)), Arg5)), Local5) } Else { Store (DerefOf (Index (DerefOf (Index (TIM0, 0x0A)), Arg5)), Local5) } If (A133 ()) { Store (DerefOf (Index (DerefOf (Index (TIM0, 0x0C)), Local5)), DMA1) } Else { Store (DerefOf (Index (DerefOf (Index (TIM0, 0x04)), Local5)), DMA1) } Or (CHNF, 0x04, CHNF) } Return (TMD0) } Reminder: issue tracked at #9530. Thanks, Andreas Mohr ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-09 21:36 ` Andreas Mohr @ 2007-12-10 0:04 ` Andreas Mohr 2007-12-10 0:49 ` Andreas Mohr 0 siblings, 1 reply; 74+ messages in thread From: Andreas Mohr @ 2007-12-10 0:04 UTC (permalink / raw) To: Andreas Mohr Cc: Robert Hancock, Matthew Garrett, Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide, Tejun Heo, Len Brown, linux-acpi Hi, On Sun, Dec 09, 2007 at 10:36:42PM +0100, Andreas Mohr wrote: > And the second, possibly much more lucrative, question would be > whether we're actually doing something wrong with our ACPI _GTM execution > which triggers the AE_AML_PACKAGE_LIMIT problem. > > This might help here, perhaps (relevant snippets of AML dump): Indeed, after looking over this horrid ASL stuff for ages I'm now starting to believe that our IDE controller state is wrong, since the Match()ing etc. in this particular _GTM implementation is heavily dependant on actual PCI values (it references some PCI_Config OperationRegion:s), and some indexing seems to go wrong due to this. IOW, it seems very likely that _GTM on these BIOSes (VIA chipsets) isn't actually wrongly implemented but simply expects IDE controller values to have been set up ""differently"". Or... one could possibly even infer from this that - maybe - the _GTM invocation spot is wrong, it should be done somewhere different during bootup. Or whatever. This seems to tell me again that we're often quick to blacklist or whitelist things left and right when instead fundamental problems are hidden somewhere. Still investigating, Andreas Mohr ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-10 0:04 ` Andreas Mohr @ 2007-12-10 0:49 ` Andreas Mohr 2007-12-10 1:28 ` Robert Hancock 2007-12-10 2:20 ` Tejun Heo 0 siblings, 2 replies; 74+ messages in thread From: Andreas Mohr @ 2007-12-10 0:49 UTC (permalink / raw) To: Andreas Mohr Cc: Robert Hancock, Matthew Garrett, Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide, Tejun Heo, Len Brown, linux-acpi On Mon, Dec 10, 2007 at 01:04:31AM +0100, Andreas Mohr wrote: > IOW, it seems very likely that _GTM on these BIOSes (VIA chipsets) isn't > actually wrongly implemented but simply expects IDE controller values > to have been set up ""differently"". > > > Or... one could possibly even infer from this that - maybe - > the _GTM invocation spot is wrong, it should be done somewhere > different during bootup. Or whatever. "Whatever" indeed: There's an ASL Match() for a "PMPT" (Primary Master PorT) PCI register, and the possible register values are: Package (0x04) { 0x20, 0x31, 0x65, 0xA8 }, and from OperationRegion (CFG2, PCI_Config, 0x40, 0x20) Field (CFG2, DWordAcc, NoLock, Preserve) { Offset (0x08),· SSPT, 8,· SMPT, 8,· PSPT, 8,· PMPT, 8,· Offset (0x10),· ... we can infer that at PCI_Config offset 0x48 those values should be located. However after bootup or resume there are: # lspci -s 00:11.1 -xxx 00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) 00: 06 11 71 05 07 00 90 02 06 8a 01 01 00 20 00 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 01 e4 00 00 00 00 00 00 00 00 00 00 06 11 71 05 30: 00 00 00 00 c0 00 00 00 00 00 00 00 ff 01 00 00 40: 0b 32 09 0a 18 1c c0 00 99 99 20 20 ff 00 a8 20 50: 07 07 f6 f1 14 03 00 00 a8 a8 a8 a8 00 00 00 00 60: 00 02 00 00 00 00 00 00 00 02 00 00 00 00 00 00 70: 02 01 00 00 00 00 00 00 82 01 00 00 00 00 00 00 80: 00 e0 a1 1f 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 06 00 71 05 06 11 71 05 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 07 00 00 00 00 00 00 00 00 00 As one can see, the relevant values for SSPT, SMPT, PSPT and PMPT are 99 99 20 20, which are not quite entirely valid judging from the array above, and this is because the secondary port is unused, as can also be seen from my bootup log: scsi0 : pata_via scsi1 : pata_via ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xe400 irq 14 ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xe408 irq 15 ata1.00: ATA-5: WDC WD1200JB-00CRA1, 17.07W17, max UDMA/100 ata1.00: 234441648 sectors, multi 16: LBA ata1.01: ATAPI: TOSHIBA DVD-ROM SD-M1612, 1004, max UDMA/33 Switched to high resolution mode on CPU 0 ata1.00: configured for UDMA/100 ata1.01: configured for UDMA/33 ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTM_] (Node df80b9a8), AE_AML_PACKAGE_LIM IT ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN1._GTM] (Node df80b8d0), AE_AML_PACKAG E_LIMIT ata2: ACPI get timing mode failed (AE 0x300d) Manually tweaking the values to 20 20 20 20 truly does skip the _GTM failure message on suspend - only to reappear right on resume due to 99 99 20 20 combo happening again. If I don't tweak, I get _GTM failure at both suspend and resume. As such one can conclude that this BIOS is rather very confused when being called for _GTM on an entirely unused controller port. And this is either because the BIOS is dumb or because ACPI doesn't really expect anyone to call _GTM on an unused physical port. I'd bet on the latter... (however I haven't found ACPI 3.0b explicitly mentioning this somewhere yet) Andreas Mohr ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-10 0:49 ` Andreas Mohr @ 2007-12-10 1:28 ` Robert Hancock 2007-12-10 2:25 ` Tejun Heo 2007-12-10 2:20 ` Tejun Heo 1 sibling, 1 reply; 74+ messages in thread From: Robert Hancock @ 2007-12-10 1:28 UTC (permalink / raw) To: Andreas Mohr Cc: Matthew Garrett, Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide, Tejun Heo, Len Brown, linux-acpi Andreas Mohr wrote: > On Mon, Dec 10, 2007 at 01:04:31AM +0100, Andreas Mohr wrote: >> IOW, it seems very likely that _GTM on these BIOSes (VIA chipsets) isn't >> actually wrongly implemented but simply expects IDE controller values >> to have been set up ""differently"". >> >> >> Or... one could possibly even infer from this that - maybe - >> the _GTM invocation spot is wrong, it should be done somewhere >> different during bootup. Or whatever. > > "Whatever" indeed: > > There's an ASL Match() for a "PMPT" (Primary Master PorT) PCI register, > and the possible register values are: > > Package (0x04) > { > 0x20, > 0x31, > 0x65, > 0xA8 > }, > > and from > > OperationRegion (CFG2, PCI_Config, 0x40, 0x20) > Field (CFG2, DWordAcc, NoLock, Preserve) > { > Offset (0x08),· > SSPT, 8,· > SMPT, 8,· > PSPT, 8,· > PMPT, 8,· > Offset (0x10),· > ... > we can infer that at PCI_Config offset 0x48 those values should be located. > However after bootup or resume there are: > > # lspci -s 00:11.1 -xxx > 00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) > 00: 06 11 71 05 07 00 90 02 06 8a 01 01 00 20 00 00 > 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 20: 01 e4 00 00 00 00 00 00 00 00 00 00 06 11 71 05 > 30: 00 00 00 00 c0 00 00 00 00 00 00 00 ff 01 00 00 > 40: 0b 32 09 0a 18 1c c0 00 99 99 20 20 ff 00 a8 20 > 50: 07 07 f6 f1 14 03 00 00 a8 a8 a8 a8 00 00 00 00 > 60: 00 02 00 00 00 00 00 00 00 02 00 00 00 00 00 00 > 70: 02 01 00 00 00 00 00 00 82 01 00 00 00 00 00 00 > 80: 00 e0 a1 1f 00 00 00 00 00 00 00 00 00 00 00 00 > 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 > d0: 06 00 71 05 06 11 71 05 00 00 00 00 00 00 00 00 > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > f0: 00 00 00 00 00 00 07 00 00 00 00 00 00 00 00 00 > > > As one can see, the relevant values for SSPT, SMPT, PSPT and PMPT are > 99 99 20 20, which are not quite entirely valid judging from the array above, > and this is because the secondary port is unused, as can also be seen > from my bootup log: > > scsi0 : pata_via > scsi1 : pata_via > ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xe400 irq 14 > ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xe408 irq 15 > ata1.00: ATA-5: WDC WD1200JB-00CRA1, 17.07W17, max UDMA/100 > ata1.00: 234441648 sectors, multi 16: LBA > ata1.01: ATAPI: TOSHIBA DVD-ROM SD-M1612, 1004, max UDMA/33 > Switched to high resolution mode on CPU 0 > ata1.00: configured for UDMA/100 > ata1.01: configured for UDMA/33 > ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126] > ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTM_] (Node df80b9a8), AE_AML_PACKAGE_LIM > IT > ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN1._GTM] (Node df80b8d0), AE_AML_PACKAG > E_LIMIT > ata2: ACPI get timing mode failed (AE 0x300d) > > > Manually tweaking the values to 20 20 20 20 truly does skip the _GTM failure message on suspend - > only to reappear right on resume due to 99 99 20 20 combo happening again. > If I don't tweak, I get _GTM failure at both suspend and resume. > > > As such one can conclude that this BIOS is rather very confused when being called for _GTM on an entirely > unused controller port. And this is either because the BIOS is dumb or because ACPI doesn't really > expect anyone to call _GTM on an unused physical port. I'd bet on the latter... > (however I haven't found ACPI 3.0b explicitly mentioning this somewhere yet) > > Andreas Mohr > Probably Windows doesn't call _GTM on a port with no devices connected, and so the BIOS people never tested that case. Likely we can just avoid doing this - if no devices are connected the timing settings for that channel are irrelevant.. And you're quite right in your comment that we are often too quick to blacklist hardware instead of looking into why it really is failing. ACPI is one of those areas where we often just need to figure out how to be bug-to-bug compatibile with what Windows is doing.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-10 1:28 ` Robert Hancock @ 2007-12-10 2:25 ` Tejun Heo 2007-12-10 3:20 ` Robert Hancock 0 siblings, 1 reply; 74+ messages in thread From: Tejun Heo @ 2007-12-10 2:25 UTC (permalink / raw) To: Robert Hancock Cc: Andreas Mohr, Matthew Garrett, Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide, Len Brown, linux-acpi Robert Hancock wrote: > And you're quite right in your comment that we are often too quick to > blacklist hardware instead of looking into why it really is failing. > ACPI is one of those areas where we often just need to figure out how to > be bug-to-bug compatibile with what Windows is doing.. In the spirit of not blacklisting without looking deep into ACPI code, can somebody familiar with ASL take a look at comment 11 of bug 9320? http://bugzilla.kernel.org/show_bug.cgi?id=9320#c11 This is libata calling _GTM to find out how the BIOS configured the device to determine cable type. Thanks. -- tejun ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-10 2:25 ` Tejun Heo @ 2007-12-10 3:20 ` Robert Hancock 0 siblings, 0 replies; 74+ messages in thread From: Robert Hancock @ 2007-12-10 3:20 UTC (permalink / raw) To: Tejun Heo Cc: Andreas Mohr, Matthew Garrett, Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide, Len Brown, linux-acpi Tejun Heo wrote: > Robert Hancock wrote: >> And you're quite right in your comment that we are often too quick to >> blacklist hardware instead of looking into why it really is failing. >> ACPI is one of those areas where we often just need to figure out how to >> be bug-to-bug compatibile with what Windows is doing.. > > In the spirit of not blacklisting without looking deep into ACPI code, > can somebody familiar with ASL take a look at comment 11 of bug 9320? > > http://bugzilla.kernel.org/show_bug.cgi?id=9320#c11 > > This is libata calling _GTM to find out how the BIOS configured the > device to determine cable type. > > Thanks. I suspect it's somewhat similar (though perhaps a different cause), the code is trying to lookup a value (presumably register contents) in a table using Match, gets a value that's not in the table (which makes Match return the ONES value FFFFFFFF meaning not found) and so the lookup of the corresponding output value with that index fails. We'd need the full ASL dump to know exactly what's going on there. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-10 0:49 ` Andreas Mohr 2007-12-10 1:28 ` Robert Hancock @ 2007-12-10 2:20 ` Tejun Heo 1 sibling, 0 replies; 74+ messages in thread From: Tejun Heo @ 2007-12-10 2:20 UTC (permalink / raw) To: Andreas Mohr Cc: Robert Hancock, Matthew Garrett, Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide, Len Brown, linux-acpi Andreas Mohr wrote: > As such one can conclude that this BIOS is rather very confused when being called for _GTM on an entirely > unused controller port. And this is either because the BIOS is dumb or because ACPI doesn't really > expect anyone to call _GTM on an unused physical port. I'd bet on the latter... > (however I haven't found ACPI 3.0b explicitly mentioning this somewhere yet) Thanks a lot for finding this out. One of the two reports in bug 9320 seems to be the same problem although the other doesn't seem to be. So, it seems we'll have to check that both primary and secondary slots are empty and skip _GTM if so. :-( Also, right, there's no need to fail suspend on _GTM failure whatever the error is. That was me being anal again. Will incorporate both into the ACPI fixes patchset. Thanks. -- tejun ^ permalink raw reply [flat|nested] 74+ messages in thread
* 2.6.24-rc4-git5: Reported regressions from 2.6.23 @ 2007-12-08 2:40 Rafael J. Wysocki 2007-12-08 6:53 ` Fabio Comolli ` (8 more replies) 0 siblings, 9 replies; 74+ messages in thread From: Rafael J. Wysocki @ 2007-12-08 2:40 UTC (permalink / raw) To: LKML; +Cc: Andrew Morton, Linus Torvalds, Ingo Molnar This message contains a list of some regressions from 2.6.23 which have been reported since 2.6.24-rc1 was released and for which there are no fixes in the mainline that I know of. If any of them have been fixed already, please let me know. If you know of any other unresolved regressions from 2.6.23, please let me know either and I'll add them to the list. Subject : EHCI causes system to resume instantly from S4 Submitter : Maxim Levitsky <maximlevitsky@gmail.com> References : http://lkml.org/lkml/2007/10/27/66 http://bugzilla.kernel.org/show_bug.cgi?id=9258 Handled-By : "Rafael J. Wysocki" <rjw@sisk.pl> David Brownell <david-b@pacbell.net> Alan Stern <stern@rowland.harvard.edu> Patch : Note: : the problem appears to heavily depend on hardware Subject : leds: ledtrig-timer calls sleeping function from invalid context Submitter : Márton Németh <nm127@freemail.hu> References : http://bugzilla.kernel.org/show_bug.cgi?id=9264 Handled-By : Richard Purdie <rpurdie@rpsys.net> Patch : http://bugzilla.kernel.org/attachment.cgi?id=13493&action=view Subject : PATA scan: ACPI Exception AE_AML_PACKAGE_LIMIT... is beyond end of object Submitter : Hans de Bruin <bruinjm@xs4all.nl> References : http://bugzilla.kernel.org/show_bug.cgi?id=9320 Handled-By : Robert Moore <Robert.Moore@intel.com> Tejun Heo <htejun@gmail.com> Fu Michael <michael.fu@intel.com> Patch : Subject : snd_hda_intel 2.6.24-rc2 bug: interrupts don't always work on Lenovo X60s Submitter : Roland Dreier <rdreier@cisco.com> References : http://lkml.org/lkml/2007/11/8/255 http://bugzilla.kernel.org/show_bug.cgi?id=9332 Handled-By : Patch : Subject : system hangs after a few minutes Submitter : Marcus Better <marcus@better.se> References : http://bugzilla.kernel.org/show_bug.cgi?id=9335 Handled-By : Andrew Morton <akpm@linux-foundation.org> Alan Stern <stern@rowland.harvard.edu> Patch : http://bugzilla.kernel.org/attachment.cgi?id=13871&action=view Subject : cd/dvd inaccessible in 2.6.24-rc2 Submitter : Will Trives <will@trivescon.com.au> References : http://lkml.org/lkml/2007/11/9/290 http://bugzilla.kernel.org/show_bug.cgi?id=9346 Handled-By : Len Brown <lenb@kernel.org> Tejun Heo <htejun@gmail.com> Patch : Subject : The keyboard doesn't work Submitter : Francois Valenduc <francois.valenduc@skynet.be> References : http://bugzilla.kernel.org/show_bug.cgi?id=9362 Handled-By : Dmitry Torokhov <dtor@insightbb.com> Ingo Molnar <mingo@elte.hu> Alexey Starikovskiy <astarikovskiy@suse.de> Patch : http://bugzilla.kernel.org/attachment.cgi?id=13892&action=view http://bugzilla.kernel.org/attachment.cgi?id=13893&action=view http://bugzilla.kernel.org/attachment.cgi?id=13907&action=view Note : patches to apply in this order, top-down Subject : v2.6.24-rc2-409-g9418d5d: attempt to access beyond end of device Submitter : Thomas Meyer <thomas@m3y3r.de> References : http://lkml.org/lkml/2007/11/13/250 http://bugzilla.kernel.org/show_bug.cgi?id=9370 Handled-By : Matthew Wilcox <matthew@wil.cx> Patch : Subject : SError: { DevExch } occuring and causing disruption Submitter : Avuton Olrich <avuton@gmail.com> References : http://bugzilla.kernel.org/show_bug.cgi?id=9393 Handled-By : Tejun Heo <htejun@gmail.com> Mark Lord <mlord@pobox.com> Patch : Subject : nfsd gets stuck when underlying filesystem is XFS Submitter : Christian Kujau <lists@nerdbynature.de> Chris Wedgwood <cw@f00f.org> References : http://bugzilla.kernel.org/show_bug.cgi?id=9400 Handled-By : "J. Bruce Fields" <bfields@fieldses.org> Christoph Hellwig <hch@infradead.org> Patch : http://lkml.org/lkml/2007/11/25/39 Subject : 2.6.24-rc3: find complains about /proc/net Submitter : Pavel Machek <pavel@ucw.cz> References : http://lkml.org/lkml/2007/11/19/253 http://bugzilla.kernel.org/show_bug.cgi?id=9411 Handled-By : "Eric W. Biederman" <ebiederm@xmission.com> Patch : Note : the existing fix needs fixing Subject : Not work light of button-led with module b43 in chipset broadcom 4318 Submitter : Cristian Aravena Romero <caravena@gmail.com> References : http://bugzilla.kernel.org/show_bug.cgi?id=9414 Handled-By : Patch : Subject : unable to turn cooling device 'off' - LG LE50 Express laptop Submitter : Marcus Better <marcus@better.se> References : http://bugzilla.kernel.org/show_bug.cgi?id=9432 Handled-By : Len Brown <lenb@kernel.org> Alexey Starikovskiy <astarikovskiy@suse.de> Patch : Subject : 2.6.24-rc3 can't see sd partitions on Alpha Submitter : Bob Tracy <rct@gherkin.frus.com> References : http://lkml.org/lkml/2007/11/18/3 http://bugzilla.kernel.org/show_bug.cgi?id=9457 Handled-By : Andrew Morton <akpm@linux-foundation.org> Kay Sievers <kay.sievers@vrfy.org> Ingo Molnar <mingo@elte.hu> Patch : Subject : 2.6.24-rc3-git2 softlockup detected Submitter : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> References : http://lkml.org/lkml/2007/11/28/16 http://bugzilla.kernel.org/show_bug.cgi?id=9472 Handled-By : Andrew Morton <akpm@linux-foundation.org> Ingo Molnar <mingo@elte.hu> Patch : Subject : jiffies counter leaps in 2.6.24-rc3 Submitter : Stefano Brivio <stefano.brivio@polimi.it> References : http://lkml.org/lkml/2007/11/24/53 http://bugzilla.kernel.org/show_bug.cgi?id=9475 Handled-By : Ingo Molnar <mingo@elte.hu> Patch : http://lkml.org/lkml/2007/12/7/132 Subject : kernel GPF in 2.6.24 (g09f345da) Submitter : Jon Nelson <jnelson-kernel-bugzilla@jamponi.net> References : http://bugzilla.kernel.org/show_bug.cgi?id=9482 Handled-By : Andrew Morton <akpm@linux-foundation.org> Patch : Subject : 20000+ wake-ups/second in 2.6.24 Submitter : Mark Lord <lkml@rtr.ca> References : http://lkml.org/lkml/2007/12/1/141 http://bugzilla.kernel.org/show_bug.cgi?id=9489 Handled-By : Arjan van de Ven <arjan@infradead.org> Patch : Subject : 2.6.24: false double-clicks from USB mouse Submitter : Mark Lord <lkml@rtr.ca> References : http://lkml.org/lkml/2007/12/2/86 http://bugzilla.kernel.org/show_bug.cgi?id=9492 Handled-By : Jiri Kosina <jkosina@suse.cz> Patch : Subject : Battery shows up twice in kpowersave Submitter : Rolf Eike Beer <eike-kernel@sf-tec.de> References : http://bugzilla.kernel.org/show_bug.cgi?id=9494 Handled-By : Alexey Starikovskiy <astarikovskiy@suse.de> Patch : Subject : kobject ->k_name memory leak Submitter : Alexey Dobriyan <adobriyan@sw.ru> References : http://lkml.org/lkml/2007/12/3/20 http://bugzilla.kernel.org/show_bug.cgi?id=9496 Handled-By : Greg KH <gregkh@suse.de> Patch : Subject : tipc_init(), WARNING: at arch/x86/mm/highmem_32.c:52 kmap_atomic_prot() Submitter : Ingo Molnar <mingo@elte.hu> References : http://lkml.org/lkml/2007/11/29/157 http://bugzilla.kernel.org/show_bug.cgi?id=9497 Handled-By : Matt Mackall <mpm@selenic.com> Patch : http://lkml.org/lkml/2007/11/29/387 Subject : Regression - 2.6.24-rc3 - umem nvram card driver oops Submitter : David Chinner <dgc@sgi.com> References : http://lkml.org/lkml/2007/12/3/216 http://bugzilla.kernel.org/show_bug.cgi?id=9498 Handled-By : Neil Brown <neilb@suse.de> Patch : http://lkml.org/lkml/2007/12/3/266 Subject : PS3: trouble with SPARSEMEM_VMEMMAP and kexec Submitter : Geoff Levand <geoffrey.levand@am.sony.com> References : http://lkml.org/lkml/2007/12/3/137 http://bugzilla.kernel.org/show_bug.cgi?id=9499 Handled-By : Milton Miller <miltonm@bga.com> Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Yasunori Goto <y-goto@jp.fujitsu.com> Patch : Subject : binfmt_misc file system is empty Submitter : Marcus Better <marcus@better.se> References : http://bugzilla.kernel.org/show_bug.cgi?id=9504 Handled-By : Denis V. Lunev <den@openvz.org> Patch : Subject : 2.6.24-rc4 hwmon it87 probe fails Submitter : Mike Houston <mikeserv@bmts.com> References : http://lkml.org/lkml/2007/12/4/466 http://bugzilla.kernel.org/show_bug.cgi?id=9514 Handled-By : Patch : Subject : Major regression on hackbench with SLUB Submitter : Steven Rostedt <rostedt@goodmis.org> References : http://lkml.org/lkml/2007/12/7/181 http://bugzilla.kernel.org/show_bug.cgi?id=9521 Handled-By : Linus Torvalds <torvalds@linux-foundation.org> Patch : Subject : 2.6.24-rc3-git4 NFS crossmnt regression Submitter : Shane <gnome42@gmail.com> References : http://lkml.org/lkml/2007/12/6/410 http://bugzilla.kernel.org/show_bug.cgi?id=9522 Handled-By : "Trond Myklebust" <trond.myklebust@fys.uio.no> Patch : http://bugzilla.kernel.org/attachment.cgi?id=13908&action=view Subject : soft lockup - CPU#1 stuck for 15s! [swapper:0] Submitter : "Parag Warudkar" <parag.warudkar@gmail.com> References : http://lkml.org/lkml/2007/12/7/299 http://bugzilla.kernel.org/show_bug.cgi?id=9525 Handled-By : "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com> Patch : For details, please follow the links given in references. As you can see, there is a Bugzilla entry for each of the listed regressions. There also is a Bugzilla entry used for tracking the regressions from 2.6.23, unresolved as well as resolved, at: http://bugzilla.kernel.org/show_bug.cgi?id=9243 Please let me know if there are any Bugzilla entries that should be added to the list in there. Thanks, Rafael ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 2:40 Rafael J. Wysocki @ 2007-12-08 6:53 ` Fabio Comolli 2007-12-08 8:28 ` Ingo Molnar 2007-12-08 9:29 ` Andrew Morton ` (7 subsequent siblings) 8 siblings, 1 reply; 74+ messages in thread From: Fabio Comolli @ 2007-12-08 6:53 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: LKML, Andrew Morton, Linus Torvalds, Ingo Molnar Hi. On Dec 8, 2007 3:40 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote: > This message contains a list of some regressions from 2.6.23 which have been > reported since 2.6.24-rc1 was released and for which there are no fixes in the > mainline that I know of. If any of them have been fixed already, please let me > know. > > If you know of any other unresolved regressions from 2.6.23, please let me know > either and I'll add them to the list. <snip> > Subject : Battery shows up twice in kpowersave > Submitter : Rolf Eike Beer <eike-kernel@sf-tec.de> > References : http://bugzilla.kernel.org/show_bug.cgi?id=9494 > Handled-By : Alexey Starikovskiy <astarikovskiy@suse.de> > Patch : > I don't think that this is a regression: I reported on RedHat bugzilla when I switched from F7 to F8 and I was using 2.6.23.8 at that time. It looks to me an HAL regression, but of course I may be wrong :-) as the reported bisected to a bad commit. https://bugzilla.redhat.com/show_bug.cgi?id=373041 By the way, I now switched to Fedrora Rawhide with a 2.6.24-rc4-git5 custom kernel and Gnome desktop and the problem is still present, even with gnome-power-manager. Hope this helps. Regards, Fabio ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 6:53 ` Fabio Comolli @ 2007-12-08 8:28 ` Ingo Molnar 2007-12-08 9:23 ` Andrew Morton 0 siblings, 1 reply; 74+ messages in thread From: Ingo Molnar @ 2007-12-08 8:28 UTC (permalink / raw) To: Fabio Comolli Cc: Rafael J. Wysocki, LKML, Andrew Morton, Linus Torvalds, Greg KH, Len Brown * Fabio Comolli <fabio.comolli@gmail.com> wrote: > <snip> > > > Subject : Battery shows up twice in kpowersave > > Submitter : Rolf Eike Beer <eike-kernel@sf-tec.de> > > References : http://bugzilla.kernel.org/show_bug.cgi?id=9494 > > Handled-By : Alexey Starikovskiy <astarikovskiy@suse.de> > > Patch : > > > > I don't think that this is a regression: I reported on RedHat bugzilla > when I switched from F7 to F8 and I was using 2.6.23.8 at that time. > It looks to me an HAL regression, but of course I may be wrong :-) as > the reported bisected to a bad commit. > > https://bugzilla.redhat.com/show_bug.cgi?id=373041 > > By the way, I now switched to Fedrora Rawhide with a 2.6.24-rc4-git5 > custom kernel and Gnome desktop and the problem is still present, even > with gnome-power-manager. to me this looks like an ABI regression - utilities should work without change. Something changed in /sys output that caused HAL to think that there are two batteries: | The output of lshal shows that there are two UDI's with | info.capabilities = { 'battery' }: | | udi = '/org/freedesktop/Hal/devices/acpi_BAT0' | udi = '/org/freedesktop/Hal/devices/computer_power_supply_0' whether it's a HAL bug or a kernel bug, the original state should be restored and it should be worked out without breaking users of older HAL versions. grumble: way too many times do various system utilities break when i upgrade the kernel on my laptop. Maybe a new debug mechanism: we should start fingerprinting the exact /sys and /proc output and enforce that it's immutable across kernel releases as long as the hardware is unmodified? Ingo ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 8:28 ` Ingo Molnar @ 2007-12-08 9:23 ` Andrew Morton 2007-12-08 22:11 ` Rafael J. Wysocki 0 siblings, 1 reply; 74+ messages in thread From: Andrew Morton @ 2007-12-08 9:23 UTC (permalink / raw) To: Ingo Molnar Cc: Fabio Comolli, Rafael J. Wysocki, LKML, Linus Torvalds, Greg KH, Len Brown, Alexey Starikovskiy On Sat, 8 Dec 2007 09:28:15 +0100 Ingo Molnar <mingo@elte.hu> wrote: > > * Fabio Comolli <fabio.comolli@gmail.com> wrote: > > > <snip> > > > > > Subject : Battery shows up twice in kpowersave > > > Submitter : Rolf Eike Beer <eike-kernel@sf-tec.de> > > > References : http://bugzilla.kernel.org/show_bug.cgi?id=9494 > > > Handled-By : Alexey Starikovskiy <astarikovskiy@suse.de> > > > Patch : > > > > > > > I don't think that this is a regression: I reported on RedHat bugzilla > > when I switched from F7 to F8 and I was using 2.6.23.8 at that time. > > It looks to me an HAL regression, but of course I may be wrong :-) as > > the reported bisected to a bad commit. > > > > https://bugzilla.redhat.com/show_bug.cgi?id=373041 > > > > By the way, I now switched to Fedrora Rawhide with a 2.6.24-rc4-git5 > > custom kernel and Gnome desktop and the problem is still present, even > > with gnome-power-manager. > > to me this looks like an ABI regression - utilities should work without > change. Something changed in /sys output that caused HAL to think that > there are two batteries: Yep. Although HAL is of course a most special case of "userspace". > | The output of lshal shows that there are two UDI's with > | info.capabilities = { 'battery' }: > | > | udi = '/org/freedesktop/Hal/devices/acpi_BAT0' > | udi = '/org/freedesktop/Hal/devices/computer_power_supply_0' > > whether it's a HAL bug or a kernel bug, the original state should be > restored and it should be worked out without breaking users of older HAL > versions. "breaking users of older HAL versions" == "breaking machines". The patch should be reverted. Do we know which one it was? > grumble: way too many times do various system utilities break when i > upgrade the kernel on my laptop. Maybe a new debug mechanism: we should > start fingerprinting the exact /sys and /proc output and enforce that > it's immutable across kernel releases as long as the hardware is > unmodified? That would be neat. It would need to be executed on a lot of different machines. I wonder if there's something sneaky we can do here. Install the script in /lib/modules/$(uname -r) and then run it from the kernel when the fork count reaches 1000 ;) (hey, I've seen worse: /proc files which start with #!/bin/sh) ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 9:23 ` Andrew Morton @ 2007-12-08 22:11 ` Rafael J. Wysocki 0 siblings, 0 replies; 74+ messages in thread From: Rafael J. Wysocki @ 2007-12-08 22:11 UTC (permalink / raw) To: Andrew Morton Cc: Ingo Molnar, Fabio Comolli, LKML, Linus Torvalds, Greg KH, Len Brown, Alexey Starikovskiy On Saturday, 8 of December 2007, Andrew Morton wrote: > On Sat, 8 Dec 2007 09:28:15 +0100 Ingo Molnar <mingo@elte.hu> wrote: > > > > > * Fabio Comolli <fabio.comolli@gmail.com> wrote: > > > > > <snip> > > > > > > > Subject : Battery shows up twice in kpowersave > > > > Submitter : Rolf Eike Beer <eike-kernel@sf-tec.de> > > > > References : http://bugzilla.kernel.org/show_bug.cgi?id=9494 > > > > Handled-By : Alexey Starikovskiy <astarikovskiy@suse.de> > > > > Patch : > > > > > > > > > > I don't think that this is a regression: I reported on RedHat bugzilla > > > when I switched from F7 to F8 and I was using 2.6.23.8 at that time. > > > It looks to me an HAL regression, but of course I may be wrong :-) as > > > the reported bisected to a bad commit. > > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=373041 > > > > > > By the way, I now switched to Fedrora Rawhide with a 2.6.24-rc4-git5 > > > custom kernel and Gnome desktop and the problem is still present, even > > > with gnome-power-manager. > > > > to me this looks like an ABI regression - utilities should work without > > change. Something changed in /sys output that caused HAL to think that > > there are two batteries: > > Yep. Although HAL is of course a most special case of "userspace". > > > | The output of lshal shows that there are two UDI's with > > | info.capabilities = { 'battery' }: > > | > > | udi = '/org/freedesktop/Hal/devices/acpi_BAT0' > > | udi = '/org/freedesktop/Hal/devices/computer_power_supply_0' > > > > whether it's a HAL bug or a kernel bug, the original state should be > > restored and it should be worked out without breaking users of older HAL > > versions. > > "breaking users of older HAL versions" == "breaking machines". > > The patch should be reverted. Do we know which one it was? > > > grumble: way too many times do various system utilities break when i > > upgrade the kernel on my laptop. Maybe a new debug mechanism: we should > > start fingerprinting the exact /sys and /proc output and enforce that > > it's immutable across kernel releases as long as the hardware is > > unmodified? > > That would be neat. It would need to be executed on a lot of different > machines. Hm, that wouldn't allow us to add new attributes ... ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 2:40 Rafael J. Wysocki 2007-12-08 6:53 ` Fabio Comolli @ 2007-12-08 9:29 ` Andrew Morton 2007-12-08 22:17 ` Rafael J. Wysocki 2007-12-08 9:36 ` Andrew Morton ` (6 subsequent siblings) 8 siblings, 1 reply; 74+ messages in thread From: Andrew Morton @ 2007-12-08 9:29 UTC (permalink / raw) To: Rafael J. Wysocki Cc: LKML, Linus Torvalds, Ingo Molnar, Márton Németh On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > This message contains a list of some regressions from 2.6.23 which have been > reported since 2.6.24-rc1 was released and for which there are no fixes in the > mainline that I know of. If any of them have been fixed already, please let me > know. > > If you know of any other unresolved regressions from 2.6.23, please let me know > either and I'll add them to the list. Twenty nine, huh? It would be useful if these records were sorted in date-of-reportage order and had a date stamp so we could see how long they've been hanging about. Something to think about for the post-2.6.24 regression if you'll be handling those? > Subject : leds: ledtrig-timer calls sleeping function from invalid context > Submitter : Márton Németh <nm127@freemail.hu> > References : http://bugzilla.kernel.org/show_bug.cgi?id=9264 > Handled-By : Richard Purdie <rpurdie@rpsys.net> > Patch : http://bugzilla.kernel.org/attachment.cgi?id=13493&action=view That patch has been merged (dc47206e552c0850ad11f7e9a1fca0a3c92f5d65) and assuming Márton has tested the latest git snapshot (ftp://ftp.kernel.org/pub/linux/kernel/v2.6/snapshots) successfully we can cross it off? ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 9:29 ` Andrew Morton @ 2007-12-08 22:17 ` Rafael J. Wysocki 0 siblings, 0 replies; 74+ messages in thread From: Rafael J. Wysocki @ 2007-12-08 22:17 UTC (permalink / raw) To: Andrew Morton; +Cc: LKML, Linus Torvalds, Ingo Molnar, Márton Németh On Saturday, 8 of December 2007, Andrew Morton wrote: > On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > This message contains a list of some regressions from 2.6.23 which have been > > reported since 2.6.24-rc1 was released and for which there are no fixes in the > > mainline that I know of. If any of them have been fixed already, please let me > > know. > > > > If you know of any other unresolved regressions from 2.6.23, please let me know > > either and I'll add them to the list. > > Twenty nine, huh? > > It would be useful if these records were sorted in date-of-reportage order > and had a date stamp so we could see how long they've been hanging about. They are sorted by the bugzilla number which reflects the date-of-reportage order pretty well. For a techincal reason, it's easier to me if they're sorted like this. Adding date stamps should be easy, tough, I'll try to add them to the next report. > Something to think about for the post-2.6.24 regression if you'll be handling > those? Yes, I'm going to handle the post-2.6.24 regressions too (in the hope there will be less of them ;-)). > > Subject : leds: ledtrig-timer calls sleeping function from invalid context > > Submitter : Márton Németh <nm127@freemail.hu> > > References : http://bugzilla.kernel.org/show_bug.cgi?id=9264 > > Handled-By : Richard Purdie <rpurdie@rpsys.net> > > Patch : http://bugzilla.kernel.org/attachment.cgi?id=13493&action=view > > That patch has been merged (dc47206e552c0850ad11f7e9a1fca0a3c92f5d65) and > assuming Márton has tested the latest git snapshot > (ftp://ftp.kernel.org/pub/linux/kernel/v2.6/snapshots) successfully we can > cross it off? Yes, will drop. Thanks, Rafael ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 2:40 Rafael J. Wysocki 2007-12-08 6:53 ` Fabio Comolli 2007-12-08 9:29 ` Andrew Morton @ 2007-12-08 9:36 ` Andrew Morton 2007-12-08 10:12 ` Andreas Mohr 2007-12-09 6:52 ` Tejun Heo 2007-12-08 9:42 ` Andrew Morton ` (5 subsequent siblings) 8 siblings, 2 replies; 74+ messages in thread From: Andrew Morton @ 2007-12-08 9:36 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: LKML, Linus Torvalds, Ingo Molnar, linux-ide, Tejun Heo On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > This message contains a list of some regressions from 2.6.23 which have been > reported since 2.6.24-rc1 was released and for which there are no fixes in the > mainline that I know of. If any of them have been fixed already, please let me > know. > > If you know of any other unresolved regressions from 2.6.23, please let me know > either and I'll add them to the list. > > > Subject : PATA scan: ACPI Exception AE_AML_PACKAGE_LIMIT... is beyond end of object > Submitter : Hans de Bruin <bruinjm@xs4all.nl> > References : http://bugzilla.kernel.org/show_bug.cgi?id=9320 > Handled-By : Robert Moore <Robert.Moore@intel.com> > Tejun Heo <htejun@gmail.com> > Fu Michael <michael.fu@intel.com> > Patch : > A number of other people are seeing the same thing and Tejun is putting in a blacklist of machines which cannot use libata+acpi. That patch is not yet in any git tree which I pull. AFACIT the machines kepe working OK - there's just some nasty dmesg spew. If any machines _are_ breaking then this could cause real problems and I'd prefer that we either go for a whitelist or arrange to detect the condition and fall back to non-acpi ata. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 9:36 ` Andrew Morton @ 2007-12-08 10:12 ` Andreas Mohr 2007-12-08 10:20 ` Andrew Morton 2007-12-09 6:52 ` Tejun Heo 1 sibling, 1 reply; 74+ messages in thread From: Andreas Mohr @ 2007-12-08 10:12 UTC (permalink / raw) To: Andrew Morton Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide, Tejun Heo Hi, On Sat, Dec 08, 2007 at 01:36:31AM -0800, Andrew Morton wrote: > > Subject : PATA scan: ACPI Exception AE_AML_PACKAGE_LIMIT... is beyond end of object > > Submitter : Hans de Bruin <bruinjm@xs4all.nl> > > References : http://bugzilla.kernel.org/show_bug.cgi?id=9320 > > Handled-By : Robert Moore <Robert.Moore@intel.com> > > Tejun Heo <htejun@gmail.com> > > Fu Michael <michael.fu@intel.com> > > Patch : > > > > A number of other people are seeing the same thing and Tejun is putting in > a blacklist of machines which cannot use libata+acpi. That patch is not > yet in any git tree which I pull. > > AFACIT the machines kepe working OK - there's just some nasty dmesg spew. > > If any machines _are_ breaking then this could cause real problems and I'd > prefer that we either go for a whitelist or arrange to detect the condition > and fall back to non-acpi ata. Does this report now win me the lucky draw, pretty please? ;) STD regression rc1 -> rc234, suspend fails completely, recovering is pretty much useless since HDD is DEAD from this point on anyway. Managed to capture -rc2 suspend logging via still-alive ssh session. 2.6.24-rc1 suspend/resume log, successful (well, a couple seconds delay, most likely due to well-recovered AML failure): swsusp: Marking nosave pages: 000000000009f000 - 0000000000100000 swsusp: Basic memory bitmaps created Syncing filesystems ... done. Freezing user space processes ... (elapsed 0.00 seconds) done. Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done. Shrinking memory... done (0 pages freed) Freed 0 kbytes in 0.02 seconds (0.00 MB/s) Suspending console(s) hub 4-0:1.0: hub_suspend usb usb4: bus suspend ehci_hcd 0000:00:10.3: suspend root hub hub 3-0:1.0: hub_suspend usb usb3: bus suspend usb usb3: suspend_rh hub 2-0:1.0: hub_suspend usb usb2: bus suspend usb usb2: suspend_rh hub 1-0:1.0: hub_suspend usb usb1: bus suspend usb usb1: suspend_rh sd 0:0:0:0: [sda] Synchronizing SCSI cache parport_pc 00:09: disabled serial 00:08: disabled serial 00:07: disabled ACPI: PCI interrupt for device 0000:00:11.5 disabled ACPI handle has no context! ACPI: PCI interrupt for device 0000:00:11.1 disabled ACPI: PCI interrupt for device 0000:00:10.3 disabled ehci_hcd 0000:00:10.3: --> PCI D3/wakeup uhci_hcd 0000:00:10.2: uhci_suspend ACPI: PCI interrupt for device 0000:00:10.2 disabled uhci_hcd 0000:00:10.2: --> PCI D3 uhci_hcd 0000:00:10.1: uhci_suspend ACPI: PCI interrupt for device 0000:00:10.1 disabled uhci_hcd 0000:00:10.1: --> PCI D3 uhci_hcd 0000:00:10.0: uhci_suspend ACPI: PCI interrupt for device 0000:00:10.0 disabled uhci_hcd 0000:00:10.0: --> PCI D3 ACPI: PCI interrupt for device 0000:00:0d.0 disabled ACPI handle has no context! ACPI: PCI interrupt for device 0000:00:0c.0 disabled ACPI handle has no context! pci_set_power_state(): 0000:00:00.0: state=3, current state=5 swsusp: critical section: swsusp: Need to copy 51195 pages Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. evxfevnt-0079 [00] enable : System is already in ACPI mode ACPI: PCI Interrupt Link [ALKA] BIOS reported IRQ 0, using IRQ 20 ACPI: PCI Interrupt Link [ALKB] BIOS reported IRQ 0, using IRQ 21 ACPI: PCI Interrupt Link [ALKC] BIOS reported IRQ 0, using IRQ 22 ACPI: PCI Interrupt Link [ALKD] BIOS reported IRQ 0, using IRQ 23 evxfevnt-0079 [00] enable : System is already in ACPI mode ACPI: Unable to turn cooling device [c180ff60] 'off' PCI: Setting latency timer of device 0000:00:01.0 to 64 ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[19] MMIO=[db140000-db1407ff] Max Packet=[2048] IR/IT contexts=[4/8] ACPI: PCI Interrupt 0000:00:0a.0[A] -> GSI 18 (level, low) -> IRQ 18 e100: eth-intel: e100_watchdog: link up, 100Mbps, full-duplex PM: Writing back config space on device 0000:00:0d.0 at offset 1 (was 2100007, writing 2100003) ACPI: PCI Interrupt 0000:00:0d.0[A] -> GSI 19 (level, low) -> IRQ 22 uhci_hcd 0000:00:10.0: PCI D0, from previous PCI D3 ACPI: PCI Interrupt 0000:00:10.0[A] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20 uhci_hcd 0000:00:10.0: uhci_resume uhci_hcd 0000:00:10.0: uhci_check_and_reset_hc: cmd = 0x0000 uhci_hcd 0000:00:10.0: Performing full reset usb usb1: root hub lost power or was reset usb usb1: suspend_rh uhci_hcd 0000:00:10.1: PCI D0, from previous PCI D3 ACPI: PCI Interrupt 0000:00:10.1[B] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20 uhci_hcd 0000:00:10.1: uhci_resume uhci_hcd 0000:00:10.1: uhci_check_and_reset_hc: cmd = 0x0000 uhci_hcd 0000:00:10.1: Performing full reset usb usb2: root hub lost power or was reset usb usb2: suspend_rh uhci_hcd 0000:00:10.2: PCI D0, from previous PCI D3 ACPI: PCI Interrupt 0000:00:10.2[C] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20 uhci_hcd 0000:00:10.2: uhci_resume uhci_hcd 0000:00:10.2: uhci_check_and_reset_hc: cmd = 0x0000 uhci_hcd 0000:00:10.2: Performing full reset usb usb3: root hub lost power or was reset usb usb3: suspend_rh ehci_hcd 0000:00:10.3: PCI D0, from previous PCI D3 ACPI: PCI Interrupt 0000:00:10.3[D] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20 PM: Writing back config space on device 0000:00:10.3 at offset 3 (was 2008, writing 2010) PM: Writing back config space on device 0000:00:10.3 at offset 1 (was 2100007, writing 2100017) PM: Writing back config space on device 0000:00:11.1 at offset 1 (was 2900003, writing 2900007) ACPI: PCI Interrupt 0000:00:11.1[A] -> Link [ALKA] -> GSI 20 (level, low) -> IRQ 17 ACPI: PCI Interrupt 0000:00:11.5[C] -> Link [ALKC] -> GSI 22 (level, low) -> IRQ 23 PCI: Setting latency timer of device 0000:00:11.5 to 64 ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 16 serial 00:07: activated serial 00:08: activated parport_pc 00:09: activated i8042 aux 00:0a: activation failed i8042 kbd 00:0b: activation failed sd 0:0:0:0: [sda] Starting disk ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT ata1.01: _GTF evaluation failed (AE 0x300d) ata1.01: revalidation failed (errno=-5) ata1: failed to recover some devices, retrying in 5 secs ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT ata1.01: _GTF evaluation failed (AE 0x300d) ata1.01: ACPI on devcfg failed the second time, disabling (errno=-5) ata1.01: revalidation failed (errno=1) ata1: failed to recover some devices, retrying in 5 secs ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV0._GTF] (Node c180b840), AE_AML_PACKAGE_LIMIT ata1.00: _GTF evaluation failed (AE 0x300d) ata1.00: revalidation failed (errno=-5) ata1: failed to recover some devices, retrying in 5 secs ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV0._GTF] (Node c180b840), AE_AML_PACKAGE_LIMIT ata1.00: _GTF evaluation failed (AE 0x300d) ata1.00: ACPI on devcfg failed the second time, disabling (errno=-5) ata1.00: revalidation failed (errno=1) ata1: failed to recover some devices, retrying in 5 secs ata1.00: configured for UDMA/100 ata1.01: configured for UDMA/33 sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors (120034 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors (120034 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA usb usb1: usb resume usb usb1: wakeup_rh hub 1-0:1.0: trying to enable port power on non-switchable hub usb usb2: usb resume usb usb2: wakeup_rh hub 2-0:1.0: trying to enable port power on non-switchable hub usb usb3: usb resume usb usb3: wakeup_rh hub 3-0:1.0: trying to enable port power on non-switchable hub usb usb4: usb resume ehci_hcd 0000:00:10.3: resume root hub hub 4-0:1.0: hub_resume Restarting tasks ... <7>hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0006 uhci_hcd 0000:00:10.0: port 1 portsc 018a,00 hub 1-0:1.0: port 1, status 0300, change 0003, 1.5 Mb/s done. swsusp: Basic memory bitmaps freed hub 1-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x300 uhci_hcd 0000:00:10.0: port 2 portsc 008a,00 hub 1-0:1.0: port 2, status 0100, change 0003, 12 Mb/s hub 1-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100 hub 2-0:1.0: state 7 ports 2 chg 0000 evt 0006 uhci_hcd 0000:00:10.1: port 1 portsc 018a,00 hub 2-0:1.0: port 1, status 0300, change 0003, 1.5 Mb/s hub 2-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x300 uhci_hcd 0000:00:10.1: port 2 portsc 008a,00 hub 2-0:1.0: port 2, status 0100, change 0003, 12 Mb/s hub 2-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100 hub 3-0:1.0: state 7 ports 2 chg 0000 evt 0006 uhci_hcd 0000:00:10.2: port 1 portsc 008a,00 hub 3-0:1.0: port 1, status 0100, change 0003, 12 Mb/s hub 3-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x100 uhci_hcd 0000:00:10.2: port 2 portsc 008a,00 hub 3-0:1.0: port 2, status 0100, change 0003, 12 Mb/s hub 3-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100 hub 4-0:1.0: state 7 ports 6 chg 0000 evt 0000 hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0000 hub 2-0:1.0: state 7 ports 2 chg 0000 evt 0000 hub 3-0:1.0: state 7 ports 2 chg 0000 evt 0000 usb usb1: suspend_rh (auto-stop) usb usb2: suspend_rh (auto-stop) usb usb3: suspend_rh (auto-stop) agpgart: Found an AGP 2.0 compliant device at 0000:00:00.0. agpgart: Putting AGP V2 device at 0000:00:00.0 into 4x mode agpgart: Putting AGP V2 device at 0000:01:00.0 into 4x mode [drm] Loading R200 Microcode 2.6.24-rc2 suspend log (one screenful), UNSUCCESSFUL: serial 00:07: disabled ACPI: PCI interrupt for device 0000:00:11.5 disabled ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTM_] (Node c180b9a8), AE_AML_PACKAGE_LIMIT ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN1._GTM] (Node c180b8d0), AE_AML_PACKAGE_LIMIT ata2: ACPI get timing mode failed (AE 0x300d) pci_device_suspend(): ata_pci_device_suspend+0x0/0x40() returns -22 suspend_device(): pci_device_suspend+0x0/0x70() returns -22 Could not suspend device 0000:00:11.1: error -22 ACPI: PCI Interrupt 0000:00:11.5[C] -> Link [ALKC] -> GSI 22 (level, low) -> IRQ 23 ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 16 serial 00:07: activated serial 00:08: activated parport_pc 00:09: activated i8042 aux 00:0a: activation failed i8042 kbd 00:0b: activation failed sd 0:0:0:0: [sda] Starting disk sd 0:0:0:0: timing out command, waited 180s sd 0:0:0:0: [sda] START_STOP FAILED sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK Restarting tasks ... <7>hub 1-0:1.0: state 7 ports 6 chg 0000 evt 0000 done. swsusp: Basic memory bitmaps freed swsusp: Marking nosave pages: 000000000009f000 - 0000000000100000 swsusp: Basic memory bitmaps created Syncing filesystems ... # lspci 00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333] 00:01.0 PCI bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333 AGP] 00:09.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 46) 00:0a.0 Multimedia audio controller: Aureal Semiconductor Vortex 2 (rev fe) 00:0c.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 08) 00:0d.0 Multimedia audio controller: Aztech System Ltd 3328 Audio (rev 10) 00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80) 00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80) 00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80) 00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82) 00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge 00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) 00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 50) 00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74) 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon RV250 If [Radeon 9000] (rev 01) 01:00.1 Display controller: ATI Technologies Inc Radeon RV250 [Radeon 9000] (Secondary) (rev 01) # dmidecode 2.9 SMBIOS 2.2 present. 39 structures occupying 1035 bytes. Table at 0x000F0800. Handle 0x0000, DMI type 0, 19 bytes BIOS Information Vendor: Award Software International, Inc. Version: 6.00 PG Release Date: 09/16/2003 Address: 0xE0000 Runtime Size: 128 kB ROM Size: 512 kB Characteristics: ISA is supported PCI is supported PNP is supported APM is supported BIOS is upgradeable BIOS shadowing is allowed ESCD support is available Boot from CD is supported Selectable boot is supported BIOS ROM is socketed EDD is supported 5.25"/360 KB floppy services are supported (int 13h) 5.25"/1.2 MB floppy services are supported (int 13h) 3.5"/720 KB floppy services are supported (int 13h) 3.5"/2.88 MB floppy services are supported (int 13h) Print screen service is supported (int 5h) 8042 keyboard services are supported (int 9h) Serial services are supported (int 14h) Printer services are supported (int 17h) CGA/mono video services are supported (int 10h) ACPI is supported USB legacy is supported AGP is supported LS-120 boot is supported ATAPI Zip drive boot is supported Handle 0x0001, DMI type 1, 25 bytes System Information Manufacturer: VIA Technologies, Inc. Product Name: VT8367-8235 Version: Serial Number: UUID: Not Present Wake-up Type: Power Switch Handle 0x0002, DMI type 2, 8 bytes Base Board Information Manufacturer: Product Name: VT8367-8235 Version: Serial Number: Handle 0x0003, DMI type 3, 13 bytes Chassis Information Manufacturer: Type: Desktop Lock: Not Present Version: Serial Number: Asset Tag: Boot-up State: Unknown Power Supply State: Unknown Thermal State: Unknown Security Status: Unknown Handle 0x0004, DMI type 4, 32 bytes Processor Information Socket Designation: Socket A Type: Central Processor Family: Duron Manufacturer: AMD ID: 81 06 00 00 FF FB 83 03 Signature: Family 6, Model 8, Stepping 1 Flags: FPU (Floating-point unit on-chip) VME (Virtual mode extension) DE (Debugging extension) PSE (Page size extension) TSC (Time stamp counter) MSR (Model specific registers) PAE (Physical address extension) MCE (Machine check exception) CX8 (CMPXCHG8 instruction supported) APIC (On-chip APIC hardware supported) SEP (Fast system call) MTRR (Memory type range registers) PGE (Page global enable) MCA (Machine check architecture) CMOV (Conditional move instruction supported) PAT (Page attribute table) PSE-36 (36-bit page size extension) MMX (MMX technology supported) FXSR (Fast floating-point save and restore) SSE (Streaming SIMD extensions) Version: AMD K7 processor Voltage: 3.3 V External Clock: 133 MHz Max Speed: 1500 MHz Current Speed: 1200 MHz Status: Populated, Enabled Upgrade: ZIF Socket L1 Cache Handle: 0x000A L2 Cache Handle: 0x000B L3 Cache Handle: No L3 Cache Handle 0x0005, DMI type 5, 24 bytes Memory Controller Information Error Detecting Method: None Error Correcting Capabilities: None Supported Interleave: One-way Interleave Current Interleave: Four-way Interleave Maximum Memory Module Size: 32 MB Maximum Total Memory Size: 128 MB Supported Speeds: 70 ns 60 ns Supported Memory Types: Standard EDO Memory Module Voltage: 5.0 V Associated Memory Slots: 4 0x0006 0x0007 0x0008 0x0009 Enabled Error Correcting Capabilities: None . . . # hdparm -i /dev/sda /dev/sda: Model=WDC WD1200JB-00CRA1 , FwRev=17.07W17, SerialNo=WD-WCA8C4285629 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq } RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=40 BuffType=DualPortCache, BuffSize=8192kB, MaxMultSect=16, MultSect=?16? CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=234441648 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 AdvancedPM=no WriteCache=enabled Drive conforms to: Unspecified: ATA/ATAPI-1,2,3,4,5 * signifies the current active mode Athlon on EPOX 8K5A2+ board. Again, 2.6.23 and 2.6.24-rc1 work, yet 2.6.24 -rc2, -rc3 and -rc4 FAIL. Probably won't be able to do any reporting over the weekend (WOL is inoperable ATM for some weird reason), let me know what you need. Took too much time to gather this report already anyway ;) Thanks, Andreas Mohr ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 10:12 ` Andreas Mohr @ 2007-12-08 10:20 ` Andrew Morton 2007-12-08 10:28 ` Matthew Garrett 2007-12-08 10:55 ` Andreas Mohr 0 siblings, 2 replies; 74+ messages in thread From: Andrew Morton @ 2007-12-08 10:20 UTC (permalink / raw) To: Andreas Mohr Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide, Tejun Heo, Len Brown, linux-acpi On Sat, 8 Dec 2007 11:12:57 +0100 Andreas Mohr <andi@lisas.de> wrote: > Hi, > > On Sat, Dec 08, 2007 at 01:36:31AM -0800, Andrew Morton wrote: > > > Subject : PATA scan: ACPI Exception AE_AML_PACKAGE_LIMIT... is beyond end of object > > > Submitter : Hans de Bruin <bruinjm@xs4all.nl> > > > References : http://bugzilla.kernel.org/show_bug.cgi?id=9320 > > > Handled-By : Robert Moore <Robert.Moore@intel.com> > > > Tejun Heo <htejun@gmail.com> > > > Fu Michael <michael.fu@intel.com> > > > Patch : > > > > > > > A number of other people are seeing the same thing and Tejun is putting in > > a blacklist of machines which cannot use libata+acpi. That patch is not > > yet in any git tree which I pull. > > > > AFACIT the machines kepe working OK - there's just some nasty dmesg spew. > > > > If any machines _are_ breaking then this could cause real problems and I'd > > prefer that we either go for a whitelist or arrange to detect the condition > > and fall back to non-acpi ata. > > Does this report now win me the lucky draw, pretty please? ;) nah, you have to cc the acpi guys to get a prize ;) Len&co, could you please take a look? Andreas, please do separately report that WOL problem too.. Our list just reached 30. > STD regression rc1 -> rc234, suspend fails completely, recovering is > pretty much useless since HDD is DEAD from this point on anyway. > Managed to capture -rc2 suspend logging via still-alive ssh session. > > 2.6.24-rc1 suspend/resume log, successful (well, a couple seconds delay, most likely due to > well-recovered AML failure): > > swsusp: Marking nosave pages: 000000000009f000 - 0000000000100000 > swsusp: Basic memory bitmaps created > Syncing filesystems ... done. > Freezing user space processes ... (elapsed 0.00 seconds) done. > Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done. > Shrinking memory... done (0 pages freed) > Freed 0 kbytes in 0.02 seconds (0.00 MB/s) > Suspending console(s) > hub 4-0:1.0: hub_suspend > usb usb4: bus suspend > ehci_hcd 0000:00:10.3: suspend root hub > hub 3-0:1.0: hub_suspend > usb usb3: bus suspend > usb usb3: suspend_rh > hub 2-0:1.0: hub_suspend > usb usb2: bus suspend > usb usb2: suspend_rh > hub 1-0:1.0: hub_suspend > usb usb1: bus suspend > usb usb1: suspend_rh > sd 0:0:0:0: [sda] Synchronizing SCSI cache > parport_pc 00:09: disabled > serial 00:08: disabled > serial 00:07: disabled > ACPI: PCI interrupt for device 0000:00:11.5 disabled > ACPI handle has no context! > ACPI: PCI interrupt for device 0000:00:11.1 disabled > ACPI: PCI interrupt for device 0000:00:10.3 disabled > ehci_hcd 0000:00:10.3: --> PCI D3/wakeup > uhci_hcd 0000:00:10.2: uhci_suspend > ACPI: PCI interrupt for device 0000:00:10.2 disabled > uhci_hcd 0000:00:10.2: --> PCI D3 > uhci_hcd 0000:00:10.1: uhci_suspend > ACPI: PCI interrupt for device 0000:00:10.1 disabled > uhci_hcd 0000:00:10.1: --> PCI D3 > uhci_hcd 0000:00:10.0: uhci_suspend > ACPI: PCI interrupt for device 0000:00:10.0 disabled > uhci_hcd 0000:00:10.0: --> PCI D3 > ACPI: PCI interrupt for device 0000:00:0d.0 disabled > ACPI handle has no context! > ACPI: PCI interrupt for device 0000:00:0c.0 disabled > ACPI handle has no context! > pci_set_power_state(): 0000:00:00.0: state=3, current state=5 > swsusp: critical section: > swsusp: Need to copy 51195 pages > Intel machine check architecture supported. > Intel machine check reporting enabled on CPU#0. > evxfevnt-0079 [00] enable : System is already in ACPI mode > ACPI: PCI Interrupt Link [ALKA] BIOS reported IRQ 0, using IRQ 20 > ACPI: PCI Interrupt Link [ALKB] BIOS reported IRQ 0, using IRQ 21 > ACPI: PCI Interrupt Link [ALKC] BIOS reported IRQ 0, using IRQ 22 > ACPI: PCI Interrupt Link [ALKD] BIOS reported IRQ 0, using IRQ 23 > evxfevnt-0079 [00] enable : System is already in ACPI mode > ACPI: Unable to turn cooling device [c180ff60] 'off' > PCI: Setting latency timer of device 0000:00:01.0 to 64 > ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[19] MMIO=[db140000-db1407ff] Max Packet=[2048] IR/IT contexts=[4/8] > ACPI: PCI Interrupt 0000:00:0a.0[A] -> GSI 18 (level, low) -> IRQ 18 > e100: eth-intel: e100_watchdog: link up, 100Mbps, full-duplex > PM: Writing back config space on device 0000:00:0d.0 at offset 1 (was 2100007, writing 2100003) > ACPI: PCI Interrupt 0000:00:0d.0[A] -> GSI 19 (level, low) -> IRQ 22 > uhci_hcd 0000:00:10.0: PCI D0, from previous PCI D3 > ACPI: PCI Interrupt 0000:00:10.0[A] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20 > uhci_hcd 0000:00:10.0: uhci_resume > uhci_hcd 0000:00:10.0: uhci_check_and_reset_hc: cmd = 0x0000 > uhci_hcd 0000:00:10.0: Performing full reset > usb usb1: root hub lost power or was reset > usb usb1: suspend_rh > uhci_hcd 0000:00:10.1: PCI D0, from previous PCI D3 > ACPI: PCI Interrupt 0000:00:10.1[B] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20 > uhci_hcd 0000:00:10.1: uhci_resume > uhci_hcd 0000:00:10.1: uhci_check_and_reset_hc: cmd = 0x0000 > uhci_hcd 0000:00:10.1: Performing full reset > usb usb2: root hub lost power or was reset > usb usb2: suspend_rh > uhci_hcd 0000:00:10.2: PCI D0, from previous PCI D3 > ACPI: PCI Interrupt 0000:00:10.2[C] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20 > uhci_hcd 0000:00:10.2: uhci_resume > uhci_hcd 0000:00:10.2: uhci_check_and_reset_hc: cmd = 0x0000 > uhci_hcd 0000:00:10.2: Performing full reset > usb usb3: root hub lost power or was reset > usb usb3: suspend_rh > ehci_hcd 0000:00:10.3: PCI D0, from previous PCI D3 > ACPI: PCI Interrupt 0000:00:10.3[D] -> Link [ALKB] -> GSI 21 (level, low) -> IRQ 20 > PM: Writing back config space on device 0000:00:10.3 at offset 3 (was 2008, writing 2010) > PM: Writing back config space on device 0000:00:10.3 at offset 1 (was 2100007, writing 2100017) > PM: Writing back config space on device 0000:00:11.1 at offset 1 (was 2900003, writing 2900007) > ACPI: PCI Interrupt 0000:00:11.1[A] -> Link [ALKA] -> GSI 20 (level, low) -> IRQ 17 > ACPI: PCI Interrupt 0000:00:11.5[C] -> Link [ALKC] -> GSI 22 (level, low) -> IRQ 23 > PCI: Setting latency timer of device 0000:00:11.5 to 64 > ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 16 > serial 00:07: activated > serial 00:08: activated > parport_pc 00:09: activated > i8042 aux 00:0a: activation failed > i8042 kbd 00:0b: activation failed > sd 0:0:0:0: [sda] Starting disk > ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126] > ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT > ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT > ata1.01: _GTF evaluation failed (AE 0x300d) > ata1.01: revalidation failed (errno=-5) > ata1: failed to recover some devices, retrying in 5 secs > ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126] > ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT > ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT > ata1.01: _GTF evaluation failed (AE 0x300d) > ata1.01: ACPI on devcfg failed the second time, disabling (errno=-5) > ata1.01: revalidation failed (errno=1) > ata1: failed to recover some devices, retrying in 5 secs > ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126] > ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT > ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV0._GTF] (Node c180b840), AE_AML_PACKAGE_LIMIT > ata1.00: _GTF evaluation failed (AE 0x300d) > ata1.00: revalidation failed (errno=-5) > ata1: failed to recover some devices, retrying in 5 secs > ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126] > ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT > ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV0._GTF] (Node c180b840), AE_AML_PACKAGE_LIMIT > ata1.00: _GTF evaluation failed (AE 0x300d) > ata1.00: ACPI on devcfg failed the second time, disabling (errno=-5) > ata1.00: revalidation failed (errno=1) > ata1: failed to recover some devices, retrying in 5 secs > ata1.00: configured for UDMA/100 > ata1.01: configured for UDMA/33 > sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors (120034 MB) > sd 0:0:0:0: [sda] Write Protect is off > sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 > sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors (120034 MB) > sd 0:0:0:0: [sda] Write Protect is off > sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 > sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > usb usb1: usb resume > usb usb1: wakeup_rh > hub 1-0:1.0: trying to enable port power on non-switchable hub > usb usb2: usb resume > usb usb2: wakeup_rh > hub 2-0:1.0: trying to enable port power on non-switchable hub > usb usb3: usb resume > usb usb3: wakeup_rh > hub 3-0:1.0: trying to enable port power on non-switchable hub > usb usb4: usb resume > ehci_hcd 0000:00:10.3: resume root hub > hub 4-0:1.0: hub_resume > Restarting tasks ... <7>hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0006 > uhci_hcd 0000:00:10.0: port 1 portsc 018a,00 > hub 1-0:1.0: port 1, status 0300, change 0003, 1.5 Mb/s > done. > swsusp: Basic memory bitmaps freed > hub 1-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x300 > uhci_hcd 0000:00:10.0: port 2 portsc 008a,00 > hub 1-0:1.0: port 2, status 0100, change 0003, 12 Mb/s > hub 1-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100 > hub 2-0:1.0: state 7 ports 2 chg 0000 evt 0006 > uhci_hcd 0000:00:10.1: port 1 portsc 018a,00 > hub 2-0:1.0: port 1, status 0300, change 0003, 1.5 Mb/s > hub 2-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x300 > uhci_hcd 0000:00:10.1: port 2 portsc 008a,00 > hub 2-0:1.0: port 2, status 0100, change 0003, 12 Mb/s > hub 2-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100 > hub 3-0:1.0: state 7 ports 2 chg 0000 evt 0006 > uhci_hcd 0000:00:10.2: port 1 portsc 008a,00 > hub 3-0:1.0: port 1, status 0100, change 0003, 12 Mb/s > hub 3-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x100 > uhci_hcd 0000:00:10.2: port 2 portsc 008a,00 > hub 3-0:1.0: port 2, status 0100, change 0003, 12 Mb/s > hub 3-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100 > hub 4-0:1.0: state 7 ports 6 chg 0000 evt 0000 > hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0000 > hub 2-0:1.0: state 7 ports 2 chg 0000 evt 0000 > hub 3-0:1.0: state 7 ports 2 chg 0000 evt 0000 > usb usb1: suspend_rh (auto-stop) > usb usb2: suspend_rh (auto-stop) > usb usb3: suspend_rh (auto-stop) > agpgart: Found an AGP 2.0 compliant device at 0000:00:00.0. > agpgart: Putting AGP V2 device at 0000:00:00.0 into 4x mode > agpgart: Putting AGP V2 device at 0000:01:00.0 into 4x mode > [drm] Loading R200 Microcode > > > > 2.6.24-rc2 suspend log (one screenful), UNSUCCESSFUL: > > serial 00:07: disabled > ACPI: PCI interrupt for device 0000:00:11.5 disabled > ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126] > ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTM_] > (Node c180b9a8), AE_AML_PACKAGE_LIMIT > ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN1._GTM] (Node c180b8d0), AE_AML_PACKAGE_LIMIT > ata2: ACPI get timing mode failed (AE 0x300d) > pci_device_suspend(): ata_pci_device_suspend+0x0/0x40() returns -22 > suspend_device(): pci_device_suspend+0x0/0x70() returns -22 > Could not suspend device 0000:00:11.1: error -22 > ACPI: PCI Interrupt 0000:00:11.5[C] -> Link [ALKC] -> GSI 22 (level, low) -> IRQ 23 > ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 16 > serial 00:07: activated > serial 00:08: activated > parport_pc 00:09: activated > i8042 aux 00:0a: activation failed > i8042 kbd 00:0b: activation failed > sd 0:0:0:0: [sda] Starting disk > sd 0:0:0:0: timing out command, waited 180s > sd 0:0:0:0: [sda] START_STOP FAILED > sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK > Restarting tasks ... <7>hub 1-0:1.0: state 7 ports 6 chg 0000 evt 0000 > done. > swsusp: Basic memory bitmaps freed > swsusp: Marking nosave pages: 000000000009f000 - 0000000000100000 > swsusp: Basic memory bitmaps created > Syncing filesystems ... > > > > # lspci > 00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333] > 00:01.0 PCI bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333 AGP] > 00:09.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 46) > 00:0a.0 Multimedia audio controller: Aureal Semiconductor Vortex 2 (rev fe) > 00:0c.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 08) > 00:0d.0 Multimedia audio controller: Aztech System Ltd 3328 Audio (rev 10) > 00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80) > 00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80) > 00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80) > 00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82) > 00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge > 00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) > 00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 50) > 00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74) > 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon RV250 If [Radeon 9000] (rev 01) > 01:00.1 Display controller: ATI Technologies Inc Radeon RV250 [Radeon 9000] (Secondary) (rev 01) > > > # dmidecode 2.9 > SMBIOS 2.2 present. > 39 structures occupying 1035 bytes. > Table at 0x000F0800. > > Handle 0x0000, DMI type 0, 19 bytes > BIOS Information > Vendor: Award Software International, Inc. > Version: 6.00 PG > Release Date: 09/16/2003 > Address: 0xE0000 > Runtime Size: 128 kB > ROM Size: 512 kB > Characteristics: > ISA is supported > PCI is supported > PNP is supported > APM is supported > BIOS is upgradeable > BIOS shadowing is allowed > ESCD support is available > Boot from CD is supported > Selectable boot is supported > BIOS ROM is socketed > EDD is supported > 5.25"/360 KB floppy services are supported (int 13h) > 5.25"/1.2 MB floppy services are supported (int 13h) > 3.5"/720 KB floppy services are supported (int 13h) > 3.5"/2.88 MB floppy services are supported (int 13h) > Print screen service is supported (int 5h) > 8042 keyboard services are supported (int 9h) > Serial services are supported (int 14h) > Printer services are supported (int 17h) > CGA/mono video services are supported (int 10h) > ACPI is supported > USB legacy is supported > AGP is supported > LS-120 boot is supported > ATAPI Zip drive boot is supported > > Handle 0x0001, DMI type 1, 25 bytes > System Information > Manufacturer: VIA Technologies, Inc. > Product Name: VT8367-8235 > Version: > Serial Number: > UUID: Not Present > Wake-up Type: Power Switch > > Handle 0x0002, DMI type 2, 8 bytes > Base Board Information > Manufacturer: > Product Name: VT8367-8235 > Version: > Serial Number: > > Handle 0x0003, DMI type 3, 13 bytes > Chassis Information > Manufacturer: > Type: Desktop > Lock: Not Present > Version: > Serial Number: > Asset Tag: > Boot-up State: Unknown > Power Supply State: Unknown > Thermal State: Unknown > Security Status: Unknown > > Handle 0x0004, DMI type 4, 32 bytes > Processor Information > Socket Designation: Socket A > Type: Central Processor > Family: Duron > Manufacturer: AMD > ID: 81 06 00 00 FF FB 83 03 > Signature: Family 6, Model 8, Stepping 1 > Flags: > FPU (Floating-point unit on-chip) > VME (Virtual mode extension) > DE (Debugging extension) > PSE (Page size extension) > TSC (Time stamp counter) > MSR (Model specific registers) > PAE (Physical address extension) > MCE (Machine check exception) > CX8 (CMPXCHG8 instruction supported) > APIC (On-chip APIC hardware supported) > SEP (Fast system call) > MTRR (Memory type range registers) > PGE (Page global enable) > MCA (Machine check architecture) > CMOV (Conditional move instruction supported) > PAT (Page attribute table) > PSE-36 (36-bit page size extension) > MMX (MMX technology supported) > FXSR (Fast floating-point save and restore) > SSE (Streaming SIMD extensions) > Version: AMD K7 processor > Voltage: 3.3 V > External Clock: 133 MHz > Max Speed: 1500 MHz > Current Speed: 1200 MHz > Status: Populated, Enabled > Upgrade: ZIF Socket > L1 Cache Handle: 0x000A > L2 Cache Handle: 0x000B > L3 Cache Handle: No L3 Cache > > Handle 0x0005, DMI type 5, 24 bytes > Memory Controller Information > Error Detecting Method: None > Error Correcting Capabilities: > None > Supported Interleave: One-way Interleave > Current Interleave: Four-way Interleave > Maximum Memory Module Size: 32 MB > Maximum Total Memory Size: 128 MB > Supported Speeds: > 70 ns > 60 ns > Supported Memory Types: > Standard > EDO > Memory Module Voltage: 5.0 V > Associated Memory Slots: 4 > 0x0006 > 0x0007 > 0x0008 > 0x0009 > Enabled Error Correcting Capabilities: None > > . > . > . > > > # hdparm -i /dev/sda > > /dev/sda: > > Model=WDC WD1200JB-00CRA1 , FwRev=17.07W17, SerialNo=WD-WCA8C4285629 > Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq } > RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=40 > BuffType=DualPortCache, BuffSize=8192kB, MaxMultSect=16, MultSect=?16? > CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=234441648 > IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} > PIO modes: pio0 pio1 pio2 pio3 pio4 > DMA modes: mdma0 mdma1 mdma2 > UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 > AdvancedPM=no WriteCache=enabled > Drive conforms to: Unspecified: ATA/ATAPI-1,2,3,4,5 > > * signifies the current active mode > > > > Athlon on EPOX 8K5A2+ board. > > > > Again, 2.6.23 and 2.6.24-rc1 work, yet 2.6.24 -rc2, -rc3 and -rc4 FAIL. > > Probably won't be able to do any reporting over the weekend (WOL is > inoperable ATM for some weird reason), let me know what you need. > Took too much time to gather this report already anyway ;) > > Thanks, > > Andreas Mohr ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 10:20 ` Andrew Morton @ 2007-12-08 10:28 ` Matthew Garrett 2007-12-08 10:55 ` Andreas Mohr 1 sibling, 0 replies; 74+ messages in thread From: Matthew Garrett @ 2007-12-08 10:28 UTC (permalink / raw) To: Andrew Morton Cc: Andreas Mohr, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide, Tejun Heo, Len Brown, linux-acpi On Sat, Dec 08, 2007 at 02:20:01AM -0800, Andrew Morton wrote: > On Sat, 8 Dec 2007 11:12:57 +0100 Andreas Mohr <andi@lisas.de> wrote: > > ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126] > > ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.GTF_] (Node c180b990), AE_AML_PACKAGE_LIMIT > > ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.IDE0.CHN0.DRV1._GTF] (Node c180b888), AE_AML_PACKAGE_LIMIT > > ata1.01: _GTF evaluation failed (AE 0x300d) 037f6bb79f753c014bc84bca0de9bf98bb5ab169 ought to have fixed this? -- Matthew Garrett | mjg59@srcf.ucam.org ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 10:20 ` Andrew Morton 2007-12-08 10:28 ` Matthew Garrett @ 2007-12-08 10:55 ` Andreas Mohr 2007-12-09 15:46 ` Tejun Heo 1 sibling, 1 reply; 74+ messages in thread From: Andreas Mohr @ 2007-12-08 10:55 UTC (permalink / raw) To: Andrew Morton Cc: Andreas Mohr, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide, Tejun Heo, Len Brown, linux-acpi Hi, On Sat, Dec 08, 2007 at 02:20:01AM -0800, Andrew Morton wrote: > > Does this report now win me the lucky draw, pretty please? ;) > > nah, you have to cc the acpi guys to get a prize ;) Thought so shortly, but missed it. > Andreas, please do separately report that WOL problem too.. Local setup issue only, at least this one *isn't* a 2.6.24-rc regression. ;) > Our list just reached 30. Oh, so this is in fact a separate issue? Wasn't sure, couldn't do enough analysis of similar cases. Will test any (already submitted!) suggestions ASAP. Andreas Mohr ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 10:55 ` Andreas Mohr @ 2007-12-09 15:46 ` Tejun Heo 2007-12-09 19:59 ` Andreas Mohr 0 siblings, 1 reply; 74+ messages in thread From: Tejun Heo @ 2007-12-09 15:46 UTC (permalink / raw) To: Andreas Mohr Cc: Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide, Len Brown, linux-acpi Andreas Mohr wrote: > Hi, > > On Sat, Dec 08, 2007 at 02:20:01AM -0800, Andrew Morton wrote: >>> Does this report now win me the lucky draw, pretty please? ;) >> nah, you have to cc the acpi guys to get a prize ;) > > Thought so shortly, but missed it. > >> Andreas, please do separately report that WOL problem too.. > > Local setup issue only, at least this one *isn't* a 2.6.24-rc regression. ;) > >> Our list just reached 30. > > Oh, so this is in fact a separate issue? Wasn't sure, couldn't do > enough analysis of similar cases. > > Will test any (already submitted!) suggestions ASAP. Please post full kernel boot log and the result of 'lspci -nn'. Thanks. -- tejun ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-09 15:46 ` Tejun Heo @ 2007-12-09 19:59 ` Andreas Mohr 0 siblings, 0 replies; 74+ messages in thread From: Andreas Mohr @ 2007-12-09 19:59 UTC (permalink / raw) To: Tejun Heo Cc: Andreas Mohr, Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide, Len Brown, linux-acpi Hi, On Mon, Dec 10, 2007 at 12:46:57AM +0900, Tejun Heo wrote: > Please post full kernel boot log and the result of 'lspci -nn'. Done, on #9530. Will try some of the promising patches/suggestions now, hopefully this will show me what's up. Will add further results there. Andreas Mohr ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 9:36 ` Andrew Morton 2007-12-08 10:12 ` Andreas Mohr @ 2007-12-09 6:52 ` Tejun Heo 2007-12-09 14:20 ` Rafael J. Wysocki 1 sibling, 1 reply; 74+ messages in thread From: Tejun Heo @ 2007-12-09 6:52 UTC (permalink / raw) To: Andrew Morton Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, linux-ide Hello, Andrew Morton wrote: >> Subject : PATA scan: ACPI Exception AE_AML_PACKAGE_LIMIT... is beyond end of object >> Submitter : Hans de Bruin <bruinjm@xs4all.nl> >> References : http://bugzilla.kernel.org/show_bug.cgi?id=9320 >> Handled-By : Robert Moore <Robert.Moore@intel.com> >> Tejun Heo <htejun@gmail.com> >> Fu Michael <michael.fu@intel.com> >> Patch : >> > > A number of other people are seeing the same thing and Tejun is > putting in a blacklist of machines which cannot use libata+acpi. > That patch is not yet in any git tree which I pull. > > AFACIT the machines kepe working OK - there's just some nasty dmesg > spew. > > If any machines _are_ breaking then this could cause real problems > and I'd prefer that we either go for a whitelist or arrange to > detect the condition and fall back to non-acpi ata. The pending patchset should make ATA ACPI quite resistant to failures. Known bad boards can be blacklisted (currently only one is on the list), ATA ACPI is disabled quicker if ACPI evalution fails, execution errors are handled better and commands which are intended to help the vendor instead of the user are filtered. So, I think we have enough safety nets. Thanks. -- tejun ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-09 6:52 ` Tejun Heo @ 2007-12-09 14:20 ` Rafael J. Wysocki 2007-12-09 15:11 ` Tejun Heo 0 siblings, 1 reply; 74+ messages in thread From: Rafael J. Wysocki @ 2007-12-09 14:20 UTC (permalink / raw) To: Tejun Heo; +Cc: Andrew Morton, LKML, Linus Torvalds, Ingo Molnar, linux-ide On Sunday, 9 of December 2007, Tejun Heo wrote: > Hello, > > Andrew Morton wrote: > >> Subject : PATA scan: ACPI Exception AE_AML_PACKAGE_LIMIT... is beyond end of object > >> Submitter : Hans de Bruin <bruinjm@xs4all.nl> > >> References : http://bugzilla.kernel.org/show_bug.cgi?id=9320 > >> Handled-By : Robert Moore <Robert.Moore@intel.com> > >> Tejun Heo <htejun@gmail.com> > >> Fu Michael <michael.fu@intel.com> > >> Patch : > >> > > > > A number of other people are seeing the same thing and Tejun is > > putting in a blacklist of machines which cannot use libata+acpi. > > That patch is not yet in any git tree which I pull. > > > > AFACIT the machines kepe working OK - there's just some nasty dmesg > > spew. > > > > If any machines _are_ breaking then this could cause real problems > > and I'd prefer that we either go for a whitelist or arrange to > > detect the condition and fall back to non-acpi ata. > > The pending patchset should make ATA ACPI quite resistant to failures. Are you going to push it for 2.6.24? > Known bad boards can be blacklisted (currently only one is on the > list), ATA ACPI is disabled quicker if ACPI evalution fails, execution > errors are handled better and commands which are intended to help the > vendor instead of the user are filtered. So, I think we have enough > safety nets. Sounds good. :-) Thanks, Rafael ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-09 14:20 ` Rafael J. Wysocki @ 2007-12-09 15:11 ` Tejun Heo 0 siblings, 0 replies; 74+ messages in thread From: Tejun Heo @ 2007-12-09 15:11 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Andrew Morton, LKML, Linus Torvalds, Ingo Molnar, linux-ide Rafael J. Wysocki wrote: >>> If any machines _are_ breaking then this could cause real problems >>> and I'd prefer that we either go for a whitelist or arrange to >>> detect the condition and fall back to non-acpi ata. >> The pending patchset should make ATA ACPI quite resistant to failures. > > Are you going to push it for 2.6.24? Yeah, I'm hoping so. Maybe command filtering should wait till 2.6.25 but the rest, yeap. -- tejun ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 2:40 Rafael J. Wysocki ` (2 preceding siblings ...) 2007-12-08 9:36 ` Andrew Morton @ 2007-12-08 9:42 ` Andrew Morton 2007-12-08 18:57 ` Roland Dreier 2007-12-08 19:40 ` Theodore Tso 2007-12-08 9:46 ` Andrew Morton ` (4 subsequent siblings) 8 siblings, 2 replies; 74+ messages in thread From: Andrew Morton @ 2007-12-08 9:42 UTC (permalink / raw) To: Rafael J. Wysocki Cc: LKML, Linus Torvalds, Ingo Molnar, Roland Dreier, Takashi Iwai, Theodore Ts'o On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > This message contains a list of some regressions from 2.6.23 which have been > reported since 2.6.24-rc1 was released and for which there are no fixes in the > mainline that I know of. If any of them have been fixed already, please let me > know. > > If you know of any other unresolved regressions from 2.6.23, please let me know > either and I'll add them to the list. > > ... > > Subject : snd_hda_intel 2.6.24-rc2 bug: interrupts don't always work on Lenovo X60s > Submitter : Roland Dreier <rdreier@cisco.com> > References : http://lkml.org/lkml/2007/11/8/255 > http://bugzilla.kernel.org/show_bug.cgi?id=9332 > Handled-By : > Patch : Takashi had a patch and that has been merged. AFAIK this regression has been fixed and we're left with a new but harmless warning. However Roland reported other problems and it appears that the trail went cold (http://lkml.org/lkml/2007/11/14/251) Ted was hitting some of the same problems but that trail appears to also have gone cold (http://lkml.org/lkml/2007/11/23/17). Guys, can we have a status update on all of this please? ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 9:42 ` Andrew Morton @ 2007-12-08 18:57 ` Roland Dreier 2007-12-08 19:40 ` Theodore Tso 1 sibling, 0 replies; 74+ messages in thread From: Roland Dreier @ 2007-12-08 18:57 UTC (permalink / raw) To: Andrew Morton Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, Roland Dreier, Takashi Iwai, Theodore Ts'o > > Subject : snd_hda_intel 2.6.24-rc2 bug: interrupts don't always work on Lenovo X60s > > Submitter : Roland Dreier <rdreier@cisco.com> > > References : http://lkml.org/lkml/2007/11/8/255 > > http://bugzilla.kernel.org/show_bug.cgi?id=9332 > > Handled-By : > > Patch : > > Takashi had a patch and that has been merged. AFAIK this regression > has been fixed and we're left with a new but harmless warning. > > However Roland reported other problems and it appears that the trail went > cold (http://lkml.org/lkml/2007/11/14/251) A fix for the most likely cause of this problem was merged (7eba5c9d "[ALSA] hda-codec - Check PINCAP only for PIN widgets") but it seems that setting CONFIG_SND_HDA_POWER_SAVE can cause the "azx_get_response timeout, switching to polling mode" message sometimes too. However according to Takashi this is really just a cosmetic problem -- polling mode is not so bad. - R. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 9:42 ` Andrew Morton 2007-12-08 18:57 ` Roland Dreier @ 2007-12-08 19:40 ` Theodore Tso 2007-12-08 19:55 ` Ingo Molnar 2007-12-08 22:30 ` Rafael J. Wysocki 1 sibling, 2 replies; 74+ messages in thread From: Theodore Tso @ 2007-12-08 19:40 UTC (permalink / raw) To: Andrew Morton Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, Roland Dreier, Takashi Iwai On Sat, Dec 08, 2007 at 01:42:41AM -0800, Andrew Morton wrote: > > Subject : snd_hda_intel 2.6.24-rc2 bug: interrupts don't always work on Lenovo X60s > > Submitter : Roland Dreier <rdreier@cisco.com> > > References : http://lkml.org/lkml/2007/11/8/255 > > http://bugzilla.kernel.org/show_bug.cgi?id=9332 > > Handled-By : > > Patch : > > Takashi had a patch and that has been merged. AFAIK this regression > has been fixed and we're left with a new but harmless warning. > > However Roland reported other problems and it appears that the trail went > cold (http://lkml.org/lkml/2007/11/14/251) > > Ted was hitting some of the same problems but that trail appears to also > have gone cold (http://lkml.org/lkml/2007/11/23/17). Actually, not gone cold, but I stopped posting about it because it's been solved and I thought agreement had been reached that it should be pushed to mainline before 2.6.25. I am very happily running with Ingo's "snd hda suspend latency: shorten codec read" patch, which was originally intended to speed up resuming from hibernation, but which as I discovered, also has the nice side effect of eliminating the reported error. On 11/23, Takashi replied to my note (http://lkml.org/lkml/2007/11/23/17) and suggested that Jaroslav push this patch to Linus immediately instead of waiting for 2.6.25, since it appearly solves two problems with one stone. However, I just checked, as of Linus's public, and Ingo's patch is *not* in mainline. However, as far as I am concerned, Ingo's patch, first posted to LKML here: http://lkml.org/lkml/2007/11/16/66 should be listed as fixing the above regression. Rafael, could you please make a note of this in your regression list, and could we please get this patch pushed into mainline? Thanks!! - Ted ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 19:40 ` Theodore Tso @ 2007-12-08 19:55 ` Ingo Molnar 2007-12-08 22:30 ` Rafael J. Wysocki 1 sibling, 0 replies; 74+ messages in thread From: Ingo Molnar @ 2007-12-08 19:55 UTC (permalink / raw) To: Theodore Tso, Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds, Roland Dreier, Takashi Iwai * Theodore Tso <tytso@mit.edu> wrote: > I am very happily running with Ingo's "snd hda suspend latency: > shorten codec read" patch, which was originally intended to speed up > resuming from hibernation, but which as I discovered, also has the > nice side effect of eliminating the reported error. > > On 11/23, Takashi replied to my note > (http://lkml.org/lkml/2007/11/23/17) and suggested that Jaroslav push > this patch to Linus immediately instead of waiting for 2.6.25, since > it appearly solves two problems with one stone. However, I just > checked, as of Linus's public, and Ingo's patch is *not* in mainline. > > However, as far as I am concerned, Ingo's patch, first posted to LKML > here: http://lkml.org/lkml/2007/11/16/66 should be listed as fixing > the above regression. Rafael, could you please make a note of this in > your regression list, and could we please get this patch pushed into > mainline? ha! I'd never have expected _that_ to happen. Cool. Fixing a driver bug by accident :-) Ingo ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 19:40 ` Theodore Tso 2007-12-08 19:55 ` Ingo Molnar @ 2007-12-08 22:30 ` Rafael J. Wysocki 2007-12-09 2:15 ` Theodore Tso 1 sibling, 1 reply; 74+ messages in thread From: Rafael J. Wysocki @ 2007-12-08 22:30 UTC (permalink / raw) To: Theodore Tso Cc: Andrew Morton, LKML, Linus Torvalds, Ingo Molnar, Roland Dreier, Takashi Iwai On Saturday, 8 of December 2007, Theodore Tso wrote: > On Sat, Dec 08, 2007 at 01:42:41AM -0800, Andrew Morton wrote: > > > Subject : snd_hda_intel 2.6.24-rc2 bug: interrupts don't always work on Lenovo X60s > > > Submitter : Roland Dreier <rdreier@cisco.com> > > > References : http://lkml.org/lkml/2007/11/8/255 > > > http://bugzilla.kernel.org/show_bug.cgi?id=9332 > > > Handled-By : > > > Patch : > > > > Takashi had a patch and that has been merged. AFAIK this regression > > has been fixed and we're left with a new but harmless warning. > > > > However Roland reported other problems and it appears that the trail went > > cold (http://lkml.org/lkml/2007/11/14/251) > > > > Ted was hitting some of the same problems but that trail appears to also > > have gone cold (http://lkml.org/lkml/2007/11/23/17). > > Actually, not gone cold, but I stopped posting about it because it's > been solved and I thought agreement had been reached that it should be > pushed to mainline before 2.6.25. > > I am very happily running with Ingo's "snd hda suspend latency: > shorten codec read" patch, which was originally intended to speed up > resuming from hibernation, but which as I discovered, also has the > nice side effect of eliminating the reported error. > > On 11/23, Takashi replied to my note (http://lkml.org/lkml/2007/11/23/17) > and suggested that Jaroslav push this patch to Linus immediately > instead of waiting for 2.6.25, since it appearly solves two problems > with one stone. However, I just checked, as of Linus's public, and > Ingo's patch is *not* in mainline. > > However, as far as I am concerned, Ingo's patch, first posted to LKML > here: http://lkml.org/lkml/2007/11/16/66 should be listed as fixing > the above regression. Rafael, could you please make a note of this in > your regression list, Done, thanks. > and could we please get this patch pushed into mainline? ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 22:30 ` Rafael J. Wysocki @ 2007-12-09 2:15 ` Theodore Tso 2007-12-13 10:49 ` Takashi Iwai 0 siblings, 1 reply; 74+ messages in thread From: Theodore Tso @ 2007-12-09 2:15 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Andrew Morton, LKML, Linus Torvalds, Ingo Molnar, Roland Dreier, Takashi Iwai On Sat, Dec 08, 2007 at 11:30:53PM +0100, Rafael J. Wysocki wrote: > On Saturday, 8 of December 2007, Theodore Tso wrote: > > However, as far as I am concerned, Ingo's patch, first posted to LKML > > here: http://lkml.org/lkml/2007/11/16/66 should be listed as fixing > > the above regression. Rafael, could you please make a note of this in > > your regression list, > > Done, thanks. Great, thanks. I should add that technically this wasn't a regression since I had been seeing this since before 2.6.23. Also, it isn't a big deal, since aside from noise in the syslog, falling back to polling more doesn't make any functional or user-visible difference (although I guess it's less efficient). Regardless of whether it is a regression, it would be nice to get the patch applied and and this issue fixed for 2.6.25! - Ted ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-09 2:15 ` Theodore Tso @ 2007-12-13 10:49 ` Takashi Iwai 2007-12-20 15:42 ` Takashi Iwai 0 siblings, 1 reply; 74+ messages in thread From: Takashi Iwai @ 2007-12-13 10:49 UTC (permalink / raw) To: Theodore Tso Cc: Rafael J. Wysocki, Andrew Morton, LKML, Linus Torvalds, Ingo Molnar, Roland Dreier, perex [Sorry for the late response as I've been on vacation] At Sat, 8 Dec 2007 21:15:44 -0500, Theodore Tso wrote: > > On Sat, Dec 08, 2007 at 11:30:53PM +0100, Rafael J. Wysocki wrote: > > On Saturday, 8 of December 2007, Theodore Tso wrote: > > > However, as far as I am concerned, Ingo's patch, first posted to LKML > > > here: http://lkml.org/lkml/2007/11/16/66 should be listed as fixing > > > the above regression. Rafael, could you please make a note of this in > > > your regression list, > > > > Done, thanks. > > Great, thanks. I should add that technically this wasn't a regression > since I had been seeing this since before 2.6.23. Also, it isn't a > big deal, since aside from noise in the syslog, falling back to > polling more doesn't make any functional or user-visible difference > (although I guess it's less efficient). > > Regardless of whether it is a regression, it would be nice to get the > patch applied and and this issue fixed for 2.6.25! You mean 2.6.24 ? ;-) Yes, if it solves the problem, not only improves the latency, it's definitely nice to have now. I was just too conservative to mark it for 2.6.24 merge although it looks safe. Jaroslav, could you prepare this for the push? It corresponds to alsa-kernel HG changeset 5557. thanks, Takashi ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-13 10:49 ` Takashi Iwai @ 2007-12-20 15:42 ` Takashi Iwai 0 siblings, 0 replies; 74+ messages in thread From: Takashi Iwai @ 2007-12-20 15:42 UTC (permalink / raw) To: perex Cc: Theodore Tso, Rafael J. Wysocki, Andrew Morton, LKML, Linus Torvalds, Ingo Molnar, Roland Dreier At Thu, 13 Dec 2007 11:49:51 +0100, I wrote: > > [Sorry for the late response as I've been on vacation] > > At Sat, 8 Dec 2007 21:15:44 -0500, > Theodore Tso wrote: > > > > On Sat, Dec 08, 2007 at 11:30:53PM +0100, Rafael J. Wysocki wrote: > > > On Saturday, 8 of December 2007, Theodore Tso wrote: > > > > However, as far as I am concerned, Ingo's patch, first posted to LKML > > > > here: http://lkml.org/lkml/2007/11/16/66 should be listed as fixing > > > > the above regression. Rafael, could you please make a note of this in > > > > your regression list, > > > > > > Done, thanks. > > > > Great, thanks. I should add that technically this wasn't a regression > > since I had been seeing this since before 2.6.23. Also, it isn't a > > big deal, since aside from noise in the syslog, falling back to > > polling more doesn't make any functional or user-visible difference > > (although I guess it's less efficient). > > > > Regardless of whether it is a regression, it would be nice to get the > > patch applied and and this issue fixed for 2.6.25! > > You mean 2.6.24 ? ;-) > > Yes, if it solves the problem, not only improves the latency, it's > definitely nice to have now. I was just too conservative to mark it > for 2.6.24 merge although it looks safe. > > Jaroslav, could you prepare this for the push? It corresponds to > alsa-kernel HG changeset 5557. Jaroslav, what about this now? Takashi ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 2:40 Rafael J. Wysocki ` (3 preceding siblings ...) 2007-12-08 9:42 ` Andrew Morton @ 2007-12-08 9:46 ` Andrew Morton 2007-12-08 15:49 ` Alan Stern 2007-12-08 9:52 ` Andrew Morton ` (3 subsequent siblings) 8 siblings, 1 reply; 74+ messages in thread From: Andrew Morton @ 2007-12-08 9:46 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: LKML, Linus Torvalds, Ingo Molnar, Alan Stern On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > This message contains a list of some regressions from 2.6.23 which have been > reported since 2.6.24-rc1 was released and for which there are no fixes in the > mainline that I know of. If any of them have been fixed already, please let me > know. > > If you know of any other unresolved regressions from 2.6.23, please let me know > either and I'll add them to the list. > > > .. > > Subject : system hangs after a few minutes > Submitter : Marcus Better <marcus@better.se> > References : http://bugzilla.kernel.org/show_bug.cgi?id=9335 > Handled-By : Andrew Morton <akpm@linux-foundation.org> > Alan Stern <stern@rowland.harvard.edu> > Patch : http://bugzilla.kernel.org/attachment.cgi?id=13871&action=view > This one we have a confirmed fix from Alan but it doesn't appear to be in anyone's tree. There is a second bug in here, applicable to core x86: Marcus's machine won't boot with nmi_watchdog=1. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 9:46 ` Andrew Morton @ 2007-12-08 15:49 ` Alan Stern 0 siblings, 0 replies; 74+ messages in thread From: Alan Stern @ 2007-12-08 15:49 UTC (permalink / raw) To: Andrew Morton; +Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar On Sat, 8 Dec 2007, Andrew Morton wrote: > On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > This message contains a list of some regressions from 2.6.23 which have been > > reported since 2.6.24-rc1 was released and for which there are no fixes in the > > mainline that I know of. If any of them have been fixed already, please let me > > know. > > > > If you know of any other unresolved regressions from 2.6.23, please let me know > > either and I'll add them to the list. > > > > > > .. > > > > Subject : system hangs after a few minutes > > Submitter : Marcus Better <marcus@better.se> > > References : http://bugzilla.kernel.org/show_bug.cgi?id=9335 > > Handled-By : Andrew Morton <akpm@linux-foundation.org> > > Alan Stern <stern@rowland.harvard.edu> > > Patch : http://bugzilla.kernel.org/attachment.cgi?id=13871&action=view > > > > This one we have a confirmed fix from Alan but it doesn't appear to be in > anyone's tree. An expanded version of that fix is in Greg's queue: http://marc.info/?l=linux-usb-devel&m=119697043410947&w=2 Since he's away until Tuesday, nothing will happen for a few days. However you might want to replace the old fix that got added to -mm. > There is a second bug in here, applicable to core x86: Marcus's machine > won't boot with nmi_watchdog=1. Alan Stern ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 2:40 Rafael J. Wysocki ` (4 preceding siblings ...) 2007-12-08 9:46 ` Andrew Morton @ 2007-12-08 9:52 ` Andrew Morton 2007-12-09 7:00 ` Tejun Heo 2007-12-08 10:44 ` Richard Purdie ` (2 subsequent siblings) 8 siblings, 1 reply; 74+ messages in thread From: Andrew Morton @ 2007-12-08 9:52 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: LKML, Linus Torvalds, Ingo Molnar, Tejun Heo On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > This message contains a list of some regressions from 2.6.23 which have been > reported since 2.6.24-rc1 was released and for which there are no fixes in the > mainline that I know of. If any of them have been fixed already, please let me > know. > > If you know of any other unresolved regressions from 2.6.23, please let me know > either and I'll add them to the list. > > ... > > Subject : cd/dvd inaccessible in 2.6.24-rc2 > Submitter : Will Trives <will@trivescon.com.au> > References : http://lkml.org/lkml/2007/11/9/290 > http://bugzilla.kernel.org/show_bug.cgi?id=9346 > Handled-By : Len Brown <lenb@kernel.org> > Tejun Heo <htejun@gmail.com> > Patch : > Nasty one. Tejun and several diligent reporters are doing sterling work there and things have improved. I don't know whether any of Tejun's patches have been merged yet, but we'll probably be OK on this one. What is unclear (to me) is what actually caused those people's machines to break? ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 9:52 ` Andrew Morton @ 2007-12-09 7:00 ` Tejun Heo 2007-12-09 13:42 ` Alan Cox 0 siblings, 1 reply; 74+ messages in thread From: Tejun Heo @ 2007-12-09 7:00 UTC (permalink / raw) To: Andrew Morton Cc: Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar, Alan Cox Hello, (cc'ing Alan) Andrew Morton wrote: >> Subject : cd/dvd inaccessible in 2.6.24-rc2 >> Submitter : Will Trives <will@trivescon.com.au> >> References : http://lkml.org/lkml/2007/11/9/290 >> http://bugzilla.kernel.org/show_bug.cgi?id=9346 >> Handled-By : Len Brown <lenb@kernel.org> >> Tejun Heo <htejun@gmail.com> >> Patch : >> > > Nasty one. Tejun and several diligent reporters are doing sterling > work there and things have improved. I don't know whether any of > Tejun's patches have been merged yet, but we'll probably be OK on > this one. I'm still trying to find out what's really going on. That drive is quite peculiar. > What is unclear (to me) is what actually caused those people's machines to > break? It's introduced by setting ATAPI transfer chunk size to actual transfer size which is the right thing to do generally. However, with the change, the ATAPI HSM should be ready to drain full extra transfer chunks which libata HSM wasn't doing. With that part changed, most regressions should go away. Unfortunately, simply adding that doesn't fix the case in bug 9346 and I'm still trying to find out why. The good news is that the drive works fine with proposed more extensive improvements to libata ATAPI which will probably be included into 2.6.25, so we at least have long term solution. If we fail to find out the solution in time, we always have the alternative of backing out the ATAPI transfer chunk size update. This will break some other cases which were fixed by the change but those won't be regressions at least and we can add transfer chunk size update with other changes to 2.6.25. Thanks. -- tejun ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-09 7:00 ` Tejun Heo @ 2007-12-09 13:42 ` Alan Cox 2007-12-09 15:09 ` Tejun Heo ` (2 more replies) 0 siblings, 3 replies; 74+ messages in thread From: Alan Cox @ 2007-12-09 13:42 UTC (permalink / raw) To: Tejun Heo Cc: Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar > If we fail to find out the solution in time, we always have the > alternative of backing out the ATAPI transfer chunk size update. This Which will break far more controllers and drives than it fixes, so backing it out is nonsensical and not in the general good. > will break some other cases which were fixed by the change but those > won't be regressions at least and we can add transfer chunk size > update with other changes to 2.6.25. Great, make everyone else wait another three months for a working CD drive. The one off regression appears far less harmful than a revert. Tejun - instead of backing out important updates for 2.6.24 we should just blacklist that specific drive for now and sort it nicely in 2.6.25, not revert stuff and break everyone elses ATAPI devices. Alan ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-09 13:42 ` Alan Cox @ 2007-12-09 15:09 ` Tejun Heo 2007-12-09 15:25 ` Alan Cox 2007-12-09 18:36 ` Linus Torvalds 2007-12-09 18:41 ` Linus Torvalds 2 siblings, 1 reply; 74+ messages in thread From: Tejun Heo @ 2007-12-09 15:09 UTC (permalink / raw) To: Alan Cox Cc: Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar Hello, Alan. Alan Cox wrote: >> will break some other cases which were fixed by the change but those >> won't be regressions at least and we can add transfer chunk size >> update with other changes to 2.6.25. > > Great, make everyone else wait another three months for a working CD > drive. The one off regression appears far less harmful than a revert. Newly broken ones will be regressions. How many do we fix by the change? On SATA, setting the correct transfer chunk size doesn't seem to fix many. > Tejun - instead of backing out important updates for 2.6.24 we should > just blacklist that specific drive for now and sort it nicely in 2.6.25, > not revert stuff and break everyone elses ATAPI devices. We'll need to blacklist setting transfer chunk size, eek, and let's leave that as the last resort and hope that we find the solution soon. Blacklist takes time to develop and temporary blacklist for just one release doesn't sound like a good idea. -- tejun ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-09 15:09 ` Tejun Heo @ 2007-12-09 15:25 ` Alan Cox 2007-12-09 15:39 ` Tejun Heo 0 siblings, 1 reply; 74+ messages in thread From: Alan Cox @ 2007-12-09 15:25 UTC (permalink / raw) To: Tejun Heo Cc: Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar > Newly broken ones will be regressions. How many do we fix by the > change? On SATA, setting the correct transfer chunk size doesn't seem > to fix many. Regressions are not some kind of grand evil. Better to regress the odd device than continue to break entire controllers. > > Tejun - instead of backing out important updates for 2.6.24 we should > > just blacklist that specific drive for now and sort it nicely in 2.6.25, > > not revert stuff and break everyone elses ATAPI devices. > > We'll need to blacklist setting transfer chunk size, eek, and let's > leave that as the last resort and hope that we find the solution soon. > Blacklist takes time to develop and temporary blacklist for just one > release doesn't sound like a good idea. It seems to be sensible to me *if* it is just this one device we are somehow confusing and that one device is holding up fixing everything else. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-09 15:25 ` Alan Cox @ 2007-12-09 15:39 ` Tejun Heo 0 siblings, 0 replies; 74+ messages in thread From: Tejun Heo @ 2007-12-09 15:39 UTC (permalink / raw) To: Alan Cox Cc: Andrew Morton, Rafael J. Wysocki, LKML, Linus Torvalds, Ingo Molnar Alan Cox wrote: >> Newly broken ones will be regressions. How many do we fix by the >> change? On SATA, setting the correct transfer chunk size doesn't seem >> to fix many. > > Regressions are not some kind of grand evil. Better to regress the odd > device than continue to break entire controllers. We need to put more weight on regressions as it at least makes releases predictable to users. Anyways, I wasn't saying it was some absolute maxim. I was literally asking how many so that we can evaluate the trade off. >>> Tejun - instead of backing out important updates for 2.6.24 we should >>> just blacklist that specific drive for now and sort it nicely in 2.6.25, >>> not revert stuff and break everyone elses ATAPI devices. >> We'll need to blacklist setting transfer chunk size, eek, and let's >> leave that as the last resort and hope that we find the solution soon. >> Blacklist takes time to develop and temporary blacklist for just one >> release doesn't sound like a good idea. > > It seems to be sensible to me *if* it is just this one device we are > somehow confusing and that one device is holding up fixing everything > else. Yeah, if it's this one device, I fully agree. Let's see how debugging turns out. -- tejun ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-09 13:42 ` Alan Cox 2007-12-09 15:09 ` Tejun Heo @ 2007-12-09 18:36 ` Linus Torvalds 2007-12-09 21:54 ` Alan Cox 2007-12-09 18:41 ` Linus Torvalds 2 siblings, 1 reply; 74+ messages in thread From: Linus Torvalds @ 2007-12-09 18:36 UTC (permalink / raw) To: Alan Cox; +Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar On Sun, 9 Dec 2007, Alan Cox wrote: > > > If we fail to find out the solution in time, we always have the > > alternative of backing out the ATAPI transfer chunk size update. This > > Which will break far more controllers and drives than it fixes, so > backing it out is nonsensical and not in the general good. No. Regressions are worse. It doesn't matter AT ALL if you think that it breaks ten times more devices, if it's a regression and those devices didn't work in the past, they simply DO NOT COUNT. Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-09 18:36 ` Linus Torvalds @ 2007-12-09 21:54 ` Alan Cox 0 siblings, 0 replies; 74+ messages in thread From: Alan Cox @ 2007-12-09 21:54 UTC (permalink / raw) To: Linus Torvalds Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar > Regressions are worse. It doesn't matter AT ALL if you think that it > breaks ten times more devices, if it's a regression and those devices > didn't work in the past, they simply DO NOT COUNT. Must be time for an -ac tree again. Alan ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-09 13:42 ` Alan Cox 2007-12-09 15:09 ` Tejun Heo 2007-12-09 18:36 ` Linus Torvalds @ 2007-12-09 18:41 ` Linus Torvalds 2007-12-09 22:01 ` Alan Cox 2 siblings, 1 reply; 74+ messages in thread From: Linus Torvalds @ 2007-12-09 18:41 UTC (permalink / raw) To: Alan Cox; +Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar On Sun, 9 Dec 2007, Alan Cox wrote: > > Great, make everyone else wait another three months for a working CD > drive. The one off regression appears far less harmful than a revert. Btw, Alan, that "math" is total and utter BULLSH*T, and you should know that. "The one off regression" is likely the tip of an iceberg. If something regresses for one person, for that one person who tested and noticed and made a bug-report, there's probably a thousand people who haven't even tested the development kernel, or who had problems and just went back to the previous version. In contrast, reverting something will be guaranteed to not have those kinds of issues, since the only people who could notice are people for who it never worked in the first place. There's no "silent mass of people" that can be affected. This is why regressions are so important. They don't trump _everything_, but basically ignoring and letting them slide is *much* more painful than just reverting it. The biggest reason to ignore a regression is if nobody can even figure out where it came from, or reverting simply isn't an option for some really deep and fundamental issue. That doesn't seem to be the case here. So we should revert unless there is some known acceptable real fix. Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-09 18:41 ` Linus Torvalds @ 2007-12-09 22:01 ` Alan Cox 2007-12-09 22:51 ` Ray Lee 2007-12-10 1:57 ` Linus Torvalds 0 siblings, 2 replies; 74+ messages in thread From: Alan Cox @ 2007-12-09 22:01 UTC (permalink / raw) To: Linus Torvalds Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar > Btw, Alan, that "math" is total and utter BULLSH*T, and you should know > that. The one off regression is probably not one off, but this is IDE so actually its quite probable its a single broken firmware. The alternative is that you cripple just about every user of various other standards compliant devices and controllers whose hardware we finally fixed. Finally you need to remember that the 'regression' is caused by the fact we now do the _right_ thing both in terms of 'old IDE' and specs. Believe it or not I did actually think in quite some detail about this case, and the relative probabilities, and go back and re-review the old IDE code (whose behaviour we now follow) and the spec. I spend a measurable amount of my time reviewing code and weighing risks, regressions and progress for an enterprise Linux vendor, so its something I do every day of the week. To blindly argue regressions are critical is sometimes (as in this case) to argue that "this freeway is no longer compatible with a horse and cart" means the freeway should be turned back into a dirt road. The horse and cart happened to work by chance because the road was quiet that day. We clearly need to add a horse & cart lane in the long term, but for 2.6.24 it may well be the right thing to do just to blacklist that specific drive back to old behaviour until we can tidy it more nicely. Alan ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-09 22:01 ` Alan Cox @ 2007-12-09 22:51 ` Ray Lee 2007-12-10 1:57 ` Linus Torvalds 1 sibling, 0 replies; 74+ messages in thread From: Ray Lee @ 2007-12-09 22:51 UTC (permalink / raw) To: Alan Cox Cc: Linus Torvalds, Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar On Dec 9, 2007 2:01 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: > > Btw, Alan, that "math" is total and utter BULLSH*T, and you should know > > that. > > To blindly argue regressions are critical is sometimes (as in this case) > to argue that "this freeway is no longer compatible with a horse and > cart" means the freeway should be turned back into a dirt road. Honest question: If you allow regressions, then how does one guarantee forward progress? (If it were a finite set of systems, all within one group's control, then the answer is simple: count how many work. However, in this case we only have a statistical sampling available to us.) Ray ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-09 22:01 ` Alan Cox 2007-12-09 22:51 ` Ray Lee @ 2007-12-10 1:57 ` Linus Torvalds 2007-12-10 3:28 ` Alan Cox ` (2 more replies) 1 sibling, 3 replies; 74+ messages in thread From: Linus Torvalds @ 2007-12-10 1:57 UTC (permalink / raw) To: Alan Cox; +Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar On Sun, 9 Dec 2007, Alan Cox wrote: > > The one off regression is probably not one off, but this is IDE so > actually its quite probable its a single broken firmware. > > The alternative is that you cripple just about every user of various > other standards compliant devices and controllers whose hardware we > finally fixed. Alan, you're so full of shit that it's not even funny. Have you even *read* the thread? Tejun already reported that this apparently gets fixed _properly_ with the more extensive cleanups and fixes that are pending for 2.6.25. In other words, the stuff you call so critically important (yet we've been able to live without it until now!) is apparently simply NOT YET READY. It's breaking things. In this case, Tejun seems to be right on the money. I also agree 100% with him when he says "Blacklist takes time to develop and temporary blacklist for just one release doesn't sound like a good idea." because if we create some blacklist for that one reported device, not only is it likely going to be wrong (it's almost never just one firmware or one chip that has a particular issue), but we tend to create thee blacklists and later realize that we shouldn't have blacklisted things at all, we should just have done things differently. For examples of that, see the NCQ blacklist that was just _us_ doing things wrong (over-reacting to things we shouldn't care about), and there's currently another totally unrelated discussion on a very similar thing wrt libata and the ACPI startup commands for an unused controller port. > Finally you need to remember that the 'regression' is caused by the fact > we now do the _right_ thing both in terms of 'old IDE' and specs. .. and what the hell does that matter? If the code doesn't work, it doesn't work, and you might as well point to some random scribblings done by a three-year-old on toilet paper rather than any "specs". Real life matters more. Regressions matter more. We apparently do have a full fix, but it seems to be too invasive for 2.6.24, which means that the thing that currently DOES NOT WORK and causes regressions should be reverted, so that 2.6.24 is at least no worse than 2.6.23 (and all earlier kernels) in this respect. And then we should just hope that the more complete fix that Tejun has doesn't cause any issues on its own. I would suggest that if you care so deeply about this issue, you press Fedora into putting Tejun's tree into Fedora testing, and get that thing tested out extensively. So the fact is, we have a way forward, but we should *not* take steps backwards just because you want to push something out that isn't quite ready. We should revert the change that causes the current trouble, safe in the knowledge (or at least "strong hope") that we have a way forward that makes *both* 2.6.24 and 2.6.25 be continual improvements. We used to allow regressions. It was really painful. It's hard to debug things when things sometimes break. It's much better to have a nice constant monotonic improvement. It's better for users, but it's much better also for developers, even if you may be frustrated right now because some new code effectively gets shut down until it works for everybody. Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-10 1:57 ` Linus Torvalds @ 2007-12-10 3:28 ` Alan Cox 2007-12-10 3:38 ` Alan Cox 2007-12-10 8:21 ` Ingo Molnar 2 siblings, 0 replies; 74+ messages in thread From: Alan Cox @ 2007-12-10 3:28 UTC (permalink / raw) To: Linus Torvalds Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar Its your kernel. Its your call, and your privilege to be wrong. And anyone with ATAPI problems should probably test the -mm tree before reporting anything. Alan ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-10 1:57 ` Linus Torvalds 2007-12-10 3:28 ` Alan Cox @ 2007-12-10 3:38 ` Alan Cox 2007-12-10 15:38 ` Linus Torvalds 2007-12-10 8:21 ` Ingo Molnar 2 siblings, 1 reply; 74+ messages in thread From: Alan Cox @ 2007-12-10 3:38 UTC (permalink / raw) To: Linus Torvalds Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar > Have you even *read* the thread? In detail, as it unfolds and while testing variants of Tejun's code on the hardware I have access to - none of which has this bug making it rather trickier to help. > In other words, the stuff you call so critically important (yet we've been > able to live without it until now!) is apparently simply NOT YET READY. > It's breaking things. And as I keep pointing out but you keep ignoring - not doing it breaks even more things, by a factor of quite a lot. > .. and what the hell does that matter? If the code doesn't work, it > doesn't work, and you might as well point to some random scribblings done > by a three-year-old on toilet paper rather than any "specs". The code without the changes doesn't work either. So pick your toilet paper.. by your argument both are toilet paper. > causes regressions should be reverted, so that 2.6.24 is at least no worse > than 2.6.23 (and all earlier kernels) in this respect. Which as the distro bug lists for ATAPI will tell you - aint good. Still distro vendors can ship patches. > We used to allow regressions. It was really painful. It's hard to debug > things when things sometimes break. It's much better to have a nice > constant monotonic improvement. Linus, the kernel regresses all over the place every release. If it didn't do that you'd never get any changes in. Your kernel would fossilize like RHEL or SLES and you'd be spending weeks analysing each changeset for possible side effects, or - as happens by neccessity - adding code paths so a fix vital to one driver ceases to share core code with another driver - to reduce regression risk. Been there, done that and its not the way progress happens. > It's better for users, but it's much better also for developers, even if > you may be frustrated right now because some new code effectively gets > shut down until it works for everybody. Have fun. I trust you'll be fixing the other 11 I think it was listed regressions before 2.6.24 - or backing out every changeset that could be responsible ? No I thought not - because that wouldn't be sensible either. Alan ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-10 3:38 ` Alan Cox @ 2007-12-10 15:38 ` Linus Torvalds 0 siblings, 0 replies; 74+ messages in thread From: Linus Torvalds @ 2007-12-10 15:38 UTC (permalink / raw) To: Alan Cox; +Cc: Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML, Ingo Molnar On Mon, 10 Dec 2007, Alan Cox wrote: > > And as I keep pointing out but you keep ignoring - not doing it breaks > even more things, by a factor of quite a lot. But we've never done it before in libata, right? So the "not doing it breaks" argument is about stuff that isn't regressions. Can you really not see the difference? Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-10 1:57 ` Linus Torvalds 2007-12-10 3:28 ` Alan Cox 2007-12-10 3:38 ` Alan Cox @ 2007-12-10 8:21 ` Ingo Molnar 2007-12-10 8:27 ` Tejun Heo 2 siblings, 1 reply; 74+ messages in thread From: Ingo Molnar @ 2007-12-10 8:21 UTC (permalink / raw) To: Linus Torvalds Cc: Alan Cox, Tejun Heo, Andrew Morton, Rafael J. Wysocki, LKML * Linus Torvalds <torvalds@linux-foundation.org> wrote: > Tejun already reported that this apparently gets fixed _properly_ with > the more extensive cleanups and fixes that are pending for 2.6.25. btw., how extensive are those cleanups and fixes in reality, is there a rollup somewhere one could take a look at? Those fixes and cleanups were deferred to v2.6.25 in the knowledge of having the current code included in v2.6.24 - but now that the current approach seems to regress, maybe those cleanups are still safe enough. (compared to an outright revert) Ingo ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-10 8:21 ` Ingo Molnar @ 2007-12-10 8:27 ` Tejun Heo 2007-12-10 8:41 ` Ingo Molnar 0 siblings, 1 reply; 74+ messages in thread From: Tejun Heo @ 2007-12-10 8:27 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, Alan Cox, Andrew Morton, Rafael J. Wysocki, LKML Ingo Molnar wrote: > * Linus Torvalds <torvalds@linux-foundation.org> wrote: > >> Tejun already reported that this apparently gets fixed _properly_ with >> the more extensive cleanups and fixes that are pending for 2.6.25. > > btw., how extensive are those cleanups and fixes in reality, is there a > rollup somewhere one could take a look at? Those fixes and cleanups were > deferred to v2.6.25 in the knowledge of having the current code included > in v2.6.24 - but now that the current approach seems to regress, maybe > those cleanups are still safe enough. (compared to an outright revert) The following git tree contains patches pending review for 2.6.25. http://git.kernel.org/?p=linux/kernel/git/tj/libata-dev.git;a=shortlog;h=improve-ATAPI-data-transfer-no-pio And we're getting close to fixing the regression. I don't think there's too much worry about this one. Just need a bit more time to test few more things. Thanks. -- tejun ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-10 8:27 ` Tejun Heo @ 2007-12-10 8:41 ` Ingo Molnar 0 siblings, 0 replies; 74+ messages in thread From: Ingo Molnar @ 2007-12-10 8:41 UTC (permalink / raw) To: Tejun Heo Cc: Linus Torvalds, Alan Cox, Andrew Morton, Rafael J. Wysocki, LKML * Tejun Heo <htejun@gmail.com> wrote: > The following git tree contains patches pending review for 2.6.25. > > http://git.kernel.org/?p=linux/kernel/git/tj/libata-dev.git;a=shortlog;h=improve-ATAPI-data-transfer-no-pio > > And we're getting close to fixing the regression. I don't think > there's too much worry about this one. Just need a bit more time to > test few more things. ah, i see, the joys of the kernel running BIOS written code (AML): http://bugzilla.kernel.org/attachment.cgi?id=13932&action=view cute! Ingo ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 2:40 Rafael J. Wysocki ` (5 preceding siblings ...) 2007-12-08 9:52 ` Andrew Morton @ 2007-12-08 10:44 ` Richard Purdie 2007-12-08 22:32 ` Rafael J. Wysocki 2007-12-09 11:54 ` Andrew Morton 2007-12-10 20:42 ` Ingo Molnar 8 siblings, 1 reply; 74+ messages in thread From: Richard Purdie @ 2007-12-08 10:44 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: LKML, Andrew Morton, Linus Torvalds, Ingo Molnar On Sat, 2007-12-08 at 03:40 +0100, Rafael J. Wysocki wrote: > Subject : leds: ledtrig-timer calls sleeping function from invalid context > Submitter : Márton Németh <nm127@freemail.hu> > References : http://bugzilla.kernel.org/show_bug.cgi?id=9264 > Handled-By : Richard Purdie <rpurdie@rpsys.net> > Patch : http://bugzilla.kernel.org/attachment.cgi?id=13493&action=view The fix is now in mainline: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=dc47206e552c0850ad11f7e9a1fca0a3c92f5d65 Cheers, Richard ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 10:44 ` Richard Purdie @ 2007-12-08 22:32 ` Rafael J. Wysocki 0 siblings, 0 replies; 74+ messages in thread From: Rafael J. Wysocki @ 2007-12-08 22:32 UTC (permalink / raw) To: Richard Purdie; +Cc: LKML, Andrew Morton, Linus Torvalds, Ingo Molnar On Saturday, 8 of December 2007, Richard Purdie wrote: > On Sat, 2007-12-08 at 03:40 +0100, Rafael J. Wysocki wrote: > > Subject : leds: ledtrig-timer calls sleeping function from invalid context > > Submitter : Márton Németh <nm127@freemail.hu> > > References : http://bugzilla.kernel.org/show_bug.cgi?id=9264 > > Handled-By : Richard Purdie <rpurdie@rpsys.net> > > Patch : http://bugzilla.kernel.org/attachment.cgi?id=13493&action=view > > The fix is now in mainline: > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=dc47206e552c0850ad11f7e9a1fca0a3c92f5d65 Yes, already dropped. Thanks, Rafael ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 2:40 Rafael J. Wysocki ` (6 preceding siblings ...) 2007-12-08 10:44 ` Richard Purdie @ 2007-12-09 11:54 ` Andrew Morton 2007-12-09 12:05 ` Ingo Molnar 2007-12-09 14:24 ` Rafael J. Wysocki 2007-12-10 20:42 ` Ingo Molnar 8 siblings, 2 replies; 74+ messages in thread From: Andrew Morton @ 2007-12-09 11:54 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: LKML, Linus Torvalds, Ingo Molnar On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > This message contains a list of some regressions from 2.6.23 which have been > reported since 2.6.24-rc1 was released and for which there are no fixes in the > mainline that I know of. Here's one for you - I have a new Lenovo t61p with which to irritate everyone. suspend-to-ram is a wipeout, but suspend-to-disk works OK under 2.6.23. However under 2.6.24-rc1 and -rc4 the machine reboots right at the end of resume-from-disk. Am trying to do a git-disect on it but it seems that someone has been screwing with ata Kconfig and I'm hitting a pile of cant-find-root-disk bisection points and I can't immediately work out why. I'll try to find time to look at it again next week. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-09 11:54 ` Andrew Morton @ 2007-12-09 12:05 ` Ingo Molnar 2007-12-09 14:24 ` Rafael J. Wysocki 1 sibling, 0 replies; 74+ messages in thread From: Ingo Molnar @ 2007-12-09 12:05 UTC (permalink / raw) To: Andrew Morton; +Cc: Rafael J. Wysocki, LKML, Linus Torvalds * Andrew Morton <akpm@linux-foundation.org> wrote: > Am trying to do a git-disect on it but it seems that someone has been > screwing with ata Kconfig and I'm hitting a pile of > cant-find-root-disk bisection points and I can't immediately work out > why. I'll try to find time to look at it again next week. the way i solve such bisection problems is to have the patch like below applied by a "git-bisect run" scriptlet (and popped off after the test). This way all must-have drivers and kernel features are selected for that particular testbox, no matter what Kconfig complication there are. (except outright config option renaming but those are rare) Ingo Index: linux/arch/x86/Kconfig.needed =================================================================== --- /dev/null +++ linux/arch/x86/Kconfig.needed @@ -0,0 +1,88 @@ +config FORCE_MINIMAL_CONFIG + bool + default y + +select EXPERIMENTAL + +select EXT3_FS +select EXT3_FS_XATTR +select EXT3_FS_POSIX_ACL +select EXT3_FS_SECURITY +select BLOCK +select HOTPLUG +#select INOTIFY +#select INOTIFY_USER + +# so that capset() works (sudo, etc.): +select SECURITY +select SECURITY_CAPABILITIES + +select BINFMT_ELF +select MSDOS_PARTITION +select PARTITION_ADVANCED +select BSD_DISKLABEL + +select SYSFS +select SYSFS_DEPRECATED +select PROC_FS +select FUTEX + +select ATA +select SATA_AHCI +select ATA_PIIX +select PATA_AMD +select PATA_OLDPIIX +select BLK_DEV_SD + +select E100 +select E1000 +select NET_ETHERNET +select NET_PCI +select MII +select CRC32 + +select 8139TOO +select FORCEDETH + +select PACKET + +select NETPOLL +select NETCONSOLE +select NET_POLL_CONTROLLER +select INET +select NET +select UNIX +select NETDEVICES + +select SERIAL_8250 +select SERIAL_8250_CONSOLE +select MAGIC_SYSRQ + +select INPUT +select INPUT_MOUSEDEV +select INPUT_POLLDEV +select INPUT_KEYBOARD +select KEYBOARD_ATKBD +select SERIO +select SERIO_I8042 + +select VT +select VT_CONSOLE +select HW_CONSOLE +select VGA_CONSOLE +select EARLY_PRINTK +select PRINTK +select UNIX98_PTYS + +select USB +select USB_MOUSE +select USB_EHCI_HCD +select USB_OHCI_HCD +select USB_UHCI_HCD +select USB_SUPPORT + +select PCI + +select STANDALONE +select PREVENT_FIRMWARE_BUILD + Index: linux/lib/Kconfig =================================================================== --- linux.orig/lib/Kconfig +++ linux/lib/Kconfig @@ -142,3 +142,6 @@ config CHECK_SIGNATURE bool endmenu + +source "arch/x86/Kconfig.needed" + Index: linux/lib/Kconfig.debug ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-09 11:54 ` Andrew Morton 2007-12-09 12:05 ` Ingo Molnar @ 2007-12-09 14:24 ` Rafael J. Wysocki 1 sibling, 0 replies; 74+ messages in thread From: Rafael J. Wysocki @ 2007-12-09 14:24 UTC (permalink / raw) To: Andrew Morton; +Cc: LKML, Linus Torvalds, Ingo Molnar On Sunday, 9 of December 2007, Andrew Morton wrote: > On Sat, 8 Dec 2007 03:40:49 +0100 "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > This message contains a list of some regressions from 2.6.23 which have been > > reported since 2.6.24-rc1 was released and for which there are no fixes in the > > mainline that I know of. > > Here's one for you - I have a new Lenovo t61p with which to irritate > everyone. > > suspend-to-ram is a wipeout, but suspend-to-disk works OK under > 2.6.23. > > However under 2.6.24-rc1 and -rc4 the machine reboots right at the end of > resume-from-disk. It's http://bugzilla.kernel.org/show_bug.cgi?id=9258 , I think. Does it do that if you unload ehci-hcd before the hibernation? ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-08 2:40 Rafael J. Wysocki ` (7 preceding siblings ...) 2007-12-09 11:54 ` Andrew Morton @ 2007-12-10 20:42 ` Ingo Molnar 2007-12-10 20:57 ` Guillaume Chazarain 2007-12-10 20:59 ` Andrew Morton 8 siblings, 2 replies; 74+ messages in thread From: Ingo Molnar @ 2007-12-10 20:42 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: LKML, Andrew Morton, Linus Torvalds * Rafael J. Wysocki <rjw@sisk.pl> wrote: > Subject : jiffies counter leaps in 2.6.24-rc3 > Submitter : Stefano Brivio <stefano.brivio@polimi.it> > References : http://lkml.org/lkml/2007/11/24/53 > http://bugzilla.kernel.org/show_bug.cgi?id=9475 > Handled-By : Ingo Molnar <mingo@elte.hu> > Patch : http://lkml.org/lkml/2007/12/7/132 Linus, Andrew, i need some help deciding what to do about this regression. The fixes for this have been tested and resolve the regression, but they change printk and other code that runs by default and is thus rather invasive so late in the v2.6.24 cycle. This bug should only affect CONFIG_PRINTK_TIME=y kernels (a non-default debug option) - although some claimed effect was on udelay()/mdelay() too. i've attached below the queue of 5 patches that fix this problem. They have been build and boot tested with more than 1000 random kernels in the past few days, so i certainly trust the core and x86 bits of this. what do you think? Right now i've got them queued up for 2.6.25 in both the scheduler-devel and the x86-devel git trees - but can submit them for 2.6.24 if it's better if we did them there. I've got no strong opinion either way. Ingo --------------------> Subject: x86: scale cyc_2_nsec according to CPU frequency From: "Guillaume Chazarain" <guichaz@yahoo.fr> scale the sched_clock() cyc_2_nsec scaling factor according to CPU frequency changes. [ mingo@elte.hu: simplified it and fixed it for SMP. ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- arch/x86/kernel/tsc_32.c | 43 ++++++++++++++++++++++++++++++----- arch/x86/kernel/tsc_64.c | 57 ++++++++++++++++++++++++++++++++++++++--------- include/asm-x86/timer.h | 23 ++++++++++++++---- 3 files changed, 102 insertions(+), 21 deletions(-) Index: linux/arch/x86/kernel/tsc_32.c =================================================================== --- linux.orig/arch/x86/kernel/tsc_32.c +++ linux/arch/x86/kernel/tsc_32.c @@ -5,6 +5,7 @@ #include <linux/jiffies.h> #include <linux/init.h> #include <linux/dmi.h> +#include <linux/percpu.h> #include <asm/delay.h> #include <asm/tsc.h> @@ -80,13 +81,31 @@ EXPORT_SYMBOL_GPL(check_tsc_unstable); * * -johnstul@us.ibm.com "math is hard, lets go shopping!" */ -unsigned long cyc2ns_scale __read_mostly; -#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */ +DEFINE_PER_CPU(unsigned long, cyc2ns); -static inline void set_cyc2ns_scale(unsigned long cpu_khz) +static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu) { - cyc2ns_scale = (1000000 << CYC2NS_SCALE_FACTOR)/cpu_khz; + unsigned long flags, prev_scale, *scale; + unsigned long long tsc_now, ns_now; + + local_irq_save(flags); + sched_clock_idle_sleep_event(); + + scale = &per_cpu(cyc2ns, cpu); + + rdtscll(tsc_now); + ns_now = __cycles_2_ns(tsc_now); + + prev_scale = *scale; + if (cpu_khz) + *scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz; + + /* + * Start smoothly with the new frequency: + */ + sched_clock_idle_wakeup_event(0); + local_irq_restore(flags); } /* @@ -239,7 +258,9 @@ time_cpufreq_notifier(struct notifier_bl ref_freq, freq->new); if (!(freq->flags & CPUFREQ_CONST_LOOPS)) { tsc_khz = cpu_khz; - set_cyc2ns_scale(cpu_khz); + preempt_disable(); + set_cyc2ns_scale(cpu_khz, smp_processor_id()); + preempt_enable(); /* * TSC based sched_clock turns * to junk w/ cpufreq @@ -367,6 +388,8 @@ static inline void check_geode_tsc_relia void __init tsc_init(void) { + int cpu; + if (!cpu_has_tsc || tsc_disable) goto out_no_tsc; @@ -380,7 +403,15 @@ void __init tsc_init(void) (unsigned long)cpu_khz / 1000, (unsigned long)cpu_khz % 1000); - set_cyc2ns_scale(cpu_khz); + /* + * Secondary CPUs do not run through tsc_init(), so set up + * all the scale factors for all CPUs, assuming the same + * speed as the bootup CPU. (cpufreq notifiers will fix this + * up if their speed diverges) + */ + for_each_possible_cpu(cpu) + set_cyc2ns_scale(cpu_khz, cpu); + use_tsc_delay(); /* Check and install the TSC clocksource */ Index: linux/arch/x86/kernel/tsc_64.c =================================================================== --- linux.orig/arch/x86/kernel/tsc_64.c +++ linux/arch/x86/kernel/tsc_64.c @@ -10,6 +10,7 @@ #include <asm/hpet.h> #include <asm/timex.h> +#include <asm/timer.h> static int notsc __initdata = 0; @@ -18,16 +19,48 @@ EXPORT_SYMBOL(cpu_khz); unsigned int tsc_khz; EXPORT_SYMBOL(tsc_khz); -static unsigned int cyc2ns_scale __read_mostly; +/* Accelerators for sched_clock() + * convert from cycles(64bits) => nanoseconds (64bits) + * basic equation: + * ns = cycles / (freq / ns_per_sec) + * ns = cycles * (ns_per_sec / freq) + * ns = cycles * (10^9 / (cpu_khz * 10^3)) + * ns = cycles * (10^6 / cpu_khz) + * + * Then we use scaling math (suggested by george@mvista.com) to get: + * ns = cycles * (10^6 * SC / cpu_khz) / SC + * ns = cycles * cyc2ns_scale / SC + * + * And since SC is a constant power of two, we can convert the div + * into a shift. + * + * We can use khz divisor instead of mhz to keep a better precision, since + * cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits. + * (mathieu.desnoyers@polymtl.ca) + * + * -johnstul@us.ibm.com "math is hard, lets go shopping!" + */ +DEFINE_PER_CPU(unsigned long, cyc2ns); -static inline void set_cyc2ns_scale(unsigned long khz) +static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu) { - cyc2ns_scale = (NSEC_PER_MSEC << NS_SCALE) / khz; -} + unsigned long flags, prev_scale, *scale; + unsigned long long tsc_now, ns_now; -static unsigned long long cycles_2_ns(unsigned long long cyc) -{ - return (cyc * cyc2ns_scale) >> NS_SCALE; + local_irq_save(flags); + sched_clock_idle_sleep_event(); + + scale = &per_cpu(cyc2ns, cpu); + + rdtscll(tsc_now); + ns_now = __cycles_2_ns(tsc_now); + + prev_scale = *scale; + if (cpu_khz) + *scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz; + + sched_clock_idle_wakeup_event(0); + local_irq_restore(flags); } unsigned long long sched_clock(void) @@ -100,7 +133,9 @@ static int time_cpufreq_notifier(struct mark_tsc_unstable("cpufreq changes"); } - set_cyc2ns_scale(tsc_khz_ref); + preempt_disable(); + set_cyc2ns_scale(tsc_khz_ref, smp_processor_id()); + preempt_enable(); return 0; } @@ -151,7 +186,7 @@ static unsigned long __init tsc_read_ref void __init tsc_calibrate(void) { unsigned long flags, tsc1, tsc2, tr1, tr2, pm1, pm2, hpet1, hpet2; - int hpet = is_hpet_enabled(); + int hpet = is_hpet_enabled(), cpu; local_irq_save(flags); @@ -206,7 +241,9 @@ void __init tsc_calibrate(void) } tsc_khz = tsc2 / tsc1; - set_cyc2ns_scale(tsc_khz); + + for_each_possible_cpu(cpu) + set_cyc2ns_scale(tsc_khz, cpu); } /* Index: linux/include/asm-x86/timer.h =================================================================== --- linux.orig/include/asm-x86/timer.h +++ linux/include/asm-x86/timer.h @@ -2,6 +2,7 @@ #define _ASMi386_TIMER_H #include <linux/init.h> #include <linux/pm.h> +#include <linux/percpu.h> #define TICK_SIZE (tick_nsec / 1000) @@ -16,7 +17,7 @@ extern int recalibrate_cpu_khz(void); #define calculate_cpu_khz() native_calculate_cpu_khz() #endif -/* Accellerators for sched_clock() +/* Accelerators for sched_clock() * convert from cycles(64bits) => nanoseconds (64bits) * basic equation: * ns = cycles / (freq / ns_per_sec) @@ -31,20 +32,32 @@ extern int recalibrate_cpu_khz(void); * And since SC is a constant power of two, we can convert the div * into a shift. * - * We can use khz divisor instead of mhz to keep a better percision, since + * We can use khz divisor instead of mhz to keep a better precision, since * cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits. * (mathieu.desnoyers@polymtl.ca) * * -johnstul@us.ibm.com "math is hard, lets go shopping!" */ -extern unsigned long cyc2ns_scale __read_mostly; + +DECLARE_PER_CPU(unsigned long, cyc2ns); #define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */ -static inline unsigned long long cycles_2_ns(unsigned long long cyc) +static inline unsigned long long __cycles_2_ns(unsigned long long cyc) { - return (cyc * cyc2ns_scale) >> CYC2NS_SCALE_FACTOR; + return cyc * per_cpu(cyc2ns, smp_processor_id()) >> CYC2NS_SCALE_FACTOR; } +static inline unsigned long long cycles_2_ns(unsigned long long cyc) +{ + unsigned long long ns; + unsigned long flags; + + local_irq_save(flags); + ns = __cycles_2_ns(cyc); + local_irq_restore(flags); + + return ns; +} #endif ---------------> Subject: x86: idle wakeup event in the HLT loop From: Ingo Molnar <mingo@elte.hu> do a proper idle-wakeup event on HLT as well - some CPUs stop the TSC in HLT too, not just when going through the ACPI methods. (the ACPI idle code already does this.) [ update the 64-bit side too, as noticed by Jiri Slaby. ] Signed-off-by: Ingo Molnar <mingo@elte.hu> --- arch/x86/kernel/process_32.c | 15 ++++++++++++--- arch/x86/kernel/process_64.c | 13 ++++++++++--- 2 files changed, 22 insertions(+), 6 deletions(-) Index: linux-x86.q/arch/x86/kernel/process_32.c =================================================================== --- linux-x86.q.orig/arch/x86/kernel/process_32.c +++ linux-x86.q/arch/x86/kernel/process_32.c @@ -113,10 +113,19 @@ void default_idle(void) smp_mb(); local_irq_disable(); - if (!need_resched()) + if (!need_resched()) { + ktime_t t0, t1; + u64 t0n, t1n; + + t0 = ktime_get(); + t0n = ktime_to_ns(t0); safe_halt(); /* enables interrupts racelessly */ - else - local_irq_enable(); + local_irq_disable(); + t1 = ktime_get(); + t1n = ktime_to_ns(t1); + sched_clock_idle_wakeup_event(t1n - t0n); + } + local_irq_enable(); current_thread_info()->status |= TS_POLLING; } else { /* loop is done by the caller */ Index: linux-x86.q/arch/x86/kernel/process_64.c =================================================================== --- linux-x86.q.orig/arch/x86/kernel/process_64.c +++ linux-x86.q/arch/x86/kernel/process_64.c @@ -116,9 +116,16 @@ static void default_idle(void) smp_mb(); local_irq_disable(); if (!need_resched()) { - /* Enables interrupts one instruction before HLT. - x86 special cases this so there is no race. */ - safe_halt(); + ktime_t t0, t1; + u64 t0n, t1n; + + t0 = ktime_get(); + t0n = ktime_to_ns(t0); + safe_halt(); /* enables interrupts racelessly */ + local_irq_disable(); + t1 = ktime_get(); + t1n = ktime_to_ns(t1); + sched_clock_idle_wakeup_event(t1n - t0n); } else local_irq_enable(); current_thread_info()->status |= TS_POLLING; ---------------> Subject: printk: make printk more robust by not allowing recursion From: Ingo Molnar <mingo@elte.hu> make printk more robust by allowing recursion only if there's a crash going on. Also add recursion detection. I've tested it with an artificially injected printk recursion - instead of a lockup or spontaneous reboot or other crash, the output was a well controlled: [ 41.057335] SysRq : <2>BUG: recent printk recursion! [ 41.057335] loglevel0-8 reBoot Crashdump show-all-locks(D) tErm Full kIll saK showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks also do all this printk-debug logic with irqs disabled. Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/printk.c | 48 ++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 38 insertions(+), 10 deletions(-) Index: linux/kernel/printk.c =================================================================== --- linux.orig/kernel/printk.c +++ linux/kernel/printk.c @@ -628,30 +628,57 @@ asmlinkage int printk(const char *fmt, . /* cpu currently holding logbuf_lock */ static volatile unsigned int printk_cpu = UINT_MAX; +const char printk_recursion_bug_msg [] = + KERN_CRIT "BUG: recent printk recursion!\n"; +static int printk_recursion_bug; + asmlinkage int vprintk(const char *fmt, va_list args) { + static int log_level_unknown = 1; + static char printk_buf[1024]; + unsigned long flags; - int printed_len; + int printed_len = 0; + int this_cpu; char *p; - static char printk_buf[1024]; - static int log_level_unknown = 1; boot_delay_msec(); preempt_disable(); - if (unlikely(oops_in_progress) && printk_cpu == smp_processor_id()) - /* If a crash is occurring during printk() on this CPU, - * make sure we can't deadlock */ - zap_locks(); - /* This stops the holder of console_sem just where we want him */ raw_local_irq_save(flags); + this_cpu = smp_processor_id(); + + /* + * Ouch, printk recursed into itself! + */ + if (unlikely(printk_cpu == this_cpu)) { + /* + * If a crash is occurring during printk() on this CPU, + * then try to get the crash message out but make sure + * we can't deadlock. Otherwise just return to avoid the + * recursion and return - but flag the recursion so that + * it can be printed at the next appropriate moment: + */ + if (!oops_in_progress) { + printk_recursion_bug = 1; + goto out_restore_irqs; + } + zap_locks(); + } + lockdep_off(); spin_lock(&logbuf_lock); - printk_cpu = smp_processor_id(); + printk_cpu = this_cpu; + if (printk_recursion_bug) { + printk_recursion_bug = 0; + strcpy(printk_buf, printk_recursion_bug_msg); + printed_len = sizeof(printk_recursion_bug_msg); + } /* Emit the output into the temporary buffer */ - printed_len = vscnprintf(printk_buf, sizeof(printk_buf), fmt, args); + printed_len += vscnprintf(printk_buf + printed_len, + sizeof(printk_buf), fmt, args); /* * Copy the output into log_buf. If the caller didn't provide @@ -744,6 +771,7 @@ asmlinkage int vprintk(const char *fmt, printk_cpu = UINT_MAX; spin_unlock(&logbuf_lock); lockdep_on(); +out_restore_irqs: raw_local_irq_restore(flags); } ---------------> Subject: sched: fix CONFIG_PRINT_TIME's reliance on sched_clock() From: Ingo Molnar <mingo@elte.hu> Stefano Brivio reported weird printk timestamp behavior during CPU frequency changes: http://bugzilla.kernel.org/show_bug.cgi?id=9475 fix CONFIG_PRINT_TIME's reliance on sched_clock() and use cpu_clock() instead. Reported-and-bisected-by: Stefano Brivio <stefano.brivio@polimi.it> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/printk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux/kernel/printk.c =================================================================== --- linux.orig/kernel/printk.c +++ linux/kernel/printk.c @@ -707,7 +707,7 @@ asmlinkage int vprintk(const char *fmt, loglev_char = default_message_loglevel + '0'; } - t = printk_clock(); + t = cpu_clock(printk_cpu); nanosec_rem = do_div(t, 1000000000); tlen = sprintf(tbuf, "<%c>[%5lu.%06lu] ", ---------------> Subject: sched: remove printk_clock() From: Ingo Molnar <mingo@elte.hu> printk_clock() is obsolete - it has been replaced with cpu_clock(). Signed-off-by: Ingo Molnar <mingo@elte.hu> --- arch/arm/kernel/time.c | 11 ----------- arch/ia64/kernel/time.c | 27 --------------------------- kernel/printk.c | 5 ----- 3 files changed, 43 deletions(-) Index: linux/arch/arm/kernel/time.c =================================================================== --- linux.orig/arch/arm/kernel/time.c +++ linux/arch/arm/kernel/time.c @@ -79,17 +79,6 @@ static unsigned long dummy_gettimeoffset } #endif -/* - * An implementation of printk_clock() independent from - * sched_clock(). This avoids non-bootable kernels when - * printk_clock is enabled. - */ -unsigned long long printk_clock(void) -{ - return (unsigned long long)(jiffies - INITIAL_JIFFIES) * - (1000000000 / HZ); -} - static unsigned long next_rtc_update; /* Index: linux/arch/ia64/kernel/time.c =================================================================== --- linux.orig/arch/ia64/kernel/time.c +++ linux/arch/ia64/kernel/time.c @@ -344,33 +344,6 @@ udelay (unsigned long usecs) } EXPORT_SYMBOL(udelay); -static unsigned long long ia64_itc_printk_clock(void) -{ - if (ia64_get_kr(IA64_KR_PER_CPU_DATA)) - return sched_clock(); - return 0; -} - -static unsigned long long ia64_default_printk_clock(void) -{ - return (unsigned long long)(jiffies_64 - INITIAL_JIFFIES) * - (1000000000/HZ); -} - -unsigned long long (*ia64_printk_clock)(void) = &ia64_default_printk_clock; - -unsigned long long printk_clock(void) -{ - return ia64_printk_clock(); -} - -void __init -ia64_setup_printk_clock(void) -{ - if (!(sal_platform_features & IA64_SAL_PLATFORM_FEATURE_ITC_DRIFT)) - ia64_printk_clock = ia64_itc_printk_clock; -} - /* IA64 doesn't cache the timezone */ void update_vsyscall_tz(void) { Index: linux/kernel/printk.c =================================================================== --- linux.orig/kernel/printk.c +++ linux/kernel/printk.c @@ -573,11 +573,6 @@ static int __init printk_time_setup(char __setup("time", printk_time_setup); -__attribute__((weak)) unsigned long long printk_clock(void) -{ - return sched_clock(); -} - /* Check if we have any console registered that can be called early in boot. */ static int have_callable_console(void) { ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-10 20:42 ` Ingo Molnar @ 2007-12-10 20:57 ` Guillaume Chazarain 2007-12-10 20:59 ` Andrew Morton 1 sibling, 0 replies; 74+ messages in thread From: Guillaume Chazarain @ 2007-12-10 20:57 UTC (permalink / raw) To: Ingo Molnar; +Cc: Rafael J. Wysocki, LKML, Andrew Morton, Linus Torvalds On Dec 10, 2007 9:42 PM, Ingo Molnar <mingo@elte.hu> wrote: > although some claimed effect was on udelay()/mdelay() too. Any specific report? The jumping sched_clock on frequency change caused some scheduling oddities for me, but CFS attenuated the effect. Thanks. -- Guillaume ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-10 20:42 ` Ingo Molnar 2007-12-10 20:57 ` Guillaume Chazarain @ 2007-12-10 20:59 ` Andrew Morton 2007-12-10 22:45 ` Ingo Molnar 1 sibling, 1 reply; 74+ messages in thread From: Andrew Morton @ 2007-12-10 20:59 UTC (permalink / raw) To: Ingo Molnar; +Cc: rjw, linux-kernel, torvalds On Mon, 10 Dec 2007 21:42:12 +0100 Ingo Molnar <mingo@elte.hu> wrote: > * Rafael J. Wysocki <rjw@sisk.pl> wrote: > > > Subject : jiffies counter leaps in 2.6.24-rc3 > > Submitter : Stefano Brivio <stefano.brivio@polimi.it> > > References : http://lkml.org/lkml/2007/11/24/53 > > http://bugzilla.kernel.org/show_bug.cgi?id=9475 > > Handled-By : Ingo Molnar <mingo@elte.hu> > > Patch : http://lkml.org/lkml/2007/12/7/132 > > Linus, Andrew, i need some help deciding what to do about this > regression. The fixes for this have been tested and resolve the > regression, but they change printk and other code that runs by default > and is thus rather invasive so late in the v2.6.24 cycle. This bug > should only affect CONFIG_PRINTK_TIME=y kernels (a non-default debug > option) - although some claimed effect was on udelay()/mdelay() too. > > i've attached below the queue of 5 patches that fix this problem. They > have been build and boot tested with more than 1000 random kernels in > the past few days, so i certainly trust the core and x86 bits of this. > > what do you think? Right now i've got them queued up for 2.6.25 in both > the scheduler-devel and the x86-devel git trees - but can submit them > for 2.6.24 if it's better if we did them there. I've got no strong > opinion either way. printk_clock() doesn't seem terribly important but what's this stuff about effects on udelay/mdelay? That can be serious if they're getting shortened. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-10 20:59 ` Andrew Morton @ 2007-12-10 22:45 ` Ingo Molnar 2007-12-10 23:04 ` Ingo Molnar 0 siblings, 1 reply; 74+ messages in thread From: Ingo Molnar @ 2007-12-10 22:45 UTC (permalink / raw) To: Andrew Morton; +Cc: rjw, linux-kernel, torvalds * Andrew Morton <akpm@linux-foundation.org> wrote: > > what do you think? Right now i've got them queued up for 2.6.25 in > > both the scheduler-devel and the x86-devel git trees - but can > > submit them for 2.6.24 if it's better if we did them there. I've got > > no strong opinion either way. > > printk_clock() doesn't seem terribly important but what's this stuff > about effects on udelay/mdelay? That can be serious if they're > getting shortened. since udelay depends on loops_per_jiffy, which is fixed up time_cpufreq_notifier(), i dont see how it could be affected by frequency changes. (but that's the theory - practice might be different) Ingo ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-10 22:45 ` Ingo Molnar @ 2007-12-10 23:04 ` Ingo Molnar 2007-12-10 23:34 ` Stefano Brivio 0 siblings, 1 reply; 74+ messages in thread From: Ingo Molnar @ 2007-12-10 23:04 UTC (permalink / raw) To: Andrew Morton Cc: rjw, linux-kernel, torvalds, Stefano Brivio, Guillaume Chazarain * Ingo Molnar <mingo@elte.hu> wrote: > * Andrew Morton <akpm@linux-foundation.org> wrote: > > > > what do you think? Right now i've got them queued up for 2.6.25 in > > > both the scheduler-devel and the x86-devel git trees - but can > > > submit them for 2.6.24 if it's better if we did them there. I've got > > > no strong opinion either way. > > > > printk_clock() doesn't seem terribly important but what's this stuff > > about effects on udelay/mdelay? That can be serious if they're > > getting shortened. > > since udelay depends on loops_per_jiffy, which is fixed up > time_cpufreq_notifier(), i dont see how it could be affected by > frequency changes. (but that's the theory - practice might be > different) Stefano Brivio reported udelay()/mdelay() effects in the b43 driver. (and it caused driver failures for him.) Stefano, could you please try to sum up your experiences with that issue? Is it reproducable, and the 5 patches i did fix it? (if yes, could you try to re-do the mdelay verifications perhaps, to make sure it's not some other effect interacting here. In theory sched-clock scaling has no effect on udelay behavior.) Ingo ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-10 23:04 ` Ingo Molnar @ 2007-12-10 23:34 ` Stefano Brivio 2007-12-10 23:53 ` Guillaume Chazarain ` (2 more replies) 0 siblings, 3 replies; 74+ messages in thread From: Stefano Brivio @ 2007-12-10 23:34 UTC (permalink / raw) To: Ingo Molnar Cc: Andrew Morton, rjw, linux-kernel, torvalds, Guillaume Chazarain On Tue, 11 Dec 2007 00:04:25 +0100 Ingo Molnar <mingo@elte.hu> wrote: > > * Ingo Molnar <mingo@elte.hu> wrote: > > > * Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > what do you think? Right now i've got them queued up for 2.6.25 in > > > > both the scheduler-devel and the x86-devel git trees - but can > > > > submit them for 2.6.24 if it's better if we did them there. I've got > > > > no strong opinion either way. > > > > > > printk_clock() doesn't seem terribly important but what's this stuff > > > about effects on udelay/mdelay? That can be serious if they're > > > getting shortened. > > > > since udelay depends on loops_per_jiffy, which is fixed up > > time_cpufreq_notifier(), i dont see how it could be affected by > > frequency changes. (but that's the theory - practice might be > > different) > > Stefano Brivio reported udelay()/mdelay() effects in the b43 driver. > (and it caused driver failures for him.) > > Stefano, could you please try to sum up your experiences with that > issue? Is it reproducable, and the 5 patches i did fix it? (if yes, > could you try to re-do the mdelay verifications perhaps, to make sure > it's not some other effect interacting here. In theory sched-clock > scaling has no effect on udelay behavior.) Sorry for disappearing. Anyway, yes, those patches fixed it. Precision in delays isn't that good when using my crappy unstable TSC (mdelay(2000) causes delays between 2 and 2.9 seconds) but it's not depending on frequency changes anymore. So I'd say it's fixed, but please tell me if you want me to do any other test so as to be sure it is. -- Ciao Stefano ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-10 23:34 ` Stefano Brivio @ 2007-12-10 23:53 ` Guillaume Chazarain 2007-12-11 8:48 ` Ingo Molnar 2007-12-10 23:56 ` Arjan van de Ven 2007-12-11 9:01 ` Ingo Molnar 2 siblings, 1 reply; 74+ messages in thread From: Guillaume Chazarain @ 2007-12-10 23:53 UTC (permalink / raw) To: Stefano Brivio; +Cc: Ingo Molnar, Andrew Morton, rjw, linux-kernel, torvalds Stefano Brivio <stefano.brivio@polimi.it> wrote: > Sorry for disappearing. Anyway, yes, those patches fixed it. Precision in > delays isn't that good when using my crappy unstable TSC (mdelay(2000) > causes delays between 2 and 2.9 seconds) but it's not depending on frequency > changes anymore. So I'd say it's fixed, but please tell me if you want me > to do any other test so as to be sure it is. Ingo, it seems you dropped http://lkml.org/lkml/2007/12/7/100 (cpu_clock() based udelay), so how udelay can be affected by your proposed changes? Thanks. -- Guillaume ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-10 23:53 ` Guillaume Chazarain @ 2007-12-11 8:48 ` Ingo Molnar 0 siblings, 0 replies; 74+ messages in thread From: Ingo Molnar @ 2007-12-11 8:48 UTC (permalink / raw) To: Guillaume Chazarain Cc: Stefano Brivio, Andrew Morton, rjw, linux-kernel, torvalds * Guillaume Chazarain <guichaz@yahoo.fr> wrote: > Stefano Brivio <stefano.brivio@polimi.it> wrote: > > > Sorry for disappearing. Anyway, yes, those patches fixed it. Precision in > > delays isn't that good when using my crappy unstable TSC (mdelay(2000) > > causes delays between 2 and 2.9 seconds) but it's not depending on frequency > > changes anymore. So I'd say it's fixed, but please tell me if you want me > > to do any other test so as to be sure it is. > > Ingo, > > it seems you dropped http://lkml.org/lkml/2007/12/7/100 (cpu_clock() > based udelay), so how udelay can be affected by your proposed changes? was this needed for you to get stable udelay()? (that cpu_clock() based udelay patch was buggy, i got the units wrong. udelay does wacky conversions between various units. So i dropped it for the time being.) the last rollup you tested didnt show udelay problems, and it didnt include the sched_clock() based udelay patch. so it would be nice if you could re-examine exactly what is needed. Please try latest -git and the concatenation of the 4 patches below. What would be the best info is to see which (if any!) patches are needed against latest -git to get a stable udelay() on your box. Ingo ---------------------------------------> * Rafael J. Wysocki <rjw@sisk.pl> wrote: > Subject : jiffies counter leaps in 2.6.24-rc3 > Submitter : Stefano Brivio <stefano.brivio@polimi.it> > References : http://lkml.org/lkml/2007/11/24/53 > http://bugzilla.kernel.org/show_bug.cgi?id=9475 > Handled-By : Ingo Molnar <mingo@elte.hu> > Patch : http://lkml.org/lkml/2007/12/7/132 Linus, Andrew, i need some help deciding what to do about this regression. The fixes for this have been tested and resolve the regression, but they change printk and other code that runs by default and is thus rather invasive so late in the v2.6.24 cycle. This bug should only affect CONFIG_PRINTK_TIME=y kernels (a non-default debug option) - although some claimed effect was on udelay()/mdelay() too. i've attached below the queue of 5 patches that fix this problem. They have been build and boot tested with more than 1000 random kernels in the past few days, so i certainly trust the core and x86 bits of this. what do you think? Right now i've got them queued up for 2.6.25 in both the scheduler-devel and the x86-devel git trees - but can submit them for 2.6.24 if it's better if we did them there. I've got no strong opinion either way. Ingo --------------------> Subject: x86: scale cyc_2_nsec according to CPU frequency From: "Guillaume Chazarain" <guichaz@yahoo.fr> scale the sched_clock() cyc_2_nsec scaling factor according to CPU frequency changes. [ mingo@elte.hu: simplified it and fixed it for SMP. ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- arch/x86/kernel/tsc_32.c | 43 ++++++++++++++++++++++++++++++----- arch/x86/kernel/tsc_64.c | 57 ++++++++++++++++++++++++++++++++++++++--------- include/asm-x86/timer.h | 23 ++++++++++++++---- 3 files changed, 102 insertions(+), 21 deletions(-) Index: linux/arch/x86/kernel/tsc_32.c =================================================================== --- linux.orig/arch/x86/kernel/tsc_32.c +++ linux/arch/x86/kernel/tsc_32.c @@ -5,6 +5,7 @@ #include <linux/jiffies.h> #include <linux/init.h> #include <linux/dmi.h> +#include <linux/percpu.h> #include <asm/delay.h> #include <asm/tsc.h> @@ -80,13 +81,31 @@ EXPORT_SYMBOL_GPL(check_tsc_unstable); * * -johnstul@us.ibm.com "math is hard, lets go shopping!" */ -unsigned long cyc2ns_scale __read_mostly; -#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */ +DEFINE_PER_CPU(unsigned long, cyc2ns); -static inline void set_cyc2ns_scale(unsigned long cpu_khz) +static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu) { - cyc2ns_scale = (1000000 << CYC2NS_SCALE_FACTOR)/cpu_khz; + unsigned long flags, prev_scale, *scale; + unsigned long long tsc_now, ns_now; + + local_irq_save(flags); + sched_clock_idle_sleep_event(); + + scale = &per_cpu(cyc2ns, cpu); + + rdtscll(tsc_now); + ns_now = __cycles_2_ns(tsc_now); + + prev_scale = *scale; + if (cpu_khz) + *scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz; + + /* + * Start smoothly with the new frequency: + */ + sched_clock_idle_wakeup_event(0); + local_irq_restore(flags); } /* @@ -239,7 +258,9 @@ time_cpufreq_notifier(struct notifier_bl ref_freq, freq->new); if (!(freq->flags & CPUFREQ_CONST_LOOPS)) { tsc_khz = cpu_khz; - set_cyc2ns_scale(cpu_khz); + preempt_disable(); + set_cyc2ns_scale(cpu_khz, smp_processor_id()); + preempt_enable(); /* * TSC based sched_clock turns * to junk w/ cpufreq @@ -367,6 +388,8 @@ static inline void check_geode_tsc_relia void __init tsc_init(void) { + int cpu; + if (!cpu_has_tsc || tsc_disable) goto out_no_tsc; @@ -380,7 +403,15 @@ void __init tsc_init(void) (unsigned long)cpu_khz / 1000, (unsigned long)cpu_khz % 1000); - set_cyc2ns_scale(cpu_khz); + /* + * Secondary CPUs do not run through tsc_init(), so set up + * all the scale factors for all CPUs, assuming the same + * speed as the bootup CPU. (cpufreq notifiers will fix this + * up if their speed diverges) + */ + for_each_possible_cpu(cpu) + set_cyc2ns_scale(cpu_khz, cpu); + use_tsc_delay(); /* Check and install the TSC clocksource */ Index: linux/arch/x86/kernel/tsc_64.c =================================================================== --- linux.orig/arch/x86/kernel/tsc_64.c +++ linux/arch/x86/kernel/tsc_64.c @@ -10,6 +10,7 @@ #include <asm/hpet.h> #include <asm/timex.h> +#include <asm/timer.h> static int notsc __initdata = 0; @@ -18,16 +19,48 @@ EXPORT_SYMBOL(cpu_khz); unsigned int tsc_khz; EXPORT_SYMBOL(tsc_khz); -static unsigned int cyc2ns_scale __read_mostly; +/* Accelerators for sched_clock() + * convert from cycles(64bits) => nanoseconds (64bits) + * basic equation: + * ns = cycles / (freq / ns_per_sec) + * ns = cycles * (ns_per_sec / freq) + * ns = cycles * (10^9 / (cpu_khz * 10^3)) + * ns = cycles * (10^6 / cpu_khz) + * + * Then we use scaling math (suggested by george@mvista.com) to get: + * ns = cycles * (10^6 * SC / cpu_khz) / SC + * ns = cycles * cyc2ns_scale / SC + * + * And since SC is a constant power of two, we can convert the div + * into a shift. + * + * We can use khz divisor instead of mhz to keep a better precision, since + * cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits. + * (mathieu.desnoyers@polymtl.ca) + * + * -johnstul@us.ibm.com "math is hard, lets go shopping!" + */ +DEFINE_PER_CPU(unsigned long, cyc2ns); -static inline void set_cyc2ns_scale(unsigned long khz) +static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu) { - cyc2ns_scale = (NSEC_PER_MSEC << NS_SCALE) / khz; -} + unsigned long flags, prev_scale, *scale; + unsigned long long tsc_now, ns_now; -static unsigned long long cycles_2_ns(unsigned long long cyc) -{ - return (cyc * cyc2ns_scale) >> NS_SCALE; + local_irq_save(flags); + sched_clock_idle_sleep_event(); + + scale = &per_cpu(cyc2ns, cpu); + + rdtscll(tsc_now); + ns_now = __cycles_2_ns(tsc_now); + + prev_scale = *scale; + if (cpu_khz) + *scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz; + + sched_clock_idle_wakeup_event(0); + local_irq_restore(flags); } unsigned long long sched_clock(void) @@ -100,7 +133,9 @@ static int time_cpufreq_notifier(struct mark_tsc_unstable("cpufreq changes"); } - set_cyc2ns_scale(tsc_khz_ref); + preempt_disable(); + set_cyc2ns_scale(tsc_khz_ref, smp_processor_id()); + preempt_enable(); return 0; } @@ -151,7 +186,7 @@ static unsigned long __init tsc_read_ref void __init tsc_calibrate(void) { unsigned long flags, tsc1, tsc2, tr1, tr2, pm1, pm2, hpet1, hpet2; - int hpet = is_hpet_enabled(); + int hpet = is_hpet_enabled(), cpu; local_irq_save(flags); @@ -206,7 +241,9 @@ void __init tsc_calibrate(void) } tsc_khz = tsc2 / tsc1; - set_cyc2ns_scale(tsc_khz); + + for_each_possible_cpu(cpu) + set_cyc2ns_scale(tsc_khz, cpu); } /* Index: linux/include/asm-x86/timer.h =================================================================== --- linux.orig/include/asm-x86/timer.h +++ linux/include/asm-x86/timer.h @@ -2,6 +2,7 @@ #define _ASMi386_TIMER_H #include <linux/init.h> #include <linux/pm.h> +#include <linux/percpu.h> #define TICK_SIZE (tick_nsec / 1000) @@ -16,7 +17,7 @@ extern int recalibrate_cpu_khz(void); #define calculate_cpu_khz() native_calculate_cpu_khz() #endif -/* Accellerators for sched_clock() +/* Accelerators for sched_clock() * convert from cycles(64bits) => nanoseconds (64bits) * basic equation: * ns = cycles / (freq / ns_per_sec) @@ -31,20 +32,32 @@ extern int recalibrate_cpu_khz(void); * And since SC is a constant power of two, we can convert the div * into a shift. * - * We can use khz divisor instead of mhz to keep a better percision, since + * We can use khz divisor instead of mhz to keep a better precision, since * cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits. * (mathieu.desnoyers@polymtl.ca) * * -johnstul@us.ibm.com "math is hard, lets go shopping!" */ -extern unsigned long cyc2ns_scale __read_mostly; + +DECLARE_PER_CPU(unsigned long, cyc2ns); #define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */ -static inline unsigned long long cycles_2_ns(unsigned long long cyc) +static inline unsigned long long __cycles_2_ns(unsigned long long cyc) { - return (cyc * cyc2ns_scale) >> CYC2NS_SCALE_FACTOR; + return cyc * per_cpu(cyc2ns, smp_processor_id()) >> CYC2NS_SCALE_FACTOR; } +static inline unsigned long long cycles_2_ns(unsigned long long cyc) +{ + unsigned long long ns; + unsigned long flags; + + local_irq_save(flags); + ns = __cycles_2_ns(cyc); + local_irq_restore(flags); + + return ns; +} #endif ---------------> Subject: x86: idle wakeup event in the HLT loop From: Ingo Molnar <mingo@elte.hu> do a proper idle-wakeup event on HLT as well - some CPUs stop the TSC in HLT too, not just when going through the ACPI methods. (the ACPI idle code already does this.) [ update the 64-bit side too, as noticed by Jiri Slaby. ] Signed-off-by: Ingo Molnar <mingo@elte.hu> --- arch/x86/kernel/process_32.c | 15 ++++++++++++--- arch/x86/kernel/process_64.c | 13 ++++++++++--- 2 files changed, 22 insertions(+), 6 deletions(-) Index: linux-x86.q/arch/x86/kernel/process_32.c =================================================================== --- linux-x86.q.orig/arch/x86/kernel/process_32.c +++ linux-x86.q/arch/x86/kernel/process_32.c @@ -113,10 +113,19 @@ void default_idle(void) smp_mb(); local_irq_disable(); - if (!need_resched()) + if (!need_resched()) { + ktime_t t0, t1; + u64 t0n, t1n; + + t0 = ktime_get(); + t0n = ktime_to_ns(t0); safe_halt(); /* enables interrupts racelessly */ - else - local_irq_enable(); + local_irq_disable(); + t1 = ktime_get(); + t1n = ktime_to_ns(t1); + sched_clock_idle_wakeup_event(t1n - t0n); + } + local_irq_enable(); current_thread_info()->status |= TS_POLLING; } else { /* loop is done by the caller */ Index: linux-x86.q/arch/x86/kernel/process_64.c =================================================================== --- linux-x86.q.orig/arch/x86/kernel/process_64.c +++ linux-x86.q/arch/x86/kernel/process_64.c @@ -116,9 +116,16 @@ static void default_idle(void) smp_mb(); local_irq_disable(); if (!need_resched()) { - /* Enables interrupts one instruction before HLT. - x86 special cases this so there is no race. */ - safe_halt(); + ktime_t t0, t1; + u64 t0n, t1n; + + t0 = ktime_get(); + t0n = ktime_to_ns(t0); + safe_halt(); /* enables interrupts racelessly */ + local_irq_disable(); + t1 = ktime_get(); + t1n = ktime_to_ns(t1); + sched_clock_idle_wakeup_event(t1n - t0n); } else local_irq_enable(); current_thread_info()->status |= TS_POLLING; ---------------> Subject: printk: make printk more robust by not allowing recursion From: Ingo Molnar <mingo@elte.hu> make printk more robust by allowing recursion only if there's a crash going on. Also add recursion detection. I've tested it with an artificially injected printk recursion - instead of a lockup or spontaneous reboot or other crash, the output was a well controlled: [ 41.057335] SysRq : <2>BUG: recent printk recursion! [ 41.057335] loglevel0-8 reBoot Crashdump show-all-locks(D) tErm Full kIll saK showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks also do all this printk-debug logic with irqs disabled. Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/printk.c | 48 ++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 38 insertions(+), 10 deletions(-) Index: linux/kernel/printk.c =================================================================== --- linux.orig/kernel/printk.c +++ linux/kernel/printk.c @@ -628,30 +628,57 @@ asmlinkage int printk(const char *fmt, . /* cpu currently holding logbuf_lock */ static volatile unsigned int printk_cpu = UINT_MAX; +const char printk_recursion_bug_msg [] = + KERN_CRIT "BUG: recent printk recursion!\n"; +static int printk_recursion_bug; + asmlinkage int vprintk(const char *fmt, va_list args) { + static int log_level_unknown = 1; + static char printk_buf[1024]; + unsigned long flags; - int printed_len; + int printed_len = 0; + int this_cpu; char *p; - static char printk_buf[1024]; - static int log_level_unknown = 1; boot_delay_msec(); preempt_disable(); - if (unlikely(oops_in_progress) && printk_cpu == smp_processor_id()) - /* If a crash is occurring during printk() on this CPU, - * make sure we can't deadlock */ - zap_locks(); - /* This stops the holder of console_sem just where we want him */ raw_local_irq_save(flags); + this_cpu = smp_processor_id(); + + /* + * Ouch, printk recursed into itself! + */ + if (unlikely(printk_cpu == this_cpu)) { + /* + * If a crash is occurring during printk() on this CPU, + * then try to get the crash message out but make sure + * we can't deadlock. Otherwise just return to avoid the + * recursion and return - but flag the recursion so that + * it can be printed at the next appropriate moment: + */ + if (!oops_in_progress) { + printk_recursion_bug = 1; + goto out_restore_irqs; + } + zap_locks(); + } + lockdep_off(); spin_lock(&logbuf_lock); - printk_cpu = smp_processor_id(); + printk_cpu = this_cpu; + if (printk_recursion_bug) { + printk_recursion_bug = 0; + strcpy(printk_buf, printk_recursion_bug_msg); + printed_len = sizeof(printk_recursion_bug_msg); + } /* Emit the output into the temporary buffer */ - printed_len = vscnprintf(printk_buf, sizeof(printk_buf), fmt, args); + printed_len += vscnprintf(printk_buf + printed_len, + sizeof(printk_buf), fmt, args); /* * Copy the output into log_buf. If the caller didn't provide @@ -744,6 +771,7 @@ asmlinkage int vprintk(const char *fmt, printk_cpu = UINT_MAX; spin_unlock(&logbuf_lock); lockdep_on(); +out_restore_irqs: raw_local_irq_restore(flags); } ---------------> Subject: sched: fix CONFIG_PRINT_TIME's reliance on sched_clock() From: Ingo Molnar <mingo@elte.hu> Stefano Brivio reported weird printk timestamp behavior during CPU frequency changes: http://bugzilla.kernel.org/show_bug.cgi?id=9475 fix CONFIG_PRINT_TIME's reliance on sched_clock() and use cpu_clock() instead. Reported-and-bisected-by: Stefano Brivio <stefano.brivio@polimi.it> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/printk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux/kernel/printk.c =================================================================== --- linux.orig/kernel/printk.c +++ linux/kernel/printk.c @@ -707,7 +707,7 @@ asmlinkage int vprintk(const char *fmt, loglev_char = default_message_loglevel + '0'; } - t = printk_clock(); + t = cpu_clock(printk_cpu); nanosec_rem = do_div(t, 1000000000); tlen = sprintf(tbuf, "<%c>[%5lu.%06lu] ", ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-10 23:34 ` Stefano Brivio 2007-12-10 23:53 ` Guillaume Chazarain @ 2007-12-10 23:56 ` Arjan van de Ven 2007-12-11 0:01 ` Guillaume Chazarain 2007-12-11 9:01 ` Ingo Molnar 2 siblings, 1 reply; 74+ messages in thread From: Arjan van de Ven @ 2007-12-10 23:56 UTC (permalink / raw) To: Stefano Brivio Cc: Ingo Molnar, Andrew Morton, rjw, linux-kernel, torvalds, Guillaume Chazarain On Tue, 11 Dec 2007 00:34:33 +0100 Stefano Brivio <stefano.brivio@polimi.it> wrote: > On Tue, 11 Dec 2007 00:04:25 +0100 > Ingo Molnar <mingo@elte.hu> wrote: > > > > * Ingo Molnar <mingo@elte.hu> wrote: > > > > > * Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > > what do you think? Right now i've got them queued up for > > > > > 2.6.25 in both the scheduler-devel and the x86-devel git > > > > > trees - but can submit them for 2.6.24 if it's better if we > > > > > did them there. I've got no strong opinion either way. > > > > > > > > printk_clock() doesn't seem terribly important but what's this > > > > stuff about effects on udelay/mdelay? That can be serious if > > > > they're getting shortened. > > > > > > since udelay depends on loops_per_jiffy, which is fixed up > > > time_cpufreq_notifier(), i dont see how it could be affected by > > > frequency changes. (but that's the theory - practice might be > > > different) > > > > Stefano Brivio reported udelay()/mdelay() effects in the b43 > > driver. (and it caused driver failures for him.) > > > > Stefano, could you please try to sum up your experiences with that > > issue? Is it reproducable, and the 5 patches i did fix it? (if yes, > > could you try to re-do the mdelay verifications perhaps, to make > > sure it's not some other effect interacting here. In theory > > sched-clock scaling has no effect on udelay behavior.) > > Sorry for disappearing. Anyway, yes, those patches fixed it. > Precision in delays isn't that good when using my crappy unstable TSC > (mdelay(2000) causes delays between 2 and 2.9 seconds) but it's not > depending on frequency changes anymore. So I'd say it's fixed, but > please tell me if you want me to do any other test so as to be sure > it is. > > I'm still quite concerned about this in dual/quad core scenarios; the frequency of both cores is the maximum of what linux sets each core to; this means that if you're THIS sensitive to that there still is quite a nasty issue there. I wonder if the various delay functions (maybe only in .25) should use the maximum observed loops_per_jiffie instead always (across cpus) to be super safe here. -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-10 23:56 ` Arjan van de Ven @ 2007-12-11 0:01 ` Guillaume Chazarain 2007-12-11 1:06 ` Arjan van de Ven 0 siblings, 1 reply; 74+ messages in thread From: Guillaume Chazarain @ 2007-12-11 0:01 UTC (permalink / raw) To: Arjan van de Ven Cc: Stefano Brivio, Ingo Molnar, Andrew Morton, rjw, linux-kernel, torvalds Arjan van de Ven <arjan@infradead.org> wrote: > the frequency of both cores is the maximum of what linux sets each core to; Do you mean that the cpufreq code can be confused about the actual frequency of the cores? That sounds like a big problem. Thanks for any insight. -- Guillaume ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-11 0:01 ` Guillaume Chazarain @ 2007-12-11 1:06 ` Arjan van de Ven 2007-12-11 8:43 ` Ingo Molnar 0 siblings, 1 reply; 74+ messages in thread From: Arjan van de Ven @ 2007-12-11 1:06 UTC (permalink / raw) To: Guillaume Chazarain Cc: Stefano Brivio, Ingo Molnar, Andrew Morton, rjw, linux-kernel, torvalds On Tue, 11 Dec 2007 01:01:25 +0100 Guillaume Chazarain <guichaz@yahoo.fr> wrote: > Arjan van de Ven <arjan@infradead.org> wrote: > > > the frequency of both cores is the maximum of what linux sets each > > core to; > > Do you mean that the cpufreq code can be confused about the actual > frequency of the cores? it means that cpufreq doesn't know the actual frequency (although bios sometimes tells us about the relationship, often the bios just lies through it's teeth); it only knows what it asks for, not what it gets. We know it'll get at least what it asks for, but it can get more than it asks for basically. >That sounds like a big problem. it'll get way worse going forward. (but even on todays systems, the tsc no longer represents frequency, but is some fixed clock totally unrelated to cpu frequency) -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-11 1:06 ` Arjan van de Ven @ 2007-12-11 8:43 ` Ingo Molnar 0 siblings, 0 replies; 74+ messages in thread From: Ingo Molnar @ 2007-12-11 8:43 UTC (permalink / raw) To: Arjan van de Ven Cc: Guillaume Chazarain, Stefano Brivio, Andrew Morton, rjw, linux-kernel, torvalds * Arjan van de Ven <arjan@infradead.org> wrote: > > That sounds like a big problem. > > it'll get way worse going forward. (but even on todays systems, the > tsc no longer represents frequency, but is some fixed clock totally > unrelated to cpu frequency) X86_FEATURE_CONSTANT_TSC CPUs (all modern Intel CPUs) should be fine - we dont do any TSC frequency fixups for them. The loops_per_jiffy fixup looks like this: if (!(freq->flags & CPUFREQ_CONST_LOOPS)) cpu_data(freq->cpu).loops_per_jiffy = cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new); i.e. X86_FEATURE_CONSTANT_TSC excluded. The sched_clock() scaling factor is modified like this: if (!(freq->flags & CPUFREQ_CONST_LOOPS)) { tsc_khz = cpu_khz; preempt_disable(); set_cyc2ns_scale(cpu_khz, smp_processor_id()); so here X86_FEATURE_CONSTANT_TSC is excluded again. So the whole frequency scaling issue will become a pure legacy issue only with time. Ingo ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-10 23:34 ` Stefano Brivio 2007-12-10 23:53 ` Guillaume Chazarain 2007-12-10 23:56 ` Arjan van de Ven @ 2007-12-11 9:01 ` Ingo Molnar 2007-12-11 21:10 ` Stefano Brivio 2007-12-19 0:58 ` Stefano Brivio 2 siblings, 2 replies; 74+ messages in thread From: Ingo Molnar @ 2007-12-11 9:01 UTC (permalink / raw) To: Stefano Brivio Cc: Andrew Morton, rjw, linux-kernel, torvalds, Guillaume Chazarain [-- Attachment #1: Type: text/plain, Size: 1791 bytes --] * Stefano Brivio <stefano.brivio@polimi.it> wrote: > > Stefano, could you please try to sum up your experiences with that > > issue? Is it reproducable, and the 5 patches i did fix it? (if yes, > > could you try to re-do the mdelay verifications perhaps, to make > > sure it's not some other effect interacting here. In theory > > sched-clock scaling has no effect on udelay behavior.) > > Sorry for disappearing. Anyway, yes, those patches fixed it. Precision > in delays isn't that good when using my crappy unstable TSC > (mdelay(2000) causes delays between 2 and 2.9 seconds) but it's not > depending on frequency changes anymore. So I'd say it's fixed, but > please tell me if you want me to do any other test so as to be sure it > is. ok, just to make sure we are all synced up. I made 8 patches related to this problem category (and all the trickle effects). 3 are upstream already, 5 are pending for v2.6.25. One out of those 5 is an immaterial cleanup patch - which leaves us 4 patches to sort out. So i'd suggest for you to try latest -git - that will tell us whether udelay() is acceptable on your box right now. i've attached those 4 patches: x86-sched_clock-re-scheduler-fix-x86-regression-in-native-sched-clock.patch x86-cpu-clock-idle-event.patch sched-printk-recursion-fix.patch sched-printk-clock-fix.patch none of them is _supposed_ to have any effect on udelay(), but the interactions in this area are weird. [ note: CONFIG_PRINTK_TIME will be broken and only fixed in v2.6.25, so use some other time metric for determining mdelay quality. ] plus then there's this patch: http://lkml.org/lkml/2007/12/7/100 is it perhaps this one that fixed udelay for you? [ which would be much more expected, as this patch changes udelay ;-) ] Ingo [-- Attachment #2: x86-sched_clock-re-scheduler-fix-x86-regression-in-native-sched-clock.patch --] [-- Type: text/plain, Size: 7410 bytes --] Subject: x86: scale cyc_2_nsec according to CPU frequency From: "Guillaume Chazarain" <guichaz@yahoo.fr> scale the sched_clock() cyc_2_nsec scaling factor according to CPU frequency changes. [ mingo@elte.hu: simplified it and fixed it for SMP. ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- arch/x86/kernel/tsc_32.c | 43 ++++++++++++++++++++++++++++++----- arch/x86/kernel/tsc_64.c | 57 ++++++++++++++++++++++++++++++++++++++--------- include/asm-x86/timer.h | 23 ++++++++++++++---- 3 files changed, 102 insertions(+), 21 deletions(-) Index: linux/arch/x86/kernel/tsc_32.c =================================================================== --- linux.orig/arch/x86/kernel/tsc_32.c +++ linux/arch/x86/kernel/tsc_32.c @@ -5,6 +5,7 @@ #include <linux/jiffies.h> #include <linux/init.h> #include <linux/dmi.h> +#include <linux/percpu.h> #include <asm/delay.h> #include <asm/tsc.h> @@ -80,13 +81,31 @@ EXPORT_SYMBOL_GPL(check_tsc_unstable); * * -johnstul@us.ibm.com "math is hard, lets go shopping!" */ -unsigned long cyc2ns_scale __read_mostly; -#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */ +DEFINE_PER_CPU(unsigned long, cyc2ns); -static inline void set_cyc2ns_scale(unsigned long cpu_khz) +static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu) { - cyc2ns_scale = (1000000 << CYC2NS_SCALE_FACTOR)/cpu_khz; + unsigned long flags, prev_scale, *scale; + unsigned long long tsc_now, ns_now; + + local_irq_save(flags); + sched_clock_idle_sleep_event(); + + scale = &per_cpu(cyc2ns, cpu); + + rdtscll(tsc_now); + ns_now = __cycles_2_ns(tsc_now); + + prev_scale = *scale; + if (cpu_khz) + *scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz; + + /* + * Start smoothly with the new frequency: + */ + sched_clock_idle_wakeup_event(0); + local_irq_restore(flags); } /* @@ -239,7 +258,9 @@ time_cpufreq_notifier(struct notifier_bl ref_freq, freq->new); if (!(freq->flags & CPUFREQ_CONST_LOOPS)) { tsc_khz = cpu_khz; - set_cyc2ns_scale(cpu_khz); + preempt_disable(); + set_cyc2ns_scale(cpu_khz, smp_processor_id()); + preempt_enable(); /* * TSC based sched_clock turns * to junk w/ cpufreq @@ -367,6 +388,8 @@ static inline void check_geode_tsc_relia void __init tsc_init(void) { + int cpu; + if (!cpu_has_tsc || tsc_disable) goto out_no_tsc; @@ -380,7 +403,15 @@ void __init tsc_init(void) (unsigned long)cpu_khz / 1000, (unsigned long)cpu_khz % 1000); - set_cyc2ns_scale(cpu_khz); + /* + * Secondary CPUs do not run through tsc_init(), so set up + * all the scale factors for all CPUs, assuming the same + * speed as the bootup CPU. (cpufreq notifiers will fix this + * up if their speed diverges) + */ + for_each_possible_cpu(cpu) + set_cyc2ns_scale(cpu_khz, cpu); + use_tsc_delay(); /* Check and install the TSC clocksource */ Index: linux/arch/x86/kernel/tsc_64.c =================================================================== --- linux.orig/arch/x86/kernel/tsc_64.c +++ linux/arch/x86/kernel/tsc_64.c @@ -10,6 +10,7 @@ #include <asm/hpet.h> #include <asm/timex.h> +#include <asm/timer.h> static int notsc __initdata = 0; @@ -18,16 +19,48 @@ EXPORT_SYMBOL(cpu_khz); unsigned int tsc_khz; EXPORT_SYMBOL(tsc_khz); -static unsigned int cyc2ns_scale __read_mostly; +/* Accelerators for sched_clock() + * convert from cycles(64bits) => nanoseconds (64bits) + * basic equation: + * ns = cycles / (freq / ns_per_sec) + * ns = cycles * (ns_per_sec / freq) + * ns = cycles * (10^9 / (cpu_khz * 10^3)) + * ns = cycles * (10^6 / cpu_khz) + * + * Then we use scaling math (suggested by george@mvista.com) to get: + * ns = cycles * (10^6 * SC / cpu_khz) / SC + * ns = cycles * cyc2ns_scale / SC + * + * And since SC is a constant power of two, we can convert the div + * into a shift. + * + * We can use khz divisor instead of mhz to keep a better precision, since + * cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits. + * (mathieu.desnoyers@polymtl.ca) + * + * -johnstul@us.ibm.com "math is hard, lets go shopping!" + */ +DEFINE_PER_CPU(unsigned long, cyc2ns); -static inline void set_cyc2ns_scale(unsigned long khz) +static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu) { - cyc2ns_scale = (NSEC_PER_MSEC << NS_SCALE) / khz; -} + unsigned long flags, prev_scale, *scale; + unsigned long long tsc_now, ns_now; -static unsigned long long cycles_2_ns(unsigned long long cyc) -{ - return (cyc * cyc2ns_scale) >> NS_SCALE; + local_irq_save(flags); + sched_clock_idle_sleep_event(); + + scale = &per_cpu(cyc2ns, cpu); + + rdtscll(tsc_now); + ns_now = __cycles_2_ns(tsc_now); + + prev_scale = *scale; + if (cpu_khz) + *scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz; + + sched_clock_idle_wakeup_event(0); + local_irq_restore(flags); } unsigned long long sched_clock(void) @@ -100,7 +133,9 @@ static int time_cpufreq_notifier(struct mark_tsc_unstable("cpufreq changes"); } - set_cyc2ns_scale(tsc_khz_ref); + preempt_disable(); + set_cyc2ns_scale(tsc_khz_ref, smp_processor_id()); + preempt_enable(); return 0; } @@ -151,7 +186,7 @@ static unsigned long __init tsc_read_ref void __init tsc_calibrate(void) { unsigned long flags, tsc1, tsc2, tr1, tr2, pm1, pm2, hpet1, hpet2; - int hpet = is_hpet_enabled(); + int hpet = is_hpet_enabled(), cpu; local_irq_save(flags); @@ -206,7 +241,9 @@ void __init tsc_calibrate(void) } tsc_khz = tsc2 / tsc1; - set_cyc2ns_scale(tsc_khz); + + for_each_possible_cpu(cpu) + set_cyc2ns_scale(tsc_khz, cpu); } /* Index: linux/include/asm-x86/timer.h =================================================================== --- linux.orig/include/asm-x86/timer.h +++ linux/include/asm-x86/timer.h @@ -2,6 +2,7 @@ #define _ASMi386_TIMER_H #include <linux/init.h> #include <linux/pm.h> +#include <linux/percpu.h> #define TICK_SIZE (tick_nsec / 1000) @@ -16,7 +17,7 @@ extern int recalibrate_cpu_khz(void); #define calculate_cpu_khz() native_calculate_cpu_khz() #endif -/* Accellerators for sched_clock() +/* Accelerators for sched_clock() * convert from cycles(64bits) => nanoseconds (64bits) * basic equation: * ns = cycles / (freq / ns_per_sec) @@ -31,20 +32,32 @@ extern int recalibrate_cpu_khz(void); * And since SC is a constant power of two, we can convert the div * into a shift. * - * We can use khz divisor instead of mhz to keep a better percision, since + * We can use khz divisor instead of mhz to keep a better precision, since * cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits. * (mathieu.desnoyers@polymtl.ca) * * -johnstul@us.ibm.com "math is hard, lets go shopping!" */ -extern unsigned long cyc2ns_scale __read_mostly; + +DECLARE_PER_CPU(unsigned long, cyc2ns); #define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */ -static inline unsigned long long cycles_2_ns(unsigned long long cyc) +static inline unsigned long long __cycles_2_ns(unsigned long long cyc) { - return (cyc * cyc2ns_scale) >> CYC2NS_SCALE_FACTOR; + return cyc * per_cpu(cyc2ns, smp_processor_id()) >> CYC2NS_SCALE_FACTOR; } +static inline unsigned long long cycles_2_ns(unsigned long long cyc) +{ + unsigned long long ns; + unsigned long flags; + + local_irq_save(flags); + ns = __cycles_2_ns(cyc); + local_irq_restore(flags); + + return ns; +} #endif [-- Attachment #3: x86-cpu-clock-idle-event.patch --] [-- Type: text/plain, Size: 2055 bytes --] Subject: x86: idle wakeup event in the HLT loop From: Ingo Molnar <mingo@elte.hu> do a proper idle-wakeup event on HLT as well - some CPUs stop the TSC in HLT too, not just when going through the ACPI methods. (the ACPI idle code already does this.) [ update the 64-bit side too, as noticed by Jiri Slaby. ] Signed-off-by: Ingo Molnar <mingo@elte.hu> --- arch/x86/kernel/process_32.c | 15 ++++++++++++--- arch/x86/kernel/process_64.c | 13 ++++++++++--- 2 files changed, 22 insertions(+), 6 deletions(-) Index: linux-x86.q/arch/x86/kernel/process_32.c =================================================================== --- linux-x86.q.orig/arch/x86/kernel/process_32.c +++ linux-x86.q/arch/x86/kernel/process_32.c @@ -113,10 +113,19 @@ void default_idle(void) smp_mb(); local_irq_disable(); - if (!need_resched()) + if (!need_resched()) { + ktime_t t0, t1; + u64 t0n, t1n; + + t0 = ktime_get(); + t0n = ktime_to_ns(t0); safe_halt(); /* enables interrupts racelessly */ - else - local_irq_enable(); + local_irq_disable(); + t1 = ktime_get(); + t1n = ktime_to_ns(t1); + sched_clock_idle_wakeup_event(t1n - t0n); + } + local_irq_enable(); current_thread_info()->status |= TS_POLLING; } else { /* loop is done by the caller */ Index: linux-x86.q/arch/x86/kernel/process_64.c =================================================================== --- linux-x86.q.orig/arch/x86/kernel/process_64.c +++ linux-x86.q/arch/x86/kernel/process_64.c @@ -116,9 +116,16 @@ static void default_idle(void) smp_mb(); local_irq_disable(); if (!need_resched()) { - /* Enables interrupts one instruction before HLT. - x86 special cases this so there is no race. */ - safe_halt(); + ktime_t t0, t1; + u64 t0n, t1n; + + t0 = ktime_get(); + t0n = ktime_to_ns(t0); + safe_halt(); /* enables interrupts racelessly */ + local_irq_disable(); + t1 = ktime_get(); + t1n = ktime_to_ns(t1); + sched_clock_idle_wakeup_event(t1n - t0n); } else local_irq_enable(); current_thread_info()->status |= TS_POLLING; [-- Attachment #4: sched-printk-recursion-fix.patch --] [-- Type: text/plain, Size: 3181 bytes --] Subject: printk: make printk more robust by not allowing recursion From: Ingo Molnar <mingo@elte.hu> make printk more robust by allowing recursion only if there's a crash going on. Also add recursion detection. I've tested it with an artificially injected printk recursion - instead of a lockup or spontaneous reboot or other crash, the output was a well controlled: [ 41.057335] SysRq : <2>BUG: recent printk recursion! [ 41.057335] loglevel0-8 reBoot Crashdump show-all-locks(D) tErm Full kIll saK showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks also do all this printk-debug logic with irqs disabled. Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/printk.c | 48 ++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 38 insertions(+), 10 deletions(-) Index: linux/kernel/printk.c =================================================================== --- linux.orig/kernel/printk.c +++ linux/kernel/printk.c @@ -628,30 +628,57 @@ asmlinkage int printk(const char *fmt, . /* cpu currently holding logbuf_lock */ static volatile unsigned int printk_cpu = UINT_MAX; +const char printk_recursion_bug_msg [] = + KERN_CRIT "BUG: recent printk recursion!\n"; +static int printk_recursion_bug; + asmlinkage int vprintk(const char *fmt, va_list args) { + static int log_level_unknown = 1; + static char printk_buf[1024]; + unsigned long flags; - int printed_len; + int printed_len = 0; + int this_cpu; char *p; - static char printk_buf[1024]; - static int log_level_unknown = 1; boot_delay_msec(); preempt_disable(); - if (unlikely(oops_in_progress) && printk_cpu == smp_processor_id()) - /* If a crash is occurring during printk() on this CPU, - * make sure we can't deadlock */ - zap_locks(); - /* This stops the holder of console_sem just where we want him */ raw_local_irq_save(flags); + this_cpu = smp_processor_id(); + + /* + * Ouch, printk recursed into itself! + */ + if (unlikely(printk_cpu == this_cpu)) { + /* + * If a crash is occurring during printk() on this CPU, + * then try to get the crash message out but make sure + * we can't deadlock. Otherwise just return to avoid the + * recursion and return - but flag the recursion so that + * it can be printed at the next appropriate moment: + */ + if (!oops_in_progress) { + printk_recursion_bug = 1; + goto out_restore_irqs; + } + zap_locks(); + } + lockdep_off(); spin_lock(&logbuf_lock); - printk_cpu = smp_processor_id(); + printk_cpu = this_cpu; + if (printk_recursion_bug) { + printk_recursion_bug = 0; + strcpy(printk_buf, printk_recursion_bug_msg); + printed_len = sizeof(printk_recursion_bug_msg); + } /* Emit the output into the temporary buffer */ - printed_len = vscnprintf(printk_buf, sizeof(printk_buf), fmt, args); + printed_len += vscnprintf(printk_buf + printed_len, + sizeof(printk_buf), fmt, args); /* * Copy the output into log_buf. If the caller didn't provide @@ -744,6 +771,7 @@ asmlinkage int vprintk(const char *fmt, printk_cpu = UINT_MAX; spin_unlock(&logbuf_lock); lockdep_on(); +out_restore_irqs: raw_local_irq_restore(flags); } [-- Attachment #5: sched-printk-clock-fix.patch --] [-- Type: text/plain, Size: 943 bytes --] Subject: sched: fix CONFIG_PRINT_TIME's reliance on sched_clock() From: Ingo Molnar <mingo@elte.hu> Stefano Brivio reported weird printk timestamp behavior during CPU frequency changes: http://bugzilla.kernel.org/show_bug.cgi?id=9475 fix CONFIG_PRINT_TIME's reliance on sched_clock() and use cpu_clock() instead. Reported-and-bisected-by: Stefano Brivio <stefano.brivio@polimi.it> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/printk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux/kernel/printk.c =================================================================== --- linux.orig/kernel/printk.c +++ linux/kernel/printk.c @@ -707,7 +707,7 @@ asmlinkage int vprintk(const char *fmt, loglev_char = default_message_loglevel + '0'; } - t = printk_clock(); + t = cpu_clock(printk_cpu); nanosec_rem = do_div(t, 1000000000); tlen = sprintf(tbuf, "<%c>[%5lu.%06lu] ", ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-11 9:01 ` Ingo Molnar @ 2007-12-11 21:10 ` Stefano Brivio 2007-12-19 0:58 ` Stefano Brivio 1 sibling, 0 replies; 74+ messages in thread From: Stefano Brivio @ 2007-12-11 21:10 UTC (permalink / raw) To: Ingo Molnar Cc: Andrew Morton, rjw, linux-kernel, torvalds, Guillaume Chazarain On Tue, 11 Dec 2007 10:01:20 +0100 Ingo Molnar <mingo@elte.hu> wrote: > ok, just to make sure we are all synced up. I made 8 patches related to > this problem category (and all the trickle effects). 3 are upstream > already, 5 are pending for v2.6.25. One out of those 5 is an immaterial > cleanup patch - which leaves us 4 patches to sort out. > > So i'd suggest for you to try latest -git - that will tell us whether > udelay() is acceptable on your box right now. Yes, it is (msleep(2000), as said, gives delays between 2 and 2.9s on my box, and drivers are happy). The commit which fixed this (it seems) is fa2dd441df28b9fdfc68f84ae66f1b507cfff0e4. I'll bisect and tell you more in the next days. > i've attached those 4 patches: > > x86-sched_clock-re-scheduler-fix-x86-regression-in-native-sched-clock.patch > x86-cpu-clock-idle-event.patch > sched-printk-recursion-fix.patch > sched-printk-clock-fix.patch > > none of them is _supposed_ to have any effect on udelay(), but the > interactions in this area are weird. No effects here IIRC. > [ note: CONFIG_PRINTK_TIME will be broken and only fixed in v2.6.25, so > use some other time metric for determining mdelay quality. ] > > plus then there's this patch: > > http://lkml.org/lkml/2007/12/7/100 > > is it perhaps this one that fixed udelay for you? [ which would be much > more expected, as this patch changes udelay ;-) ] Will try it ASAP, again, in the next few days anyway. -- Ciao Stefano ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23 2007-12-11 9:01 ` Ingo Molnar 2007-12-11 21:10 ` Stefano Brivio @ 2007-12-19 0:58 ` Stefano Brivio 1 sibling, 0 replies; 74+ messages in thread From: Stefano Brivio @ 2007-12-19 0:58 UTC (permalink / raw) To: Ingo Molnar Cc: Andrew Morton, rjw, linux-kernel, torvalds, Guillaume Chazarain On Tue, 11 Dec 2007 10:01:20 +0100 Ingo Molnar <mingo@elte.hu> wrote: > ok, just to make sure we are all synced up. I made 8 patches related to > this problem category (and all the trickle effects). 3 are upstream > already, 5 are pending for v2.6.25. One out of those 5 is an immaterial > cleanup patch - which leaves us 4 patches to sort out. > > So i'd suggest for you to try latest -git - that will tell us whether > udelay() is acceptable on your box right now. > > i've attached those 4 patches: > > x86-sched_clock-re-scheduler-fix-x86-regression-in-native-sched-clock.patch > x86-cpu-clock-idle-event.patch > sched-printk-recursion-fix.patch > sched-printk-clock-fix.patch > > none of them is _supposed_ to have any effect on udelay(), but the > interactions in this area are weird. Exactly, none of them have any effect on udelay(). > [ note: CONFIG_PRINTK_TIME will be broken and only fixed in v2.6.25, so > use some other time metric for determining mdelay quality. ] > > plus then there's this patch: > > http://lkml.org/lkml/2007/12/7/100 > > is it perhaps this one that fixed udelay for you? [ which would be much > more expected, as this patch changes udelay ;-) ] Yes, this one did. mdelay(2000) still gives delays between 2 and 2.9s, which is acceptable. I have marked the regression as CODE_FIX. -- Ciao Stefano ^ permalink raw reply [flat|nested] 74+ messages in thread
end of thread, other threads:[~2007-12-20 17:41 UTC | newest] Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <fa.qQhD8aJpiTOFaZqjRYwoaG7YT1c@ifi.uio.no> [not found] ` <fa.yaP86AixGhz5Q7eXSu04pIQp6ho@ifi.uio.no> [not found] ` <fa.c3k8VKWAx4HIo9zXWbL5Ek0oSBw@ifi.uio.no> [not found] ` <fa.C8ACnOhs8bXB++vmugf5F34JcJg@ifi.uio.no> [not found] ` <fa.5o6E6S0UWnARbQPxLe30TvLQIiY@ifi.uio.no> 2007-12-08 18:24 ` 2.6.24-rc4-git5: Reported regressions from 2.6.23 Robert Hancock 2007-12-09 5:59 ` Tejun Heo 2007-12-09 21:36 ` Andreas Mohr 2007-12-10 0:04 ` Andreas Mohr 2007-12-10 0:49 ` Andreas Mohr 2007-12-10 1:28 ` Robert Hancock 2007-12-10 2:25 ` Tejun Heo 2007-12-10 3:20 ` Robert Hancock 2007-12-10 2:20 ` Tejun Heo 2007-12-08 2:40 Rafael J. Wysocki 2007-12-08 6:53 ` Fabio Comolli 2007-12-08 8:28 ` Ingo Molnar 2007-12-08 9:23 ` Andrew Morton 2007-12-08 22:11 ` Rafael J. Wysocki 2007-12-08 9:29 ` Andrew Morton 2007-12-08 22:17 ` Rafael J. Wysocki 2007-12-08 9:36 ` Andrew Morton 2007-12-08 10:12 ` Andreas Mohr 2007-12-08 10:20 ` Andrew Morton 2007-12-08 10:28 ` Matthew Garrett 2007-12-08 10:55 ` Andreas Mohr 2007-12-09 15:46 ` Tejun Heo 2007-12-09 19:59 ` Andreas Mohr 2007-12-09 6:52 ` Tejun Heo 2007-12-09 14:20 ` Rafael J. Wysocki 2007-12-09 15:11 ` Tejun Heo 2007-12-08 9:42 ` Andrew Morton 2007-12-08 18:57 ` Roland Dreier 2007-12-08 19:40 ` Theodore Tso 2007-12-08 19:55 ` Ingo Molnar 2007-12-08 22:30 ` Rafael J. Wysocki 2007-12-09 2:15 ` Theodore Tso 2007-12-13 10:49 ` Takashi Iwai 2007-12-20 15:42 ` Takashi Iwai 2007-12-08 9:46 ` Andrew Morton 2007-12-08 15:49 ` Alan Stern 2007-12-08 9:52 ` Andrew Morton 2007-12-09 7:00 ` Tejun Heo 2007-12-09 13:42 ` Alan Cox 2007-12-09 15:09 ` Tejun Heo 2007-12-09 15:25 ` Alan Cox 2007-12-09 15:39 ` Tejun Heo 2007-12-09 18:36 ` Linus Torvalds 2007-12-09 21:54 ` Alan Cox 2007-12-09 18:41 ` Linus Torvalds 2007-12-09 22:01 ` Alan Cox 2007-12-09 22:51 ` Ray Lee 2007-12-10 1:57 ` Linus Torvalds 2007-12-10 3:28 ` Alan Cox 2007-12-10 3:38 ` Alan Cox 2007-12-10 15:38 ` Linus Torvalds 2007-12-10 8:21 ` Ingo Molnar 2007-12-10 8:27 ` Tejun Heo 2007-12-10 8:41 ` Ingo Molnar 2007-12-08 10:44 ` Richard Purdie 2007-12-08 22:32 ` Rafael J. Wysocki 2007-12-09 11:54 ` Andrew Morton 2007-12-09 12:05 ` Ingo Molnar 2007-12-09 14:24 ` Rafael J. Wysocki 2007-12-10 20:42 ` Ingo Molnar 2007-12-10 20:57 ` Guillaume Chazarain 2007-12-10 20:59 ` Andrew Morton 2007-12-10 22:45 ` Ingo Molnar 2007-12-10 23:04 ` Ingo Molnar 2007-12-10 23:34 ` Stefano Brivio 2007-12-10 23:53 ` Guillaume Chazarain 2007-12-11 8:48 ` Ingo Molnar 2007-12-10 23:56 ` Arjan van de Ven 2007-12-11 0:01 ` Guillaume Chazarain 2007-12-11 1:06 ` Arjan van de Ven 2007-12-11 8:43 ` Ingo Molnar 2007-12-11 9:01 ` Ingo Molnar 2007-12-11 21:10 ` Stefano Brivio 2007-12-19 0:58 ` Stefano Brivio
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).