All of lore.kernel.org
 help / color / mirror / Atom feed
* 4.1-rc6: ATA link is slow to respond, please be patient
@ 2015-08-07 10:29 Christian Kujau
  2015-08-07 12:09 ` Denis Kirjanov
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Christian Kujau @ 2015-08-07 10:29 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-kernel

Hi,

this PowerBook G4 was running 3.16 for a while but now I wanted to upgrade 
to latest mainline. However, during bootup the following happens:

===============================
[    2.237102] ata1: PATA max UDMA/100 irq 39
[    2.401708] ata1.00: ATA-8: SAMSUNG HM061GC, LR100-10, max UDMA/100
[    2.401764] ata1.00: 117231408 sectors, multi 16: LBA48 
[    2.417633] ata1.00: configured for UDMA/100
[   44.918102] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[   44.920452] ata1.00: failed command: READ DMA
[   44.922725] ata1.00: cmd c8/00:88:64:c2:12/00:00:00:00:00/e0 tag 0 dma 69632 in
[   44.927257] ata1.00: status: { DRDY }
[   49.971784] ata1.00: qc timeout (cmd 0xec)
[   49.976529] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   49.978908] ata1.00: revalidation failed (errno=-5)
[   55.019662] ata1: link is slow to respond, please be patient (ready=0)
[   60.007677] ata1: device not ready (errno=-16), forcing hardreset
[   60.012670] ata1: soft resetting link
[   60.193638] ata1.00: configured for UDMA/100
[   60.196158] ata1.00: device reported invalid CHS sector 0
[   60.198610] ata1: EH complete
===============================

This happens only once, but systemd thinks there's a hard problem and will 
drop to a recovery shell. I can start sshd and login remotely and then the 
system appears to be running just fine.

This happened in 4.2.0-rc5 so I went back a few versions and found that
4.1-rc5 was OK (the error does not show up and the system boots just fine)
and 4.1-rc6 is not.

Unfortunately a git-bisect between these two versions went completly off 
the charts, I don't know what happened here:

==================================
first bad commit:

0fa372b6c95013af1334b3d5c9b5f03a70ecedab is the first bad commit
commit 0fa372b6c95013af1334b3d5c9b5f03a70ecedab
Author: Takashi Iwai <tiwai@suse.de>
Date:   Wed May 27 16:17:19 2015 +0200

    ALSA: hda - Fix noise on AMD radeon 290x controller
==================================

I don't have this driver (or ALSA) even selected. I can reproduce this 
error pretty reliably and I'd like to attempt another git-bisect
run when I'm more awake. But maybe somebody recognizes this error and
has a hint where this could come from?

dmesg & .config:  http://nerdbynature.de/bits/v4.1-rc6/

Thanks,
Christian.
-- 
BOFH excuse #225:

It's those computer people in X {city of world}.  They keep stuffing things up.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 4.1-rc6: ATA link is slow to respond, please be patient
  2015-08-07 10:29 4.1-rc6: ATA link is slow to respond, please be patient Christian Kujau
@ 2015-08-07 12:09 ` Denis Kirjanov
  2015-08-08  8:57 ` Denis Kirjanov
  2015-08-09  4:17   ` Christian Kujau
  2 siblings, 0 replies; 10+ messages in thread
From: Denis Kirjanov @ 2015-08-07 12:09 UTC (permalink / raw)
  To: Christian Kujau; +Cc: linuxppc-dev, linux-kernel

On 8/7/15, Christian Kujau <lists@nerdbynature.de> wrote:
> Hi,
>
> this PowerBook G4 was running 3.16 for a while but now I wanted to upgrade
> to latest mainline. However, during bootup the following happens:
>
> ===============================
> [    2.237102] ata1: PATA max UDMA/100 irq 39
> [    2.401708] ata1.00: ATA-8: SAMSUNG HM061GC, LR100-10, max UDMA/100
> [    2.401764] ata1.00: 117231408 sectors, multi 16: LBA48
> [    2.417633] ata1.00: configured for UDMA/100
> [   44.918102] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [   44.920452] ata1.00: failed command: READ DMA
> [   44.922725] ata1.00: cmd c8/00:88:64:c2:12/00:00:00:00:00/e0 tag 0 dma
> 69632 in
> [   44.927257] ata1.00: status: { DRDY }
> [   49.971784] ata1.00: qc timeout (cmd 0xec)
> [   49.976529] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [   49.978908] ata1.00: revalidation failed (errno=-5)
> [   55.019662] ata1: link is slow to respond, please be patient (ready=0)
> [   60.007677] ata1: device not ready (errno=-16), forcing hardreset
> [   60.012670] ata1: soft resetting link
> [   60.193638] ata1.00: configured for UDMA/100
> [   60.196158] ata1.00: device reported invalid CHS sector 0
> [   60.198610] ata1: EH complete
> ===============================

Interesting, I'll try to reproduce it on my G4.

>
> This happens only once, but systemd thinks there's a hard problem and will
> drop to a recovery shell. I can start sshd and login remotely and then the
> system appears to be running just fine.
>
> This happened in 4.2.0-rc5 so I went back a few versions and found that
> 4.1-rc5 was OK (the error does not show up and the system boots just fine)
> and 4.1-rc6 is not.
>
> Unfortunately a git-bisect between these two versions went completly off
> the charts, I don't know what happened here:
>
> ==================================
> first bad commit:
>
> 0fa372b6c95013af1334b3d5c9b5f03a70ecedab is the first bad commit
> commit 0fa372b6c95013af1334b3d5c9b5f03a70ecedab
> Author: Takashi Iwai <tiwai@suse.de>
> Date:   Wed May 27 16:17:19 2015 +0200
>
>     ALSA: hda - Fix noise on AMD radeon 290x controller
> ==================================
>
> I don't have this driver (or ALSA) even selected. I can reproduce this
> error pretty reliably and I'd like to attempt another git-bisect
> run when I'm more awake. But maybe somebody recognizes this error and
> has a hint where this could come from?
>
> dmesg & .config:  http://nerdbynature.de/bits/v4.1-rc6/
>
> Thanks,
> Christian.
> --
> BOFH excuse #225:
>
> It's those computer people in X {city of world}.  They keep stuffing things
> up.
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 4.1-rc6: ATA link is slow to respond, please be patient
  2015-08-07 10:29 4.1-rc6: ATA link is slow to respond, please be patient Christian Kujau
  2015-08-07 12:09 ` Denis Kirjanov
@ 2015-08-08  8:57 ` Denis Kirjanov
  2015-08-08 21:34     ` Christian Kujau
  2015-08-09  4:17   ` Christian Kujau
  2 siblings, 1 reply; 10+ messages in thread
From: Denis Kirjanov @ 2015-08-08  8:57 UTC (permalink / raw)
  To: Christian Kujau; +Cc: linuxppc-dev, linux-kernel

On 8/7/15, Christian Kujau <lists@nerdbynature.de> wrote:
> Hi,
>
> this PowerBook G4 was running 3.16 for a while but now I wanted to upgrade
> to latest mainline. However, during bootup the following happens:
>
> ===============================
> [    2.237102] ata1: PATA max UDMA/100 irq 39
> [    2.401708] ata1.00: ATA-8: SAMSUNG HM061GC, LR100-10, max UDMA/100
> [    2.401764] ata1.00: 117231408 sectors, multi 16: LBA48
> [    2.417633] ata1.00: configured for UDMA/100
> [   44.918102] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [   44.920452] ata1.00: failed command: READ DMA
> [   44.922725] ata1.00: cmd c8/00:88:64:c2:12/00:00:00:00:00/e0 tag 0 dma
> 69632 in
> [   44.927257] ata1.00: status: { DRDY }
> [   49.971784] ata1.00: qc timeout (cmd 0xec)
> [   49.976529] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [   49.978908] ata1.00: revalidation failed (errno=-5)
> [   55.019662] ata1: link is slow to respond, please be patient (ready=0)
> [   60.007677] ata1: device not ready (errno=-16), forcing hardreset
> [   60.012670] ata1: soft resetting link
> [   60.193638] ata1.00: configured for UDMA/100
> [   60.196158] ata1.00: device reported invalid CHS sector 0
> [   60.198610] ata1: EH complete
> ===============================

Just tried 4.2.0-rc5+ and haven't hit the issue.

[   17.180034] pata-pci-macio 0002:20:0d.0: enabling device (0000 -> 0002)
[   17.185862] adb: starting probe task...
[   17.196011] pata-pci-macio 0002:20:0d.0: Activating pata-macio
chipset UniNorth ATA-6, Apple bus ID 3
[   17.202312] scsi host0: pata_macio
[   17.203698] ata1: PATA max UDMA/100 irq 39
[   17.219397] adb devices: [2]: 2 c4 [7]: 7 1f
[   17.225400] ADB keyboard at 2, handler 1
[   17.225560] Detected ADB keyboard, type ISO, swapping keys.
[   17.226642] input: ADB keyboard as /devices/virtual/input/input0
[   17.227590] input: ADB Powerbook buttons as /devices/virtual/input/input1
[   17.227795] adb: finished probe task...
[   17.368537] ata1.00: ATA-6: TOSHIBA MK8026GAX, PA005B, max UDMA/100
[   17.368717] ata1.00: 156301488 sectors, multi 16: LBA48
[   17.376346] ata1.00: configured for UDMA/100
[   17.377544] scsi 0:0:0:0: Direct-Access     ATA      TOSHIBA
MK8026GA 5B   PQ: 0 ANSI: 5
[   17.386989] sd 0:0:0:0: [sda] 156301488 512-byte logical blocks:
(80.0 GB/74.5 GiB)
[   17.393144] sd 0:0:0:0: [sda] Write Protect is off
[   17.397579] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[   17.398215] sd 0:0:0:0: Attached scsi generic sg0 type 0
[   17.404124] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[   17.661225]  sda: [mac] sda1 sda2 sda3 sda4
[   17.672937] sd 0:0:0:0: [sda] Attached SCSI disk
[   18.223985] pata-macio 0.00020000:ata-3: Activating pata-macio
chipset KeyLargo ATA-3, Apple bus ID 0
[   18.233397] scsi host1: pata_macio
[   18.239172] ata2: PATA max MWDMA2 irq 24


>
> This happens only once, but systemd thinks there's a hard problem and will
> drop to a recovery shell. I can start sshd and login remotely and then the
> system appears to be running just fine.
>
> This happened in 4.2.0-rc5 so I went back a few versions and found that
> 4.1-rc5 was OK (the error does not show up and the system boots just fine)
> and 4.1-rc6 is not.
>
> Unfortunately a git-bisect between these two versions went completly off
> the charts, I don't know what happened here:
>
> ==================================
> first bad commit:
>
> 0fa372b6c95013af1334b3d5c9b5f03a70ecedab is the first bad commit
> commit 0fa372b6c95013af1334b3d5c9b5f03a70ecedab
> Author: Takashi Iwai <tiwai@suse.de>
> Date:   Wed May 27 16:17:19 2015 +0200
>
>     ALSA: hda - Fix noise on AMD radeon 290x controller
> ==================================
>
> I don't have this driver (or ALSA) even selected. I can reproduce this
> error pretty reliably and I'd like to attempt another git-bisect
> run when I'm more awake. But maybe somebody recognizes this error and
> has a hint where this could come from?
>
> dmesg & .config:  http://nerdbynature.de/bits/v4.1-rc6/
>
> Thanks,
> Christian.
> --
> BOFH excuse #225:
>
> It's those computer people in X {city of world}.  They keep stuffing things
> up.
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 4.1-rc6: ATA link is slow to respond, please be patient
  2015-08-08  8:57 ` Denis Kirjanov
@ 2015-08-08 21:34     ` Christian Kujau
  0 siblings, 0 replies; 10+ messages in thread
From: Christian Kujau @ 2015-08-08 21:34 UTC (permalink / raw)
  To: Denis Kirjanov; +Cc: linuxppc-dev, linux-kernel

On August 8, 2015 1:57:05 AM PDT, Denis Kirjanov <kda@linux-powerpc.org> wrote:
>On 8/7/15, Christian Kujau <lists@nerdbynature.de> wrote:
>> Hi,
>>
>> this PowerBook G4 was running 3.16 for a while but now I wanted to
>upgrade
>> to latest mainline. However, during bootup the following happens:
>>
>> ===============================
>> [    2.237102] ata1: PATA max UDMA/100 irq 39
>> [    2.401708] ata1.00: ATA-8: SAMSUNG HM061GC, LR100-10, max
>UDMA/100
>> [    2.401764] ata1.00: 117231408 sectors, multi 16: LBA48
>> [    2.417633] ata1.00: configured for UDMA/100
>> [   44.918102] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action
>0x0
>> [   44.920452] ata1.00: failed command: READ DMA
>> [   44.922725] ata1.00: cmd c8/00:88:64:c2:12/00:00:00:00:00/e0 tag 0
>dma
>> 69632 in
>> [   44.927257] ata1.00: status: { DRDY }
>> [   49.971784] ata1.00: qc timeout (cmd 0xec)
>> [   49.976529] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [   49.978908] ata1.00: revalidation failed (errno=-5)
>> [   55.019662] ata1: link is slow to respond, please be patient
>(ready=0)
>> [   60.007677] ata1: device not ready (errno=-16), forcing hardreset
>> [   60.012670] ata1: soft resetting link
>> [   60.193638] ata1.00: configured for UDMA/100
>> [   60.196158] ata1.00: device reported invalid CHS sector 0
>> [   60.198610] ata1: EH complete
>> ===============================
>
>Just tried 4.2.0-rc5+ and haven't hit the issue.
>
>[   17.180034] pata-pci-macio 0002:20:0d.0: enabling device (0000 ->
>0002)
>[   17.185862] adb: starting probe task...
>[   17.196011] pata-pci-macio 0002:20:0d.0: Activating pata-macio
>chipset UniNorth ATA-6, Apple bus ID 3
>[   17.202312] scsi host0: pata_macio
>[   17.203698] ata1: PATA max UDMA/100 irq 39
>[   17.219397] adb devices: [2]: 2 c4 [7]: 7 1f
>[   17.225400] ADB keyboard at 2, handler 1
>[   17.225560] Detected ADB keyboard, type ISO, swapping keys.
>[   17.226642] input: ADB keyboard as /devices/virtual/input/input0
>[   17.227590] input: ADB Powerbook buttons as
>/devices/virtual/input/input1
>[   17.227795] adb: finished probe task...
>[   17.368537] ata1.00: ATA-6: TOSHIBA MK8026GAX, PA005B, max UDMA/100
>[   17.368717] ata1.00: 156301488 sectors, multi 16: LBA48
>[   17.376346] ata1.00: configured for UDMA/100
>[   17.377544] scsi 0:0:0:0: Direct-Access     ATA      TOSHIBA
>MK8026GA 5B   PQ: 0 ANSI: 5
>[   17.386989] sd 0:0:0:0: [sda] 156301488 512-byte logical blocks:
>(80.0 GB/74.5 GiB)
>[   17.393144] sd 0:0:0:0: [sda] Write Protect is off
>[   17.397579] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
>[   17.398215] sd 0:0:0:0: Attached scsi generic sg0 type 0
>[   17.404124] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
>enabled, doesn't support DPO or FUA
>[   17.661225]  sda: [mac] sda1 sda2 sda3 sda4
>[   17.672937] sd 0:0:0:0: [sda] Attached SCSI disk
>[   18.223985] pata-macio 0.00020000:ata-3: Activating pata-macio
>chipset KeyLargo ATA-3, Apple bus ID 0
>[   18.233397] scsi host1: pata_macio
>[   18.239172] ata2: PATA max MWDMA2 irq 24
>
>
>>
>> This happens only once, but systemd thinks there's a hard problem and
>will
>> drop to a recovery shell. I can start sshd and login remotely and
>then the
>> system appears to be running just fine.
>>
>> This happened in 4.2.0-rc5 so I went back a few versions and found
>that
>> 4.1-rc5 was OK (the error does not show up and the system boots just
>fine)
>> and 4.1-rc6 is not.
>>
>> Unfortunately a git-bisect between these two versions went completly
>off
>> the charts, I don't know what happened here:
>>
>> ==================================
>> first bad commit:
>>
>> 0fa372b6c95013af1334b3d5c9b5f03a70ecedab is the first bad commit
>> commit 0fa372b6c95013af1334b3d5c9b5f03a70ecedab
>> Author: Takashi Iwai <tiwai@suse.de>
>> Date:   Wed May 27 16:17:19 2015 +0200
>>
>>     ALSA: hda - Fix noise on AMD radeon 290x controller
>> ==================================
>>
>> I don't have this driver (or ALSA) even selected. I can reproduce
>this
>> error pretty reliably and I'd like to attempt another git-bisect
>> run when I'm more awake. But maybe somebody recognizes this error and
>> has a hint where this could come from?
>>
>> dmesg & .config:  http://nerdbynature.de/bits/v4.1-rc6/
>>
>> Thanks,
>> Christian.
>> --
>> BOFH excuse #225:
>>
>> It's those computer people in X {city of world}.  They keep stuffing
>things
>> up.
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev

Can you send me your .config or did you use my .config, verbatim?

I'll try another git-bisect later today.

Thanks,
Christian.
-- 
make bzImage, not war

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 4.1-rc6: ATA link is slow to respond, please be patient
@ 2015-08-08 21:34     ` Christian Kujau
  0 siblings, 0 replies; 10+ messages in thread
From: Christian Kujau @ 2015-08-08 21:34 UTC (permalink / raw)
  To: Denis Kirjanov; +Cc: linuxppc-dev, linux-kernel

On August 8, 2015 1:57:05 AM PDT, Denis Kirjanov <kda@linux-powerpc.org> wrote:
>On 8/7/15, Christian Kujau <lists@nerdbynature.de> wrote:
>> Hi,
>>
>> this PowerBook G4 was running 3.16 for a while but now I wanted to
>upgrade
>> to latest mainline. However, during bootup the following happens:
>>
>> ===============================
>> [    2.237102] ata1: PATA max UDMA/100 irq 39
>> [    2.401708] ata1.00: ATA-8: SAMSUNG HM061GC, LR100-10, max
>UDMA/100
>> [    2.401764] ata1.00: 117231408 sectors, multi 16: LBA48
>> [    2.417633] ata1.00: configured for UDMA/100
>> [   44.918102] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action
>0x0
>> [   44.920452] ata1.00: failed command: READ DMA
>> [   44.922725] ata1.00: cmd c8/00:88:64:c2:12/00:00:00:00:00/e0 tag 0
>dma
>> 69632 in
>> [   44.927257] ata1.00: status: { DRDY }
>> [   49.971784] ata1.00: qc timeout (cmd 0xec)
>> [   49.976529] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [   49.978908] ata1.00: revalidation failed (errno=-5)
>> [   55.019662] ata1: link is slow to respond, please be patient
>(ready=0)
>> [   60.007677] ata1: device not ready (errno=-16), forcing hardreset
>> [   60.012670] ata1: soft resetting link
>> [   60.193638] ata1.00: configured for UDMA/100
>> [   60.196158] ata1.00: device reported invalid CHS sector 0
>> [   60.198610] ata1: EH complete
>> ===============================
>
>Just tried 4.2.0-rc5+ and haven't hit the issue.
>
>[   17.180034] pata-pci-macio 0002:20:0d.0: enabling device (0000 ->
>0002)
>[   17.185862] adb: starting probe task...
>[   17.196011] pata-pci-macio 0002:20:0d.0: Activating pata-macio
>chipset UniNorth ATA-6, Apple bus ID 3
>[   17.202312] scsi host0: pata_macio
>[   17.203698] ata1: PATA max UDMA/100 irq 39
>[   17.219397] adb devices: [2]: 2 c4 [7]: 7 1f
>[   17.225400] ADB keyboard at 2, handler 1
>[   17.225560] Detected ADB keyboard, type ISO, swapping keys.
>[   17.226642] input: ADB keyboard as /devices/virtual/input/input0
>[   17.227590] input: ADB Powerbook buttons as
>/devices/virtual/input/input1
>[   17.227795] adb: finished probe task...
>[   17.368537] ata1.00: ATA-6: TOSHIBA MK8026GAX, PA005B, max UDMA/100
>[   17.368717] ata1.00: 156301488 sectors, multi 16: LBA48
>[   17.376346] ata1.00: configured for UDMA/100
>[   17.377544] scsi 0:0:0:0: Direct-Access     ATA      TOSHIBA
>MK8026GA 5B   PQ: 0 ANSI: 5
>[   17.386989] sd 0:0:0:0: [sda] 156301488 512-byte logical blocks:
>(80.0 GB/74.5 GiB)
>[   17.393144] sd 0:0:0:0: [sda] Write Protect is off
>[   17.397579] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
>[   17.398215] sd 0:0:0:0: Attached scsi generic sg0 type 0
>[   17.404124] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
>enabled, doesn't support DPO or FUA
>[   17.661225]  sda: [mac] sda1 sda2 sda3 sda4
>[   17.672937] sd 0:0:0:0: [sda] Attached SCSI disk
>[   18.223985] pata-macio 0.00020000:ata-3: Activating pata-macio
>chipset KeyLargo ATA-3, Apple bus ID 0
>[   18.233397] scsi host1: pata_macio
>[   18.239172] ata2: PATA max MWDMA2 irq 24
>
>
>>
>> This happens only once, but systemd thinks there's a hard problem and
>will
>> drop to a recovery shell. I can start sshd and login remotely and
>then the
>> system appears to be running just fine.
>>
>> This happened in 4.2.0-rc5 so I went back a few versions and found
>that
>> 4.1-rc5 was OK (the error does not show up and the system boots just
>fine)
>> and 4.1-rc6 is not.
>>
>> Unfortunately a git-bisect between these two versions went completly
>off
>> the charts, I don't know what happened here:
>>
>> ==================================
>> first bad commit:
>>
>> 0fa372b6c95013af1334b3d5c9b5f03a70ecedab is the first bad commit
>> commit 0fa372b6c95013af1334b3d5c9b5f03a70ecedab
>> Author: Takashi Iwai <tiwai@suse.de>
>> Date:   Wed May 27 16:17:19 2015 +0200
>>
>>     ALSA: hda - Fix noise on AMD radeon 290x controller
>> ==================================
>>
>> I don't have this driver (or ALSA) even selected. I can reproduce
>this
>> error pretty reliably and I'd like to attempt another git-bisect
>> run when I'm more awake. But maybe somebody recognizes this error and
>> has a hint where this could come from?
>>
>> dmesg & .config:  http://nerdbynature.de/bits/v4.1-rc6/
>>
>> Thanks,
>> Christian.
>> --
>> BOFH excuse #225:
>>
>> It's those computer people in X {city of world}.  They keep stuffing
>things
>> up.
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev

Can you send me your .config or did you use my .config, verbatim?

I'll try another git-bisect later today.

Thanks,
Christian.
-- 
make bzImage, not war

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 4.1-rc6: ATA link is slow to respond, please be patient
  2015-08-07 10:29 4.1-rc6: ATA link is slow to respond, please be patient Christian Kujau
@ 2015-08-09  4:17   ` Christian Kujau
  2015-08-08  8:57 ` Denis Kirjanov
  2015-08-09  4:17   ` Christian Kujau
  2 siblings, 0 replies; 10+ messages in thread
From: Christian Kujau @ 2015-08-09  4:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-ide, linux-kernel

[Adding linux-ide@vger.kernel.org]

On Fri, 7 Aug 2015, Christian Kujau wrote:
> this PowerBook G4 was running 3.16 for a while but now I wanted to upgrade 
> to latest mainline. However, during bootup the following happens:
> 
> ===============================
> [    2.237102] ata1: PATA max UDMA/100 irq 39
> [    2.401708] ata1.00: ATA-8: SAMSUNG HM061GC, LR100-10, max UDMA/100
> [    2.401764] ata1.00: 117231408 sectors, multi 16: LBA48 
> [    2.417633] ata1.00: configured for UDMA/100
> [   44.918102] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [   44.920452] ata1.00: failed command: READ DMA
> [   44.922725] ata1.00: cmd c8/00:88:64:c2:12/00:00:00:00:00/e0 tag 0 dma 69632 in
> [   44.927257] ata1.00: status: { DRDY }
> [   49.971784] ata1.00: qc timeout (cmd 0xec)
> [   49.976529] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [   49.978908] ata1.00: revalidation failed (errno=-5)
> [   55.019662] ata1: link is slow to respond, please be patient (ready=0)
> [   60.007677] ata1: device not ready (errno=-16), forcing hardreset
> [   60.012670] ata1: soft resetting link
> [   60.193638] ata1.00: configured for UDMA/100
> [   60.196158] ata1.00: device reported invalid CHS sector 0
> [   60.198610] ata1: EH complete
> ===============================
> 
> This happens only once, but systemd thinks there's a hard problem and will 
> drop to a recovery shell. I can start sshd and login remotely and then the 
> system appears to be running just fine.
> 
> This happened in 4.2.0-rc5 so I went back a few versions and found that
> 4.1-rc5 was OK (the error does not show up and the system boots just fine)
> and 4.1-rc6 is not.
> 

After more digging around I noticed that the same error (with 
changed wording) happens with a Debian 3.16.0-4-powerpc kernel - so it
doesn't appear to be a recent regression as I suspected at first:

==================================
[   46.907147] ata1: drained 572 bytes to clear DRQ
[   46.907166] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[   46.908419] ata1.00: failed command: READ DMA
[   46.909058] ata1.00: cmd c8/00:80:9c:f9:60/00:00:00:00:00/e0 tag 0 dma 65536 in
         res 40/00:fe:00:00:00/00:00:00:00:00/40 Emask 0x20 (host bus error)
[   46.910303] ata1.00: status: { DRDY }
[   46.970579] ata1.00: configured for UDMA/100
[   46.971853] ata1.00: device reported invalid CHS sector 0
[   46.972524] ata1: EH complete
==================================

Also, the error cannot repduced as reliably as I thought: sometimes, the 
machine just boots w/o a hitch - and that might be the reasons why my 
bisect attempts failed and incorrectly blamed totally unrelated commits: 
after each "git bisect {good,bad}" (+compiling) I rebooted but there was a 
chance that the system came up just fine / showed the same ATA error and 
thus falsified the git-bisect results.

I noticed that with this Debian 3.16 kernel, it happens less often when I 
use the "irqpoll" option. But with 4.2-rc5 this doesn't seem to help much, 
the system still hangs during boot but continues after the "EH complete " 
message. And it doesn't appear afterwards, I can read from my root disk 
just fine and a long SMART check also comes back fine.

Because the error only appears to happen on the very first access after a 
reboot, I tried to boot with rootdelay=30 - but of course then it just 
waits "before" accessing the root disk. I'd need a magic option to wait a 
few seconds "after" the first disk access, so that the boot framework 
("systemd") won't be thrown off when /dev/sda isn't responding as fast as 
expected.

What _does_ seem to help a bit was to disable the the swap device, which 
is configured as an encrypted dm-device here - and systemd was almost 
always stumbling over this particular service during bootup. Because of 
the ATA timeout, the dm-device could not be setup correctly and systemd 
would bail out and drop me into a recovery shell. Without the swap device, 
systemd would skip setting up swap and boot just fine (most of the time) 
and I can setup swap once the system has been booted. So...there's that.

It's still a mystery to me why /dev/sda is only behaving weird on its 
first access.

I've cc'ed linux-ide, maybe somebody has an idea on that?

dmesg & .config:  http://nerdbynature.de/bits/v4.1-rc6/

Thanks,
Christian.


> Unfortunately a git-bisect between these two versions went completly off 
> the charts, I don't know what happened here:
> ==================================
> first bad commit:
> 
> 0fa372b6c95013af1334b3d5c9b5f03a70ecedab is the first bad commit
> commit 0fa372b6c95013af1334b3d5c9b5f03a70ecedab
> Author: Takashi Iwai <tiwai@suse.de>
> Date:   Wed May 27 16:17:19 2015 +0200
> 
>     ALSA: hda - Fix noise on AMD radeon 290x controller
> ==================================
> 
> I don't have this driver (or ALSA) even selected. I can reproduce this 
> error pretty reliably and I'd like to attempt another git-bisect
> run when I'm more awake. But maybe somebody recognizes this error and
> has a hint where this could come from?
> 
> dmesg & .config:  http://nerdbynature.de/bits/v4.1-rc6/
> 
> Thanks,
> Christian.
> -- 
> BOFH excuse #225:
> 
> It's those computer people in X {city of world}.  They keep stuffing things up.
> 

-- 
BOFH excuse #263:

It's stuck in the Web.
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 4.1-rc6: ATA link is slow to respond, please be patient
@ 2015-08-09  4:17   ` Christian Kujau
  0 siblings, 0 replies; 10+ messages in thread
From: Christian Kujau @ 2015-08-09  4:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-kernel, linux-ide

[Adding linux-ide@vger.kernel.org]

On Fri, 7 Aug 2015, Christian Kujau wrote:
> this PowerBook G4 was running 3.16 for a while but now I wanted to upgrade 
> to latest mainline. However, during bootup the following happens:
> 
> ===============================
> [    2.237102] ata1: PATA max UDMA/100 irq 39
> [    2.401708] ata1.00: ATA-8: SAMSUNG HM061GC, LR100-10, max UDMA/100
> [    2.401764] ata1.00: 117231408 sectors, multi 16: LBA48 
> [    2.417633] ata1.00: configured for UDMA/100
> [   44.918102] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [   44.920452] ata1.00: failed command: READ DMA
> [   44.922725] ata1.00: cmd c8/00:88:64:c2:12/00:00:00:00:00/e0 tag 0 dma 69632 in
> [   44.927257] ata1.00: status: { DRDY }
> [   49.971784] ata1.00: qc timeout (cmd 0xec)
> [   49.976529] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [   49.978908] ata1.00: revalidation failed (errno=-5)
> [   55.019662] ata1: link is slow to respond, please be patient (ready=0)
> [   60.007677] ata1: device not ready (errno=-16), forcing hardreset
> [   60.012670] ata1: soft resetting link
> [   60.193638] ata1.00: configured for UDMA/100
> [   60.196158] ata1.00: device reported invalid CHS sector 0
> [   60.198610] ata1: EH complete
> ===============================
> 
> This happens only once, but systemd thinks there's a hard problem and will 
> drop to a recovery shell. I can start sshd and login remotely and then the 
> system appears to be running just fine.
> 
> This happened in 4.2.0-rc5 so I went back a few versions and found that
> 4.1-rc5 was OK (the error does not show up and the system boots just fine)
> and 4.1-rc6 is not.
> 

After more digging around I noticed that the same error (with 
changed wording) happens with a Debian 3.16.0-4-powerpc kernel - so it
doesn't appear to be a recent regression as I suspected at first:

==================================
[   46.907147] ata1: drained 572 bytes to clear DRQ
[   46.907166] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[   46.908419] ata1.00: failed command: READ DMA
[   46.909058] ata1.00: cmd c8/00:80:9c:f9:60/00:00:00:00:00/e0 tag 0 dma 65536 in
         res 40/00:fe:00:00:00/00:00:00:00:00/40 Emask 0x20 (host bus error)
[   46.910303] ata1.00: status: { DRDY }
[   46.970579] ata1.00: configured for UDMA/100
[   46.971853] ata1.00: device reported invalid CHS sector 0
[   46.972524] ata1: EH complete
==================================

Also, the error cannot repduced as reliably as I thought: sometimes, the 
machine just boots w/o a hitch - and that might be the reasons why my 
bisect attempts failed and incorrectly blamed totally unrelated commits: 
after each "git bisect {good,bad}" (+compiling) I rebooted but there was a 
chance that the system came up just fine / showed the same ATA error and 
thus falsified the git-bisect results.

I noticed that with this Debian 3.16 kernel, it happens less often when I 
use the "irqpoll" option. But with 4.2-rc5 this doesn't seem to help much, 
the system still hangs during boot but continues after the "EH complete " 
message. And it doesn't appear afterwards, I can read from my root disk 
just fine and a long SMART check also comes back fine.

Because the error only appears to happen on the very first access after a 
reboot, I tried to boot with rootdelay=30 - but of course then it just 
waits "before" accessing the root disk. I'd need a magic option to wait a 
few seconds "after" the first disk access, so that the boot framework 
("systemd") won't be thrown off when /dev/sda isn't responding as fast as 
expected.

What _does_ seem to help a bit was to disable the the swap device, which 
is configured as an encrypted dm-device here - and systemd was almost 
always stumbling over this particular service during bootup. Because of 
the ATA timeout, the dm-device could not be setup correctly and systemd 
would bail out and drop me into a recovery shell. Without the swap device, 
systemd would skip setting up swap and boot just fine (most of the time) 
and I can setup swap once the system has been booted. So...there's that.

It's still a mystery to me why /dev/sda is only behaving weird on its 
first access.

I've cc'ed linux-ide, maybe somebody has an idea on that?

dmesg & .config:  http://nerdbynature.de/bits/v4.1-rc6/

Thanks,
Christian.


> Unfortunately a git-bisect between these two versions went completly off 
> the charts, I don't know what happened here:
> ==================================
> first bad commit:
> 
> 0fa372b6c95013af1334b3d5c9b5f03a70ecedab is the first bad commit
> commit 0fa372b6c95013af1334b3d5c9b5f03a70ecedab
> Author: Takashi Iwai <tiwai@suse.de>
> Date:   Wed May 27 16:17:19 2015 +0200
> 
>     ALSA: hda - Fix noise on AMD radeon 290x controller
> ==================================
> 
> I don't have this driver (or ALSA) even selected. I can reproduce this 
> error pretty reliably and I'd like to attempt another git-bisect
> run when I'm more awake. But maybe somebody recognizes this error and
> has a hint where this could come from?
> 
> dmesg & .config:  http://nerdbynature.de/bits/v4.1-rc6/
> 
> Thanks,
> Christian.
> -- 
> BOFH excuse #225:
> 
> It's those computer people in X {city of world}.  They keep stuffing things up.
> 

-- 
BOFH excuse #263:

It's stuck in the Web.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 4.1-rc6: ATA link is slow to respond, please be patient
  2015-08-09  4:17   ` Christian Kujau
@ 2015-08-09  6:43     ` Christian Kujau
  -1 siblings, 0 replies; 10+ messages in thread
From: Christian Kujau @ 2015-08-09  6:43 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-ide, linux-kernel

On Sat, 8 Aug 2015, Christian Kujau wrote:

> [Adding linux-ide@vger.kernel.org]
> 
> On Fri, 7 Aug 2015, Christian Kujau wrote:
> > this PowerBook G4 was running 3.16 for a while but now I wanted to upgrade 
> > to latest mainline. However, during bootup the following happens:
> > 
> > ===============================
> > [    2.237102] ata1: PATA max UDMA/100 irq 39
> > [    2.401708] ata1.00: ATA-8: SAMSUNG HM061GC, LR100-10, max UDMA/100
> > [    2.401764] ata1.00: 117231408 sectors, multi 16: LBA48 
> > [    2.417633] ata1.00: configured for UDMA/100
> > [   44.918102] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> > [   44.920452] ata1.00: failed command: READ DMA
> > [   44.922725] ata1.00: cmd c8/00:88:64:c2:12/00:00:00:00:00/e0 tag 0 dma 69632 in
> > [   44.927257] ata1.00: status: { DRDY }
> > [   49.971784] ata1.00: qc timeout (cmd 0xec)
> > [   49.976529] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> > [   49.978908] ata1.00: revalidation failed (errno=-5)
> > [   55.019662] ata1: link is slow to respond, please be patient (ready=0)
> > [   60.007677] ata1: device not ready (errno=-16), forcing hardreset
> > [   60.012670] ata1: soft resetting link
> > [   60.193638] ata1.00: configured for UDMA/100
> > [   60.196158] ata1.00: device reported invalid CHS sector 0
> > [   60.198610] ata1: EH complete
> > ===============================
> > 
> > This happens only once, but systemd thinks there's a hard problem and will 
> > drop to a recovery shell. I can start sshd and login remotely and then the 
> > system appears to be running just fine.

I played around with libata* kernel parameters, the only "success" I had 
was with libata.dma=0 - which disables DMA and the system booted without 
the error. But of course the disk throughput was much slower - is there a 
way to enable DMA again once the system is booted? "hdparm -d" would 
return HDIO_SET_DMA, of course[0].

Tried something more drastic and disabled libata completely and enabled 
CONFIG_IDE (and CONFIG_BLK_DEV_IDE_PMAC) again and a similar error appears 
(sometimes) during bootup:

[   39.971392] ide-pmac lost interrupt, dma status: 8480
[   39.972704] hda: lost interrupt
[   39.973951] hda: dma_intr: status=0xd8 { Busy }
[   39.975231] hda: possibly failed opcode: 0x25
[   39.978855] hda: DMA disabled
[   40.019388] ide0: reset: success

But the host seems to recover more quickly and systemd wasn't thrown off 
by the small ATA delay. But DMA got disabled again :-\

Ideas welcome! :-)

Christian.

[0] https://ata.wiki.kernel.org/index.php/Libata_FAQ

> > This happened in 4.2.0-rc5 so I went back a few versions and found that
> > 4.1-rc5 was OK (the error does not show up and the system boots just fine)
> > and 4.1-rc6 is not.
> > 
> 
> After more digging around I noticed that the same error (with 
> changed wording) happens with a Debian 3.16.0-4-powerpc kernel - so it
> doesn't appear to be a recent regression as I suspected at first:
> 
> ==================================
> [   46.907147] ata1: drained 572 bytes to clear DRQ
> [   46.907166] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [   46.908419] ata1.00: failed command: READ DMA
> [   46.909058] ata1.00: cmd c8/00:80:9c:f9:60/00:00:00:00:00/e0 tag 0 dma 65536 in
>          res 40/00:fe:00:00:00/00:00:00:00:00/40 Emask 0x20 (host bus error)
> [   46.910303] ata1.00: status: { DRDY }
> [   46.970579] ata1.00: configured for UDMA/100
> [   46.971853] ata1.00: device reported invalid CHS sector 0
> [   46.972524] ata1: EH complete
> ==================================
> 
> Also, the error cannot repduced as reliably as I thought: sometimes, the 
> machine just boots w/o a hitch - and that might be the reasons why my 
> bisect attempts failed and incorrectly blamed totally unrelated commits: 
> after each "git bisect {good,bad}" (+compiling) I rebooted but there was a 
> chance that the system came up just fine / showed the same ATA error and 
> thus falsified the git-bisect results.
> 
> I noticed that with this Debian 3.16 kernel, it happens less often when I 
> use the "irqpoll" option. But with 4.2-rc5 this doesn't seem to help much, 
> the system still hangs during boot but continues after the "EH complete " 
> message. And it doesn't appear afterwards, I can read from my root disk 
> just fine and a long SMART check also comes back fine.
> 
> Because the error only appears to happen on the very first access after a 
> reboot, I tried to boot with rootdelay=30 - but of course then it just 
> waits "before" accessing the root disk. I'd need a magic option to wait a 
> few seconds "after" the first disk access, so that the boot framework 
> ("systemd") won't be thrown off when /dev/sda isn't responding as fast as 
> expected.
> 
> What _does_ seem to help a bit was to disable the the swap device, which 
> is configured as an encrypted dm-device here - and systemd was almost 
> always stumbling over this particular service during bootup. Because of 
> the ATA timeout, the dm-device could not be setup correctly and systemd 
> would bail out and drop me into a recovery shell. Without the swap device, 
> systemd would skip setting up swap and boot just fine (most of the time) 
> and I can setup swap once the system has been booted. So...there's that.
> 
> It's still a mystery to me why /dev/sda is only behaving weird on its 
> first access.
> 
> I've cc'ed linux-ide, maybe somebody has an idea on that?
> 
> dmesg & .config:  http://nerdbynature.de/bits/v4.1-rc6/
> 
> Thanks,
> Christian.
> 
> 
> > Unfortunately a git-bisect between these two versions went completly off 
> > the charts, I don't know what happened here:
> > ==================================
> > first bad commit:
> > 
> > 0fa372b6c95013af1334b3d5c9b5f03a70ecedab is the first bad commit
> > commit 0fa372b6c95013af1334b3d5c9b5f03a70ecedab
> > Author: Takashi Iwai <tiwai@suse.de>
> > Date:   Wed May 27 16:17:19 2015 +0200
> > 
> >     ALSA: hda - Fix noise on AMD radeon 290x controller
> > ==================================
> > 
> > I don't have this driver (or ALSA) even selected. I can reproduce this 
> > error pretty reliably and I'd like to attempt another git-bisect
> > run when I'm more awake. But maybe somebody recognizes this error and
> > has a hint where this could come from?
> > 
> > dmesg & .config:  http://nerdbynature.de/bits/v4.1-rc6/
> > 
> > Thanks,
> > Christian.
> > -- 
> > BOFH excuse #225:
> > 
> > It's those computer people in X {city of world}.  They keep stuffing things up.
> > 
> 
> -- 
> BOFH excuse #263:
> 
> It's stuck in the Web.
> 

-- 
BOFH excuse #316:

Elves on strike. (Why do they call EMAG Elf Magic)
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 4.1-rc6: ATA link is slow to respond, please be patient
@ 2015-08-09  6:43     ` Christian Kujau
  0 siblings, 0 replies; 10+ messages in thread
From: Christian Kujau @ 2015-08-09  6:43 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-kernel, linux-ide

On Sat, 8 Aug 2015, Christian Kujau wrote:

> [Adding linux-ide@vger.kernel.org]
> 
> On Fri, 7 Aug 2015, Christian Kujau wrote:
> > this PowerBook G4 was running 3.16 for a while but now I wanted to upgrade 
> > to latest mainline. However, during bootup the following happens:
> > 
> > ===============================
> > [    2.237102] ata1: PATA max UDMA/100 irq 39
> > [    2.401708] ata1.00: ATA-8: SAMSUNG HM061GC, LR100-10, max UDMA/100
> > [    2.401764] ata1.00: 117231408 sectors, multi 16: LBA48 
> > [    2.417633] ata1.00: configured for UDMA/100
> > [   44.918102] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> > [   44.920452] ata1.00: failed command: READ DMA
> > [   44.922725] ata1.00: cmd c8/00:88:64:c2:12/00:00:00:00:00/e0 tag 0 dma 69632 in
> > [   44.927257] ata1.00: status: { DRDY }
> > [   49.971784] ata1.00: qc timeout (cmd 0xec)
> > [   49.976529] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> > [   49.978908] ata1.00: revalidation failed (errno=-5)
> > [   55.019662] ata1: link is slow to respond, please be patient (ready=0)
> > [   60.007677] ata1: device not ready (errno=-16), forcing hardreset
> > [   60.012670] ata1: soft resetting link
> > [   60.193638] ata1.00: configured for UDMA/100
> > [   60.196158] ata1.00: device reported invalid CHS sector 0
> > [   60.198610] ata1: EH complete
> > ===============================
> > 
> > This happens only once, but systemd thinks there's a hard problem and will 
> > drop to a recovery shell. I can start sshd and login remotely and then the 
> > system appears to be running just fine.

I played around with libata* kernel parameters, the only "success" I had 
was with libata.dma=0 - which disables DMA and the system booted without 
the error. But of course the disk throughput was much slower - is there a 
way to enable DMA again once the system is booted? "hdparm -d" would 
return HDIO_SET_DMA, of course[0].

Tried something more drastic and disabled libata completely and enabled 
CONFIG_IDE (and CONFIG_BLK_DEV_IDE_PMAC) again and a similar error appears 
(sometimes) during bootup:

[   39.971392] ide-pmac lost interrupt, dma status: 8480
[   39.972704] hda: lost interrupt
[   39.973951] hda: dma_intr: status=0xd8 { Busy }
[   39.975231] hda: possibly failed opcode: 0x25
[   39.978855] hda: DMA disabled
[   40.019388] ide0: reset: success

But the host seems to recover more quickly and systemd wasn't thrown off 
by the small ATA delay. But DMA got disabled again :-\

Ideas welcome! :-)

Christian.

[0] https://ata.wiki.kernel.org/index.php/Libata_FAQ

> > This happened in 4.2.0-rc5 so I went back a few versions and found that
> > 4.1-rc5 was OK (the error does not show up and the system boots just fine)
> > and 4.1-rc6 is not.
> > 
> 
> After more digging around I noticed that the same error (with 
> changed wording) happens with a Debian 3.16.0-4-powerpc kernel - so it
> doesn't appear to be a recent regression as I suspected at first:
> 
> ==================================
> [   46.907147] ata1: drained 572 bytes to clear DRQ
> [   46.907166] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [   46.908419] ata1.00: failed command: READ DMA
> [   46.909058] ata1.00: cmd c8/00:80:9c:f9:60/00:00:00:00:00/e0 tag 0 dma 65536 in
>          res 40/00:fe:00:00:00/00:00:00:00:00/40 Emask 0x20 (host bus error)
> [   46.910303] ata1.00: status: { DRDY }
> [   46.970579] ata1.00: configured for UDMA/100
> [   46.971853] ata1.00: device reported invalid CHS sector 0
> [   46.972524] ata1: EH complete
> ==================================
> 
> Also, the error cannot repduced as reliably as I thought: sometimes, the 
> machine just boots w/o a hitch - and that might be the reasons why my 
> bisect attempts failed and incorrectly blamed totally unrelated commits: 
> after each "git bisect {good,bad}" (+compiling) I rebooted but there was a 
> chance that the system came up just fine / showed the same ATA error and 
> thus falsified the git-bisect results.
> 
> I noticed that with this Debian 3.16 kernel, it happens less often when I 
> use the "irqpoll" option. But with 4.2-rc5 this doesn't seem to help much, 
> the system still hangs during boot but continues after the "EH complete " 
> message. And it doesn't appear afterwards, I can read from my root disk 
> just fine and a long SMART check also comes back fine.
> 
> Because the error only appears to happen on the very first access after a 
> reboot, I tried to boot with rootdelay=30 - but of course then it just 
> waits "before" accessing the root disk. I'd need a magic option to wait a 
> few seconds "after" the first disk access, so that the boot framework 
> ("systemd") won't be thrown off when /dev/sda isn't responding as fast as 
> expected.
> 
> What _does_ seem to help a bit was to disable the the swap device, which 
> is configured as an encrypted dm-device here - and systemd was almost 
> always stumbling over this particular service during bootup. Because of 
> the ATA timeout, the dm-device could not be setup correctly and systemd 
> would bail out and drop me into a recovery shell. Without the swap device, 
> systemd would skip setting up swap and boot just fine (most of the time) 
> and I can setup swap once the system has been booted. So...there's that.
> 
> It's still a mystery to me why /dev/sda is only behaving weird on its 
> first access.
> 
> I've cc'ed linux-ide, maybe somebody has an idea on that?
> 
> dmesg & .config:  http://nerdbynature.de/bits/v4.1-rc6/
> 
> Thanks,
> Christian.
> 
> 
> > Unfortunately a git-bisect between these two versions went completly off 
> > the charts, I don't know what happened here:
> > ==================================
> > first bad commit:
> > 
> > 0fa372b6c95013af1334b3d5c9b5f03a70ecedab is the first bad commit
> > commit 0fa372b6c95013af1334b3d5c9b5f03a70ecedab
> > Author: Takashi Iwai <tiwai@suse.de>
> > Date:   Wed May 27 16:17:19 2015 +0200
> > 
> >     ALSA: hda - Fix noise on AMD radeon 290x controller
> > ==================================
> > 
> > I don't have this driver (or ALSA) even selected. I can reproduce this 
> > error pretty reliably and I'd like to attempt another git-bisect
> > run when I'm more awake. But maybe somebody recognizes this error and
> > has a hint where this could come from?
> > 
> > dmesg & .config:  http://nerdbynature.de/bits/v4.1-rc6/
> > 
> > Thanks,
> > Christian.
> > -- 
> > BOFH excuse #225:
> > 
> > It's those computer people in X {city of world}.  They keep stuffing things up.
> > 
> 
> -- 
> BOFH excuse #263:
> 
> It's stuck in the Web.
> 

-- 
BOFH excuse #316:

Elves on strike. (Why do they call EMAG Elf Magic)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 4.1-rc6: ATA link is slow to respond, please be patient
  2015-08-09  4:17   ` Christian Kujau
  (?)
  (?)
@ 2015-08-10  1:37   ` Michael Ellerman
  -1 siblings, 0 replies; 10+ messages in thread
From: Michael Ellerman @ 2015-08-10  1:37 UTC (permalink / raw)
  To: Christian Kujau; +Cc: linuxppc-dev, linux-ide, linux-kernel

On Sat, 2015-08-08 at 21:17 -0700, Christian Kujau wrote:
> [Adding linux-ide@vger.kernel.org]
> 
> On Fri, 7 Aug 2015, Christian Kujau wrote:
> > this PowerBook G4 was running 3.16 for a while but now I wanted to upgrade 
> > to latest mainline. However, during bootup the following happens:
> > 
> > ===============================
> > [    2.237102] ata1: PATA max UDMA/100 irq 39
> > [    2.401708] ata1.00: ATA-8: SAMSUNG HM061GC, LR100-10, max UDMA/100
> > [    2.401764] ata1.00: 117231408 sectors, multi 16: LBA48 
> > [    2.417633] ata1.00: configured for UDMA/100
> > [   44.918102] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> > [   44.920452] ata1.00: failed command: READ DMA
> > [   44.922725] ata1.00: cmd c8/00:88:64:c2:12/00:00:00:00:00/e0 tag 0 dma 69632 in
> > [   44.927257] ata1.00: status: { DRDY }
> > [   49.971784] ata1.00: qc timeout (cmd 0xec)
> > [   49.976529] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> > [   49.978908] ata1.00: revalidation failed (errno=-5)
> > [   55.019662] ata1: link is slow to respond, please be patient (ready=0)
> > [   60.007677] ata1: device not ready (errno=-16), forcing hardreset
> > [   60.012670] ata1: soft resetting link
> > [   60.193638] ata1.00: configured for UDMA/100
> > [   60.196158] ata1.00: device reported invalid CHS sector 0
> > [   60.198610] ata1: EH complete
> > ===============================
> > 
> > This happens only once, but systemd thinks there's a hard problem and will 
> > drop to a recovery shell. I can start sshd and login remotely and then the 
> > system appears to be running just fine.
> > 
> > This happened in 4.2.0-rc5 so I went back a few versions and found that
> > 4.1-rc5 was OK (the error does not show up and the system boots just fine)
> > and 4.1-rc6 is not.
> > 
> 
> After more digging around I noticed that the same error (with 
> changed wording) happens with a Debian 3.16.0-4-powerpc kernel - so it
> doesn't appear to be a recent regression as I suspected at first:
> 
> ==================================
> [   46.907147] ata1: drained 572 bytes to clear DRQ
> [   46.907166] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [   46.908419] ata1.00: failed command: READ DMA
> [   46.909058] ata1.00: cmd c8/00:80:9c:f9:60/00:00:00:00:00/e0 tag 0 dma 65536 in
>          res 40/00:fe:00:00:00/00:00:00:00:00/40 Emask 0x20 (host bus error)
> [   46.910303] ata1.00: status: { DRDY }
> [   46.970579] ata1.00: configured for UDMA/100
> [   46.971853] ata1.00: device reported invalid CHS sector 0
> [   46.972524] ata1: EH complete
> ==================================
> 
> Also, the error cannot repduced as reliably as I thought: sometimes, the 
> machine just boots w/o a hitch - and that might be the reasons why my 
> bisect attempts failed and incorrectly blamed totally unrelated commits: 
> after each "git bisect {good,bad}" (+compiling) I rebooted but there was a 
> chance that the system came up just fine / showed the same ATA error and 
> thus falsified the git-bisect results.

Yes that would explain why the bisect went wrong. If you have an intermittent
bug like that you have to be very careful about which commits you mark good or
bad.

I don't really know anything about disk drivers, so hopefully someone who does
can chime in.

cheers



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-08-10  1:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-07 10:29 4.1-rc6: ATA link is slow to respond, please be patient Christian Kujau
2015-08-07 12:09 ` Denis Kirjanov
2015-08-08  8:57 ` Denis Kirjanov
2015-08-08 21:34   ` Christian Kujau
2015-08-08 21:34     ` Christian Kujau
2015-08-09  4:17 ` Christian Kujau
2015-08-09  4:17   ` Christian Kujau
2015-08-09  6:43   ` Christian Kujau
2015-08-09  6:43     ` Christian Kujau
2015-08-10  1:37   ` Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.