All of lore.kernel.org
 help / color / mirror / Atom feed
* "external abort on linefetch (0x814)" on Kirkwood 6282 SoC
       [not found] <1500801941.22097.24.camel@hellion.org.uk>
@ 2017-07-26 15:22 ` Andrew Lunn
  2017-07-26 16:18   ` Ian Campbell
  2017-07-29 15:50 ` Andrew Lunn
  1 sibling, 1 reply; 6+ messages in thread
From: Andrew Lunn @ 2017-07-26 15:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Jul 23, 2017 at 10:25:41AM +0100, Ian Campbell wrote:
> Hello kirkwood folks,
> 
> We have been seeing reports on the Debian arm list about
> instability/errors running Debian Stretch (4.9 based) on
> various?Kirkwood 6282 based QNAP systems. Errors are things like [0,
> actually one of the earlier pre-4.9 reports, same symptoms as with 4.9
> though]:
> 
> [???37.167103] BUG: Bad rss-counter state mm:c0caa1e0 idx:1 val:1
> [??783.570365] BUG: Bad rss-counter state mm:c09e6220 idx:1 val:1
> [??800.172223] BUG: Bad rss-counter state mm:ecbc05e0 idx:1 val:1
> [??829.005336] BUG: Bad rss-counter state mm:c0d4b880 idx:1 val:1
> [??871.773956] BUG: Bad rss-counter state mm:c09e63c0 idx:1 val:1
> [ 1299.565344] BUG: Bad rss-counter state mm:ecaf8c40 idx:1 val:1
> 
> and
> 
> [???71.033784] Unhandled fault: external abort on linefetch (0x014) at 0xb6c73db0
> [???71.041037] pgd = ead9c000
> [???71.043747] [b6c73db0] *pgd=3fd72831
> [???84.144056] Unhandled fault: external abort on linefetch (0x014) at 0xb6d44db0
> [...]

Hi Ian

I have a 6282 system i can try to reproduce this on. It will probably
be a few days before i get around to it.

   Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread

* "external abort on linefetch (0x814)" on Kirkwood 6282 SoC
  2017-07-26 15:22 ` "external abort on linefetch (0x814)" on Kirkwood 6282 SoC Andrew Lunn
@ 2017-07-26 16:18   ` Ian Campbell
  2017-07-26 17:55     ` Andrew Lunn
  2017-07-28 16:33     ` Andrew Lunn
  0 siblings, 2 replies; 6+ messages in thread
From: Ian Campbell @ 2017-07-26 16:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2017-07-26 at 17:22 +0200, Andrew Lunn wrote:
> I have a 6282 system i can try to reproduce this on. It will probably
> be a few days before i get around to it.

Thanks!

For some reason my original mail never made it to debian-arm or linux-
arm-kernel, suspiciously the mail which I attached _also_ doesn't
appear in the archives. I suspect something has decided (false +ve)
that it was spam or a virus or something and blocked it.

FTR below is the full text of my original mail. I'd attach boot-7.log
as well but I worry it might get nobbled again, let me know if anyone
wants it...

Ian.

Hello kirkwood folks,

We have been seeing reports on the Debian arm list about
instability/errors running Debian Stretch (4.9 based) on
various?Kirkwood 6282 based QNAP systems. Errors are things like [0,
actually one of the earlier pre-4.9 reports, same symptoms as with 4.9
though]:

[???37.167103] BUG: Bad rss-counter state mm:c0caa1e0 idx:1 val:1
[??783.570365] BUG: Bad rss-counter state mm:c09e6220 idx:1 val:1
[??800.172223] BUG: Bad rss-counter state mm:ecbc05e0 idx:1 val:1
[??829.005336] BUG: Bad rss-counter state mm:c0d4b880 idx:1 val:1
[??871.773956] BUG: Bad rss-counter state mm:c09e63c0 idx:1 val:1
[ 1299.565344] BUG: Bad rss-counter state mm:ecaf8c40 idx:1 val:1

and

[???71.033784] Unhandled fault: external abort on linefetch (0x014) at
0xb6c73db0
[???71.041037] pgd = ead9c000
[???71.043747] [b6c73db0] *pgd=3fd72831
[???84.144056] Unhandled fault: external abort on linefetch (0x014) at
0xb6d44db0
[...]

Many of the affected systems were running Debian Jessie (3.16 based)
fine (as is my own 6282 based system). Some reports have been on
intermediate kernels during the Stretch development cycle, it appears
(again from [0]) that 4.3 was ok but 4.7 was not.

>From the reports it seems that 6281 SoCs are not affected, I only have
a spare 6281 to?test on and can confirm that it appears to be fine when
running 4.9.

Some other reports:
-?https://lists.debian.org/debian-arm/2017/04/msg00056.html
? (might have been an unrelated failing disk though?)
-?https://lists.debian.org/debian-arm/2017/07/msg00010.html?
? which also includes a "corrupted status flag!!: 0" message making me
? wonder about possible RAM issues.
-?https://lists.debian.org/debian-arm/2017/07/msg00011.html
? Rob, author of [0], confirming 6281 is ok.
- In the attached mail (which was copied to debian-arm but didn't make
? it to the?list archives for some reason so I think it is ok to?
? share)?has the results of various experiments by Rob (of [0] fame)?
? including boot-7.log which is a full log with the error occuring.

I've had a look through the kernel git logs, both in the 4.3..4.7 range
for possible culprits and in the 4.9..now range for possible fixes but
couldn't spot anything obvious (I didn't spot very much at all touching
these processors, mostly it looks like changes for the newer Armada
platforms).

I'm afraid I've not been able to find someone to try with a newer
kernel, for my part my only 6282 based system is in "production" as
storage for a mythtv setup so it is tricky to experiment with.

Any ideas what may be going on here?

Cheers,
Ian.

[0]?https://lists.debian.org/debian-arm/2016/10/msg00041.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* "external abort on linefetch (0x814)" on Kirkwood 6282 SoC
  2017-07-26 16:18   ` Ian Campbell
@ 2017-07-26 17:55     ` Andrew Lunn
  2017-07-26 18:40       ` Ian Campbell
  2017-07-28 16:33     ` Andrew Lunn
  1 sibling, 1 reply; 6+ messages in thread
From: Andrew Lunn @ 2017-07-26 17:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 26, 2017 at 05:18:05PM +0100, Ian Campbell wrote:
> On Wed, 2017-07-26 at 17:22 +0200, Andrew Lunn wrote:
> > I have a 6282 system i can try to reproduce this on. It will probably
> > be a few days before i get around to it.
> 
> Thanks!
> 
> For some reason my original mail never made it to debian-arm or linux-
> arm-kernel, suspiciously the mail which I attached _also_ doesn't
> appear in the archives.

I suspect it is because you used attachments. They are frowned upon.

  Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread

* "external abort on linefetch (0x814)" on Kirkwood 6282 SoC
  2017-07-26 17:55     ` Andrew Lunn
@ 2017-07-26 18:40       ` Ian Campbell
  0 siblings, 0 replies; 6+ messages in thread
From: Ian Campbell @ 2017-07-26 18:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2017-07-26 at 19:55 +0200, Andrew Lunn wrote:
> On Wed, Jul 26, 2017 at 05:18:05PM +0100, Ian Campbell wrote:
> > On Wed, 2017-07-26 at 17:22 +0200, Andrew Lunn wrote:
> > > I have a 6282 system i can try to reproduce this on. It will
> probably
> > > be a few days before i get around to it.
> >?
> > Thanks!
> >?
> > For some reason my original mail never made it to debian-arm or
> linux-
> > arm-kernel, suspiciously the mail which I attached _also_ doesn't
> > appear in the archives.
> 
> I suspect it is because you used attachments. They are frowned upon.

Ah yes, that might explain it, I remember now that l-a-k frowns on
them. debian-arm is generally ok with them, but perhaps they were too
big in this case.

Thanks for the tip!

Ian.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* "external abort on linefetch (0x814)" on Kirkwood 6282 SoC
  2017-07-26 16:18   ` Ian Campbell
  2017-07-26 17:55     ` Andrew Lunn
@ 2017-07-28 16:33     ` Andrew Lunn
  1 sibling, 0 replies; 6+ messages in thread
From: Andrew Lunn @ 2017-07-28 16:33 UTC (permalink / raw)
  To: linux-arm-kernel

> Hello kirkwood folks,
> 
> We have been seeing reports on the Debian arm list about
> instability/errors running Debian Stretch (4.9 based) on
> various?Kirkwood 6282 based QNAP systems. Errors are things like [0,
> actually one of the earlier pre-4.9 reports, same symptoms as with 4.9
> though]:
> 
> [???37.167103] BUG: Bad rss-counter state mm:c0caa1e0 idx:1 val:1
> [??783.570365] BUG: Bad rss-counter state mm:c09e6220 idx:1 val:1
> [??800.172223] BUG: Bad rss-counter state mm:ecbc05e0 idx:1 val:1
> [??829.005336] BUG: Bad rss-counter state mm:c0d4b880 idx:1 val:1
> [??871.773956] BUG: Bad rss-counter state mm:c09e63c0 idx:1 val:1
> [ 1299.565344] BUG: Bad rss-counter state mm:ecaf8c40 idx:1 val:1
> 
> and
> 
> [???71.033784] Unhandled fault: external abort on linefetch (0x014) at
> 0xb6c73db0
> [???71.041037] pgd = ead9c000
> [???71.043747] [b6c73db0] *pgd=3fd72831
> [???84.144056] Unhandled fault: external abort on linefetch (0x014) at
> 0xb6d44db0
> [...]

So far, i've not been able to reproduce this. I have 6282 based QNAP
NAS box, with a single disk. Since this is a kernel hacking box, i
tftpboot and don't use an initrd. I've been using the
mvebu_v5_defconfig kernel configuration and i have tried v4.13-rc2,
v4.12, v4.10.0 and v3.9.30. And i have sid for user space.

       Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread

* "external abort on linefetch (0x814)" on Kirkwood 6282 SoC
       [not found] <1500801941.22097.24.camel@hellion.org.uk>
  2017-07-26 15:22 ` "external abort on linefetch (0x814)" on Kirkwood 6282 SoC Andrew Lunn
@ 2017-07-29 15:50 ` Andrew Lunn
  1 sibling, 0 replies; 6+ messages in thread
From: Andrew Lunn @ 2017-07-29 15:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Jul 23, 2017 at 10:25:41AM +0100, Ian Campbell wrote:
> Hello kirkwood folks,
> 
> We have been seeing reports on the Debian arm list about
> instability/errors running Debian Stretch (4.9 based) on
> various?Kirkwood 6282 based QNAP systems. Errors are things like [0,
> actually one of the earlier pre-4.9 reports, same symptoms as with 4.9
> though]:
> 
> [???37.167103] BUG: Bad rss-counter state mm:c0caa1e0 idx:1 val:1
> [??783.570365] BUG: Bad rss-counter state mm:c09e6220 idx:1 val:1
> [??800.172223] BUG: Bad rss-counter state mm:ecbc05e0 idx:1 val:1
> [??829.005336] BUG: Bad rss-counter state mm:c0d4b880 idx:1 val:1
> [??871.773956] BUG: Bad rss-counter state mm:c09e63c0 idx:1 val:1
> [ 1299.565344] BUG: Bad rss-counter state mm:ecaf8c40 idx:1 val:1
> 
> and
> 
> [???71.033784] Unhandled fault: external abort on linefetch (0x014) at 0xb6c73db0
> [???71.041037] pgd = ead9c000
> [???71.043747] [b6c73db0] *pgd=3fd72831
> [???84.144056] Unhandled fault: external abort on linefetch (0x014) at 0xb6d44db0
> [...]

I've now tried the debian kernel configuration from sid which is for
4.11. That also has not provoked the issue.

So i'm thinking this has to be related to bits of hardware i'm not
using. I don't have anything on the PCIe bus, i don't have any USB
devices plugged in, i don't use the mtd devices, etc.

Could somebody who does have the issue describe their system? Could
they pull out all there USB devices and see if that stops the
issues. Remove the driver for PCIe devices, if possible.

 Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-07-29 15:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1500801941.22097.24.camel@hellion.org.uk>
2017-07-26 15:22 ` "external abort on linefetch (0x814)" on Kirkwood 6282 SoC Andrew Lunn
2017-07-26 16:18   ` Ian Campbell
2017-07-26 17:55     ` Andrew Lunn
2017-07-26 18:40       ` Ian Campbell
2017-07-28 16:33     ` Andrew Lunn
2017-07-29 15:50 ` Andrew Lunn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.