All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG: Bad rss-counter state ...
@ 2018-04-18 16:53 René Rebe
  2018-04-18 21:34 ` Meelis Roos
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: René Rebe @ 2018-04-18 16:53 UTC (permalink / raw)
  To: sparclinux

Hi,

I recently dusted off some sparc systems, to help testing latest versions and such (https://www.youtube.com/watch?v=10q2OxHAzQ4&t=500s).

On my Ultra 30 I get some BUG printk in dmesg:

[    0.000145] PROMLIB: Sun IEEE Boot Prom 'OBP 3.9.5 1997/04/11 10:03'
[    0.000168] PROMLIB: Root node compatible: 
[    0.000251] Linux version 4.16.2-dist (root@somewhere.exactcode.de) (gcc version 5.3.0 (GCC)) #1 SMP Tue Apr 17 08:21:44 Local time zone must be set--see zic 
[    0.000828] bootconsole [earlyprom0] enabled
[    0.000835] ARCH: SUN4U
…
[ 3290.076290] BUG: Bad rss-counter state mm:00000000e37e5ba7 idx:1 val:1
[ 3290.080937] BUG: non-zero pgtables_bytes on freeing mm: 8192



-- 
 René Rebe, ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
 https://exactcode.com | https://t2sde.org | https://rene.rebe.de

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: Bad rss-counter state ...
  2018-04-18 16:53 BUG: Bad rss-counter state René Rebe
@ 2018-04-18 21:34 ` Meelis Roos
  2018-04-18 21:40 ` René Rebe
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Meelis Roos @ 2018-04-18 21:34 UTC (permalink / raw)
  To: sparclinux

> On my Ultra 30 I get some BUG printk in dmesg:
> 
> [    0.000145] PROMLIB: Sun IEEE Boot Prom 'OBP 3.9.5 1997/04/11 10:03'
> [    0.000168] PROMLIB: Root node compatible: 
> [    0.000251] Linux version 4.16.2-dist (root@somewhere.exactcode.de) > (gcc version 5.3.0 (GCC)) #1 SMP Tue Apr 17 08:21:44 Local time zone > must be set--see zic 
> [    0.000828] bootconsole [earlyprom0] enabled
> [    0.000835] ARCH: SUN4U
> …
> [ 3290.076290] BUG: Bad rss-counter state mm:00000000e37e5ba7 idx:1 > val:1
> [ 3290.080937] BUG: non-zero pgtables_bytes on freeing mm: 8192

Turn off Transparent Hugepages in kernel configuration - they are known 
to cause this. I turned it off on all my sparc64 machines and crashes 
like that are gone.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: Bad rss-counter state ...
  2018-04-18 16:53 BUG: Bad rss-counter state René Rebe
  2018-04-18 21:34 ` Meelis Roos
@ 2018-04-18 21:40 ` René Rebe
  2018-04-18 21:51 ` John Paul Adrian Glaubitz
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: René Rebe @ 2018-04-18 21:40 UTC (permalink / raw)
  To: sparclinux

Hi Meelis,

thank you for the quick reply!

On 18 Apr 2018, at 23:34, Meelis Roos <mroos@linux.ee> wrote:

>> On my Ultra 30 I get some BUG printk in dmesg:
>> 
>> [    0.000145] PROMLIB: Sun IEEE Boot Prom 'OBP 3.9.5 1997/04/11 10:03'
>> [    0.000168] PROMLIB: Root node compatible:=20
>> [    0.000251] Linux version 4.16.2-dist (root@somewhere.exactcode.de) =
>> (gcc version 5.3.0 (GCC)) #1 SMP Tue Apr 17 08:21:44 Local time zone =
>> must be set--see zic=20
>> [    0.000828] bootconsole [earlyprom0] enabled
>> [    0.000835] ARCH: SUN4U
>> =85
>> [ 3290.076290] BUG: Bad rss-counter state mm:00000000e37e5ba7 idx:1 =
>> val:1
>> [ 3290.080937] BUG: non-zero pgtables_bytes on freeing mm: 8192
> 
> Turn off Transparent Hugepages in kernel configuration - they are known 
> to cause this. I turned it off on all my sparc64 machines and crashes 
> like that are gone.

unfortunately the .config option is off:

# CONFIG_TRANSPARENT_HUGEPAGE is not set

:-/

-- 
 ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
 http://exactcode.com | http://exactscan.com | http://ocrkit.com | http://t2-project.org | http://rene.rebe.de


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: Bad rss-counter state ...
  2018-04-18 16:53 BUG: Bad rss-counter state René Rebe
  2018-04-18 21:34 ` Meelis Roos
  2018-04-18 21:40 ` René Rebe
@ 2018-04-18 21:51 ` John Paul Adrian Glaubitz
  2018-04-19  1:30 ` David Miller
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: John Paul Adrian Glaubitz @ 2018-04-18 21:51 UTC (permalink / raw)
  To: sparclinux

On 04/18/2018 11:34 PM, Meelis Roos wrote:
> Turn off Transparent Hugepages in kernel configuration - they are known 
> to cause this. I turned it off on all my sparc64 machines and crashes 
> like that are gone.

It's enabled by default in the Debian kernel:

root@landau:~# grep CONFIG_TRANSPARENT_HUGEPAGE /boot/config-$(uname -r)
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
root@landau:~#

We haven't received any reports regarding this issue and I also haven't
seen it on any of the machines the project is running (UltraSPARC IIIi,
SPARC T5, Sun Fire 2000).

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: Bad rss-counter state ...
  2018-04-18 16:53 BUG: Bad rss-counter state René Rebe
                   ` (2 preceding siblings ...)
  2018-04-18 21:51 ` John Paul Adrian Glaubitz
@ 2018-04-19  1:30 ` David Miller
  2018-04-19  5:47 ` René Rebe
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2018-04-19  1:30 UTC (permalink / raw)
  To: sparclinux

From: René Rebe <rene@exactcode.com>
Date: Wed, 18 Apr 2018 23:40:06 +0200

> On 18 Apr 2018, at 23:34, Meelis Roos <mroos@linux.ee> wrote:
> 
>> Turn off Transparent Hugepages in kernel configuration - they are known 
>> to cause this. I turned it off on all my sparc64 machines and crashes 
>> like that are gone.
> 
> unfortunately the .config option is off:
> 
> # CONFIG_TRANSPARENT_HUGEPAGE is not set
> 
> :-/

Indeed, I never believed this bug was related to huge pages.

Hopefully I can get back to trying to narrow this one down.

I've personally never seen this problem on any of my sun4v
machines, and perhaps that is a clue.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: Bad rss-counter state ...
  2018-04-18 16:53 BUG: Bad rss-counter state René Rebe
                   ` (3 preceding siblings ...)
  2018-04-19  1:30 ` David Miller
@ 2018-04-19  5:47 ` René Rebe
  2018-04-19 14:10 ` John Paul Adrian Glaubitz
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: René Rebe @ 2018-04-19  5:47 UTC (permalink / raw)
  To: sparclinux


On 19 Apr 2018, at 03:30, David Miller <davem@davemloft.net> wrote:

> From: René Rebe <rene@exactcode.com>
> Date: Wed, 18 Apr 2018 23:40:06 +0200
> 
>> On 18 Apr 2018, at 23:34, Meelis Roos <mroos@linux.ee> wrote:
>> 
>>> Turn off Transparent Hugepages in kernel configuration - they are known 
>>> to cause this. I turned it off on all my sparc64 machines and crashes 
>>> like that are gone.
>> 
>> unfortunately the .config option is off:
>> 
>> # CONFIG_TRANSPARENT_HUGEPAGE is not set
>> 
>> :-/
> 
> Indeed, I never believed this bug was related to huge pages.
> 
> Hopefully I can get back to trying to narrow this one down.
> 
> I've personally never seen this problem on any of my sun4v
> machines, and perhaps that is a clue.

Any chance this is non-sun4v related? This is an Ultra30 sun4u, ...

-- 
 ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
 http://exactcode.com | http://exactscan.com | http://ocrkit.com | http://t2-project.org | http://rene.rebe.de


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: Bad rss-counter state ...
  2018-04-18 16:53 BUG: Bad rss-counter state René Rebe
                   ` (4 preceding siblings ...)
  2018-04-19  5:47 ` René Rebe
@ 2018-04-19 14:10 ` John Paul Adrian Glaubitz
  2018-04-19 15:41 ` David Miller
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: John Paul Adrian Glaubitz @ 2018-04-19 14:10 UTC (permalink / raw)
  To: sparclinux

On 04/19/2018 07:47 AM, René Rebe wrote:
> Any chance this is non-sun4v related? This is an Ultra30 sun4u, ...

Don't think so:

root@ravirin:~# dmesg |grep -i bug
[    3.616698] tg3 0000:00:02.0: VPD access failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update
[    3.968679] tg3 0003:00:02.0: VPD access failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update
root@ravirin:~# grep CONFIG_TRANSPARENT_HUGEPAGE /boot/config-$(uname -r)
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
root@ravirin:~# cat /proc/cpuinfo
cpu             : TI UltraSparc IIIi (Jalapeno)
fpu             : UltraSparc IIIi integrated FPU
pmu             : ultra3i
prom            : OBP 4.22.33 2007/06/18 12:45
type            : sun4u
ncpus probed    : 1
ncpus active    : 1
D$ parity tl1   : 0
I$ parity tl1   : 0
cpucaps         : flush,stbar,swap,muldiv,v9,ultra3,mul32,div32,v8plus,vis,vis2
Cpu0ClkTck      : 000000005995f5c0
MMU Type        : Cheetah+
MMU PGSZs       : 8K,64K,512K,4MB
State:
CPU0:           online
root@ravirin:~#

-- 
  .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
   `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: Bad rss-counter state ...
  2018-04-18 16:53 BUG: Bad rss-counter state René Rebe
                   ` (5 preceding siblings ...)
  2018-04-19 14:10 ` John Paul Adrian Glaubitz
@ 2018-04-19 15:41 ` David Miller
  2018-04-19 17:21 ` John Paul Adrian Glaubitz
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2018-04-19 15:41 UTC (permalink / raw)
  To: sparclinux

From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Date: Thu, 19 Apr 2018 16:10:54 +0200

> On 04/19/2018 07:47 AM, René Rebe wrote:
>> Any chance this is non-sun4v related? This is an Ultra30 sun4u, ...
> 
> Don't think so:
 ...
> root@ravirin:~# cat /proc/cpuinfo
> cpu             : TI UltraSparc IIIi (Jalapeno)
> fpu             : UltraSparc IIIi integrated FPU

Then it could be pre-Ultra-IIIi systems that trigger the problem.
Ultra30 is as such.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: Bad rss-counter state ...
  2018-04-18 16:53 BUG: Bad rss-counter state René Rebe
                   ` (6 preceding siblings ...)
  2018-04-19 15:41 ` David Miller
@ 2018-04-19 17:21 ` John Paul Adrian Glaubitz
  2018-04-19 19:04 ` René Rebe
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: John Paul Adrian Glaubitz @ 2018-04-19 17:21 UTC (permalink / raw)
  To: sparclinux

On 04/19/2018 05:41 PM, David Miller wrote:
> Then it could be pre-Ultra-IIIi systems that trigger the problem.
> Ultra30 is as such.

Possible. But I haven't received any such bug report from Debian users
yet and we have people running the sparc64 port on machines as old
as the Ultra 10.

Rene, would you mind testing Debian's sparc64 port on your machine?

Just booting the installation image is probably enough provided the
bug shows immediately once the kernel is booted. See [1].

Adrian

> [1] https://cdimage.debian.org/cdimage/ports/

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: Bad rss-counter state ...
  2018-04-18 16:53 BUG: Bad rss-counter state René Rebe
                   ` (7 preceding siblings ...)
  2018-04-19 17:21 ` John Paul Adrian Glaubitz
@ 2018-04-19 19:04 ` René Rebe
  2018-04-20 13:25 ` René Rebe
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: René Rebe @ 2018-04-19 19:04 UTC (permalink / raw)
  To: sparclinux

Hi,

On 19 Apr 2018, at 19:21, John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> wrote:

> On 04/19/2018 05:41 PM, David Miller wrote:
>> Then it could be pre-Ultra-IIIi systems that trigger the problem.
>> Ultra30 is as such.
> 
> Possible. But I haven't received any such bug report from Debian users
> yet and we have people running the sparc64 port on machines as old
> as the Ultra 10.
> 
> Rene, would you mind testing Debian's sparc64 port on your machine?

I would, however I realize this bug message does not reliably appear.
(I should note the kernel continued to run further thru all the installation
so for my first glance it was “only” of cosmetic nature.)
My thinking was to post it in case it ring a bell or remind someone of something.

I will keep you posted what else I find, e.g. trying to mount a floppy on
the Ultra 5 oopsed. Will double check and re-post details separately
(unless someone warns me floppy support is known to be broken right
now ;-)

On the plus side: I could start latest xorg-server w/ sunffb driver, though
someone deleted the nice Mesa driver since the last time I turned it on,
… :-/

> Just booting the installation image is probably enough provided the
> bug shows immediately once the kernel is booted. See [1].
> 
> Adrian
> 
>> [1] https://cdimage.debian.org/cdimage/ports/

-- 
 ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
 http://exactcode.com | http://exactscan.com | http://ocrkit.com | http://t2-project.org | http://rene.rebe.de


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: Bad rss-counter state ...
  2018-04-18 16:53 BUG: Bad rss-counter state René Rebe
                   ` (8 preceding siblings ...)
  2018-04-19 19:04 ` René Rebe
@ 2018-04-20 13:25 ` René Rebe
  2018-04-20 13:30 ` John Paul Adrian Glaubitz
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: René Rebe @ 2018-04-20 13:25 UTC (permalink / raw)
  To: sparclinux

Hi,

On 19 Apr 2018, at 21:04, René Rebe <rene@exactcode.com> wrote:

> Hi,
> 
> On 19 Apr 2018, at 19:21, John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> wrote:
> 
>> On 04/19/2018 05:41 PM, David Miller wrote:
>>> Then it could be pre-Ultra-IIIi systems that trigger the problem.
>>> Ultra30 is as such.
>> 
>> Possible. But I haven't received any such bug report from Debian users
>> yet and we have people running the sparc64 port on machines as old
>> as the Ultra 10.
>> 
>> Rene, would you mind testing Debian's sparc64 port on your machine?
> 
> I would, however I realize this bug message does not reliably appear.
> (I should note the kernel continued to run further thru all the installation
> so for my first glance it was “only” of cosmetic nature.)
> My thinking was to post it in case it ring a bell or remind someone of something.
> 
> I will keep you posted what else I find, e.g. trying to mount a floppy on
> the Ultra 5 oopsed. Will double check and re-post details separately
> (unless someone warns me floppy support is known to be broken right
> now ;-)
> 
> On the plus side: I could start latest xorg-server w/ sunffb driver, though
> someone deleted the nice Mesa driver since the last time I turned it on,
> … :-/


For what it’s worth it just occurred on my Ultra 5, being up some minutes, running
some larger svn up; or compiling something:

[    0.000151] PROMLIB: Sun IEEE Boot Prom 'OBP 3.31.0 2001/07/25 20:36'
[    0.000174] PROMLIB: Root node compatible: 
[    0.000258] Linux version 4.16.2-dist (root@somewhere.exactco.de) (gcc version 5.3.0 (GCC)) #1 SMP Tue Apr 17 08:21:44 Local time zone must be set--see zic 
[    0.040628] bootconsole [btext0] enabled
[    0.040655] ARCH: SUN4U
[    0.040837] Ethernet address: 08:00:20:13:de:ad
[    0.041214] MM: PAGE_OFFSET is 0xfffff80000000000 (max_phys_bits == 40)
[    0.041227] MM: VMALLOC [0x0000000100000000 --> 0x0000060000000000]
[    0.041237] MM: VMEMMAP [0x0000060000000000 --> 0x00000c0000000000]
[    0.045556] Kernel: Using 2 locked TLB entries for main kernel image.
...
[  177.065984] internal 
[  177.065992] transceiver at 
[  177.066042] 100Mb/s, Full Duplex.
[  503.110890] eth0: Happy Meal receive FIFO overflow.
[  503.111079] eth0: Happy Meal receive FIFO overflow.
[  503.111354] eth0: Happy Meal receive FIFO overflow.
[  503.111578] eth0: Happy Meal receive FIFO overflow.
[  503.111850] eth0: Happy Meal receive FIFO overflow.
[ 1056.709749] BUG: Bad rss-counter state mm:000000000c3312a7 idx:1 val:2
[ 1056.710009] BUG: non-zero pgtables_bytes on freeing mm: 8192

The machine continues running “fine”, if there is anything to test let me know.
Maybe later the weekend or so I find the time to boot the debian image.

But given it occurs sporadically I currently have no way to reproduce this on demand anyways.

-- 
 ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
 http://exactcode.com | http://exactscan.com | http://ocrkit.com | http://t2-project.org | http://rene.rebe.de


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: Bad rss-counter state ...
  2018-04-18 16:53 BUG: Bad rss-counter state René Rebe
                   ` (9 preceding siblings ...)
  2018-04-20 13:25 ` René Rebe
@ 2018-04-20 13:30 ` John Paul Adrian Glaubitz
  2018-04-20 13:45 ` René Rebe
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: John Paul Adrian Glaubitz @ 2018-04-20 13:30 UTC (permalink / raw)
  To: sparclinux

On 04/20/2018 03:25 PM, René Rebe wrote:
> [    0.000258] Linux version 4.16.2-dist (root@somewhere.exactco.de) (gcc version 5.3.0 (GCC)) #1 SMP Tue Apr 17 08:21:44 Local time zone must be set--see zic

It *might* also be related to the toolchain.

Your version of gcc is pretty old (Debian builds the kernel with gcc-7.3.0) and
we found many different bugs in gcc and binutils that were fixed over the time
for SPARC. Not sure which of those fixes were backported to gcc-5.

So, in case you are unable to track down the issue in the kernel, try a newer
toolchain and the latest version of binutils (2.30 + branch updates).

Adrian

-- 
  .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
   `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: Bad rss-counter state ...
  2018-04-18 16:53 BUG: Bad rss-counter state René Rebe
                   ` (10 preceding siblings ...)
  2018-04-20 13:30 ` John Paul Adrian Glaubitz
@ 2018-04-20 13:45 ` René Rebe
  2018-04-20 15:31 ` René Rebe
  2018-04-20 18:46 ` John Paul Adrian Glaubitz
  13 siblings, 0 replies; 15+ messages in thread
From: René Rebe @ 2018-04-20 13:45 UTC (permalink / raw)
  To: sparclinux

Hey John,

On 20 Apr 2018, at 15:30, John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> wrote:

> On 04/20/2018 03:25 PM, René Rebe wrote:
>> [    0.000258] Linux version 4.16.2-dist (root@somewhere.exactco.de) (gcc version 5.3.0 (GCC)) #1 SMP Tue Apr 17 08:21:44 Local time zone must be set--see zic
> 
> It *might* also be related to the toolchain.
> 
> Your version of gcc is pretty old (Debian builds the kernel with gcc-7.3.0) and
> we found many different bugs in gcc and binutils that were fixed over the time
> for SPARC. Not sure which of those fixes were backported to gcc-5.

Well, remember, I only went down with binutils / gcc versions because latest
binutils/ld segfaults linking newer glibcs:
	https://twitter.com/renebln/status/984741114757439488

#2  in bfd_malloc at ../../bfd/libbfd.c:193
#3  in _bfd_elf_strtab_finalize at ../../bfd/elf-strtab.c:368
#4  in _bfd_elf_assign_file_positions_for_non_load at ../../bfd/elf.c:6318
#5  _bfd_elf_write_object_contents at ../../bfd/elf.c:6354
#6  in bfd_close at ../../bfd/…

This is how I ended up downgrading binutils, glibc and gcc until I had a user-land
that built, … (yes, I neglected t2/sparc* support the last years, sorry for that)

> So, in case you are unable to track down the issue in the kernel, try a newer
> toolchain and the latest version of binutils (2.30 + branch updates).

Speaking of “updates”, if you have a patch for that segfault, …
But today is your lucky, as my latest financing opensource work by YouTube
videos (http://youtube.com/renerebe) shows some results, I now rebuild
latest t2/svn:HEAD toolchain and re-build the Linux kernel for you (as
the binutils/ld crash happened with glibc, I only need to toolchain for
the kernel build):

# scripts/Build-Target  -cfg sparc64 -job 0-binutils
== 13:34:33 =[0]=> Building develop/binutils [2.30 9.0-svn].
# scripts/Build-Target  -cfg sparc64 -job 0-gcc 
== 13:36:00 =[0]=> Building develop/gcc [7.3.0 9.0-svn].
# scripts/Build-Target  -cfg sparc64 -job 1-linux
== 13:42:19 =[1]=> Building base/linux [4.16.3 9.0-svn].

Waiting for the kernel to finish on my more Epyc CPU and then
rsync and reboot. So maybe less than an hour or so ;-)

	René

-- 
 ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
 http://exactcode.com | http://exactscan.com | http://ocrkit.com | http://t2-project.org | http://rene.rebe.de


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: Bad rss-counter state ...
  2018-04-18 16:53 BUG: Bad rss-counter state René Rebe
                   ` (11 preceding siblings ...)
  2018-04-20 13:45 ` René Rebe
@ 2018-04-20 15:31 ` René Rebe
  2018-04-20 18:46 ` John Paul Adrian Glaubitz
  13 siblings, 0 replies; 15+ messages in thread
From: René Rebe @ 2018-04-20 15:31 UTC (permalink / raw)
  To: sparclinux

Hi,

On 20 Apr 2018, at 15:44, René Rebe <rene@exactcode.com> wrote:

> Speaking of “updates”, if you have a patch for that segfault, …
> But today is your lucky, as my latest financing opensource work by YouTube
> videos (http://youtube.com/renerebe) shows some results, I now rebuild
> latest t2/svn:HEAD toolchain and re-build the Linux kernel for you (as
> the binutils/ld crash happened with glibc, I only need to toolchain for
> the kernel build):
> 
> # scripts/Build-Target  -cfg sparc64 -job 0-binutils
> == 13:34:33 =[0]=> Building develop/binutils [2.30 9.0-svn].
> # scripts/Build-Target  -cfg sparc64 -job 0-gcc 
> == 13:36:00 =[0]=> Building develop/gcc [7.3.0 9.0-svn].
> # scripts/Build-Target  -cfg sparc64 -job 1-linux
> == 13:42:19 =[1]=> Building base/linux [4.16.3 9.0-svn].
> 
> Waiting for the kernel to finish on my more Epyc CPU and then
> rsync and reboot. So maybe less than an hour or so ;-)

Did not change a thing, w/ GCC-7.3, latest Binutils 2.30, Ultra5:

[    0.000147] PROMLIB: Sun IEEE Boot Prom 'OBP 3.31.0 2001/07/25 20:36'
[    0.000173] PROMLIB: Root node compatible: 
[    0.000260] Linux version 4.16.3-dist (root@builder.exactcode.de) (gcc version 7.3.0 (GCC)) #1 SMP Fri Apr 20 13:44:06 Local time zone must be set--see zic 
[    0.040693] bootconsole [btext0] enabled
[    0.040723] ARCH: SUN4U
[    0.040915] Ethernet address: 08:00:20:13:de:ad
[    0.041302] MM: PAGE_OFFSET is 0xfffff80000000000 (max_phys_bits == 40)
[    0.041314] MM: VMALLOC [0x0000000100000000 --> 0x0000060000000000]
[    0.041323] MM: VMEMMAP [0x0000060000000000 --> 0x00000c0000000000]
[    0.045674] Kernel: Using 2 locked TLB entries for main kernel image.
...
[   28.162995] random: crng init done
[   30.858246] BUG: Bad rss-counter state mm:000000003b665864 idx:1 val:1
[   30.858499] BUG: non-zero pgtables_bytes on freeing mm: 8192

Full disclosure:
a) this is a cross compiler, but that never mattered for me in the last decade (this is my fastest Sun @360MHz, …)
b) this system is using btrfs, however, I think I initially repotted form the Ultra30 wi/ only Ext3
c) this system has some non Sun PCI cards plugged, in, but again initially reported on the U30:

00:01.0 PCI bridge: Oracle/SUN Simba Advanced PCI Bridge (rev 13)
00:01.1 PCI bridge: Oracle/SUN Simba Advanced PCI Bridge (rev 13)
01:01.0 Bridge: Oracle/SUN EBUS (rev 01)
01:01.1 Ethernet controller: Oracle/SUN Happy Meal 10/100 Ethernet [hme] (rev 01)
01:02.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] 3D Rage Pro PCI (rev 5c)
01:03.0 IDE interface: Silicon Image, Inc. PCI0646 (rev 03)
02:01.0 SCSI storage controller: Adaptec AIC-7861 (rev 03)
02:02.0 RAID bus controller: Promise Technology, Inc. PDC20371 (FastTrak S150 TX2plus) (rev 02)
02:03.0 USB controller: NEC Corporation OHCI USB Controller (rev 43)
02:03.1 USB controller: NEC Corporation OHCI USB Controller (rev 43)
02:03.2 USB controller: NEC Corporation uPD72010x USB 2.0 Controller (rev 04)

As I said, the system does continue to run “just fine” (as far as I can see, so far), so for now this is cosmetic, and meant as “heads up” informal report, ...

-- 
 ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
 http://exactcode.com | http://exactscan.com | http://ocrkit.com | http://t2-project.org | http://rene.rebe.de

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: Bad rss-counter state ...
  2018-04-18 16:53 BUG: Bad rss-counter state René Rebe
                   ` (12 preceding siblings ...)
  2018-04-20 15:31 ` René Rebe
@ 2018-04-20 18:46 ` John Paul Adrian Glaubitz
  13 siblings, 0 replies; 15+ messages in thread
From: John Paul Adrian Glaubitz @ 2018-04-20 18:46 UTC (permalink / raw)
  To: sparclinux

On 04/20/2018 03:44 PM, René Rebe wrote:
> Well, remember, I only went down with binutils / gcc versions because latest
> binutils/ld segfaults linking newer glibcs:
> https://twitter.com/renebln/status/984741114757439488
> 
> #2  in bfd_malloc at ../../bfd/libbfd.c:193
> #3  in _bfd_elf_strtab_finalize at ../../bfd/elf-strtab.c:368
> #4  in _bfd_elf_assign_file_positions_for_non_load at ../../bfd/elf.c:6318
> #5  _bfd_elf_write_object_contents at ../../bfd/elf.c:6354
> #6  in bfd_close at ../../bfd/…
> 
> This is how I ended up downgrading binutils, glibc and gcc until I had a user-land
> that built, … (yes, I neglected t2/sparc* support the last years, sorry for that)

I remember there was recently an issue with glibc and 32-bit sparc userland
that the Gentoo folks saw. I don't remember what it was though. We don't
have any binutils problems on a 64-bit userland in Debian that I know of.

Usually, James Clarke and Eric Botcazou are very fast fixing those issues
in binutils or gcc. binutils had several regressions with the first
2.30 versions but all that stuff was fixed in the latest 2.30 branch
and glibc builds fine. So, if you have something reproducible for
binutils, please file it in the upstream bug report tracker and
CC them and me.

Also, we have a fast SPARC-T5 porterbox running Debian sparc64 unstable
which is used by many upstream developers (qemu, gcc, binutils, FPC
and so on) and those people constantly test their stuff on this machine.
Thus, lots of stuff was already fixed for sparc64. Though there can
always be surprises, especially in binutils. Oracle themselves is focusing
on the 64-bit SPARC userland, although I don't know how busy they are
at the moment.

In general, I would recommend you joining #debian-ports on OFTC and
#gentoo-sparc on Freenode and #sparc on Freenode. There are lots of
knowledgeable people around who also usually know about the latest
regressions or previous issues.

>> So, in case you are unable to track down the issue in the kernel, try a newer
>> toolchain and the latest version of binutils (2.30 + branch updates).
> 
> Speaking of “updates”, if you have a patch for that segfault, …
> But today is your lucky, as my latest financing opensource work by YouTube
> videos (http://youtube.com/renerebe) shows some results, I now rebuild
> latest t2/svn:HEAD toolchain and re-build the Linux kernel for you (as
> the binutils/ld crash happened with glibc, I only need to toolchain for
> the kernel build):
> 
> # scripts/Build-Target  -cfg sparc64 -job 0-binutils
> = 13:34:33 =[0]=> Building develop/binutils [2.30 9.0-svn].
> # scripts/Build-Target  -cfg sparc64 -job 0-gcc 
> = 13:36:00 =[0]=> Building develop/gcc [7.3.0 9.0-svn].
> # scripts/Build-Target  -cfg sparc64 -job 1-linux
> = 13:42:19 =[1]=> Building base/linux [4.16.3 9.0-svn].
> 
> Waiting for the kernel to finish on my more Epyc CPU and then
> rsync and reboot. So maybe less than an hour or so ;-)

I just checked the latest build logs. The glibc testsuite recently
failed for 2.27 but did not fail for a previous 2.27 build:

FAIL: https://buildd.debian.org/status/fetch.php?pkg=glibc&arch=sparc64&ver=2.27-3%2Bb1&stamp\x1523840708&raw=0
PASS: https://buildd.debian.org/status/fetch.php?pkg=glibc&arch=sparc64&ver=2.27-3&stamp\x1522364505&raw=0

There was also a userland memory corruption with kernel 4.14+ which got fixed with
4.16-rc6. After upgrading the kernel, many packages built fine again but surprisingly
not glibc. I should probably poke glibc upstream after triggering another rebuild
of glibc.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-04-20 18:46 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-18 16:53 BUG: Bad rss-counter state René Rebe
2018-04-18 21:34 ` Meelis Roos
2018-04-18 21:40 ` René Rebe
2018-04-18 21:51 ` John Paul Adrian Glaubitz
2018-04-19  1:30 ` David Miller
2018-04-19  5:47 ` René Rebe
2018-04-19 14:10 ` John Paul Adrian Glaubitz
2018-04-19 15:41 ` David Miller
2018-04-19 17:21 ` John Paul Adrian Glaubitz
2018-04-19 19:04 ` René Rebe
2018-04-20 13:25 ` René Rebe
2018-04-20 13:30 ` John Paul Adrian Glaubitz
2018-04-20 13:45 ` René Rebe
2018-04-20 15:31 ` René Rebe
2018-04-20 18:46 ` John Paul Adrian Glaubitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.