All of lore.kernel.org
 help / color / mirror / Atom feed
* linux 4.10 on ast2400
@ 2017-11-07  1:12 Patrick Venture
  2017-11-07  9:39 ` Joel Stanley
  0 siblings, 1 reply; 8+ messages in thread
From: Patrick Venture @ 2017-11-07  1:12 UTC (permalink / raw)
  To: OpenBMC Maillist, Joel Stanley, Nancy Yuen

I've been doing testing with linux 4.10 on the ast2400 and on some
percentage (20% of systems) when they boot they're not able to really
launch applications.  The one we see failing is agetty, but ipmid also
ends up not running.  Here is the log from what we're seeing on the
quanta-q71l:

[  OK  ] Started Clear one time boot overrides.
[  OK  ] Found device /dev/ttyS4.
[  OK  ] Found device /dev/ttyVUART0.
[   42.360000] 8021q: adding VLAN 0 to HW filter on device eth1
[  OK  ] Started Network Service.
[   42.420000] 8021q: adding VLAN 0 to HW filter on device eth0
[  OK  ] Started Phosphor Inventory Manager.
[  OK  ] Started Phosphor Settings Daemon.
[  OK  ] Reached target Network.
         Starting Permit User Sessions...
[  OK  ] Started Lightweight SLP Server.
[  OK  ] Started Phosphor Console Muxer listening on device /dev/ttyVUART0.
[  OK  ] Started Phosphor Inband IPMI.
[  OK  ] Created slice system-xyz.openbmc_project.Hwmon.slice.
[  OK  ] Started Permit User Sessions.
[  OK  ] Started Serial Getty on ttyS4.
[  OK  ] Reached target Login Prompts.
[  OK  ] Reached target Multi-User System.
[   44.530000] ftgmac100 1e680000.ethernet eth1: NCSI interface down
[   45.800000] ftgmac100 1e660000.ethernet eth0: NCSI interface down
[   49.430000] Unable to handle kernel paging request at virtual
address e1a00006
[   49.430000] pgd = 85354000
[   49.430000] [e1a00006] *pgd=00000000
[   49.430000] Internal error: Oops: 1 [#1] ARM
[   49.430000] CPU: 0 PID: 932 Comm: (agetty) Not tainted
4.10.17-eced538e6233c50729cc107958596a1443947ba2 #1
[   49.430000] Hardware name: ASpeed SoC
[   49.430000] task: 86e1c000 task.stack: 858f6000
[   49.430000] PC is at unlink_anon_vmas+0x98/0x1b0
[   49.430000] LR is at 0x853ea140
[   49.430000] pc : [<801e0e90>]    lr : [<853ea140>]    psr: 80000013
[   49.430000] sp : 858f7d58  ip : 00000000  fp : 858f7d8c
[   49.430000] r10: 8081ca58  r9 : 858572fc  r8 : 858572c0
[   49.430000] r7 : 85154280  r6 : e1a00006  r5 : 853dfff8  r4 : 853dfff8
[   49.430000] r3 : 85154280  r2 : 853e0000  r1 : 85c08e9c  r0 : 00000000
[   49.430000] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[   49.430000] Control: 0005317f  Table: 45354000  DAC: 00000051
[   49.430000] Process (agetty) (pid: 932, stack limit = 0x858f6190)
[   49.430000] Stack: (0x858f7d58 to 0x858f8000)
[   49.430000] 7d40:
    801dacd8 801db374
[   49.430000] 7d60: 858f7d8c 85857318 858572c0 858f7dcc 00002000
00000000 769c3000 858571b8
[   49.430000] 7d80: 858f7dc4 858f7d90 801d5da0 801e0e08 769c3000
801d6b68 00000000 852e49a0
[   49.430000] 7da0: 858c1520 8080300c 85228e00 858c1520 86fa9040
000003a4 858f7e34 858f7dc8
[   49.430000] 7dc0: 801dc724 801d5d1c 00000000 858c1520 00000001
00000000 00000000 ffffffff
[   49.430000] 7de0: 00000000 8010f6b8 0000030d 00000400 85353000
00000800 801ef480 86e1c000
[   49.430000] 7e00: 858c1520 86e1c000 85228e00 00000000 86fa9040
1f69b357 858c1a00 858c1520
[   49.430000] 7e20: 00000000 86e1c000 858f7e4c 858f7e38 8010f47c
801dc628 858c1520 858c1a00
[   49.430000] 7e40: 858f7e84 858f7e50 801f6098 8010f44c 858f7e84
858f7e60 80239ad8 8587ff34
[   49.430000] 7e60: 8587ff00 00000001 8524f520 00000000 859d34e0
000003a4 858f7f0c 858f7e88
[   49.430000] 7e80: 80239ef0 801f5ca4 8587ff34 00000034 858f7e88
86e1c000 00000000 8010b4fc
[   49.430000] 7ea0: 858f7ec4 858f7eb0 8010b4fc 801c8f74 00000000
86c20c00 85c6f180 859d34e0
[   49.430000] 7ec0: 85228e00 85228e00 859255d8 86bbfe38 8581a000
858c1a00 858f7ef4 1f69b357
[   49.430000] 7ee0: 801f9994 85228e00 fffffff8 8081cae4 8081f914
8581a000 000003a4 000003a4
[   49.430000] 7f00: 858f7f2c 858f7f10 801f6734 80239bd4 85228e00
86e1c000 ffffe000 00000000
[   49.430000] 7f20: 858f7f74 858f7f30 801f6afc 801f66ec 80845538
8080300c 55d72618 55df14c0
[   49.430000] 7f40: 00000000 1f69b357 801f9648 55df14c0 55d72618
55df14c0 0000000b 80102644
[   49.430000] 7f60: 858f6000 00000000 858f7f8c 858f7f78 801f6e1c
801f67ac 00000000 801f960c
[   49.430000] 7f80: 858f7fa4 858f7f90 801f707c 801f6df8 00000010
55d72618 00000000 858f7fa8
[   49.430000] 7fa0: 801024a0 801f7060 00000010 55d72618 55d93710
55df14c0 55d72618 55da9ed8
[   49.430000] 7fc0: 00000010 55d72618 55df14c0 0000000b 7e99e758
55df17e0 55da8788 7e99e68c
[   49.430000] 7fe0: 54c3573c 7e99e464 54b741fc 76c37d3c 60000010
55d93710 477fd871 477fdc71
[   49.430000] [<801e0e90>] (unlink_anon_vmas) from [<801d5da0>]
(free_pgtables+0x94/0xb0)
[   49.430000] [<801d5da0>] (free_pgtables) from [<801dc724>]
(exit_mmap+0x10c/0x220)
[   49.430000] [<801dc724>] (exit_mmap) from [<8010f47c>] (mmput+0x40/0xc8)
[   49.430000] [<8010f47c>] (mmput) from [<801f6098>]
(flush_old_exec+0x404/0x5cc)
[   49.430000] [<801f6098>] (flush_old_exec) from [<80239ef0>]
(load_elf_binary+0x32c/0x1068)
[   49.430000] [<80239ef0>] (load_elf_binary) from [<801f6734>]
(search_binary_handler+0x58/0xc0)
[   49.430000] [<801f6734>] (search_binary_handler) from [<801f6afc>]
(do_execveat_common+0x360/0x64c)
[   49.430000] [<801f6afc>] (do_execveat_common) from [<801f6e1c>]
(do_execve+0x34/0x3c)
[   49.430000] [<801f6e1c>] (do_execve) from [<801f707c>] (SyS_execve+0x2c/0x30)
[   49.430000] [<801f707c>] (SyS_execve) from [<801024a0>]
(ret_fast_syscall+0x0/0x38)
[   49.430000] Code: 1a00002f e24bd028 e89daff0 e5946004 (e5967000)
[   49.880000] ---[ end trace 587620580325ca16 ]---
[  OK  ] Stopped Serial Getty on ttyS4.
[   61.100000] ftgmac100 1e660000.ethernet eth0: no vlan ids left to set
[   61.100000] ------------[ cut here ]------------
[   61.100000] WARNING: CPU: 0 PID: 936 at
/build/tmp/quanta/kernel-source/net/ncsi/ncsi-manage.c:256
ncsi_start_channel_monitor+0x54/0x8c
[   61.100000] CPU: 0 PID: 936 Comm: kworker/0:4 Tainted: G      D
    4.10.17-eced538e6233c50729cc107958596a1443947ba2 #1
[   61.100000] Hardware name: ASpeed SoC
[   61.100000] Workqueue: events ncsi_dev_work
[   61.100000] [<801087d4>] (unwind_backtrace) from [<80105f44>]
(show_stack+0x20/0x24)
[   61.100000] [<80105f44>] (show_stack) from [<802e070c>]
(dump_stack+0x20/0x28)
[   61.100000] [<802e070c>] (dump_stack) from [<80111b7c>] (__warn+0xe8/0x104)
[   61.100000] [<80111b7c>] (__warn) from [<80111cb0>]
(warn_slowpath_null+0x30/0x38)
[   61.100000] [<80111cb0>] (warn_slowpath_null) from [<804c5370>]
(ncsi_start_channel_monitor+0x54/0x8c)
[   61.100000] [<804c5370>] (ncsi_start_channel_monitor) from
[<804c64cc>] (ncsi_configure_channel+0x4e4/0x568)
[   61.100000] [<804c64cc>] (ncsi_configure_channel) from [<804c6d18>]
(ncsi_dev_work+0x3b8/0x3e8)
[   61.100000] [<804c6d18>] (ncsi_dev_work) from [<80127130>]
(process_one_work+0x1ac/0x384)
[   61.100000] [<80127130>] (process_one_work) from [<801275f8>]
(worker_thread+0x2b0/0x428)
[   61.100000] [<801275f8>] (worker_thread) from [<8012cde4>]
(kthread+0x13c/0x154)
[   61.100000] [<8012cde4>] (kthread) from [<80102550>]
(ret_from_fork+0x14/0x24)
[   61.100000] ---[ end trace 587620580325ca17 ]---
[   61.250000] ftgmac100 1e660000.ethernet eth0: NCSI interface down
[  176.400000] random: crng init done

Any suggestions would be appreciated.

Thanks,
Patrick

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linux 4.10 on ast2400
  2017-11-07  1:12 linux 4.10 on ast2400 Patrick Venture
@ 2017-11-07  9:39 ` Joel Stanley
  2017-11-07  9:56   ` Joel Stanley
  0 siblings, 1 reply; 8+ messages in thread
From: Joel Stanley @ 2017-11-07  9:39 UTC (permalink / raw)
  To: Patrick Venture; +Cc: OpenBMC Maillist, Nancy Yuen

On Tue, Nov 7, 2017 at 11:42 AM, Patrick Venture <venture@google.com> wrote:
> I've been doing testing with linux 4.10 on the ast2400 and on some
> percentage (20% of systems) when they boot they're not able to really
> launch applications.  The one we see failing is agetty, but ipmid also
> ends up not running.  Here is the log from what we're seeing on the
> quanta-q71l:
>
> [  OK  ] Started Clear one time boot overrides.
> [  OK  ] Found device /dev/ttyS4.
> [  OK  ] Found device /dev/ttyVUART0.
> [   42.360000] 8021q: adding VLAN 0 to HW filter on device eth1
> [  OK  ] Started Network Service.
> [   42.420000] 8021q: adding VLAN 0 to HW filter on device eth0
> [  OK  ] Started Phosphor Inventory Manager.
> [  OK  ] Started Phosphor Settings Daemon.
> [  OK  ] Reached target Network.
>          Starting Permit User Sessions...
> [  OK  ] Started Lightweight SLP Server.
> [  OK  ] Started Phosphor Console Muxer listening on device /dev/ttyVUART0.
> [  OK  ] Started Phosphor Inband IPMI.
> [  OK  ] Created slice system-xyz.openbmc_project.Hwmon.slice.
> [  OK  ] Started Permit User Sessions.
> [  OK  ] Started Serial Getty on ttyS4.
> [  OK  ] Reached target Login Prompts.
> [  OK  ] Reached target Multi-User System.
> [   44.530000] ftgmac100 1e680000.ethernet eth1: NCSI interface down
> [   45.800000] ftgmac100 1e660000.ethernet eth0: NCSI interface down
> [   49.430000] Unable to handle kernel paging request at virtual
> address e1a00006
> [   49.430000] pgd = 85354000
> [   49.430000] [e1a00006] *pgd=00000000
> [   49.430000] Internal error: Oops: 1 [#1] ARM
> [   49.430000] CPU: 0 PID: 932 Comm: (agetty) Not tainted
> 4.10.17-eced538e6233c50729cc107958596a1443947ba2 #1

This SHA isn't in the OpenBMC dev-4.10 tree. Where are you getting
your kernel sources from?

Wherever you've grabbed it from it's out of date as the line numbers
don't quite make sense.

> [   49.430000] Hardware name: ASpeed SoC
> [   49.430000] task: 86e1c000 task.stack: 858f6000
> [   49.430000] PC is at unlink_anon_vmas+0x98/0x1b0

We have seen memory corruption when running under Qemu. This is the
first time I've had a report of it happening on hardware.

 https://github.com/openbmc/qemu/issues/9

Can you share some information with how you're booting?

Are you netbooting?

Which u-boot tree are you using? Does it enable networking before
jumping to the kenrel? Or trigger any other kinds of DMA?

Cheers,

Joel

> [   49.430000] LR is at 0x853ea140
> [   49.430000] pc : [<801e0e90>]    lr : [<853ea140>]    psr: 80000013
> [   49.430000] sp : 858f7d58  ip : 00000000  fp : 858f7d8c
> [   49.430000] r10: 8081ca58  r9 : 858572fc  r8 : 858572c0
> [   49.430000] r7 : 85154280  r6 : e1a00006  r5 : 853dfff8  r4 : 853dfff8
> [   49.430000] r3 : 85154280  r2 : 853e0000  r1 : 85c08e9c  r0 : 00000000
> [   49.430000] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
> [   49.430000] Control: 0005317f  Table: 45354000  DAC: 00000051
> [   49.430000] Process (agetty) (pid: 932, stack limit = 0x858f6190)
> [   49.430000] Stack: (0x858f7d58 to 0x858f8000)
> [   49.430000] 7d40:
>     801dacd8 801db374
> [   49.430000] 7d60: 858f7d8c 85857318 858572c0 858f7dcc 00002000
> 00000000 769c3000 858571b8
> [   49.430000] 7d80: 858f7dc4 858f7d90 801d5da0 801e0e08 769c3000
> 801d6b68 00000000 852e49a0
> [   49.430000] 7da0: 858c1520 8080300c 85228e00 858c1520 86fa9040
> 000003a4 858f7e34 858f7dc8
> [   49.430000] 7dc0: 801dc724 801d5d1c 00000000 858c1520 00000001
> 00000000 00000000 ffffffff
> [   49.430000] 7de0: 00000000 8010f6b8 0000030d 00000400 85353000
> 00000800 801ef480 86e1c000
> [   49.430000] 7e00: 858c1520 86e1c000 85228e00 00000000 86fa9040
> 1f69b357 858c1a00 858c1520
> [   49.430000] 7e20: 00000000 86e1c000 858f7e4c 858f7e38 8010f47c
> 801dc628 858c1520 858c1a00
> [   49.430000] 7e40: 858f7e84 858f7e50 801f6098 8010f44c 858f7e84
> 858f7e60 80239ad8 8587ff34
> [   49.430000] 7e60: 8587ff00 00000001 8524f520 00000000 859d34e0
> 000003a4 858f7f0c 858f7e88
> [   49.430000] 7e80: 80239ef0 801f5ca4 8587ff34 00000034 858f7e88
> 86e1c000 00000000 8010b4fc
> [   49.430000] 7ea0: 858f7ec4 858f7eb0 8010b4fc 801c8f74 00000000
> 86c20c00 85c6f180 859d34e0
> [   49.430000] 7ec0: 85228e00 85228e00 859255d8 86bbfe38 8581a000
> 858c1a00 858f7ef4 1f69b357
> [   49.430000] 7ee0: 801f9994 85228e00 fffffff8 8081cae4 8081f914
> 8581a000 000003a4 000003a4
> [   49.430000] 7f00: 858f7f2c 858f7f10 801f6734 80239bd4 85228e00
> 86e1c000 ffffe000 00000000
> [   49.430000] 7f20: 858f7f74 858f7f30 801f6afc 801f66ec 80845538
> 8080300c 55d72618 55df14c0
> [   49.430000] 7f40: 00000000 1f69b357 801f9648 55df14c0 55d72618
> 55df14c0 0000000b 80102644
> [   49.430000] 7f60: 858f6000 00000000 858f7f8c 858f7f78 801f6e1c
> 801f67ac 00000000 801f960c
> [   49.430000] 7f80: 858f7fa4 858f7f90 801f707c 801f6df8 00000010
> 55d72618 00000000 858f7fa8
> [   49.430000] 7fa0: 801024a0 801f7060 00000010 55d72618 55d93710
> 55df14c0 55d72618 55da9ed8
> [   49.430000] 7fc0: 00000010 55d72618 55df14c0 0000000b 7e99e758
> 55df17e0 55da8788 7e99e68c
> [   49.430000] 7fe0: 54c3573c 7e99e464 54b741fc 76c37d3c 60000010
> 55d93710 477fd871 477fdc71
> [   49.430000] [<801e0e90>] (unlink_anon_vmas) from [<801d5da0>]
> (free_pgtables+0x94/0xb0)
> [   49.430000] [<801d5da0>] (free_pgtables) from [<801dc724>]
> (exit_mmap+0x10c/0x220)
> [   49.430000] [<801dc724>] (exit_mmap) from [<8010f47c>] (mmput+0x40/0xc8)
> [   49.430000] [<8010f47c>] (mmput) from [<801f6098>]
> (flush_old_exec+0x404/0x5cc)
> [   49.430000] [<801f6098>] (flush_old_exec) from [<80239ef0>]
> (load_elf_binary+0x32c/0x1068)
> [   49.430000] [<80239ef0>] (load_elf_binary) from [<801f6734>]
> (search_binary_handler+0x58/0xc0)
> [   49.430000] [<801f6734>] (search_binary_handler) from [<801f6afc>]
> (do_execveat_common+0x360/0x64c)
> [   49.430000] [<801f6afc>] (do_execveat_common) from [<801f6e1c>]
> (do_execve+0x34/0x3c)
> [   49.430000] [<801f6e1c>] (do_execve) from [<801f707c>] (SyS_execve+0x2c/0x30)
> [   49.430000] [<801f707c>] (SyS_execve) from [<801024a0>]
> (ret_fast_syscall+0x0/0x38)
> [   49.430000] Code: 1a00002f e24bd028 e89daff0 e5946004 (e5967000)
> [   49.880000] ---[ end trace 587620580325ca16 ]---
> [  OK  ] Stopped Serial Getty on ttyS4.
> [   61.100000] ftgmac100 1e660000.ethernet eth0: no vlan ids left to set
> [   61.100000] ------------[ cut here ]------------
> [   61.100000] WARNING: CPU: 0 PID: 936 at
> /build/tmp/quanta/kernel-source/net/ncsi/ncsi-manage.c:256
> ncsi_start_channel_monitor+0x54/0x8c
> [   61.100000] CPU: 0 PID: 936 Comm: kworker/0:4 Tainted: G      D
>     4.10.17-eced538e6233c50729cc107958596a1443947ba2 #1
> [   61.100000] Hardware name: ASpeed SoC
> [   61.100000] Workqueue: events ncsi_dev_work
> [   61.100000] [<801087d4>] (unwind_backtrace) from [<80105f44>]
> (show_stack+0x20/0x24)
> [   61.100000] [<80105f44>] (show_stack) from [<802e070c>]
> (dump_stack+0x20/0x28)
> [   61.100000] [<802e070c>] (dump_stack) from [<80111b7c>] (__warn+0xe8/0x104)
> [   61.100000] [<80111b7c>] (__warn) from [<80111cb0>]
> (warn_slowpath_null+0x30/0x38)
> [   61.100000] [<80111cb0>] (warn_slowpath_null) from [<804c5370>]
> (ncsi_start_channel_monitor+0x54/0x8c)
> [   61.100000] [<804c5370>] (ncsi_start_channel_monitor) from
> [<804c64cc>] (ncsi_configure_channel+0x4e4/0x568)
> [   61.100000] [<804c64cc>] (ncsi_configure_channel) from [<804c6d18>]
> (ncsi_dev_work+0x3b8/0x3e8)
> [   61.100000] [<804c6d18>] (ncsi_dev_work) from [<80127130>]
> (process_one_work+0x1ac/0x384)
> [   61.100000] [<80127130>] (process_one_work) from [<801275f8>]
> (worker_thread+0x2b0/0x428)
> [   61.100000] [<801275f8>] (worker_thread) from [<8012cde4>]
> (kthread+0x13c/0x154)
> [   61.100000] [<8012cde4>] (kthread) from [<80102550>]
> (ret_from_fork+0x14/0x24)
> [   61.100000] ---[ end trace 587620580325ca17 ]---
> [   61.250000] ftgmac100 1e660000.ethernet eth0: NCSI interface down
> [  176.400000] random: crng init done
>
> Any suggestions would be appreciated.
>
> Thanks,
> Patrick

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linux 4.10 on ast2400
  2017-11-07  9:39 ` Joel Stanley
@ 2017-11-07  9:56   ` Joel Stanley
  2017-11-07 15:29     ` Patrick Venture
  2017-11-09 19:47     ` Patrick Venture
  0 siblings, 2 replies; 8+ messages in thread
From: Joel Stanley @ 2017-11-07  9:56 UTC (permalink / raw)
  To: Patrick Venture; +Cc: OpenBMC Maillist, Nancy Yuen

On Tue, Nov 7, 2017 at 8:09 PM, Joel Stanley <joel@jms.id.au> wrote:
> On Tue, Nov 7, 2017 at 11:42 AM, Patrick Venture <venture@google.com> wrote:
>> I've been doing testing with linux 4.10 on the ast2400 and on some
>> percentage (20% of systems) when they boot they're not able to really
>> launch applications.  The one we see failing is agetty, but ipmid also
>> ends up not running.  Here is the log from what we're seeing on the
>> quanta-q71l:
>>
>> [  OK  ] Started Clear one time boot overrides.
>> [  OK  ] Found device /dev/ttyS4.
>> [  OK  ] Found device /dev/ttyVUART0.
>> [   42.360000] 8021q: adding VLAN 0 to HW filter on device eth1
>> [  OK  ] Started Network Service.
>> [   42.420000] 8021q: adding VLAN 0 to HW filter on device eth0
>> [  OK  ] Started Phosphor Inventory Manager.
>> [  OK  ] Started Phosphor Settings Daemon.
>> [  OK  ] Reached target Network.
>>          Starting Permit User Sessions...
>> [  OK  ] Started Lightweight SLP Server.
>> [  OK  ] Started Phosphor Console Muxer listening on device /dev/ttyVUART0.
>> [  OK  ] Started Phosphor Inband IPMI.
>> [  OK  ] Created slice system-xyz.openbmc_project.Hwmon.slice.
>> [  OK  ] Started Permit User Sessions.
>> [  OK  ] Started Serial Getty on ttyS4.
>> [  OK  ] Reached target Login Prompts.
>> [  OK  ] Reached target Multi-User System.
>> [   44.530000] ftgmac100 1e680000.ethernet eth1: NCSI interface down
>> [   45.800000] ftgmac100 1e660000.ethernet eth0: NCSI interface down
>> [   49.430000] Unable to handle kernel paging request at virtual
>> address e1a00006
>> [   49.430000] pgd = 85354000
>> [   49.430000] [e1a00006] *pgd=00000000
>> [   49.430000] Internal error: Oops: 1 [#1] ARM
>> [   49.430000] CPU: 0 PID: 932 Comm: (agetty) Not tainted
>> 4.10.17-eced538e6233c50729cc107958596a1443947ba2 #1
>
> This SHA isn't in the OpenBMC dev-4.10 tree. Where are you getting
> your kernel sources from?
>
> Wherever you've grabbed it from it's out of date as the line numbers
> don't quite make sense.
>
>> [   49.430000] Hardware name: ASpeed SoC
>> [   49.430000] task: 86e1c000 task.stack: 858f6000
>> [   49.430000] PC is at unlink_anon_vmas+0x98/0x1b0
>
> We have seen memory corruption when running under Qemu. This is the
> first time I've had a report of it happening on hardware.
>
>  https://github.com/openbmc/qemu/issues/9
>
> Can you share some information with how you're booting?
>
> Are you netbooting?
>
> Which u-boot tree are you using? Does it enable networking before
> jumping to the kenrel? Or trigger any other kinds of DMA?

Can you reproduce with some debugging turned on? Build your kernel with:

 DEBUG_LIST
 PAGE_POISONING
 DEBUG_PAGEALLOC
 DEBUG_SLAB

Or even more. Take a look through the Kernel hacking menu in
menuconfig and enable things until the system slows down too much to
reproduce the issue :)

Does it reproduce if you disable the FTGMAC100 devices (set them to
status = "disabled" in your device tree, or disable them in the kernel
config)?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linux 4.10 on ast2400
  2017-11-07  9:56   ` Joel Stanley
@ 2017-11-07 15:29     ` Patrick Venture
  2017-11-09 19:47     ` Patrick Venture
  1 sibling, 0 replies; 8+ messages in thread
From: Patrick Venture @ 2017-11-07 15:29 UTC (permalink / raw)
  To: Joel Stanley; +Cc: OpenBMC Maillist, Nancy Yuen

On Tue, Nov 7, 2017 at 1:56 AM, Joel Stanley <joel@jms.id.au> wrote:
> On Tue, Nov 7, 2017 at 8:09 PM, Joel Stanley <joel@jms.id.au> wrote:
>> On Tue, Nov 7, 2017 at 11:42 AM, Patrick Venture <venture@google.com> wrote:
>>> I've been doing testing with linux 4.10 on the ast2400 and on some
>>> percentage (20% of systems) when they boot they're not able to really
>>> launch applications.  The one we see failing is agetty, but ipmid also
>>> ends up not running.  Here is the log from what we're seeing on the
>>> quanta-q71l:
>>>
>>> [  OK  ] Started Clear one time boot overrides.
>>> [  OK  ] Found device /dev/ttyS4.
>>> [  OK  ] Found device /dev/ttyVUART0.
>>> [   42.360000] 8021q: adding VLAN 0 to HW filter on device eth1
>>> [  OK  ] Started Network Service.
>>> [   42.420000] 8021q: adding VLAN 0 to HW filter on device eth0
>>> [  OK  ] Started Phosphor Inventory Manager.
>>> [  OK  ] Started Phosphor Settings Daemon.
>>> [  OK  ] Reached target Network.
>>>          Starting Permit User Sessions...
>>> [  OK  ] Started Lightweight SLP Server.
>>> [  OK  ] Started Phosphor Console Muxer listening on device /dev/ttyVUART0.
>>> [  OK  ] Started Phosphor Inband IPMI.
>>> [  OK  ] Created slice system-xyz.openbmc_project.Hwmon.slice.
>>> [  OK  ] Started Permit User Sessions.
>>> [  OK  ] Started Serial Getty on ttyS4.
>>> [  OK  ] Reached target Login Prompts.
>>> [  OK  ] Reached target Multi-User System.
>>> [   44.530000] ftgmac100 1e680000.ethernet eth1: NCSI interface down
>>> [   45.800000] ftgmac100 1e660000.ethernet eth0: NCSI interface down
>>> [   49.430000] Unable to handle kernel paging request at virtual
>>> address e1a00006
>>> [   49.430000] pgd = 85354000
>>> [   49.430000] [e1a00006] *pgd=00000000
>>> [   49.430000] Internal error: Oops: 1 [#1] ARM
>>> [   49.430000] CPU: 0 PID: 932 Comm: (agetty) Not tainted
>>> 4.10.17-eced538e6233c50729cc107958596a1443947ba2 #1
>>
>> This SHA isn't in the OpenBMC dev-4.10 tree. Where are you getting
>> your kernel sources from?

We're on a branch based from the dev-4.10 tree.  It's just a branch
with a few extra drivers, etc -- for a different platform, actually.
So not compiled here, this should be nearly identical as dev-4.10 for
quanta-q71l's (ast2400 defconfig).

>>
>> Wherever you've grabbed it from it's out of date as the line numbers
>> don't quite make sense.
>>
>>> [   49.430000] Hardware name: ASpeed SoC
>>> [   49.430000] task: 86e1c000 task.stack: 858f6000
>>> [   49.430000] PC is at unlink_anon_vmas+0x98/0x1b0
>>
>> We have seen memory corruption when running under Qemu. This is the
>> first time I've had a report of it happening on hardware.
>>
>>  https://github.com/openbmc/qemu/issues/9

Looks like you were seeing this with 4.7 kernel in qemu as well.

We're seeing it on about 20 machines and not every boot.

>>
>> Can you share some information with how you're booting?

Booting from flash chip.

>>
>> Are you netbooting?
>>
>> Which u-boot tree are you using? Does it enable networking before
>> jumping to the kenrel? Or trigger any other kinds of DMA?

I'll have to check, we do have minor customization in u-boot, but I'll
check whether it does anything with DMA.

>
> Can you reproduce with some debugging turned on? Build your kernel with:
>
>  DEBUG_LIST
>  PAGE_POISONING
>  DEBUG_PAGEALLOC
>  DEBUG_SLAB

I'll give that a try.

>
> Or even more. Take a look through the Kernel hacking menu in
> menuconfig and enable things until the system slows down too much to
> reproduce the issue :)
>
> Does it reproduce if you disable the FTGMAC100 devices (set them to
> status = "disabled" in your device tree, or disable them in the kernel
> config)?

Are you suggesting this because of the ncsi crash?  Because that's
always happened for us on these systems, even with the 4.7 kernel --
which has been very stable.

Patrick

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linux 4.10 on ast2400
  2017-11-07  9:56   ` Joel Stanley
  2017-11-07 15:29     ` Patrick Venture
@ 2017-11-09 19:47     ` Patrick Venture
  2017-12-18 21:57       ` Patrick Venture
  1 sibling, 1 reply; 8+ messages in thread
From: Patrick Venture @ 2017-11-09 19:47 UTC (permalink / raw)
  To: Joel Stanley; +Cc: OpenBMC Maillist, Nancy Yuen

I added these configurations and after ~10 reboots it wasn't
reproducing, but I'll keep an eye out and update over the coming days.

Thanks!

On Tue, Nov 7, 2017 at 1:56 AM, Joel Stanley <joel@jms.id.au> wrote:
> On Tue, Nov 7, 2017 at 8:09 PM, Joel Stanley <joel@jms.id.au> wrote:
>> On Tue, Nov 7, 2017 at 11:42 AM, Patrick Venture <venture@google.com> wrote:
>>> I've been doing testing with linux 4.10 on the ast2400 and on some
>>> percentage (20% of systems) when they boot they're not able to really
>>> launch applications.  The one we see failing is agetty, but ipmid also
>>> ends up not running.  Here is the log from what we're seeing on the
>>> quanta-q71l:
>>>
>>> [  OK  ] Started Clear one time boot overrides.
>>> [  OK  ] Found device /dev/ttyS4.
>>> [  OK  ] Found device /dev/ttyVUART0.
>>> [   42.360000] 8021q: adding VLAN 0 to HW filter on device eth1
>>> [  OK  ] Started Network Service.
>>> [   42.420000] 8021q: adding VLAN 0 to HW filter on device eth0
>>> [  OK  ] Started Phosphor Inventory Manager.
>>> [  OK  ] Started Phosphor Settings Daemon.
>>> [  OK  ] Reached target Network.
>>>          Starting Permit User Sessions...
>>> [  OK  ] Started Lightweight SLP Server.
>>> [  OK  ] Started Phosphor Console Muxer listening on device /dev/ttyVUART0.
>>> [  OK  ] Started Phosphor Inband IPMI.
>>> [  OK  ] Created slice system-xyz.openbmc_project.Hwmon.slice.
>>> [  OK  ] Started Permit User Sessions.
>>> [  OK  ] Started Serial Getty on ttyS4.
>>> [  OK  ] Reached target Login Prompts.
>>> [  OK  ] Reached target Multi-User System.
>>> [   44.530000] ftgmac100 1e680000.ethernet eth1: NCSI interface down
>>> [   45.800000] ftgmac100 1e660000.ethernet eth0: NCSI interface down
>>> [   49.430000] Unable to handle kernel paging request at virtual
>>> address e1a00006
>>> [   49.430000] pgd = 85354000
>>> [   49.430000] [e1a00006] *pgd=00000000
>>> [   49.430000] Internal error: Oops: 1 [#1] ARM
>>> [   49.430000] CPU: 0 PID: 932 Comm: (agetty) Not tainted
>>> 4.10.17-eced538e6233c50729cc107958596a1443947ba2 #1
>>
>> This SHA isn't in the OpenBMC dev-4.10 tree. Where are you getting
>> your kernel sources from?
>>
>> Wherever you've grabbed it from it's out of date as the line numbers
>> don't quite make sense.
>>
>>> [   49.430000] Hardware name: ASpeed SoC
>>> [   49.430000] task: 86e1c000 task.stack: 858f6000
>>> [   49.430000] PC is at unlink_anon_vmas+0x98/0x1b0
>>
>> We have seen memory corruption when running under Qemu. This is the
>> first time I've had a report of it happening on hardware.
>>
>>  https://github.com/openbmc/qemu/issues/9
>>
>> Can you share some information with how you're booting?
>>
>> Are you netbooting?
>>
>> Which u-boot tree are you using? Does it enable networking before
>> jumping to the kenrel? Or trigger any other kinds of DMA?
>
> Can you reproduce with some debugging turned on? Build your kernel with:
>
>  DEBUG_LIST
>  PAGE_POISONING
>  DEBUG_PAGEALLOC
>  DEBUG_SLAB
>
> Or even more. Take a look through the Kernel hacking menu in
> menuconfig and enable things until the system slows down too much to
> reproduce the issue :)
>
> Does it reproduce if you disable the FTGMAC100 devices (set them to
> status = "disabled" in your device tree, or disable them in the kernel
> config)?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linux 4.10 on ast2400
  2017-11-09 19:47     ` Patrick Venture
@ 2017-12-18 21:57       ` Patrick Venture
  2017-12-18 23:11         ` David Duffey (dduffey)
  0 siblings, 1 reply; 8+ messages in thread
From: Patrick Venture @ 2017-12-18 21:57 UTC (permalink / raw)
  To: Joel Stanley; +Cc: OpenBMC Maillist, Nancy Yuen

I loaded 4.10 with some debug memory stuff, but I noticed that each
reboot could have wildly different free memory.  So, here's the
results from just dumping the file immediately after boot and then
rebooting.

root@quanta-q71l:~# cat /proc/meminfo
MemTotal:         115076 kB
MemFree:           42228 kB

root@quanta-q71l:~# cat /proc/meminfo
MemTotal:         115076 kB
MemFree:            1668 kB

root@quanta-q71l:~# cat /proc/meminfo
MemTotal:         115076 kB
MemFree:            1876 kB

root@quanta-q71l:~# cat /proc/meminfo
MemTotal:         115076 kB
MemFree:           27464 kB

root@quanta-q71l:~# cat /proc/meminfo
MemTotal:         115076 kB
MemFree:           12140 kB

root@quanta-q71l:~# cat /proc/meminfo
MemTotal:         115076 kB
MemFree:            2084 kB

On Thu, Nov 9, 2017 at 11:47 AM, Patrick Venture <venture@google.com> wrote:
> I added these configurations and after ~10 reboots it wasn't
> reproducing, but I'll keep an eye out and update over the coming days.
>
> Thanks!
>
> On Tue, Nov 7, 2017 at 1:56 AM, Joel Stanley <joel@jms.id.au> wrote:
>> On Tue, Nov 7, 2017 at 8:09 PM, Joel Stanley <joel@jms.id.au> wrote:
>>> On Tue, Nov 7, 2017 at 11:42 AM, Patrick Venture <venture@google.com> wrote:
>>>> I've been doing testing with linux 4.10 on the ast2400 and on some
>>>> percentage (20% of systems) when they boot they're not able to really
>>>> launch applications.  The one we see failing is agetty, but ipmid also
>>>> ends up not running.  Here is the log from what we're seeing on the
>>>> quanta-q71l:
>>>>
>>>> [  OK  ] Started Clear one time boot overrides.
>>>> [  OK  ] Found device /dev/ttyS4.
>>>> [  OK  ] Found device /dev/ttyVUART0.
>>>> [   42.360000] 8021q: adding VLAN 0 to HW filter on device eth1
>>>> [  OK  ] Started Network Service.
>>>> [   42.420000] 8021q: adding VLAN 0 to HW filter on device eth0
>>>> [  OK  ] Started Phosphor Inventory Manager.
>>>> [  OK  ] Started Phosphor Settings Daemon.
>>>> [  OK  ] Reached target Network.
>>>>          Starting Permit User Sessions...
>>>> [  OK  ] Started Lightweight SLP Server.
>>>> [  OK  ] Started Phosphor Console Muxer listening on device /dev/ttyVUART0.
>>>> [  OK  ] Started Phosphor Inband IPMI.
>>>> [  OK  ] Created slice system-xyz.openbmc_project.Hwmon.slice.
>>>> [  OK  ] Started Permit User Sessions.
>>>> [  OK  ] Started Serial Getty on ttyS4.
>>>> [  OK  ] Reached target Login Prompts.
>>>> [  OK  ] Reached target Multi-User System.
>>>> [   44.530000] ftgmac100 1e680000.ethernet eth1: NCSI interface down
>>>> [   45.800000] ftgmac100 1e660000.ethernet eth0: NCSI interface down
>>>> [   49.430000] Unable to handle kernel paging request at virtual
>>>> address e1a00006
>>>> [   49.430000] pgd = 85354000
>>>> [   49.430000] [e1a00006] *pgd=00000000
>>>> [   49.430000] Internal error: Oops: 1 [#1] ARM
>>>> [   49.430000] CPU: 0 PID: 932 Comm: (agetty) Not tainted
>>>> 4.10.17-eced538e6233c50729cc107958596a1443947ba2 #1
>>>
>>> This SHA isn't in the OpenBMC dev-4.10 tree. Where are you getting
>>> your kernel sources from?
>>>
>>> Wherever you've grabbed it from it's out of date as the line numbers
>>> don't quite make sense.
>>>
>>>> [   49.430000] Hardware name: ASpeed SoC
>>>> [   49.430000] task: 86e1c000 task.stack: 858f6000
>>>> [   49.430000] PC is at unlink_anon_vmas+0x98/0x1b0
>>>
>>> We have seen memory corruption when running under Qemu. This is the
>>> first time I've had a report of it happening on hardware.
>>>
>>>  https://github.com/openbmc/qemu/issues/9
>>>
>>> Can you share some information with how you're booting?
>>>
>>> Are you netbooting?
>>>
>>> Which u-boot tree are you using? Does it enable networking before
>>> jumping to the kenrel? Or trigger any other kinds of DMA?
>>
>> Can you reproduce with some debugging turned on? Build your kernel with:
>>
>>  DEBUG_LIST
>>  PAGE_POISONING
>>  DEBUG_PAGEALLOC
>>  DEBUG_SLAB
>>
>> Or even more. Take a look through the Kernel hacking menu in
>> menuconfig and enable things until the system slows down too much to
>> reproduce the issue :)
>>
>> Does it reproduce if you disable the FTGMAC100 devices (set them to
>> status = "disabled" in your device tree, or disable them in the kernel
>> config)?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: linux 4.10 on ast2400
  2017-12-18 21:57       ` Patrick Venture
@ 2017-12-18 23:11         ` David Duffey (dduffey)
  2017-12-19  1:09           ` Patrick Venture
  0 siblings, 1 reply; 8+ messages in thread
From: David Duffey (dduffey) @ 2017-12-18 23:11 UTC (permalink / raw)
  To: Patrick Venture, Joel Stanley; +Cc: OpenBMC Maillist


/proc/zoneinfo may provide some useful hints (if it exists)

Not exactly applicable but on x86 hosts I've seen hardware (via e820) reserve different amounts and addresses from boot-to-boot.  Additionally there is some logic in the kernel to reserve those ranges as certain boundaries so a some addresses would cause more reserved memory than others. 

-----Original Message-----
From: openbmc [mailto:openbmc-bounces+dduffey=cisco.com@lists.ozlabs.org] On Behalf Of Patrick Venture
Sent: Monday, December 18, 2017 3:58 PM
To: Joel Stanley <joel@jms.id.au>
Cc: OpenBMC Maillist <openbmc@lists.ozlabs.org>
Subject: Re: linux 4.10 on ast2400

I loaded 4.10 with some debug memory stuff, but I noticed that each reboot could have wildly different free memory.  So, here's the results from just dumping the file immediately after boot and then rebooting.

root@quanta-q71l:~# cat /proc/meminfo
MemTotal:         115076 kB
MemFree:           42228 kB

root@quanta-q71l:~# cat /proc/meminfo
MemTotal:         115076 kB
MemFree:            1668 kB

root@quanta-q71l:~# cat /proc/meminfo
MemTotal:         115076 kB
MemFree:            1876 kB

root@quanta-q71l:~# cat /proc/meminfo
MemTotal:         115076 kB
MemFree:           27464 kB

root@quanta-q71l:~# cat /proc/meminfo
MemTotal:         115076 kB
MemFree:           12140 kB

root@quanta-q71l:~# cat /proc/meminfo
MemTotal:         115076 kB
MemFree:            2084 kB

On Thu, Nov 9, 2017 at 11:47 AM, Patrick Venture <venture@google.com> wrote:
> I added these configurations and after ~10 reboots it wasn't 
> reproducing, but I'll keep an eye out and update over the coming days.
>
> Thanks!
>
> On Tue, Nov 7, 2017 at 1:56 AM, Joel Stanley <joel@jms.id.au> wrote:
>> On Tue, Nov 7, 2017 at 8:09 PM, Joel Stanley <joel@jms.id.au> wrote:
>>> On Tue, Nov 7, 2017 at 11:42 AM, Patrick Venture <venture@google.com> wrote:
>>>> I've been doing testing with linux 4.10 on the ast2400 and on some 
>>>> percentage (20% of systems) when they boot they're not able to 
>>>> really launch applications.  The one we see failing is agetty, but 
>>>> ipmid also ends up not running.  Here is the log from what we're 
>>>> seeing on the
>>>> quanta-q71l:
>>>>
>>>> [  OK  ] Started Clear one time boot overrides.
>>>> [  OK  ] Found device /dev/ttyS4.
>>>> [  OK  ] Found device /dev/ttyVUART0.
>>>> [   42.360000] 8021q: adding VLAN 0 to HW filter on device eth1
>>>> [  OK  ] Started Network Service.
>>>> [   42.420000] 8021q: adding VLAN 0 to HW filter on device eth0
>>>> [  OK  ] Started Phosphor Inventory Manager.
>>>> [  OK  ] Started Phosphor Settings Daemon.
>>>> [  OK  ] Reached target Network.
>>>>          Starting Permit User Sessions...
>>>> [  OK  ] Started Lightweight SLP Server.
>>>> [  OK  ] Started Phosphor Console Muxer listening on device /dev/ttyVUART0.
>>>> [  OK  ] Started Phosphor Inband IPMI.
>>>> [  OK  ] Created slice system-xyz.openbmc_project.Hwmon.slice.
>>>> [  OK  ] Started Permit User Sessions.
>>>> [  OK  ] Started Serial Getty on ttyS4.
>>>> [  OK  ] Reached target Login Prompts.
>>>> [  OK  ] Reached target Multi-User System.
>>>> [   44.530000] ftgmac100 1e680000.ethernet eth1: NCSI interface down
>>>> [   45.800000] ftgmac100 1e660000.ethernet eth0: NCSI interface down
>>>> [   49.430000] Unable to handle kernel paging request at virtual
>>>> address e1a00006
>>>> [   49.430000] pgd = 85354000
>>>> [   49.430000] [e1a00006] *pgd=00000000
>>>> [   49.430000] Internal error: Oops: 1 [#1] ARM
>>>> [   49.430000] CPU: 0 PID: 932 Comm: (agetty) Not tainted
>>>> 4.10.17-eced538e6233c50729cc107958596a1443947ba2 #1
>>>
>>> This SHA isn't in the OpenBMC dev-4.10 tree. Where are you getting 
>>> your kernel sources from?
>>>
>>> Wherever you've grabbed it from it's out of date as the line numbers 
>>> don't quite make sense.
>>>
>>>> [   49.430000] Hardware name: ASpeed SoC
>>>> [   49.430000] task: 86e1c000 task.stack: 858f6000
>>>> [   49.430000] PC is at unlink_anon_vmas+0x98/0x1b0
>>>
>>> We have seen memory corruption when running under Qemu. This is the 
>>> first time I've had a report of it happening on hardware.
>>>
>>>  https://github.com/openbmc/qemu/issues/9
>>>
>>> Can you share some information with how you're booting?
>>>
>>> Are you netbooting?
>>>
>>> Which u-boot tree are you using? Does it enable networking before 
>>> jumping to the kenrel? Or trigger any other kinds of DMA?
>>
>> Can you reproduce with some debugging turned on? Build your kernel with:
>>
>>  DEBUG_LIST
>>  PAGE_POISONING
>>  DEBUG_PAGEALLOC
>>  DEBUG_SLAB
>>
>> Or even more. Take a look through the Kernel hacking menu in 
>> menuconfig and enable things until the system slows down too much to 
>> reproduce the issue :)
>>
>> Does it reproduce if you disable the FTGMAC100 devices (set them to 
>> status = "disabled" in your device tree, or disable them in the 
>> kernel config)?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linux 4.10 on ast2400
  2017-12-18 23:11         ` David Duffey (dduffey)
@ 2017-12-19  1:09           ` Patrick Venture
  0 siblings, 0 replies; 8+ messages in thread
From: Patrick Venture @ 2017-12-19  1:09 UTC (permalink / raw)
  To: David Duffey (dduffey); +Cc: Joel Stanley, OpenBMC Maillist

I'll check it out -- I don't imagine the kernel is deliberately
reserving this variation of memory -- I believe some driver is
corrupting memory.  With the 4.7 kernel running, there's a massive
memory leak but the system always starts with approximately the same
amount.

Patrick

On Mon, Dec 18, 2017 at 3:11 PM, David Duffey (dduffey)
<dduffey@cisco.com> wrote:
>
> /proc/zoneinfo may provide some useful hints (if it exists)
>
> Not exactly applicable but on x86 hosts I've seen hardware (via e820) reserve different amounts and addresses from boot-to-boot.  Additionally there is some logic in the kernel to reserve those ranges as certain boundaries so a some addresses would cause more reserved memory than others.
>
> -----Original Message-----
> From: openbmc [mailto:openbmc-bounces+dduffey=cisco.com@lists.ozlabs.org] On Behalf Of Patrick Venture
> Sent: Monday, December 18, 2017 3:58 PM
> To: Joel Stanley <joel@jms.id.au>
> Cc: OpenBMC Maillist <openbmc@lists.ozlabs.org>
> Subject: Re: linux 4.10 on ast2400
>
> I loaded 4.10 with some debug memory stuff, but I noticed that each reboot could have wildly different free memory.  So, here's the results from just dumping the file immediately after boot and then rebooting.
>
> root@quanta-q71l:~# cat /proc/meminfo
> MemTotal:         115076 kB
> MemFree:           42228 kB
>
> root@quanta-q71l:~# cat /proc/meminfo
> MemTotal:         115076 kB
> MemFree:            1668 kB
>
> root@quanta-q71l:~# cat /proc/meminfo
> MemTotal:         115076 kB
> MemFree:            1876 kB
>
> root@quanta-q71l:~# cat /proc/meminfo
> MemTotal:         115076 kB
> MemFree:           27464 kB
>
> root@quanta-q71l:~# cat /proc/meminfo
> MemTotal:         115076 kB
> MemFree:           12140 kB
>
> root@quanta-q71l:~# cat /proc/meminfo
> MemTotal:         115076 kB
> MemFree:            2084 kB
>
> On Thu, Nov 9, 2017 at 11:47 AM, Patrick Venture <venture@google.com> wrote:
>> I added these configurations and after ~10 reboots it wasn't
>> reproducing, but I'll keep an eye out and update over the coming days.
>>
>> Thanks!
>>
>> On Tue, Nov 7, 2017 at 1:56 AM, Joel Stanley <joel@jms.id.au> wrote:
>>> On Tue, Nov 7, 2017 at 8:09 PM, Joel Stanley <joel@jms.id.au> wrote:
>>>> On Tue, Nov 7, 2017 at 11:42 AM, Patrick Venture <venture@google.com> wrote:
>>>>> I've been doing testing with linux 4.10 on the ast2400 and on some
>>>>> percentage (20% of systems) when they boot they're not able to
>>>>> really launch applications.  The one we see failing is agetty, but
>>>>> ipmid also ends up not running.  Here is the log from what we're
>>>>> seeing on the
>>>>> quanta-q71l:
>>>>>
>>>>> [  OK  ] Started Clear one time boot overrides.
>>>>> [  OK  ] Found device /dev/ttyS4.
>>>>> [  OK  ] Found device /dev/ttyVUART0.
>>>>> [   42.360000] 8021q: adding VLAN 0 to HW filter on device eth1
>>>>> [  OK  ] Started Network Service.
>>>>> [   42.420000] 8021q: adding VLAN 0 to HW filter on device eth0
>>>>> [  OK  ] Started Phosphor Inventory Manager.
>>>>> [  OK  ] Started Phosphor Settings Daemon.
>>>>> [  OK  ] Reached target Network.
>>>>>          Starting Permit User Sessions...
>>>>> [  OK  ] Started Lightweight SLP Server.
>>>>> [  OK  ] Started Phosphor Console Muxer listening on device /dev/ttyVUART0.
>>>>> [  OK  ] Started Phosphor Inband IPMI.
>>>>> [  OK  ] Created slice system-xyz.openbmc_project.Hwmon.slice.
>>>>> [  OK  ] Started Permit User Sessions.
>>>>> [  OK  ] Started Serial Getty on ttyS4.
>>>>> [  OK  ] Reached target Login Prompts.
>>>>> [  OK  ] Reached target Multi-User System.
>>>>> [   44.530000] ftgmac100 1e680000.ethernet eth1: NCSI interface down
>>>>> [   45.800000] ftgmac100 1e660000.ethernet eth0: NCSI interface down
>>>>> [   49.430000] Unable to handle kernel paging request at virtual
>>>>> address e1a00006
>>>>> [   49.430000] pgd = 85354000
>>>>> [   49.430000] [e1a00006] *pgd=00000000
>>>>> [   49.430000] Internal error: Oops: 1 [#1] ARM
>>>>> [   49.430000] CPU: 0 PID: 932 Comm: (agetty) Not tainted
>>>>> 4.10.17-eced538e6233c50729cc107958596a1443947ba2 #1
>>>>
>>>> This SHA isn't in the OpenBMC dev-4.10 tree. Where are you getting
>>>> your kernel sources from?
>>>>
>>>> Wherever you've grabbed it from it's out of date as the line numbers
>>>> don't quite make sense.
>>>>
>>>>> [   49.430000] Hardware name: ASpeed SoC
>>>>> [   49.430000] task: 86e1c000 task.stack: 858f6000
>>>>> [   49.430000] PC is at unlink_anon_vmas+0x98/0x1b0
>>>>
>>>> We have seen memory corruption when running under Qemu. This is the
>>>> first time I've had a report of it happening on hardware.
>>>>
>>>>  https://github.com/openbmc/qemu/issues/9
>>>>
>>>> Can you share some information with how you're booting?
>>>>
>>>> Are you netbooting?
>>>>
>>>> Which u-boot tree are you using? Does it enable networking before
>>>> jumping to the kenrel? Or trigger any other kinds of DMA?
>>>
>>> Can you reproduce with some debugging turned on? Build your kernel with:
>>>
>>>  DEBUG_LIST
>>>  PAGE_POISONING
>>>  DEBUG_PAGEALLOC
>>>  DEBUG_SLAB
>>>
>>> Or even more. Take a look through the Kernel hacking menu in
>>> menuconfig and enable things until the system slows down too much to
>>> reproduce the issue :)
>>>
>>> Does it reproduce if you disable the FTGMAC100 devices (set them to
>>> status = "disabled" in your device tree, or disable them in the
>>> kernel config)?

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-12-19  1:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-07  1:12 linux 4.10 on ast2400 Patrick Venture
2017-11-07  9:39 ` Joel Stanley
2017-11-07  9:56   ` Joel Stanley
2017-11-07 15:29     ` Patrick Venture
2017-11-09 19:47     ` Patrick Venture
2017-12-18 21:57       ` Patrick Venture
2017-12-18 23:11         ` David Duffey (dduffey)
2017-12-19  1:09           ` Patrick Venture

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.