* arm: xl vcpu-pin leads to oom-killer slashing processes
@ 2018-12-05 10:26 Andrii Anisov
2018-12-05 10:28 ` Andrii Anisov
2018-12-05 10:49 ` Julien Grall
0 siblings, 2 replies; 13+ messages in thread
From: Andrii Anisov @ 2018-12-05 10:26 UTC (permalink / raw)
To: xen-devel
Hello,
On the current
6d8ffac (xenbits/master) xen/arm: gic: Remove duplicated comment in do_sgi
and
7073942 (xenbits/staging, xenbits/smoke, xenbits/coverity-tested/smoke) pci: apply workaround for Intel errata HSE43 and BDF2/BDX2
`xl vcpu-pin` leads to oom-killer becomes mad and slashing all processes:
Poky (Yocto Project Reference Distro) 2.4.2 salvator-x hvc0
salvator-x login: root
Last login: Tue Sep 18 14:40:04 UTC 2018 on tty2
root@salvator-x:~# xl vcpu-list
Name ID VCPU CPU State Time(s) Affinity (Hard / Soft)
Domain-0 0 0 0 -b- 4.9 all / all
Domain-0 0 1 1 r-- 4.5 all / all
root@salvator-x:~# xl vcpu-pin Domain-0 all 0-1
root@salvator-x:~# xl vcpu-list
Name ID VCPU CPU State Time(s) Affinity (Hard / Soft)
Domain-0 0 0 0 r-- 5.0 0-1 / all
Domain-0 0 1 1 r-- 6.1 0-1 / all
root@salvator-x:~# [ 38.041306] systemd invoked oom-killer: gfp_mask=0x14040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, oom_score_adj=0
[ 38.052531] systemd cpuset=/ mems_allowed=0
[ 38.056791] CPU: 1 PID: 1 Comm: systemd Tainted: G O 4.14.35-yocto-standard #1
[ 38.065112] Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
[ 38.073579] Call trace:
[ 38.076095] [<ffff000008089a50>] dump_backtrace+0x0/0x3c8
[ 38.081530] [<ffff000008089e2c>] show_stack+0x14/0x20
[ 38.086631] [<ffff000008ac8640>] dump_stack+0x9c/0xbc
[ 38.091729] [<ffff00000819b048>] dump_header+0x90/0x1e8
[ 38.096996] [<ffff00000819a53c>] oom_kill_process+0x26c/0x568
[ 38.102784] [<ffff00000819ac20>] out_of_memory+0x198/0x4c0
[ 38.108317] [<ffff0000081a07d4>] __alloc_pages_nodemask+0xb5c/0xbf0
[ 38.114622] [<ffff0000081f2a24>] alloc_pages_current+0x7c/0xe8
[ 38.120494] [<ffff0000081fc354>] new_slab+0x404/0x548
[ 38.125590] [<ffff0000081fe6f8>] ___slab_alloc+0x480/0x5f0
[ 38.131121] [<ffff0000081fe88c>] __slab_alloc.isra.23+0x24/0x38
[ 38.137082] [<ffff0000081ff0bc>] kmem_cache_alloc+0x19c/0x1e0
[ 38.142875] [<ffff0000083278ec>] nfs_readhdr_alloc+0x1c/0x30
[ 38.148574] [<ffff00000832667c>] nfs_generic_pg_pgios+0x1c/0xc8
[ 38.154534] [<ffff000008326084>] nfs_pageio_doio+0x34/0x70
[ 38.160065] [<ffff0000083275d0>] nfs_pageio_complete+0x50/0xc8
[ 38.165939] [<ffff0000083286f8>] nfs_readpages+0xd8/0x1b8
[ 38.171385] [<ffff0000081a60a0>] __do_page_cache_readahead+0x180/0x268
[ 38.177955] [<ffff000008197b90>] filemap_fault+0x2c0/0x600
[ 38.183481] [<ffff0000081c8ba8>] __do_fault+0x20/0x78
[ 38.188577] [<ffff0000081ce8a4>] __handle_mm_fault+0xafc/0x1050
[ 38.194538] [<ffff0000081cef24>] handle_mm_fault+0x12c/0x1d8
[ 38.200243] [<ffff00000809d120>] do_page_fault+0x1a8/0x3d0
[ 38.205770] [<ffff00000809d384>] do_translation_fault+0x3c/0x48
[ 38.211732] [<ffff000008081310>] do_mem_abort+0x40/0xa0
[ 38.217002] [<ffff0000080813f8>] do_el0_ia_bp_hardening+0x38/0x98
[ 38.223137] Exception stack(0xffff000008013ec0 to 0xffff000008014000)
[ 38.229618] 3ec0: 0000ffffc49a4860 0000ffffc49a4bb8 0000ffffc49a4bf7 0000000000000000
[ 38.237484] 3ee0: 0000000000000000 0000ffffc49a49a0 000000000000001f 0000ffffaa8cf2a8
[ 38.245342] 3f00: 0000ffffc49a4a60 ffffff80ffffffe8 0000ffffc49a4a80 0000ffffc49a4a80
[ 38.253204] 3f20: 0000000000000018 000000005ba10e62 003ae0c145b56d95 000018ebb6886232
[ 38.261066] 3f40: 0000ffffaa99aae0 0000ffffaa63ad10 0000ffffc49a47fe 0000ffffc49a4860
[ 38.268928] 3f60: 0000ffffc49a4bb8 0000ffffc49a4bf7 0000ffffc49a4bb8 0000ffffaa8d9bc8
[ 38.276791] 3f80: 0000000000000001 0000ffffc49a4bb8 0000ffffaa914f00 0000ffffc49a4bb8
[ 38.284653] 3fa0: 00000000000f4240 0000ffffc49a47d0 0000ffffaa5cfad0 0000ffffc49a47d0
[ 38.292516] 3fc0: 0000ffffaa5cdf50 0000000080000000 0000aaaafb9d1370 00000000ffffffff
[ 38.300378] 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 38.308241] [<ffff0000080834d4>] el0_ia+0x18/0x1c
[ 38.313028] Mem-Info:
[ 38.315331] active_anon:3647 inactive_anon:5174 isolated_anon:0
[ 38.315331] active_file:4 inactive_file:3 isolated_file:0
[ 38.315331] unevictable:3 dirty:0 writeback:0 unstable:0
[ 38.315331] slab_reclaimable:1527 slab_unreclaimable:4757
[ 38.315331] mapped:3350 shmem:5255 pagetables:204 bounce:0
[ 38.315331] free:102395 free_pcp:171 free_cma:90122
[ 38.348452] Node 0 active_anon:14588kB inactive_anon:20696kB active_file:0kB inactive_file:4kB unevictable:12kB isolated(anon):0kB isolated(file):0kB mapped:13344kB dirty:0kB writeback:0kB shmem:21020kB shmems
[ 38.375385] Node 0 DMA free:388912kB min:24368kB low:30460kB high:36552kB active_anon:72kB inactive_anon:0kB active_file:0kB inactive_file:196kB unevictable:0kB writepending:0kB present:1703936kB managed:4267B
[ 38.403038] lowmem_reserve[]: 0 1342 1342
[ 38.407091] Node 0 Normal free:20800kB min:20684kB low:25852kB high:31020kB active_anon:14516kB inactive_anon:20696kB active_file:0kB inactive_file:0kB unevictable:12kB writepending:0kB present:1441792kB manaB
[ 38.435343] lowmem_reserve[]: 0 0 0
[ 38.438885] Node 0 DMA: 38*4kB (MC) 36*8kB (UMC) 12*16kB (UMC) 4*32kB (UE) 14*64kB (UMEC) 9*128kB (UMEC) 3*256kB (ME) 5*512kB (UMEC) 6*1024kB (UME) 4*2048kB (MEC) 90*4096kB (MC) = 389112kB
[ 38.455651] Node 0 Normal: 804*4kB (UMH) 413*8kB (UMH) 263*16kB (UMH) 139*32kB (UMH) 72*64kB (UMH) 9*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20936kB
[ 38.470339] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 38.478803] 5266 total pagecache pages
[ 38.482604] 0 pages in swap cache
[ 38.485974] Swap cache stats: add 0, delete 0, find 0/0
[ 38.491244] Free swap = 0kB
[ 38.494181] Total swap = 0kB
[ 38.497119] 786432 pages RAM
[ 38.500047] 0 pages HighMem/MovableOnly
[ 38.503945] 649566 pages reserved
[ 38.507319] 98304 pages cma reserved
[ 38.510943] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[ 38.519552] [ 1884] 0 1884 3375 405 9 4 0 0 systemd-journal
[ 38.529011] [ 2475] 0 2475 3262 255 8 3 0 -1000 systemd-udevd
[ 38.538336] [ 3098] 997 3098 1523 76 7 3 0 0 systemd-network
[ 38.547844] [ 3099] 999 3099 1237 89 6 3 0 0 avahi-daemon
[ 38.557085] [ 3101] 0 3101 763 18 5 3 0 0 syslogd
[ 38.565897] [ 3102] 998 3102 1115 87 6 3 0 -900 dbus-daemon
[ 38.575057] [ 3103] 999 3103 1204 68 6 3 0 0 avahi-daemon
[ 38.584303] [ 3104] 0 3104 1502 119 7 4 0 0 systemd-logind
[ 38.593719] [ 3106] 0 3106 763 18 5 3 0 0 klogd
[ 38.602360] [ 3125] 0 3125 977 80 6 3 0 0 xenstored
[ 38.611344] [ 3138] 0 3138 1757 123 7 3 0 0 weston-launch
[ 38.620692] [ 3141] 996 3141 1670 76 7 3 0 0 systemd-resolve
[ 38.630200] [ 3159] 0 3159 1802 157 7 3 0 0 systemd
[ 38.639001] [ 3161] 0 3161 579 32 5 3 0 0 agetty
[ 38.647718] [ 3165] 0 3165 2382 344 8 3 0 0 (sd-pam)
[ 38.656618] [ 3194] 0 3194 17196 48 6 3 0 0 xenconsoled
[ 38.665777] [ 3203] 0 3203 17196 57 7 3 0 0 xenconsoled
[ 38.674936] [ 3206] 0 3206 36895 3598 42 3 0 0 weston
[ 38.683661] [ 3229] 0 3229 680 43 5 3 0 0 xenwatchdogd
[ 38.692906] [ 3230] 0 3230 1403 125 7 3 0 0 login
[ 38.701563] [ 3232] 0 3232 2612 431 9 3 0 0 weston-keyboard
[ 38.711075] [ 3234] 0 3234 5359 3150 14 3 0 0 weston-desktop-
[ 38.720556] [ 3237] 0 3237 913 99 5 3 0 0 sh
[ 38.728934] Out of memory: Kill process 3206 (weston) score 25 or sacrifice child
[ 38.736484] Killed process 3234 (weston-desktop-) total-vm:21436kB, anon-rss:1232kB, file-rss:8kB, shmem-rss:11360kB
[ 45.856773] systemd invoked oom-killer: gfp_mask=0x14040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, oom_score_adj=0
[ 45.867983] systemd cpuset=/ mems_allowed=0
--
Sincerely,
Andrii Anisov.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
2018-12-05 10:26 arm: xl vcpu-pin leads to oom-killer slashing processes Andrii Anisov
@ 2018-12-05 10:28 ` Andrii Anisov
2018-12-05 10:49 ` Julien Grall
1 sibling, 0 replies; 13+ messages in thread
From: Andrii Anisov @ 2018-12-05 10:28 UTC (permalink / raw)
To: xen-devel
It happens with credit and credit2 schedulers, with old and new vgic.
On 05.12.18 12:26, Andrii Anisov wrote:
> Hello,
>
> On the current
> 6d8ffac (xenbits/master) xen/arm: gic: Remove duplicated comment in do_sgi
> and
> 7073942 (xenbits/staging, xenbits/smoke, xenbits/coverity-tested/smoke) pci: apply workaround for Intel errata HSE43 and BDF2/BDX2
>
> `xl vcpu-pin` leads to oom-killer becomes mad and slashing all processes:
>
--
Sincerely,
Andrii Anisov.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
2018-12-05 10:26 arm: xl vcpu-pin leads to oom-killer slashing processes Andrii Anisov
2018-12-05 10:28 ` Andrii Anisov
@ 2018-12-05 10:49 ` Julien Grall
2018-12-05 10:59 ` Andrii Anisov
1 sibling, 1 reply; 13+ messages in thread
From: Julien Grall @ 2018-12-05 10:49 UTC (permalink / raw)
To: Andrii Anisov, xen-devel
On 05/12/2018 10:26, Andrii Anisov wrote:
> Hello,
>
> On the current
> 6d8ffac (xenbits/master) xen/arm: gic: Remove duplicated comment in do_sgi
> and
> 7073942 (xenbits/staging, xenbits/smoke, xenbits/coverity-tested/smoke)
> pci: apply workaround for Intel errata HSE43 and BDF2/BDX2
>
> `xl vcpu-pin` leads to oom-killer becomes mad and slashing all processes:
I am not sure to understand what is the relation between the two. What is the
latest Xen commit where the oom-killer does not trigger?
How much memory do you have in Dom0? Do you have any memory hungry process running?
>
> Poky (Yocto Project Reference Distro) 2.4.2 salvator-x hvc0
>
> salvator-x login: root
> Last login: Tue Sep 18 14:40:04 UTC 2018 on tty2
> root@salvator-x:~# xl vcpu-list
> Name ID VCPU CPU State Time(s) Affinity
> (Hard / Soft)
> Domain-0 0 0 0 -b- 4.9 all / all
> Domain-0 0 1 1 r-- 4.5 all / all
> root@salvator-x:~# xl vcpu-pin Domain-0 all 0-1
> root@salvator-x:~# xl vcpu-list
> Name ID VCPU CPU State Time(s) Affinity
> (Hard / Soft)
> Domain-0 0 0 0 r-- 5.0 0-1 / all
> Domain-0 0 1 1 r-- 6.1 0-1 / all
> root@salvator-x:~# [ 38.041306] systemd invoked oom-killer:
> gfp_mask=0x14040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0,
> oom_score_adj=0
> [ 38.052531] systemd cpuset=/ mems_allowed=0
> [ 38.056791] CPU: 1 PID: 1 Comm: systemd Tainted: G O
> 4.14.35-yocto-standard #1
> [ 38.065112] Hardware name: Renesas Salvator-X 2nd version board based on
> r8a7795 ES2.0+ (DT)
> [ 38.073579] Call trace:
> [ 38.076095] [<ffff000008089a50>] dump_backtrace+0x0/0x3c8
> [ 38.081530] [<ffff000008089e2c>] show_stack+0x14/0x20
> [ 38.086631] [<ffff000008ac8640>] dump_stack+0x9c/0xbc
> [ 38.091729] [<ffff00000819b048>] dump_header+0x90/0x1e8
> [ 38.096996] [<ffff00000819a53c>] oom_kill_process+0x26c/0x568
> [ 38.102784] [<ffff00000819ac20>] out_of_memory+0x198/0x4c0
> [ 38.108317] [<ffff0000081a07d4>] __alloc_pages_nodemask+0xb5c/0xbf0
> [ 38.114622] [<ffff0000081f2a24>] alloc_pages_current+0x7c/0xe8
> [ 38.120494] [<ffff0000081fc354>] new_slab+0x404/0x548
> [ 38.125590] [<ffff0000081fe6f8>] ___slab_alloc+0x480/0x5f0
> [ 38.131121] [<ffff0000081fe88c>] __slab_alloc.isra.23+0x24/0x38
> [ 38.137082] [<ffff0000081ff0bc>] kmem_cache_alloc+0x19c/0x1e0
> [ 38.142875] [<ffff0000083278ec>] nfs_readhdr_alloc+0x1c/0x30
> [ 38.148574] [<ffff00000832667c>] nfs_generic_pg_pgios+0x1c/0xc8
> [ 38.154534] [<ffff000008326084>] nfs_pageio_doio+0x34/0x70
> [ 38.160065] [<ffff0000083275d0>] nfs_pageio_complete+0x50/0xc8
> [ 38.165939] [<ffff0000083286f8>] nfs_readpages+0xd8/0x1b8
> [ 38.171385] [<ffff0000081a60a0>] __do_page_cache_readahead+0x180/0x268
> [ 38.177955] [<ffff000008197b90>] filemap_fault+0x2c0/0x600
> [ 38.183481] [<ffff0000081c8ba8>] __do_fault+0x20/0x78
> [ 38.188577] [<ffff0000081ce8a4>] __handle_mm_fault+0xafc/0x1050
> [ 38.194538] [<ffff0000081cef24>] handle_mm_fault+0x12c/0x1d8
> [ 38.200243] [<ffff00000809d120>] do_page_fault+0x1a8/0x3d0
> [ 38.205770] [<ffff00000809d384>] do_translation_fault+0x3c/0x48
> [ 38.211732] [<ffff000008081310>] do_mem_abort+0x40/0xa0
> [ 38.217002] [<ffff0000080813f8>] do_el0_ia_bp_hardening+0x38/0x98
> [ 38.223137] Exception stack(0xffff000008013ec0 to 0xffff000008014000)
> [ 38.229618] 3ec0: 0000ffffc49a4860 0000ffffc49a4bb8 0000ffffc49a4bf7
> 0000000000000000
> [ 38.237484] 3ee0: 0000000000000000 0000ffffc49a49a0 000000000000001f
> 0000ffffaa8cf2a8
> [ 38.245342] 3f00: 0000ffffc49a4a60 ffffff80ffffffe8 0000ffffc49a4a80
> 0000ffffc49a4a80
> [ 38.253204] 3f20: 0000000000000018 000000005ba10e62 003ae0c145b56d95
> 000018ebb6886232
> [ 38.261066] 3f40: 0000ffffaa99aae0 0000ffffaa63ad10 0000ffffc49a47fe
> 0000ffffc49a4860
> [ 38.268928] 3f60: 0000ffffc49a4bb8 0000ffffc49a4bf7 0000ffffc49a4bb8
> 0000ffffaa8d9bc8
> [ 38.276791] 3f80: 0000000000000001 0000ffffc49a4bb8 0000ffffaa914f00
> 0000ffffc49a4bb8
> [ 38.284653] 3fa0: 00000000000f4240 0000ffffc49a47d0 0000ffffaa5cfad0
> 0000ffffc49a47d0
> [ 38.292516] 3fc0: 0000ffffaa5cdf50 0000000080000000 0000aaaafb9d1370
> 00000000ffffffff
> [ 38.300378] 3fe0: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [ 38.308241] [<ffff0000080834d4>] el0_ia+0x18/0x1c
> [ 38.313028] Mem-Info:
> [ 38.315331] active_anon:3647 inactive_anon:5174 isolated_anon:0
> [ 38.315331] active_file:4 inactive_file:3 isolated_file:0
> [ 38.315331] unevictable:3 dirty:0 writeback:0 unstable:0
> [ 38.315331] slab_reclaimable:1527 slab_unreclaimable:4757
> [ 38.315331] mapped:3350 shmem:5255 pagetables:204 bounce:0
> [ 38.315331] free:102395 free_pcp:171 free_cma:90122
> [ 38.348452] Node 0 active_anon:14588kB inactive_anon:20696kB active_file:0kB
> inactive_file:4kB unevictable:12kB isolated(anon):0kB isolated(file):0kB
> mapped:13344kB dirty:0kB writeback:0kB shmem:21020kB shmems
> [ 38.375385] Node 0 DMA free:388912kB min:24368kB low:30460kB high:36552kB
> active_anon:72kB inactive_anon:0kB active_file:0kB inactive_file:196kB
> unevictable:0kB writepending:0kB present:1703936kB managed:4267B
> [ 38.403038] lowmem_reserve[]: 0 1342 1342
> [ 38.407091] Node 0 Normal free:20800kB min:20684kB low:25852kB high:31020kB
> active_anon:14516kB inactive_anon:20696kB active_file:0kB inactive_file:0kB
> unevictable:12kB writepending:0kB present:1441792kB manaB
> [ 38.435343] lowmem_reserve[]: 0 0 0
> [ 38.438885] Node 0 DMA: 38*4kB (MC) 36*8kB (UMC) 12*16kB (UMC) 4*32kB (UE)
> 14*64kB (UMEC) 9*128kB (UMEC) 3*256kB (ME) 5*512kB (UMEC) 6*1024kB (UME)
> 4*2048kB (MEC) 90*4096kB (MC) = 389112kB
> [ 38.455651] Node 0 Normal: 804*4kB (UMH) 413*8kB (UMH) 263*16kB (UMH)
> 139*32kB (UMH) 72*64kB (UMH) 9*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB
> 0*4096kB = 20936kB
> [ 38.470339] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
> hugepages_size=2048kB
> [ 38.478803] 5266 total pagecache pages
> [ 38.482604] 0 pages in swap cache
> [ 38.485974] Swap cache stats: add 0, delete 0, find 0/0
> [ 38.491244] Free swap = 0kB
> [ 38.494181] Total swap = 0kB
> [ 38.497119] 786432 pages RAM
> [ 38.500047] 0 pages HighMem/MovableOnly
> [ 38.503945] 649566 pages reserved
> [ 38.507319] 98304 pages cma reserved
> [ 38.510943] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents
> oom_score_adj name
> [ 38.519552] [ 1884] 0 1884 3375 405 9 4
> 0 0 systemd-journal
> [ 38.529011] [ 2475] 0 2475 3262 255 8 3
> 0 -1000 systemd-udevd
> [ 38.538336] [ 3098] 997 3098 1523 76 7 3
> 0 0 systemd-network
> [ 38.547844] [ 3099] 999 3099 1237 89 6 3
> 0 0 avahi-daemon
> [ 38.557085] [ 3101] 0 3101 763 18 5 3
> 0 0 syslogd
> [ 38.565897] [ 3102] 998 3102 1115 87 6 3
> 0 -900 dbus-daemon
> [ 38.575057] [ 3103] 999 3103 1204 68 6 3
> 0 0 avahi-daemon
> [ 38.584303] [ 3104] 0 3104 1502 119 7 4
> 0 0 systemd-logind
> [ 38.593719] [ 3106] 0 3106 763 18 5 3
> 0 0 klogd
> [ 38.602360] [ 3125] 0 3125 977 80 6 3
> 0 0 xenstored
> [ 38.611344] [ 3138] 0 3138 1757 123 7 3
> 0 0 weston-launch
> [ 38.620692] [ 3141] 996 3141 1670 76 7 3
> 0 0 systemd-resolve
> [ 38.630200] [ 3159] 0 3159 1802 157 7 3
> 0 0 systemd
> [ 38.639001] [ 3161] 0 3161 579 32 5 3
> 0 0 agetty
> [ 38.647718] [ 3165] 0 3165 2382 344 8 3
> 0 0 (sd-pam)
> [ 38.656618] [ 3194] 0 3194 17196 48 6 3
> 0 0 xenconsoled
> [ 38.665777] [ 3203] 0 3203 17196 57 7 3
> 0 0 xenconsoled
> [ 38.674936] [ 3206] 0 3206 36895 3598 42 3
> 0 0 weston
> [ 38.683661] [ 3229] 0 3229 680 43 5 3
> 0 0 xenwatchdogd
> [ 38.692906] [ 3230] 0 3230 1403 125 7 3
> 0 0 login
> [ 38.701563] [ 3232] 0 3232 2612 431 9 3
> 0 0 weston-keyboard
> [ 38.711075] [ 3234] 0 3234 5359 3150 14 3
> 0 0 weston-desktop-
> [ 38.720556] [ 3237] 0 3237 913 99 5 3
> 0 0 sh
> [ 38.728934] Out of memory: Kill process 3206 (weston) score 25 or sacrifice
> child
> [ 38.736484] Killed process 3234 (weston-desktop-) total-vm:21436kB,
> anon-rss:1232kB, file-rss:8kB, shmem-rss:11360kB
> [ 45.856773] systemd invoked oom-killer:
> gfp_mask=0x14040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0,
> oom_score_adj=0
> [ 45.867983] systemd cpuset=/ mems_allowed=0
>
>
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
2018-12-05 10:49 ` Julien Grall
@ 2018-12-05 10:59 ` Andrii Anisov
2018-12-05 11:45 ` Julien Grall
0 siblings, 1 reply; 13+ messages in thread
From: Andrii Anisov @ 2018-12-05 10:59 UTC (permalink / raw)
To: Julien Grall, xen-devel
Hello Julien,
On 05.12.18 12:49, Julien Grall wrote:
> I am not sure to understand what is the relation between the two.
Me confused as well. I just notified about my observations.
> What is the latest Xen commit where the oom-killer does not trigger?
I didn't bisect it nor digged into it. I'm trying to measure IRQ latency as Stefano did.
> How much memory do you have in Dom0? Do you have any memory hungry process running?
Dom0 has 3Gb RAM. But it's not about the memory, I'm pretty sure. Until I decided to pin vcpus, I did all my routine without any issues.
--
Sincerely,
Andrii Anisov.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
2018-12-05 10:59 ` Andrii Anisov
@ 2018-12-05 11:45 ` Julien Grall
2018-12-05 11:59 ` Andrii Anisov
0 siblings, 1 reply; 13+ messages in thread
From: Julien Grall @ 2018-12-05 11:45 UTC (permalink / raw)
To: Andrii Anisov, xen-devel
On 05/12/2018 10:59, Andrii Anisov wrote:
> Hello Julien,
Hi,
> On 05.12.18 12:49, Julien Grall wrote:
>> I am not sure to understand what is the relation between the two.
> Me confused as well. I just notified about my observations.
>
>> What is the latest Xen commit where the oom-killer does not trigger?
> I didn't bisect it nor digged into it. I'm trying to measure IRQ latency as
> Stefano did.
>
>> How much memory do you have in Dom0? Do you have any memory hungry process
>> running?
> Dom0 has 3Gb RAM. But it's not about the memory, I'm pretty sure.Until I
> decided to pin vcpus, I did all my routine without any issues.
Well, at least the kernel thinks it does not have anymore memory (see the call
trace).
What do you mean by all your routine? How much work did you do on the platform
before triggering oem-killer?
Looking at your log, you don't seem to have swap. With all your routine, how
often are you close to the maximum memory?
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
2018-12-05 11:45 ` Julien Grall
@ 2018-12-05 11:59 ` Andrii Anisov
2018-12-05 12:15 ` Julien Grall
0 siblings, 1 reply; 13+ messages in thread
From: Andrii Anisov @ 2018-12-05 11:59 UTC (permalink / raw)
To: Julien Grall, xen-devel
On 05.12.18 13:45, Julien Grall wrote:
> Well, at least the kernel thinks it does not have anymore memory (see the call trace).
Yes, it thinks so. But it is not linked to domain .
> What do you mean by all your routine?
I mean all things I'm playing with now. Running tbm baremetal app in different use-cases, bringing it up with new vgic.
> How much work did you do on the platform before triggering oem-killer?
Without cpu pinning I do all I need without oom-killer being triggered.
After cpu pinning it takes 5-10 seconds until oom-killer starts to kill everything.
--
Sincerely,
Andrii Anisov.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
2018-12-05 11:59 ` Andrii Anisov
@ 2018-12-05 12:15 ` Julien Grall
2018-12-05 12:40 ` Andrii Anisov
0 siblings, 1 reply; 13+ messages in thread
From: Julien Grall @ 2018-12-05 12:15 UTC (permalink / raw)
To: Andrii Anisov, xen-devel
On 05/12/2018 11:59, Andrii Anisov wrote:
>
> On 05.12.18 13:45, Julien Grall wrote:
>> Well, at least the kernel thinks it does not have anymore memory (see the call
>> trace).
> Yes, it thinks so. But it is not linked to domain .
What do you mean? A memory corruption by Xen is extremely unlikely. So it looks
like to me this is a related to your domain (kernel or userspace), possibly
because memory has not been freed correctly.
>
>> What do you mean by all your routine?
> I mean all things I'm playing with now. Running tbm baremetal app in different
> use-cases, bringing it up with new vgic.
>
>> How much work did you do on the platform before triggering oem-killer?
> Without cpu pinning I do all I need without oom-killer being triggered.
> After cpu pinning it takes 5-10 seconds until oom-killer starts to kill everything.
Below a list of questions to answer:
- Can you give the steps to reproduce it from boot?
- How much memory left do you have before calling xl vcpu-pin?
- When exactly do you pin the vCPUs? (i.e how long after boot)
- What are the other programs running? How much memory are they using?
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
2018-12-05 12:15 ` Julien Grall
@ 2018-12-05 12:40 ` Andrii Anisov
2018-12-05 13:13 ` Julien Grall
0 siblings, 1 reply; 13+ messages in thread
From: Andrii Anisov @ 2018-12-05 12:40 UTC (permalink / raw)
To: Julien Grall, xen-devel
On 05.12.18 14:15, Julien Grall wrote:
>> Yes, it thinks so. But it is not linked to domain .
>
> What do you mean?
It should be read as "But it is not linked to domain memory size".
> A memory corruption by Xen is extremely unlikely.
I believe in that.
> So it looks like to me this is a related to your domain (kernel or userspace), possibly because memory has not been freed correctly.
It might be. But happens only with and right after vcpu pinning.
> Below a list of questions to answer:
> - Can you give the steps to reproduce it from boot?
The step is single and trivial, just try to pin vcpus from Dom0.
> - How much memory left do you have before calling xl vcpu-pin?
Meminfo says
root@salvator-x:~# cat /proc/meminfo
MemTotal: 2995828 kB
MemFree: 2810360 kB
MemAvailable: 2758420 kB
Top says:
Mem: 185592K used, 2810236K free, 21020K shrd, 0K buff, 53000K cached
CPU: 0% usr 8% sys 0% nic 91% idle 0% io 0% irq 0% sirq
> - When exactly do you pin the vCPUs? (i.e how long after boot)
Right after login.
> - What are the other programs running? How much memory are they using?
Weston, systemd daemons, Xen daemons, other Yocto daemons. Actually, nothing special.
--
Sincerely,
Andrii Anisov.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
2018-12-05 12:40 ` Andrii Anisov
@ 2018-12-05 13:13 ` Julien Grall
2018-12-05 14:46 ` Andrii Anisov
0 siblings, 1 reply; 13+ messages in thread
From: Julien Grall @ 2018-12-05 13:13 UTC (permalink / raw)
To: Andrii Anisov, xen-devel
On 05/12/2018 12:40, Andrii Anisov wrote:
>
>
> On 05.12.18 14:15, Julien Grall wrote:
>>> Yes, it thinks so. But it is not linked to domain .
>>
>> What do you mean?
> It should be read as "But it is not linked to domain memory size".
So if you increase the memory of the dom0 you will still see the error?
>
>> A memory corruption by Xen is extremely unlikely.
> I believe in that.
I need at least some sort of proof that Xen might corrupt the kernel. I don't
believe we manage to just corrupt the kernel memory subsystem with good enough
value reliably. So maybe we should start looking at more plausible cause.
>
>> So it looks like to me this is a related to your domain (kernel or userspace),
>> possibly because memory has not been freed correctly.
> It might be. But happens only with and right after vcpu pinning.
It does not mean this is because of Xen. It might just be because of Xen drivers
that does not free memory.
>
>> Below a list of questions to answer:
>> - Can you give the steps to reproduce it from boot?
> The step is single and trivial, just try to pin vcpus from Dom0.
I tried and can't reproduce it. But I am using 4.20-rc4 and not 4.14.35. If you
think the bug was introduced in recent Xen, then the first step is to downgrade
Xen. If it does not happen on the downgraded version, then you can bisect it.
>
>> - How much memory left do you have before calling xl vcpu-pin?
> Meminfo says
> root@salvator-x:~# cat /proc/meminfo
> MemTotal: 2995828 kB
> MemFree: 2810360 kB
> MemAvailable: 2758420 kB
How about after vCPU pining? Do you see the memory free going down?
>
> Top says:
> Mem: 185592K used, 2810236K free, 21020K shrd, 0K buff, 53000K cached
> CPU: 0% usr 8% sys 0% nic 91% idle 0% io 0% irq 0% sirq
>
>> - When exactly do you pin the vCPUs? (i.e how long after boot)
> Right after login.
So that's reliably happening? Are you sure there are nothing else on the system
using memory? For instance you seem to have nfs in place.
>
>> - What are the other programs running? How much memory are they using?
> Weston, systemd daemons, Xen daemons, other Yocto daemons. Actually, nothing
> special.
I would try to remove unnecessary programs. So you can narrow down the issues.
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
2018-12-05 13:13 ` Julien Grall
@ 2018-12-05 14:46 ` Andrii Anisov
2018-12-13 14:13 ` Andrii Anisov
0 siblings, 1 reply; 13+ messages in thread
From: Andrii Anisov @ 2018-12-05 14:46 UTC (permalink / raw)
To: Julien Grall, xen-devel
On 05.12.18 15:13, Julien Grall wrote:
> I need at least some sort of proof that Xen might corrupt the kernel. I don't believe we manage to just corrupt the kernel memory subsystem with good enough value reliably. So maybe we should start looking at more plausible cause.
I think I would be able to look deeper into it by the end of the week.
> It does not mean this is because of Xen. It might just be because of Xen drivers that does not free memory.
Totally agree. I did not dig into it yet. Just made a notification, so maybe other interested parties can check/verify the issue on different setups.
> I tried and can't reproduce it.
Great. So it might be the problem on my setup.
> But I am using 4.20-rc4 and not 4.14.35. If you think the bug was introduced in recent Xen, then the first step is to downgrade Xen. If it does not happen on the downgraded version, then you can bisect it.
It definitely does not reproduce with XEN 4.10 release. But it is pretty old.
> How about after vCPU pining? Do you see the memory free going down?
I've checked, and seen a strange thing: memtotal is shrinked down.
Meminfo before vcpu pin:
MemTotal: 2995828 kB
MemFree: 2810360 kB
MemAvailable: 2758420 kB
Buffers: 0 kB
Cached: 53092 kB
SwapCached: 0 kB
Active: 26716 kB
Inactive: 40980 kB
Active(anon): 14976 kB
Inactive(anon): 20648 kB
Active(file): 11740 kB
Inactive(file): 20332 kB
Unevictable: 12 kB
Mlocked: 12 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 14644 kB
Mapped: 31632 kB
Shmem: 21020 kB
Slab: 29204 kB
SReclaimable: 9924 kB
SUnreclaim: 19280 kB
KernelStack: 2848 kB
PageTables: 828 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 1497912 kB
Committed_AS: 58556 kB
VmallocTotal: 135290290112 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
CmaTotal: 393216 kB
CmaFree: 360488 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Meminfo before oom-killer rose:
root@salvator-x:~# cat /proc/meminfo
MemTotal: 549000 kB
MemFree: 412108 kB
MemAvailable: 347472 kB
Buffers: 0 kB
Cached: 19356 kB
SwapCached: 0 kB
Active: 15920 kB
Inactive: 17512 kB
Active(anon): 14516 kB
Inactive(anon): 9104 kB
Active(file): 1404 kB
Inactive(file): 8408 kB
Unevictable: 12 kB
Mlocked: 12 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 14160 kB
Mapped: 9016 kB
Shmem: 9496 kB
Slab: 26412 kB
SReclaimable: 6792 kB
SUnreclaim: 19620 kB
KernelStack: 2880 kB
PageTables: 776 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 274500 kB
Committed_AS: 49264 kB
VmallocTotal: 135290290112 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
CmaTotal: 393216 kB
CmaFree: 360488 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
> So that's reliably happening?
It is 100% reproducible on my setup.
> Are you sure there are nothing else on the system using memory? For instance you seem to have nfs in place.
Yes, Dom0 root is nfs.
--
Sincerely,
Andrii Anisov.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
2018-12-05 14:46 ` Andrii Anisov
@ 2018-12-13 14:13 ` Andrii Anisov
2018-12-13 16:37 ` Juergen Gross
0 siblings, 1 reply; 13+ messages in thread
From: Andrii Anisov @ 2018-12-13 14:13 UTC (permalink / raw)
To: xen-devel; +Cc: Julien Grall, Stefano Stabellini
Hello All,
OK, I've discovered a mechanism of the issue.
It is because of `d->max_pages = ~0U;` in a `construct_dom0()`.
When I do vcpu-pin, libxl updates memory nodes in xenstore for Dom0. Then kernel watch sees those changes and trying to set new target for ballon, but the target becomes extremely high, and baloon sucks all the pages.
In my kernel (4.14) in `watch_target()` function there is a code:
target_diff = xen_pv_domain() ? 0
: static_max - balloon_stats.target_pages;
Here we have `xen_pv_domain()` equal to zero, so `target_diff` big. Then, few lines below:
balloon_set_new_target(new_target - target_diff);
`balloon_set_new_target()` receives a value wrapped over 64bit what kills the system.
Now I'm looking for an appropriate kernel patch for the kernel, to fix that. Any suggestions?
--
Sincerely,
Andrii Anisov.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
2018-12-13 14:13 ` Andrii Anisov
@ 2018-12-13 16:37 ` Juergen Gross
2018-12-14 10:36 ` Andrii Anisov
0 siblings, 1 reply; 13+ messages in thread
From: Juergen Gross @ 2018-12-13 16:37 UTC (permalink / raw)
To: Andrii Anisov, xen-devel; +Cc: Julien Grall, Stefano Stabellini
On 12/13/18 3:13 PM, Andrii Anisov wrote:
> Hello All,
>
> OK, I've discovered a mechanism of the issue.
> It is because of `d->max_pages = ~0U;` in a `construct_dom0()`.
> When I do vcpu-pin, libxl updates memory nodes in xenstore for Dom0.
> Then kernel watch sees those changes and trying to set new target for
> ballon, but the target becomes extremely high, and baloon sucks all the
> pages.
>
> In my kernel (4.14) in `watch_target()` function there is a code:
>
> target_diff = xen_pv_domain() ? 0
> : static_max - balloon_stats.target_pages;
>
> Here we have `xen_pv_domain()` equal to zero, so `target_diff` big.
> Then, few lines below:
>
> balloon_set_new_target(new_target - target_diff);
>
> `balloon_set_new_target()` receives a value wrapped over 64bit what
> kills the system.
>
> Now I'm looking for an appropriate kernel patch for the kernel, to fix
> that. Any suggestions?
>
You should use linux kernel commit 3596924a233e45aa918.
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
2018-12-13 16:37 ` Juergen Gross
@ 2018-12-14 10:36 ` Andrii Anisov
0 siblings, 0 replies; 13+ messages in thread
From: Andrii Anisov @ 2018-12-14 10:36 UTC (permalink / raw)
To: Juergen Gross, xen-devel; +Cc: Julien Grall, Stefano Stabellini
Hello Juergen,
On 13.12.18 18:37, Juergen Gross wrote:
> You should use linux kernel commit 3596924a233e45aa918.
That is exactly what is needed.
Thank you!
--
Sincerely,
Andrii Anisov.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2018-12-14 10:36 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-05 10:26 arm: xl vcpu-pin leads to oom-killer slashing processes Andrii Anisov
2018-12-05 10:28 ` Andrii Anisov
2018-12-05 10:49 ` Julien Grall
2018-12-05 10:59 ` Andrii Anisov
2018-12-05 11:45 ` Julien Grall
2018-12-05 11:59 ` Andrii Anisov
2018-12-05 12:15 ` Julien Grall
2018-12-05 12:40 ` Andrii Anisov
2018-12-05 13:13 ` Julien Grall
2018-12-05 14:46 ` Andrii Anisov
2018-12-13 14:13 ` Andrii Anisov
2018-12-13 16:37 ` Juergen Gross
2018-12-14 10:36 ` Andrii Anisov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).